토픽별 BERT 감성 분석 | BERT Sentiment Analysis by Topics (13.04.2022.)

Bachelor of Business Administration @PNU/Marketing Analytics

토픽별 BERT 감성 분석 | BERT Sentiment Analysis by Topics (13.04.2022.)

Hazel Y. 2022. 4. 22. 12:16

이전 포스트에서 나는 BERT를 사용하여 토픽 모델링을 진행하였다. 결과적으로, 성공적인 제품 리뷰에서는 7개의 토픽을, 성공적이지 않은 제품 리뷰에서는 3개의 토픽을 얻었다. 이번 포스트에서는 이 토픽들을 가지고 감성 분석을 진행해볼 것이다.

In the previous post, I conducted the topic modeling with BERT. As a result, I got 7 topics from the successful product reviews and 3 topics from the unsuccessful product reviews. In this post, I'm going to conduct the sentiment analysis by these topics.

* 아래 포스트에서 파인 튜닝을 완료한 BERT 모델을 사용할 것이다.

* I'll use the BERT model that I fine-tuned in the following post.

https://livelyhheesun.tistory.com/18

BERT Sentiment Analysis - 1 (13.04.2022.)

In the previous meeting, one of the fellow students introduced BERT. It was my first time learning about BERT, which made me more interested in it. BERT is an abbreviation of Bidirectional Encoder R..

livelyhheesun.tistory.com

1. Import necessary packages.

import pandas as pd

from nltk.tokenize import sent_tokenize
import nltk

import matplotlib.pyplot as plt

2. Download the package 'punkt'.

nltk.download('punkt')

3. Make lists of topic keywords for each topic.

# successful product reviews
stopic0 = [' curl ', ' straight ', ' curler ', ' curled ', ' lashes ', ' curling ', ' curls ', ' hold ', ' eyelash ', ' eyelashes ']
stopic1 = [' brush ', ' lashes ', ' separates ', ' love ', ' coat ', ' product ', ' bristles ', ' like ', ' really ', ' one ']
stopic2 = [' size ', ' sample ', ' received ', ' full ', ' got ', ' birthday ', ' gift ', ' free ', ' sephora ', ' love ']
stopic3 = [' clump ', ' clumps ', ' clumpy ', ' makes ', ' eyelashes ', ' clumping ', ' look ', ' mascara ', ' lashes ', ' long ']
stopic4 = [' eyes ', ' sensitive ', ' itchy ', ' red ', ' irritate ', ' contacts ', ' allergies ', ' allergic ', ' itch ', ' burn ']
stopic5 = [' volume ', ' length ', ' gives ', ' adds ', ' mascara ', ' lengthens ', ' favorite ', ' love ', ' great ', ' ever ']
stopic6 = [' raccoon ', ' eyes ', ' day ', ' end ', ' hours ', ' give ', ' hour ', ' like ', ' racoon ', ' mascara ']

# unsuccessful product reviews
utopic0 = [' lashes ', ' mascara ', ' remover ', ' makeup ', ' day ', ' remove ', ' eye ', ' like ', ' curl ', ' use ']
utopic1 = [' mascara ', ' good ', ' one ', ' even ', ' product ', ' clumpy ', ' store ', ' worst ', ' get ', ' dried ']
utopic2 = [' lashes ', ' mascara ', ' long ', ' length ', ' worth ', ' look ', ' never ', ' best ', ' volume ', ' lengthen ']

4. Make a list of sentences that contain the topic keywords.

# successful product reviews - topic 0
stopic_sentences0 = []

for i in range(len(s_sentences)):
    
  for j in range(len(stopic0)):
        
    if stopic0[j] in s_sentences[i]:
            
      stopic_sentences0.append(s_sentences[i])

stopic_sentences0 = list(set(stopic_sentences0))

len(stopic_sentences0)

5. Make a list of predicted sentiment labels for the sentences. (0: negative, 1: positive)

st0_labels = []

for i in range(1342):
  
  st_list = stopic_sentences0[i*5:i*5+5]

  tf_batch = tokenizer(st_list, max_length=128, padding=True, truncation=True, return_tensors='tf')
  tf_outputs = model(tf_batch)
  tf_predictions = tf.nn.softmax(tf_outputs[0], axis=-1)
  labels = [0, 1]
  label = tf.argmax(tf_predictions, axis=1)
  label = label.numpy()

  for j in range(len(st_list)):

    st0_labels.append(labels[label[j]])

st_list_1342 = stopic_sentences0[6710:]

tf_batch = tokenizer(st_list_1342, max_length=128, padding=True, truncation=True, return_tensors='tf')
tf_outputs = model(tf_batch)
tf_predictions = tf.nn.softmax(tf_outputs[0], axis=-1)
labels = [0, 1]
label = tf.argmax(tf_predictions, axis=1)
label = label.numpy()

for j in range(len(st_list_1342)):

  st0_labels.append(labels[label[j]])

6. Visualization

reaction = ['positive', 'negative']
pos = st0_labels.count(1) / len(st0_labels)
neg = 1 - pos
values = [pos, neg]

plt.bar(reaction, values, color=['b', 'r'])

for i, v in enumerate(reaction):
    plt.text(v, values[i], round(values[i], 3), fontsize = 9, color='blue', horizontalalignment='center', verticalalignment='bottom')
    
plt.title('BERT-Predicted Sentiments for Successful Product Review Sentences (Topic0=curliness)', fontsize=15)

각 토픽에 대해 위 코드를 반복하면, 다음과 같은 시각화 결과를 얻을 수 있다.

Repeat the codes above for each topic, and the results of the visualization are as follows.

The sentiment ratio of topic 0 of the successful product reviews

The sentiment ratio of topic 1 of the successful product reviews

The sentiment ratio of topic 2 of the successful product reviews

The sentiment ratio of topic 3 of the successful product reviews

The sentiment ratio of topic 4 of the successful product reviews

The sentiment ratio of topic 5 of the successful product reviews

The sentiment ratio of topic 6 of the successful product reviews

성공한 제품 리뷰로부터의 토픽들에서는 모두 긍정적 감성이 더 강하게 나타났다. 가장 강한 긍정 비율을 보인 토픽은 '프로모션'이라는 이름의 토픽 2였으며, 가장 낮은 긍정 비율을 보인 토픽은 '알레르기 반응'이라는 이름의 토픽 4였다. 이 결과를 아래 링크로 연결된 포스트에서 설명 및 소개한 경영학적 적용에 대한 나만의 프레임워크에 대입하면, 마스카라 제품을 성공으로 이끌기 위해서 기업은 프로모션 전략을 잘 수립하거나 이미 성공한 제품의 전략을 참고 및 밴치마킹할 필요가 있다는 것을 알 수 있다.

또, 분석 대상인 성공적인 제품들에 알레르기 반응 및 가려움증 등과 관련된 몇 가지 문제점이 있는 것으로 보아, 만약 기업이 이러한 측면에서 더 많은 R&D 투자 등을 통해 그들의 마스카라 제품을 향상시킨다면, 플러스 요인으로 작용할 것이다.

All the topics from the successful product reviews showed stronger sentimental positivities. The topic that has the highest ratio of positive sentiment is topic 2, 'promotional event'. On the other hand, the topic with the lowest ratio of sentimental positivity is topic 4, 'allergic reaction'. If I consider the results together with the framework for the managerial implications introduced in the following post, to make a mascara product successful, a firm needs to establish sales promotion strategies well or benchmark the successful ones.

https://livelyhheesun.tistory.com/17

Sentiment Analysis for Each Topic (30.03.2022.)

In this post, I wanted to know how different the sentiments are depending on each topic due to the possible future managerial implication of the result. For example, topics from successful product r..

livelyhheesun.tistory.com

Also, since the successful products show some issues regarding allergic reaction or irritation after applying them, if a firm can invest more in the R&D process and improve their product in the aspect, it would be a plus for them.

The sentiment ratio of topic 0 of the unsuccessful product reviews

The sentiment ratio of topic 1 of the unsuccessful product reviews

The sentiment ratio of topic 2 of the unsuccessful product reviews

성공하지 않은 제품 리뷰 분석의 시각화 결과에서는 전체적으로 성공한 제품에 비해 더 높은 부정 비율을 확인할 수 있다. 여기에도 그 프레임워크를 적용한다면, '속눈썹 볼류마이징과 연장 효과'라는 이름의 토픽 2가 가장 높은 긍정 비율을 보이기 때문에 해당 기능 및 특징이 마스카라 제품에 있어서는 필수적이어야 한다는 것을 알 수 있다. 하지만, 이 토픽을 성공한 제품 리뷰 토픽 중 동일 이름의 토픽과 비교해보면, 성공하지 않은 제품 리뷰에 부정적 문장들이 더 높은 비율로 존재한다는 것을 파악할 수 있다. 이는 속눈썹 볼류마이징 및 연장 효과가 마스카라 제품의 필수 기능인 동시에 해당 제품의 성공 여부를 어느 정도 결정할 수 있는 부분이기 때문에 주의 깊게 개발되어야 한다는 것을 의미할 수 있다.

'전반적 평가'이라는 이름의 토픽 1은 가장 높은 부정 비율을 가지고 있고, 이는 이 토픽과 관련된 것(제품에 대한 부정적 평가가 높다는 것)은 피해야 한다는 것을 의미한다. 이는 사실 당연한 말이다. 소비자들이 가지고 있는 제품에 대한 전반적 인상이 나쁘다면 그 제품이 성공하기는 거의 불가능하기 때문이다. 기업은 이 상황을 개선시키기 위해 다양한 것들을 할 수 있다. 예를 들어, 위에서 언급한 것처럼 판매 촉진 전략을 재수립하거나 피부과학적 R&D에 더 많은 투자를 할 수도 있다. 혹은 속눈썹 볼류마이징 및 연장 효과에 초점을 맞춰 제품의 질을 향상시킬 수도 있고, 제품 사용 후 제거를 용이하게 만드는 방향으로 제품의 단점을 보완할 수도 있다. 내가 방금 언급한 몇 가지 예시를 제외하고도, 당연히 기업은 그들이 현재 부족하거나 개선시킬 필요가 있다고 느끼는 부분에 집중할 수도 있다.

The visualizations of the analysis using the unsuccessful product reviews demonstrates higher negativity in general than the other product type. If I apply the framework here as well, the volumizing and lengthening feature is necessary for mascara products since topic 2, 'volumizing & lengthening', shows the highest ratio of sentimental positivity. However, if I compare this result with the topic with the same name from the successful product reviews, there are more negative sentiments in the unsuccessful ones. This can mean that the volumizing and lengthening effect is a must-have as a mascara product, and at the same time, it should be developed well and carefully since it can be one of the features that decide whether the product will succeed or not.

Topic 1, 'overall impression', involves the highest ratio of negative sentiments, which means this should be avoided. This is pretty obvious because it is almost impossible for a product to succeed when its overall impression from consumers is bad. A firm can try many things to make it better, such as re-establishing sales promotion strategies and investing more in the dermatologic R&D as mentioned above, improving the volumizing and lengthening effect of the product, or even making the product easy to be removed. Apart from some examples that I listed, a firm can of course focus on what they lack or need to improve.

* Unauthorized copying and distribution of this post are not allowed.

'Bachelor of Business Administration @PNU > Marketing Analytics' 카테고리의 다른 글

BERT 토픽 모델링 \| BERT Topic Modeling (13.04.2022.) (0)	2022.04.21
불용어 제거한 데이터셋 만들기 \| How to Make Stop Words-Removed Datasets (13.04.2022.) (0)	2022.04.21
BERT 감성 분석 - 2 \| BERT Sentiment Analysis - 2 (13.04.2022.) (0)	2022.04.21
BERT 감성 분석 - 1 \| BERT Sentiment Analysis - 1 (13.04.2022.) (0)	2022.04.20
토픽 별 감성 분석 \| Sentiment Analysis for Each Topic (30.03.2022.) (0)	2022.04.19

현재글토픽별 BERT 감성 분석 | BERT Sentiment Analysis by Topics (13.04.2022.)

- 인프제가 생각을 쏟아내는 곳 - INFJ, a professional overthinker 공스타: @gongstabyhazel

bioinformatics, sentiment analysis, marketing analytics, 생각, 석사, Python, 유학생, 일상, 일기, Linux, 생물정보학, 리눅스, 독서기록, 파이썬, 매일의생각, 감성 분석, Netherlands, 책추천, Topic Modeling, 네덜란드,

Today :
Yesterday :

Hazel's Life Journey