Bachelor of Business Administration @PNU/Marketing Analytics

선행 연구 - 2 | Pilot Study - 2 (16.03.2022.)

Hazel Y. 2022. 4. 18. 23:19

이전 포스트에 이어, 본 포스트에서도 선행 연구에 대해 계속 적어볼 것이다. 여기에서 나는 의미 연결망 분석, 토픽 모델링, 구 빈도 분석, 그리고 지도 학습을 통한 감성 분석의 결과에 대해 설명할 것이다.

 

Following the previous post, this is the second part of the pilot study. Here, I'm going to explain the results of the semantic network analysis, topic modeling, phrase frequency analysis and supervised learning sentiment analysis.


1. Semantic network analysis

 

  • Successful product reviews


  • length - volume → quality, function
  • like - look - eyelashes, makes, long, product, really, love → Consumers like(love) products that make their eyelashes longer. (소비자들은 그들의 속눈썹을 더 길게 만들어 주는 제품을 좋아한다.)

  • Unsuccessful product reviews


  • really - tried - remover, use - one - even - product - curl
  • waterproof

→ Consumers really tried to remove the product(curlinesss), but even the remover couldn't do so. (소비자들이 제품을 속눈썹으로부터 제거하고자 했지만, 리무버조차도 그렇게 할 수 없었다.)

 

의미 연결망 분석 코드: 아래 링크 참고

For the semantic network analysis code, please go to the following post.

https://livelyhheesun.tistory.com/12

 

Semantic Network Analysis (16.03.2022.)

The following are the steps and codes for the semantic network analysis. 1. Import necessary packages. import pandas as pd from nltk.corpus import stopwords import numpy as np import matplotlib.pypl..

livelyhheesun.tistory.com


2. Topic modeling

 

  • Successful product reviews

The higher the coherence values is, the better the topics results are.

  • 5 topics

Topic 1 of the successful product reviews

< Topic 1 >

  • lash, curl, make, look, long, lengthen, clump, well, coat, definitely, apply, really, flake, hold
    • positive effects of the products (제품의 긍정적 효과)

Topic 2 of the successful product reviews

< Topic 2 >

  • try, use, buy, find, sample, size
    • what made consumers buy the products (소비자들이 그 제품을 구매하게 만든 요인)

Topic 3 of the successful product reviews

< Topic 3 >

  • lash, look, love, really, long, clump, make, volume, great, much, length, well, also, flake, clumpy, eyelash, brush, give, curl, good, wand, wear, add
    • both positive and negative effects of the products (제품의 긍정적 혹은 부정적 효과)

Topic 4 of the successful product reviews

< Topic 4 >

  • well, love, clump, smudge, lengthen, flake, coat, black, worth, money
    • effects of the products and consumers' thoughts about them (e.g. Is it worth it?) (제품의 효과와 그 제품에 대한 소비자들의 생각)

Topic 5 of the successful product reviews

< Topic 5 >

  • lash, look, curl, make, long, day, work, dry, time, formula, smudge, flake, last
    • long-lasting feature of the products (제품의 롱래스팅 기능)

  • Unsuccessful product reviews

The higher the coherence value is, the better the topic results are.

  • 4 topics

Topic 1 of the unsuccessful product reviews

< Topic 1 >

  • waterproof, remove, curl, oil, hard
    • waterproof feature of the products and removing them from consumers' lashes (제품의 워터프루프 기능과 속눈썹에서 제거 용이 여부)

Topic 2 of the unsuccessful product reviews

< Topic 2 >

  • remove, curl, volume, eye, make, brush, long, great, good, smudge, apply
    • effects of the products (제품의 효과)

Topic 3 of the unsuccessful product reviews

< Topic 3 >

  • make, look, dry, clumpy, never
    • negative effects of the products (제품의 부정적 효과)

Topic 4 of the unsuccessful product reviews

< Topic 4 >

  • review, recommend, star, hard
    • whether consumers are willing to recommend the products to others (소비자들이 그 제품을 다른 사람들에게 추천할 의향)

이번에 처음으로 토픽 모델링과 토픽에 이름을 붙이는 것을 해보았는데, 그래서인지 토픽 이름을 간단하게 한 두 단어로 붙이는 것에 실패했다. 그래서 다음 번에 또 토픽 모델링을 해보게 된다면, 그 때는 토픽 이름을 간결하게 정할 수 있도록 해야 할 것 같다.

LDA 토픽 모델링 코드: 아래 링크 참고

 

Since this was my first experience conducting topic modeling and naming topics, I failed to name them in one or two words. So, next time when I work on the analysis, I'll try to name topics with a single word.
Anyway, for the topic modeling code using LDA, please go to the following post.

https://livelyhheesun.tistory.com/13

 

Topic Modeling using LDA (16.03.2022.)

The following are steps and codes for the topic modeling using LDA. 1. Import necessary packages. import pandas as pd import gensim import gensim.corpora as corpora from gensim.utils import simple_p..

livelyhheesun.tistory.com

 


3. Phrase frequency analysis

 

  • Successful product reviews

20 most frequently mentioned 2-word phrases in successful product reviews

  • Appearance of lashes after applying the products (제품을 사용한 후 속눈썹 모습)
    • lashes look
    • makes lashes
    • long lashes
    • look like
    • curl lashes
    • volumne length
    • eyelash curler
    • make lashes
    • length volume
    • lashes long
    • separates lashes
  • Positive emotions (긍정적 감정)
    • best ever
    • love love
    • really like
  • Long-lasting feature of the products (제품의 롱래스팅 기능)
    • end day

'ever used', 'full size', 'tried many' 등과 같이 중립이거나 이 두 단어의 쌍만 가지고는 어떤 맥락에서 쓰였는지 명확히 알 수 없는 구들이 일부 있었다. 그래서 교수님께서 다음 번엔 키워드로 분석을 진행해보라고 하셨다. 키워드 분석이란 일단 특정 키워드를 포함한 문장들을 추출해낸 다음, 긍정적 문장과 부정적 문장 각각의 비율을 계산해서 비교해보는 것이다. 키워드 분석의 방법과 결과는 다음 포스트에서 소개하겠다.

 

There are several phrases, such as 'ever used', 'full size', and 'tried many', which are neutral or can't be interpreted clearly in what kind of context they were used only with the phrases themselves. So, the professor advised me to do a keyword analysis next time. In the analysis, first I need to extract sentences containing a certain keyword, and then compute and compare the percentages of positive and negative contexts each. I will explain the analysis method and show you the results of it in one of the next posts.


  • Unsuccessful product reviews

20 most frequently mentioned 2-word phrases in unsuccessful product reviews

  • Problems when removing the products (제품을 제거할 때 발생하는 문제)
    • makeup remover
    • oil based
    • hard remove
    • make remover
    • impossible remove
    • micellar water
  • Appearance of lashes after applying the products (제품을 사용한 후 속눈썹 모습)
    • lashes look
    • makes lashes
    • look like
    • curl lashes
    • lashes really
  • Negative emotions (부정적 감정)
    • worst ever
    • would (not) recommend
    • many negative
  • Long-lasting feature of the products (제품의 롱래스팅 기능)
    • throughout day

구 빈도 분석 코드: 아래 링크 참고

For the phrase frequency analysis code, please go to the following post.

https://livelyhheesun.tistory.com/14

 

Phrase Frequency Analysis (16.03.2022.)

One of the pieces of feedback that the professor gave to me is to conduct a frequency analysis not with words but with phrases(groups of words). For example, this analysis shows the frequency of 'be..

livelyhheesun.tistory.com


4. Supervised learning sentiment analysis

 

  • Successful product reviews
    • CountVectorizer
      • accuracy score: 0.731
      • AUC score: 0.866
    • TfidVectorizer
      • accuracy score: 0.748
      • AUC score: 0.885
  • Unsuccessful product reviews
    • CountVectorizer
      • accuracy score: 0.659
      • AUC score: 0.707
    • TfidVectorizer
      • accuracy score: 0.707
      • AUC score: 0.780

성공한 제품과 성공하지 않은 제품 모두에서 TfidVectorizer가 CountVectorizer보다 더 높은 정확도를 보여주었다. 그러나, AUC 스코어는 여전히 그렇게 높지 않다. 물론, 성공한 제품 리뷰의 분석에서의 해당 스코어는 꽤 높고 우수한 분류 능력을 보여주지만, 성공하지 않은 제품 리뷰의 분석에서는 분류 능력이 그저 용인되는 수준이다. 그래서 다음에는 BERT를 사용한 지도 학습 감성 분석을 진행해보고자 한다. BERT는 매우 높은 정확도로 잘 알려져있기 때문이다. BERT 감성 분석은 다음에 포스팅해보도록 하겠다.

지도 학습 감성 분석 코드: 아래 링크 참고

 

For both the successful and unsuccessful product reviews, the TfidVectorizer works better than the CountVectorizer. However, the AUC score is still not that high. Of course, the score for the successful product reviews is pretty high and has an excellent discrimination ability, but for the unsuccessful product reviews, its discrimination ability is just acceptable. Hence, I decided to conduct the supervised learning sentiment analysis using BERT because it is well known for its high accuracy, and I will post how it can be done.

For the supervised learning sentiment analysis code, please go to the following post.

https://livelyhheesun.tistory.com/15

 

Sentiment Analysis - Supervised Learning (16.03.2022.)

The professor advised me to try the supervised learning sentiment analysis. However, at that moment, I didn't know much about the supervised learning method, so I should have found or newly gathered..

livelyhheesun.tistory.com

 

 

 

* Unauthorized copying and distribution of this post are not allowed.

* 해당 글에 대한 무단 배포 및 복사를 허용하지 않습니다.