Bachelor of Business Administration @PNU/Marketing Analytics

구 빈도 분석 | Phrase Frequency Analysis (16.03.2022.)

Hazel Y. 2022. 4. 17. 10:23

교수님께서 나에게 주신 피드백 중 하나는 단어 빈도 분석보다 두 세 단어를 묶은 구로 빈도 분석을 해보라는 것이었다. 예를 들어, 아래의 코드로 진행한 분석은 단어 'ever'가 아닌 구 'best ever'의 빈도를 보여준다. 이는 연구자로하여금 단어의 실제 사용 맥락, 즉 단어 사용 의미를 더 잘 이해할 수 있도록 돕는다.

다음은 두 단어를 하나로 묶은 구를 이용한 빈도 분석 시 사용한 코드이다.

 

One of the pieces of feedback that the professor gave to me is to conduct a frequency analysis not with words but with phrases(groups of words). For example, this analysis shows the frequency of 'best ever' instead of just 'ever'. This can help a researcher to understand the actual usage, and thus meaning, of a certain word better.

The followings are the steps and codes for the phrase frequency analysis with groups of two words.


1. Import necessary packages.

import pandas as pd

import matplotlib.pyplot as plt

import re

from nltk.corpus import stopwords

 

2. Import datasets.

suc = pd.read_csv('suc.csv')
un = pd.read_csv('un.csv')

 

3. Define a function for preprocessing.

def data_text_cleaning(data):
 
    # 영문자 이외 문자는 공백으로 변환
    only_english = re.sub('[^a-zA-Z]', ' ', data)
 
    # 소문자 변환
    no_capitals = only_english.lower().split()
 
    # 불용어 제거
    stops = set(stopwords.words('english'))
    added_stops = ['mascara', 'mascaras', 'tarte', 'ilia', 'benefit', 'cosmetics', 'sephora', 'collection', 'lara', 'devgan', 'scientific', 'beauty', 'guerlain', 'clinique']
    no_stops = [word for word in no_capitals if not word in stops]
    no_stops = [word for word in no_stops if not word in added_stops]
 
    return no_stops

 

4. Preprocessing

# successful product reviews
sreview_list = []

for i in range(len(suc)):
    
    review = str(suc['review'][i])
    suc['review'][i] = data_text_cleaning(review)
    sreview_list += suc['review'][i]
    
# unsuccessful product reviews
ureview_list = []

for i in range(len(un)):
    
    ureview = str(un['review'][i])
    un['review'][i] = data_text_cleaning(ureview)
    ureview_list += un['review'][i]

 

5. Group two words together as a phrase.

# successful product reviews
sreview_2plist = []

for i in range(int(len(sreview_list)/2)):
    
    suc_2phrase = ' '.join(sreview_list[i*2:i*2+2])
    sreview_2plist.append(suc_2phrase)
    
# unsuccessful product reviews
ureview_2plist = []

for i in range(int(len(ureview_list)/2)):
    
    un_2phrase = ' '.join(ureview_list[i*2:i*2+2])
    ureview_2plist.append(un_2phrase)

 

6. Show the top 20 frequency of the phrases.

# successful product reviews

top_s2phrases = pd.Series(sreview_2plist).value_counts().head(20)
print("Top 20 phrases (with two words) from successful product reviews")
print(top_s2phrases)
# unsuccessful product reviews
top_s3phrases = pd.Series(sreview_3plist).value_counts().head(20)
print("Top 20 phrases (with three words) from successful product reviews")
print(top_s3phrases)

 

7. Visualization (bar chart)

# successful product reviews
top_s2phrases.head(20).sort_values().plot(kind='barh',title='successful product reviews phrases counter (with two words)')
# unsuccessful product reviews
top_u2phrases.head(20).sort_values().plot(kind='barh',title='unsuccessful product reviews phrases counter (with two words)')

위에서 이미 언급한 것처럼, 위의 분석은 두 단어를 하나로 묶어서 진행하였다. 그러나, 위의 코드를 살짝만 바꾸면, 두 단어가 아니라 더 많은 단어들을 묶어서 분석을 진행할 수 있다. 그래서, 나는 본 분석을 세 단어 그룹으로도 진행해보았다. 아래의 막대 그래프는 그 결과를 보여준다.

 

As I mentioned above and the codes show, I analyzed with groups of two words. However, with slight changes to these codes, you can of course increase the number of words in a group. So, I also tried this frequency analysis with 3-word-groups. The following charts are the results of it.


20 most frequently mentioned 3-word phrases in successful product reviews

  • Description of how consumers' lashes look after applying the products
    • makes lashes look
    • lashes look long
    • makes eyelashes look
    • make lashes look
    • made lashes look
    • use eyelash curler
    • wearing false lashes
    • lashes look great
    • lashes look longer
    • naturally long lashes
    • lashes look amazing
  • Positive emotions
    • best ever used
    • love love love
    • would definitely recommend
    • best ever tried
  • Product removal method
    • eye makeup remover
  • Specific product name
    • lights camera lashes
  •  How they decided to buy the products (usually by promotional activities such as freebies)
    • bought full size
    • buy full size

20 most frequently mentioned 3-word phrases in unsuccessful product reviews

  • Product removal method
    • eye makeup removal
    • oil based makeup
  • Specific product name
    • original high impact
  • Negative emotions
    • bring back outrageous

  • 성공적인 제품 리뷰에서 소비자들은 제품을 사용한 후 그들의 속눈썹 모양 등의 외모적 변화에 대해 긍정적으로 많이 언급했다. 또, 그 결과는 성공적인 제품을 구매한 소비자들 중 많은 사람들이 처음에는 소량의 증정품을 사용한 후 제품을 구매하기로 결정했다는 것을 보여준다.
  • 성공하지 않은 제품 리뷰에서 많은 소비자들은 제품을 지우는 방법에 있어서 부정적으로 언급했다.

 

  • In successful product reviews, consumers mentioned a lot about how their eyelashes look after using the products mostly with positive emotions. Also, the result shows that many of the consumers who bought the successful products first tried sample-sized freebies and then decided to purchase full-sized products.
  • In unsuccessful product reviews, many consumers mentioned how to remove the products mostly with negative emotions.

 

 

 

* Unauthorized copying and distribution of this post are not allowed.

* 해당 글에 대한 무단 배포 및 복사를 허용하지 않습니다.