세포라 리뷰 웹 크롤링 - 2 | Sephora Review Web Crawling

Bachelor of Business Administration @PNU/Marketing Analytics

세포라 리뷰 웹 크롤링 - 2 | Sephora Review Web Crawling - 2 (24.02.2022.)

Hazel Y. 2022. 4. 14. 16:09

리뷰 별점과 도움 수 데이터를 추가 수집하기 위해서 기존의 웹 크롤링 코드를 약간 수정하였다.

Since I decided to collect rating and helpfulness data from the Sephora webpage, I slightly changed the web crawling code.

1. Import necessary packages.

import time

import openpyxl
from openpyxl import Workbook

import random

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

2. Web crawling

wb = Workbook(write_only=True)
ws1 = wb.create_sheet('except_date')
ws2 = wb.create_sheet('date')
ws3 = wb.create_sheet('rating')
ws4 = wb.create_sheet('helpful')
ws1.append(['brand', 'product', 'review'])
ws2.append(['date'])
ws3.append(['rating'])
ws4.append(['helpful'])

options = webdriver.ChromeOptions()

path = '/path/to/chromedriver'
driver = webdriver.Chrome(path, chrome_options = options)

driver.implicitly_wait(3)

driver.get('https://www.sephora.com/product/high-impact-lash-elevating-mascara-P421489?skuId=1968247&keyword=CLINIQUE%20High%20Impact%20Lash%20Elevating%20Mascara')
close_popup = driver.find_element_by_css_selector('svg.css-1ikgx7p.eanm77i0')
close_popup.click()
time.sleep(3)

driver.execute_script('window.scrollTo(0, 3050)')

time.sleep(1)

sort = driver.find_elements_by_css_selector('div.css-tsrkv7')[1]
sort.click()
most_helpful = driver.find_element_by_xpath('//button[@class="css-1aawth6 eanm77i0"][contains(text(), "Most Helpful")]')
most_helpful.click()

time.sleep(1)

i = 0

for i in range(15):
    
    review_texts = driver.find_elements_by_xpath('//div[@class="css-1s11tbv eanm77i0"]')
    dates = driver.find_elements_by_xpath('//span[@class="css-ak0g49 eanm77i0"]')
    rates = driver.find_elements_by_xpath('//span[@class="css-1vmt2jw eanm77i0"]/span[@class="css-mu0xdx"]')
    helpfuls = driver.find_elements_by_xpath('//div[@class="css-1ds6ck2 eanm77i0"]/button[@class="css-36ie0l"]/span')
    
    for review_text in review_texts:
    
        review_t = review_text.text
        
        ws1.append(['CLINIQUE', "High Impact Lash Elevating Mascara", review_t])
    
    for date in dates:
        
        d = date.text
        
        ws2.append([d])
    
    for rate in rates:
        
        r = rate.get_attribute('aria-label').strip(' stars')
        
        ws3.append([r])
        
    for helpful in helpfuls:
        
        h = helpful.text.strip('('')')
        
        ws4.append([h])

    next_page = driver.find_elements_by_css_selector('li.css-1579ltc')[8]
    next_page.click()
    time.sleep(random.uniform(2, 3.5))
    i += 1
    
driver.quit()
wb.save('un4.xlsx')

위 코드의 결과: 아래 첨부된 액셀 파일 참고

The attached excel document can show you the result of the code above.

un4.xlsx

0.02MB

* Unauthorized copying and distribution of this post are not allowed.

'Bachelor of Business Administration @PNU > Marketing Analytics' 카테고리의 다른 글

감성 분석 - 비지도 학습 \| Sentiment Analysis - Unsupervised Learning (24.02.2022.) (0)	2022.04.15
단어 빈도 분석 \| Word Frequency Analysis (24.02.2022.) (0)	2022.04.14
선행 연구 -1 \| Pilot Study - 1 (24.02.2022.) (0)	2022.04.14
세포라 웹사이트 리뷰 크롤링 \| Sephora Website Review Crawling (18.02.2022.) (0)	2022.04.14
프로젝트 프로포절 두 번째 수정본 \| Second Revised Project Proposal (18.02.2022.) (0)	2022.04.14

현재글세포라 리뷰 웹 크롤링 - 2 | Sephora Review Web Crawling - 2 (24.02.2022.)

- 인프제가 생각을 쏟아내는 곳 - INFJ, a professional overthinker 공스타: @gongstabyhazel

매일의생각, 감성 분석, 석사, Netherlands, 일상, 일기, 생물정보학, 네덜란드, bioinformatics, 생각, Python, Topic Modeling, marketing analytics, 유학생, 리눅스, 책추천, sentiment analysis, Linux, 독서기록, 파이썬,

Today :
Yesterday :

일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Hazel's Life Journey