Bachelor of Business Administration @PNU/Marketing Analytics

세포라 리뷰 웹 크롤링 - 2 | Sephora Review Web Crawling - 2 (24.02.2022.)

Hazel Y. 2022. 4. 14. 16:09

리뷰 별점과 도움 수 데이터를 추가 수집하기 위해서 기존의 웹 크롤링 코드를 약간 수정하였다.

Since I decided to collect rating and helpfulness data from the Sephora webpage, I slightly changed the web crawling code.


1. Import necessary packages.

 

import time

import openpyxl
from openpyxl import Workbook

import random

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

 

2. Web crawling

 

wb = Workbook(write_only=True)
ws1 = wb.create_sheet('except_date')
ws2 = wb.create_sheet('date')
ws3 = wb.create_sheet('rating')
ws4 = wb.create_sheet('helpful')
ws1.append(['brand', 'product', 'review'])
ws2.append(['date'])
ws3.append(['rating'])
ws4.append(['helpful'])

options = webdriver.ChromeOptions()

path = '/path/to/chromedriver'
driver = webdriver.Chrome(path, chrome_options = options)

driver.implicitly_wait(3)

driver.get('https://www.sephora.com/product/high-impact-lash-elevating-mascara-P421489?skuId=1968247&keyword=CLINIQUE%20High%20Impact%20Lash%20Elevating%20Mascara')
close_popup = driver.find_element_by_css_selector('svg.css-1ikgx7p.eanm77i0')
close_popup.click()
time.sleep(3)

driver.execute_script('window.scrollTo(0, 3050)')

time.sleep(1)

sort = driver.find_elements_by_css_selector('div.css-tsrkv7')[1]
sort.click()
most_helpful = driver.find_element_by_xpath('//button[@class="css-1aawth6 eanm77i0"][contains(text(), "Most Helpful")]')
most_helpful.click()

time.sleep(1)

i = 0

for i in range(15):
    
    review_texts = driver.find_elements_by_xpath('//div[@class="css-1s11tbv eanm77i0"]')
    dates = driver.find_elements_by_xpath('//span[@class="css-ak0g49 eanm77i0"]')
    rates = driver.find_elements_by_xpath('//span[@class="css-1vmt2jw eanm77i0"]/span[@class="css-mu0xdx"]')
    helpfuls = driver.find_elements_by_xpath('//div[@class="css-1ds6ck2 eanm77i0"]/button[@class="css-36ie0l"]/span')
    
    for review_text in review_texts:
    
        review_t = review_text.text
        
        ws1.append(['CLINIQUE', "High Impact Lash Elevating Mascara", review_t])
    
    for date in dates:
        
        d = date.text
        
        ws2.append([d])
    
    for rate in rates:
        
        r = rate.get_attribute('aria-label').strip(' stars')
        
        ws3.append([r])
        
    for helpful in helpfuls:
        
        h = helpful.text.strip('('')')
        
        ws4.append([h])

    next_page = driver.find_elements_by_css_selector('li.css-1579ltc')[8]
    next_page.click()
    time.sleep(random.uniform(2, 3.5))
    i += 1
    
driver.quit()
wb.save('un4.xlsx')

위 코드의 결과: 아래 첨부된 액셀 파일 참고

The attached excel document can show you the result of the code above.

un4.xlsx
0.02MB

 

 

 

* Unauthorized copying and distribution of this post are not allowed.

* 해당 글에 대한 무단 배포 및 복사를 허용하지 않습니다.