Bachelor of Business Administration @PNU/Marketing Analytics

세포라 웹사이트 리뷰 크롤링 | Sephora Website Review Crawling (18.02.2022.)

Hazel Y. 2022. 4. 14. 11:48

나는 웹 크롤링을 위해 셀레니움을 사용하고자 한다.

I'm going to use selenium for web crawling.


 

1. Import all the necessary packages.

 

import time

import openpyxl
from openpyxl import Workbook

import random

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

 

2. Web crawling

 

wb = Workbook(write_only=True)
ws1 = wb.create_sheet('except_date')
ws2 = wb.create_sheet('date')
ws1.append(['brand', 'product', 'review'])
ws2.append(['date'])

options = webdriver.ChromeOptions()

path = '/path/to/chromedriver'
driver = webdriver.Chrome(path, chrome_options = options)

driver.implicitly_wait(3)

driver.get('https://www.sephora.com/product/high-impact-lash-elevating-mascara-P421489?skuId=1968247&keyword=CLINIQUE%20High%20Impact%20Lash%20Elevating%20Mascara')
close_popup = driver.find_element_by_css_selector('svg.css-1ikgx7p.eanm77i0')
close_popup.click()
time.sleep(3)

driver.execute_script('window.scrollTo(0, 3050)')

time.sleep(1)

sort = driver.find_elements_by_css_selector('div.css-tsrkv7')[1]
sort.click()
most_helpful = driver.find_element_by_xpath('//button[@class="css-1aawth6 eanm77i0"][contains(text(), "Most Helpful")]')
most_helpful.click()

time.sleep(1)

i = 0

for i in range(14):
    
    review_texts = driver.find_elements_by_xpath('//div[@class="css-1s11tbv eanm77i0"]')
    dates = driver.find_elements_by_xpath('//span[@class="css-ak0g49 eanm77i0"]')

    for review_text in review_texts:
    
        review_t = review_text.text
        
        ws1.append(['CLINIQUE', 'High Impact Lash Elevating Mascara', review_t])
    
    for date in dates:
        
        d = date.text
        
        ws2.append([d])

    next_page = driver.find_elements_by_css_selector('li.css-1579ltc')[8]
    next_page.click()
    time.sleep(random.uniform(2, 3.5))
    i += 1
    
driver.quit()
wb.save('un4_h.xlsx')

위 코드의 결과: 아래의 첨부된 액셀 파일 참고

Open the attached excel document named un4_h.xlsx for the result of the code above.

un4_h.xlsx
0.02MB

 

 

 

* Unauthorized copying and distribution of this post are not allowed.

* 해당 글에 대한 무단 배포 및 복사를 허용하지 않습니다.