리뷰 별점과 도움 수 데이터를 추가 수집하기 위해서 기존의 웹 크롤링 코드를 약간 수정하였다.
Since I decided to collect rating and helpfulness data from the Sephora webpage, I slightly changed the web crawling code.
1. Import necessary packages.
import time
import openpyxl
from openpyxl import Workbook
import random
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
2. Web crawling
wb = Workbook(write_only=True)
ws1 = wb.create_sheet('except_date')
ws2 = wb.create_sheet('date')
ws3 = wb.create_sheet('rating')
ws4 = wb.create_sheet('helpful')
ws1.append(['brand', 'product', 'review'])
ws2.append(['date'])
ws3.append(['rating'])
ws4.append(['helpful'])
options = webdriver.ChromeOptions()
path = '/path/to/chromedriver'
driver = webdriver.Chrome(path, chrome_options = options)
driver.implicitly_wait(3)
driver.get('https://www.sephora.com/product/high-impact-lash-elevating-mascara-P421489?skuId=1968247&keyword=CLINIQUE%20High%20Impact%20Lash%20Elevating%20Mascara')
close_popup = driver.find_element_by_css_selector('svg.css-1ikgx7p.eanm77i0')
close_popup.click()
time.sleep(3)
driver.execute_script('window.scrollTo(0, 3050)')
time.sleep(1)
sort = driver.find_elements_by_css_selector('div.css-tsrkv7')[1]
sort.click()
most_helpful = driver.find_element_by_xpath('//button[@class="css-1aawth6 eanm77i0"][contains(text(), "Most Helpful")]')
most_helpful.click()
time.sleep(1)
i = 0
for i in range(15):
review_texts = driver.find_elements_by_xpath('//div[@class="css-1s11tbv eanm77i0"]')
dates = driver.find_elements_by_xpath('//span[@class="css-ak0g49 eanm77i0"]')
rates = driver.find_elements_by_xpath('//span[@class="css-1vmt2jw eanm77i0"]/span[@class="css-mu0xdx"]')
helpfuls = driver.find_elements_by_xpath('//div[@class="css-1ds6ck2 eanm77i0"]/button[@class="css-36ie0l"]/span')
for review_text in review_texts:
review_t = review_text.text
ws1.append(['CLINIQUE', "High Impact Lash Elevating Mascara", review_t])
for date in dates:
d = date.text
ws2.append([d])
for rate in rates:
r = rate.get_attribute('aria-label').strip(' stars')
ws3.append([r])
for helpful in helpfuls:
h = helpful.text.strip('('')')
ws4.append([h])
next_page = driver.find_elements_by_css_selector('li.css-1579ltc')[8]
next_page.click()
time.sleep(random.uniform(2, 3.5))
i += 1
driver.quit()
wb.save('un4.xlsx')
위 코드의 결과: 아래 첨부된 액셀 파일 참고
The attached excel document can show you the result of the code above.
* Unauthorized copying and distribution of this post are not allowed.
* 해당 글에 대한 무단 배포 및 복사를 허용하지 않습니다.
'Bachelor of Business Administration @PNU > Marketing Analytics' 카테고리의 다른 글
감성 분석 - 비지도 학습 | Sentiment Analysis - Unsupervised Learning (24.02.2022.) (0) | 2022.04.15 |
---|---|
단어 빈도 분석 | Word Frequency Analysis (24.02.2022.) (0) | 2022.04.14 |
선행 연구 -1 | Pilot Study - 1 (24.02.2022.) (0) | 2022.04.14 |
세포라 웹사이트 리뷰 크롤링 | Sephora Website Review Crawling (18.02.2022.) (0) | 2022.04.14 |
프로젝트 프로포절 두 번째 수정본 | Second Revised Project Proposal (18.02.2022.) (0) | 2022.04.14 |