[크롤링] What is the differences between requests and selenium?

250x250

Notice

Recent Posts

Recent Comments

Link

« 2025/02 »
일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

Tags more

Archives

Today

Total

관리 메뉴

데이터과학 삼학년

[크롤링] What is the differences between requests and selenium? 본문

Natural Language Processing

[크롤링] What is the differences between requests and selenium?

Dan-k 2022. 5. 27. 19:45

웹크롤링 중에 request를 써서 html을 불러왔는데 간혹 내가 수집하려는 데이터가 없는 경우가 있다?

이럴 경우, 당황하지말자. 그것이 requests의 한계

requests

- 웹페이지의 상태를 가져오는 것으로 초기 html소스만 가져오기 때문에 온전히 웹페이지상의 모든 정보를 가져오는 것은 아님

- 즉, 크롤링하려고 한 대상이 일부 수집이 안되는 케이스 발생 가능

- 처리 속도가 빠름

selenium

- 실제 web driver를 이용해 web page를 열어 데이터를 수집하는 형태로, 초기 html뿐만 아니라 페이지를 render하기위해 사용된 html source까지 모두 가져올 수 있는 장점이 있음그러나, 느림

위 두개 module의 차이는 사이트에서 동적으로 생성시킨 rendering/java script를 pull해올 수 있냐 없냐의 차이라고 이해하면 됨

셀레니움은 실제 브라우저를 열어서 가져오기때문에 대부분의 정보를 모두 수집이 가능한 것으로 이해하면 됨

코드

from bs4 import BeautifulSoup
from selenium import webdriver
import requests

#--- requeset
response = requests.get('https://www.python.org/')
soup= BeautifulSoup(response.content, 'html.parser')
sample_requests= soup.find_all('div', class_='accord_hd')

#--- selenium
driver = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')
driver.get('https://www.python.org/')
soup= BeautifulSoup(driver.page_source, 'html.parser')
sample_selenium= soup.find_all('div', class_='accord_hd')

driver.close()

https://stackoverflow.com/questions/57249863/what-is-difference-between-soup-of-selenium-and-requests

What is difference between soup of selenium and requests?

I was crawling some information from the web, but there were different results while I'm using Selenium and requests Selenium driver.get('https://www.jobplanet.co.kr/companies/322493/benefits/%EC...

stackoverflow.com

https://blog.naver.com/PostView.nhn?isHttpsRedirect=true&blogId=kiddwannabe&logNo=221188260422

크롤링방법) Requests? Selenium? 크롤링 가능한가요?

oo 사이트 크롤링 가능한가요??Requests 로 크롤링 되나요? Selenium 배워야하나요?크롤링을 하면 어떤 ...

blog.naver.com

- request : A페이지 모두 수집 가능, B페이지는 html 만 가능

- selenium : A, B페이지 모두 수집 가능

728x90

LIST

'Natural Language Processing' 카테고리의 다른 글

PMI(Pointwise Mutual Information); 점별 상호 정보량 (0)	2022.11.27
[크롤링] selenium implicitly Wait VS Explicitly Wait (0)	2022.05.31
ROUGE : text summarization metric (0)	2022.05.09
TextRank for Text Summarization (0)	2022.05.04
텔레그램 챗 내용 export 및 parser (feat. beautifulsoup) (0)	2022.04.19

'Natural Language Processing' Related Articles

Comments

데이터과학 삼학년

[크롤링] What is the differences between requests and selenium? 본문

[크롤링] What is the differences between requests and selenium?

'Natural Language Processing' 카테고리의 다른 글

티스토리툴바