Python(20) BeautifulSoup 크롤링

Python

Python(20) BeautifulSoup 크롤링

UserDonghu 2023. 9. 22. 21:30

예전에 bs4랑 Selenium이랑 이미 한번 공부하고 핫딜 사이트 크롤링해서 csv파일로 카테고리별로 나눠서 저장해오는 코드도 연습했었는데 크롬 드라이버 에러 때문에 실행이 안된다..

자동으로 크롬 버전에 맞는 드라이버 설치해서 돌리는 코드였는데 아직 드라이버는 업데이트가 안되어서 그런듯?

이 부분은 나중에 다시 공부해서 고치는걸로.

기본 세팅

import requests
from bs4 import BeautifulSoup

response = requests.get('사이트url') # GET방식으로 HTTP요청
# params = {'pa1': 'val1', 'pa2': 'value2'}
# response = requests.get('사이트url', params=params)
# print(response.url) # '사이트url/?pa1=val1&pa2=value2'

print(response) # <Response [200]> 응답 정상 200
# response.status_code # 정상적으로 받아왔으면 200

print(type(response)) # <class 'requests.models.Response'>

print(response.encoding) # utf-8
#response.encoding = 'utf-8' 한글이 깨지면 utf-8로 인코딩

print(response.headers) # 헤더부분 나옴

print(response.text) # html소스 나옴

html = response.text

soup = BeautifulSoup(html, 'html.parser') # HTML 문서를 파싱해줌

원하는 태그 가져오기

soup.head # 가장 첫번째로 오는 head 태그
soup.title # 가장 첫번째로 오는 title태그

셀렉터

soup.select('#_market_sum')[0].text # id가 _market_sum 인것들중 0번째 인덱스의 text값 가져오기

soup.select('.table>hover>tbody>tr')[1].select('td')[6].text

이런 방법도 있다

from bs4 import BeautifulSoup
import urllib.request as req

response = req.urlopen('주소')

html = response.read()
# html = response.read().decode('utf-8') 한글이 깨지면 이렇게

# soup = Beautifulsoup(response, 'html.parse') 이렇게도 가능

저작자표시 (새창열림)

'Python' 카테고리의 다른 글

Python 실습 - 셀레니움으로 크롤링(이었던것) (0)	2023.09.24
Python 실습 - 핫딜 크롤링 해보기 (0)	2023.09.24
Python(19) 정규표현식 (0)	2023.09.22
Python(18) f-string 문법 (0)	2023.09.21
Python(17) args, kwargs, 이터레이터와 제너레이터 (1)	2023.09.20

현재글Python(20) BeautifulSoup 크롤링

UserDonghu's Note

락스타가 되고싶다

일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

UserDonghu's Note

Python(20) BeautifulSoup 크롤링

'Python' 카테고리의 다른 글

'Python'의 다른글

티스토리툴바

Python(20) BeautifulSoup 크롤링

'Python' 카테고리의 다른 글

'Python'의 다른글

관련글

티스토리툴바