爬虫入门一
一直很想学习一下爬虫,今天忙里偷闲看了一篇教程博客之后开始入门学习,很感谢教程作者的讲解与分享,文末附教程博客链接。
BeautifulSoup解析豆瓣即将上映的电影信息
python代码:
import requestsfrom bs4 import BeautifulSoup#1-1.先保存到文件,再进行解析#1-1-1.获取网页信息保存到文件的过程#url = "/cinema/later/chengdu/"#response = requests.get(url)#file_obj = open('douban.html','w',encoding="utf-8")#file_obj.write(response.content.decode('utf-8'))#file_obj.close()#1-1-2.从文件获取信息的过程#file_obj = open('douban.html','r', encoding="utf-8")#html = file_obj.read()#file_obj.close()#1-1-3.初始化BeautifulSoup,解析网页#soup = BeautifulSoup(html, 'lxml')#print(soup.find)#1-2.直接抓取、解析url = "/cinema/later/chengdu/"response = requests.get(url)soup = BeautifulSoup(response.content.decode('utf-8'), 'lxml')#2.获取并分析元素all_movies = soup.find('div', id = "showing-soon")#3.展示有用信息for each_movie in all_movies.find_all('div', class_ = "item"):#print(each_movie)all_a_tag = each_movie.find_all('a')all_li_tag = each_movie.find_all('li')movie_name = all_a_tag[1].textmovie_href = all_a_tag[1]['href']movie_date = all_li_tag[0].textmovie_type = all_li_tag[1].textmovie_area = all_li_tag[2].textmovie_lovers = all_li_tag[3].textprint('电影名:{},电影链接:{},放映日期:{},电影类型:{},上映地区:{},想看的人数:{}'.format(movie_name,movie_href,movie_date,movie_type,movie_area,movie_lovers))
输出:
附上学习链接:
爬虫入门教程⑧— BeautifulSoup解析豆瓣即将上映的电影信息.