1500字范文,内容丰富有趣,写作好帮手!
1500字范文 > Python爬虫:无头浏览器爬虫

Python爬虫:无头浏览器爬虫

时间:2021-06-29 11:57:24

相关推荐

Python爬虫:无头浏览器爬虫

Ubuntu

使用chromium

sudo apt-get install -y chromium-browser # 安装浏览器,这部必须,如果只手动安装运行会报错,缺少依赖。

或者看这个安装新版浏览器并用binary_location指定位置(需要科学上网):

/scheib/chromium-latest-linux

也可以不科学上网手动下载:

/getting-involved/download-chromium

CentOS

使用firefox

yum -y install firefox

驱动:

将其权限+x

chrome:/

或过往版本:http://chromedriver./index.html

firefox:/mozilla/geckodriver/releases

使用浏览器的无头模式headless

安装模块:

pip3 install selenium beautifulsoup4 lxml # ChromeDriver

chrome

#!/usr/bin/env python#coding=utf-8from selenium import webdriverfrom selenium.webdriver.chrome.options import Optionsfrom bs4 import BeautifulSoupimport timeurl = ""options = Options()options.headless = True#options.binary_location = "C:/Program Files (x86)/Google/Chrome/Application/chrome.exe"#driver = webdriver.Chrome(executable_path='chromedriver.exe', chrome_options=options)#options.binary_location = "/home/ubuntu/chrome-linux/chrome" driver = webdriver.Chrome(executable_path='/home/ubuntu/chromedriver', chrome_options=options)driver.get(url)html = driver.page_sourceprint(html)driver.quit()

firefox

#!/usr/bin/env python#coding=utf-8from selenium import webdriverfrom selenium.webdriver.firefox.options import Optionsfrom bs4 import BeautifulSoupimport timeurl = '/'options = Options()options.headless = True#如果设置代理"""profile = webdriver.FirefoxProfile()profile.set_preference('network.proxy.type', 1)profile.set_preference('network.proxy.http', "127.0.0.1")profile.set_preference('network.proxy.http_port', 1080)profile.set_preference('network.proxy.socks', "127.0.0.1")profile.set_preference('network.proxy.socks_port', 1080)profile.set_preference('network.proxy.ssl', "127.0.0.1")profile.set_preference('network.proxy.ssl_port', 1080)profile.set_preference('network.proxy.ftp', "127.0.0.1")profile.set_preference('network.proxy.ftp_port', 1080)#profile.set_preference("network.proxy.share_proxy_settings", True)#profile.update_preferences()"""#options.binary_location = "D:/Program Files/Mozilla Firefox/firefox.exe"#driver = webdriver.Firefox(executable_path='geckodriver.exe', firefox_profile=profile, firefox_options=options)#options.binary_location = "/root/firefox-linux/bin/firefox"driver = webdriver.Firefox(executable_path='/root/geckodriver', firefox_profile=profile, firefox_options=options)driver.get(url)html = driver.page_sourceprint(html)driver.quit()

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。