phantomjs

【爬虫】selenium动态页面请求与模拟登录知乎

懵懂的女人 提交于 2021-02-18 04:05:55
一。安装selenium pip install selenium 二。安装相应浏览器的Driver(selenium 文档) http://selenium-python.readthedocs.io/api.html 推荐使用Chrome 三。selenium的使用 1 # -*- coding: utf-8 -*- 2 3 from selenium import webdriver 4 from scrapy.selector import Selector 5 6 7 # 知乎的模拟登录 8 browser = webdriver.Chrome(executable_path= " E:/chromedriver.exe " ) # 路径是chromedriver.exe的存放的位置 9 browser.get( " https://www.zhihu.com/#signin " ) 10 browser.find_element_by_css_selector( " .view-signin input[name='account'] " ).send_keys( " ******** " ) # 帐号 11 browser.find_element_by_css_selector( " .view-signin input[name='password'] " )

爬虫(七)图片懒加载技术、selenium和PhantomJS

给你一囗甜甜゛ 提交于 2021-02-12 06:56:10
动态数据加载处理 一.图片懒加载 什么是图片懒加载? 案例分析:抓取站长素材http://sc.chinaz.com/中的图片数据 #!/usr/bin/env python # -*- coding:utf-8 -*- import requests from lxml import etree if __name__ == "__main__": url = 'http://sc.chinaz.com/tupian/gudianmeinvtupian.html' headers = { 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36', } #获取页面文本数据 response = requests.get(url=url,headers=headers) response.encoding = 'utf-8' page_text = response.text #解析页面数据(获取页面中的图片链接) #创建etree对象 tree = etree.HTML(page_text) div_list = tree.xpath( '//div[@id=

爬虫-图片懒加载技术、selenium和PhantomJS

非 Y 不嫁゛ 提交于 2021-02-11 19:18:58
动态数据加载处理 一.图片懒加载 什么是图片懒加载? 案例分析:抓取站长素材http://sc.chinaz.com/中的图片数据 #!/usr/bin/env python # -*- coding:utf-8 -*- import requests from lxml import etree if __name__ == "__main__": url = 'http://sc.chinaz.com/tupian/gudianmeinvtupian.html' headers = { 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36', } #获取页面文本数据 response = requests.get(url=url,headers=headers) response.encoding = 'utf-8' page_text = response.text #解析页面数据(获取页面中的图片链接) #创建etree对象 tree = etree.HTML(page_text) div_list = tree.xpath( '//div[@id=

selenium 自动化工具

允我心安 提交于 2021-02-09 09:56:44
问题 今天在使用 selenium + PhantomJS 动态抓取网页时,出现如下报错信息: UserWarning: Selenium support for PhantomJS has been deprecated, please use headless versions of Chrome or Firefox instead warnings.warn('Selenium support for PhantomJS has been deprecated, please use headless ' 翻译过来就是:selenium已经放弃PhantomJS,了,建议使用火狐或者谷歌无界面浏览器。 解决办法: 我们就改用 Selenium + Headless Chrome 1.安装Chrome浏览器 2.安装Selenium pip install selenium 3.安装chromedriver chromedriver下载地址: https://sites.google.com/a/chromium.org/chromedriver/downloads (被墙了) http://npm.taobao.org/mirrors/chromedriver/(可用) 注意 :chromedriver的版本要与你使用的chrome版本对应,对应关系: 点击链接 下载完成后

Vue.js not rendering when trying to generate a PDF using Phantom.js

雨燕双飞 提交于 2021-02-07 14:52:24
问题 In this simple example with hardcoded url my Vue.js components not rendering, plain html get rendered but all places i have a component appear blank. Phantom.js should work normally with Vue.js? var webPage = require('webpage'); var page = webPage.create(); page.viewportSize = { width: 1920, height: 1080 }; page.open("-----------", function start(status) { page.render('test.jpeg', {format: 'jpeg', quality: '100'}); phantom.exit(); }); Quick vue code for who want to help and do the test. <

Vue.js not rendering when trying to generate a PDF using Phantom.js

China☆狼群 提交于 2021-02-07 14:51:14
问题 In this simple example with hardcoded url my Vue.js components not rendering, plain html get rendered but all places i have a component appear blank. Phantom.js should work normally with Vue.js? var webPage = require('webpage'); var page = webPage.create(); page.viewportSize = { width: 1920, height: 1080 }; page.open("-----------", function start(status) { page.render('test.jpeg', {format: 'jpeg', quality: '100'}); phantom.exit(); }); Quick vue code for who want to help and do the test. <

handle tinymce window with python, selenium and phantomjs

China☆狼群 提交于 2021-02-07 13:44:05
问题 I have the following code for logging in on a site and post something in a forum driver = webdriver.PhantomJS() Username = "username" Password = "password" driver.get(LoginPage) WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.ID, "login"))) driver.find_element_by_name("usr").send_keys(Username) driver.find_element_by_name("pas").send_keys(Password) driver.find_element_by_id("login").click() payload = "some text" driver.get(ForumPage) WebDriverWait(driver, 10).until(EC

handle tinymce window with python, selenium and phantomjs

戏子无情 提交于 2021-02-07 13:41:42
问题 I have the following code for logging in on a site and post something in a forum driver = webdriver.PhantomJS() Username = "username" Password = "password" driver.get(LoginPage) WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.ID, "login"))) driver.find_element_by_name("usr").send_keys(Username) driver.find_element_by_name("pas").send_keys(Password) driver.find_element_by_id("login").click() payload = "some text" driver.get(ForumPage) WebDriverWait(driver, 10).until(EC

PhantomJS require() a relative path

别说谁变了你拦得住时间么 提交于 2021-02-06 00:52:32
问题 In a PhantomJS script I would like to load a custom module but it seems relative paths do not works in PhantomJS ? script.js: var foo = require('./script/lib/foo.js'); foo.bar('hello world'); phantom.exit(); foo.js: exports.bar = function(text){ console.log(text); } According to fs.workingDirectory I am in the good directory foo.js is not in the lookup path of phantomjs Am I missing something ? EDIT: inject() is not revelant because I do not need to inject a JS to an HTML page but instead

PhantomJS require() a relative path

感情迁移 提交于 2021-02-06 00:51:41
问题 In a PhantomJS script I would like to load a custom module but it seems relative paths do not works in PhantomJS ? script.js: var foo = require('./script/lib/foo.js'); foo.bar('hello world'); phantom.exit(); foo.js: exports.bar = function(text){ console.log(text); } According to fs.workingDirectory I am in the good directory foo.js is not in the lookup path of phantomjs Am I missing something ? EDIT: inject() is not revelant because I do not need to inject a JS to an HTML page but instead