发表新帖

发表新帖

I can't get the whole source code of an HTML page

前端未结

关注

 2  989

清酒与你 2021-01-22 19:47

Using Python, I want to crawl data on a web page whose source if quite big (it is a Facebook page of some user).

Say the URL is the URL I am trying to crawl. I run the

2条回答

误落风尘 (楼主)

2021-01-22 20:08
This page may execute some javascript and javascript generates some content.
Try Twill.
It based on Mechanize, but executes javascript.
Sample in Python:
```
from twill.commands import *
go("http://google.com/")
fv("f", "q", "test")
submit("btnG")
info() #shows page info
show() #shows html
```
Another option is to use Zombie.js on Node.js.
This library works even better then Twill and it is browserless solution.
Sample in Coffeescript:
```
zombie = require "zombie"
browser = new zombie()
browser.visit "https://www.google.ru/", =>
    browser.fill "q", "node.js"
    browser.pressButton "Поиск в Google", ->
        for item in browser.queryAll "h3.r a"
            console.log item.innerHTML
```
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...

热议问题