Using Python, I want to crawl data on a web page whose source if quite big (it is a Facebook page of some user).
Say the URL is the URL I am trying to crawl. I run the
This page may execute some javascript and javascript generates some content.
Try Twill.
It based on Mechanize, but executes javascript.
Sample in Python:
from twill.commands import *
go("http://google.com/")
fv("f", "q", "test")
submit("btnG")
info() #shows page info
show() #shows html
Another option is to use Zombie.js on Node.js.
This library works even better then Twill and it is browserless solution.
Sample in Coffeescript:
zombie = require "zombie"
browser = new zombie()
browser.visit "https://www.google.ru/", =>
browser.fill "q", "node.js"
browser.pressButton "Поиск в Google", ->
for item in browser.queryAll "h3.r a"
console.log item.innerHTML