链家数据爬取+地图找房
一、链家数据爬取 (由于链家二手房搜索结果有100页的限制,也就是只能搜到3000条结果,因此,我将按照城区搜索结果进行爬取) 首先从搜索结果页面获得二手房详情页面的url,存储到apartment_url.csv中 # -*- coding: utf-8 -*- import csv import re import urllib2 import sqlite3 import random import threading from bs4 import BeautifulSoup import sys reload(sys) sys.setdefaultencoding("utf-8") #Some User Agents hds=[{'User-Agent':'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6'},\ {'User-Agent':'Mozilla/5.0 (Windows NT 6.2) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.12 Safari/535.11'},\ {'User-Agent':'Mozilla/5.0 (compatible; MSIE 10.0