beautifulsoup | 易学教程

Python爬取链家二手房源信息

阅读更多关于 Python爬取链家二手房源信息

爬取链家网站二手房房源信息，第一次做，仅供参考，要用scrapy。 import scrapy,pypinyin,requests import bs4 from ..items import LianjiaItem class LianjiaSpider(scrapy.Spider): name = 'lianjia_dl' allowed_domains = ['www.lianjia.com'] start_urls = [] url_0 = 'https://www.lianjia.com/city/' res = requests.get(url_0) bs_cs = bs4.BeautifulSoup(res.text,'html.parser') xinxi_cs = bs_cs.find_all('div',class_='city_province') for data_cs in xinxi_cs: cs_s = data_cs.find('ul').find_all('li') for cs_1 in cs_s: yess = cs_1.find('a')['href'] if yess.find('fang')>=0: #若fang字符串在yess中，则yess.find('fang')是大于等于0的，显示在字符串中的位置 continue else:

ImportError: cannot import name 'BeautifulSoup4'

阅读更多关于 ImportError: cannot import name 'BeautifulSoup4'

问题 I know this question has been asked a lot on this site, but I have tried all of the solutions given, and I cannot figure out how to solve this problem. For starters: I am on a Windows 10 computer using Python 3.6 . I installed Anaconda as my IDE. I tried to install BeautifulSoup4 with pip install beautifulsoup4 , but I got the Requirement already satisfied response. The code I am trying to run is just from bs4 import BeautifulSoup4 to which I get the error: ImportError: cannot import name

ImportError: cannot import name 'BeautifulSoup4'

阅读更多关于 ImportError: cannot import name 'BeautifulSoup4'

ImportError: cannot import name 'BeautifulSoup4'

阅读更多关于 ImportError: cannot import name 'BeautifulSoup4'

Is it possible to scrape a “dynamical webpage” with beautifulsoup?

阅读更多关于 Is it possible to scrape a “dynamical webpage” with beautifulsoup?

问题 I am currently begining to use beautifulsoup to scrape websites, I think I got the basics even though I lack theoretical knowledge about webpages, I will do my best to formulate my question. What I mean with dynamical webpage is the following: a site whose HTML changes based on user action, in my case its collapsible tables. I want to obtain the data inside some "div" tag but when you load the page, the data seems unavalible in the html code, when you click on the table it expands, and the

Is it possible to scrape a “dynamical webpage” with beautifulsoup?

阅读更多关于 Is it possible to scrape a “dynamical webpage” with beautifulsoup?

Passing web data into Beautiful Soup - Empty list

阅读更多关于 Passing web data into Beautiful Soup - Empty list

问题 I've rechecked my code and looked at comparable operations on opening a URL to pass web data into Beautiful Soup, for some reason my code just doesn't return anything although it's in correct form: >>> from bs4 import BeautifulSoup >>> from urllib3 import poolmanager >>> connectBuilder = poolmanager.PoolManager() >>> content = connectBuilder.urlopen('GET', 'http://www.crummy.com/software/BeautifulSoup/') >>> content <urllib3.response.HTTPResponse object at 0x00000000032EC390> >>> soup =

How do I write a BeautifulSoup strainer that only parses objects with certain text between the tags?

阅读更多关于 How do I write a BeautifulSoup strainer that only parses objects with certain text between the tags?

问题 I'm using Django and Python 3.7. I want to have more efficient parsing so I was reading about SoupStrainer objects. I created a custom one to help me parse only the elements I need ... def my_custom_strainer(self, elem, attrs): for attr in attrs: print("attr:" + attr + "=" + attrs[attr]) if elem == 'div' and 'class' in attr and attrs['class'] == "score": return True elif elem == "span" and elem.text == re.compile("my text"): return True article_stat_page_strainer = SoupStrainer(self.my_custom

Getting data from hidden html (popup) using BS4

阅读更多关于 Getting data from hidden html (popup) using BS4

问题 I am trying to scrape the name of a link in a popup in wikipedia. So when you hover a link in wikipedia, it brings up a little snippet from the intro to that link. I need to scrape that information but I am unsure where it would be in the source. When I inspect the element(as it is popped up) this is the html (for this example I am hovering over the link "Greek") <a dir="ltr" lang="en" class="mwe-popups-extract" href="/wiki/Ancient_Greek"> The Ancient Greek language includes the

Python - find a substring between two strings based on the last occurence of the later string

阅读更多关于 Python - find a substring between two strings based on the last occurence of the later string

问题 I am trying to find a substring which is between to strings. The first string is and the last string is . The first string I look for is repetitive, while the later string can serve as an anchor. Here is an example: <div class="linkTabBl" style="float:left;padding-top:6px;width:240px"> Anglo American plc 20 Carlton House Terrace SW1Y 5AN London United Kingdom Phone : +44 (0)20 7968 8888 Fax : +44 (0)20 7968 8500 Internet : <a class="pageprofil