beautifulsoup

Python爬取链家二手房源信息

杀马特。学长 韩版系。学妹 提交于 2021-02-09 09:53:35
爬取链家网站二手房房源信息,第一次做,仅供参考,要用scrapy。 import scrapy,pypinyin,requests import bs4 from ..items import LianjiaItem class LianjiaSpider(scrapy.Spider): name = 'lianjia_dl' allowed_domains = ['www.lianjia.com'] start_urls = [] url_0 = 'https://www.lianjia.com/city/' res = requests.get(url_0) bs_cs = bs4.BeautifulSoup(res.text,'html.parser') xinxi_cs = bs_cs.find_all('div',class_='city_province') for data_cs in xinxi_cs: cs_s = data_cs.find('ul').find_all('li') for cs_1 in cs_s: yess = cs_1.find('a')['href'] if yess.find('fang')>=0: #若fang字符串在yess中,则yess.find('fang')是大于等于0的,显示在字符串中的位置 continue else:

ImportError: cannot import name 'BeautifulSoup4'

試著忘記壹切 提交于 2021-02-09 08:31:59
问题 I know this question has been asked a lot on this site, but I have tried all of the solutions given, and I cannot figure out how to solve this problem. For starters: I am on a Windows 10 computer using Python 3.6 . I installed Anaconda as my IDE. I tried to install BeautifulSoup4 with pip install beautifulsoup4 , but I got the Requirement already satisfied response. The code I am trying to run is just from bs4 import BeautifulSoup4 to which I get the error: ImportError: cannot import name

ImportError: cannot import name 'BeautifulSoup4'

纵然是瞬间 提交于 2021-02-09 08:31:26
问题 I know this question has been asked a lot on this site, but I have tried all of the solutions given, and I cannot figure out how to solve this problem. For starters: I am on a Windows 10 computer using Python 3.6 . I installed Anaconda as my IDE. I tried to install BeautifulSoup4 with pip install beautifulsoup4 , but I got the Requirement already satisfied response. The code I am trying to run is just from bs4 import BeautifulSoup4 to which I get the error: ImportError: cannot import name

ImportError: cannot import name 'BeautifulSoup4'

浪尽此生 提交于 2021-02-09 08:30:08
问题 I know this question has been asked a lot on this site, but I have tried all of the solutions given, and I cannot figure out how to solve this problem. For starters: I am on a Windows 10 computer using Python 3.6 . I installed Anaconda as my IDE. I tried to install BeautifulSoup4 with pip install beautifulsoup4 , but I got the Requirement already satisfied response. The code I am trying to run is just from bs4 import BeautifulSoup4 to which I get the error: ImportError: cannot import name

Is it possible to scrape a “dynamical webpage” with beautifulsoup?

╄→гoц情女王★ 提交于 2021-02-08 17:01:35
问题 I am currently begining to use beautifulsoup to scrape websites, I think I got the basics even though I lack theoretical knowledge about webpages, I will do my best to formulate my question. What I mean with dynamical webpage is the following: a site whose HTML changes based on user action, in my case its collapsible tables. I want to obtain the data inside some "div" tag but when you load the page, the data seems unavalible in the html code, when you click on the table it expands, and the

Is it possible to scrape a “dynamical webpage” with beautifulsoup?

孤人 提交于 2021-02-08 17:00:51
问题 I am currently begining to use beautifulsoup to scrape websites, I think I got the basics even though I lack theoretical knowledge about webpages, I will do my best to formulate my question. What I mean with dynamical webpage is the following: a site whose HTML changes based on user action, in my case its collapsible tables. I want to obtain the data inside some "div" tag but when you load the page, the data seems unavalible in the html code, when you click on the table it expands, and the

Passing web data into Beautiful Soup - Empty list

守給你的承諾、 提交于 2021-02-08 13:47:22
问题 I've rechecked my code and looked at comparable operations on opening a URL to pass web data into Beautiful Soup, for some reason my code just doesn't return anything although it's in correct form: >>> from bs4 import BeautifulSoup >>> from urllib3 import poolmanager >>> connectBuilder = poolmanager.PoolManager() >>> content = connectBuilder.urlopen('GET', 'http://www.crummy.com/software/BeautifulSoup/') >>> content <urllib3.response.HTTPResponse object at 0x00000000032EC390> >>> soup =

How do I write a BeautifulSoup strainer that only parses objects with certain text between the tags?

北城余情 提交于 2021-02-08 13:24:10
问题 I'm using Django and Python 3.7. I want to have more efficient parsing so I was reading about SoupStrainer objects. I created a custom one to help me parse only the elements I need ... def my_custom_strainer(self, elem, attrs): for attr in attrs: print("attr:" + attr + "=" + attrs[attr]) if elem == 'div' and 'class' in attr and attrs['class'] == "score": return True elif elem == "span" and elem.text == re.compile("my text"): return True article_stat_page_strainer = SoupStrainer(self.my_custom

Getting data from hidden html (popup) using BS4

↘锁芯ラ 提交于 2021-02-08 12:38:36
问题 I am trying to scrape the name of a link in a popup in wikipedia. So when you hover a link in wikipedia, it brings up a little snippet from the intro to that link. I need to scrape that information but I am unsure where it would be in the source. When I inspect the element(as it is popped up) this is the html (for this example I am hovering over the link "Greek") <a dir="ltr" lang="en" class="mwe-popups-extract" href="/wiki/Ancient_Greek"> <p>The <b>Ancient Greek</b> language includes the

Python - find a substring between two strings based on the last occurence of the later string

£可爱£侵袭症+ 提交于 2021-02-08 12:12:07
问题 I am trying to find a substring which is between to strings. The first string is <br> and the last string is <br><br> . The first string I look for is repetitive, while the later string can serve as an anchor. Here is an example: <div class="linkTabBl" style="float:left;padding-top:6px;width:240px"> Anglo American plc <br> 20 Carlton House Terrace <br> SW1Y 5AN London <br> United Kingdom <br><br> Phone : +44 (0)20 7968 8888 <br> Fax : +44 (0)20 7968 8500 <br> Internet : <a class="pageprofil