beautifulsoup | 易学教程

By Beautiful Soup i scrape twitter data. I am able to get data but can't save in csv file

阅读更多关于 By Beautiful Soup i scrape twitter data. I am able to get data but can't save in csv file

问题 I scraped Twitter for user name, Tweets, replies, retweets but can't save in a CSV file. Here is the code: from urllib.request import urlopen from bs4 import BeautifulSoup file = "5_twitterBBC.csv" f = open(file, "w") Headers = "tweet_user, tweet_text, replies, retweets\n" f.write(Headers) for page in range(0,5): url = "https://twitter.com/BBCWorld".format(page) html = urlopen(url) soup = BeautifulSoup(html,"html.parser") tweets = soup.find_all("div", {"class":"js-stream-item"}) for tweet in

Beautifulsoup returns incomplete html

阅读更多关于 Beautifulsoup returns incomplete html

问题 I am reading a book about Python right now. There is a small project for homework: "Write a program that goes to a photo-sharing site like Flickr or Imgur, searches for a category of photos, and then downloads all the resulting images." It is suggested to use only webbrowser, requests and bs4 libraries. I cannot do it for Flickr. I found that the parser cannot go inside the element (div class="interaction-view"). Using "Inspect element" in Chrome I can see that there are a few "div" elements

BeautifulSoup “AttributeError: 'NoneType' object has no attribute 'text'”

阅读更多关于 BeautifulSoup “AttributeError: 'NoneType' object has no attribute 'text'”

问题 I was web-scraping weather-searched Google with bs4, and Python can't find a <span> tag when there is one. How can I solve this problem? I tried to find this <span> with the class and the id , but both failed. <div id="wob_dcp"> <span class="vk_gy vk_sh" id="wob_dc">Clear with periodic clouds</span> </div> Above is the HTML code I was trying to scrape in the page: Sorry, I can't post images because of the reputation^^; response = requests.get('https://www.google.com/search?hl=ja&ei

BeautifulSoup, extracting strings within HTML tags, ResultSet objects

阅读更多关于 BeautifulSoup, extracting strings within HTML tags, ResultSet objects

问题 I am confused exactly how I can use the ResultSet object with BeautifulSoup, i.e. bs4.element.ResultSet . After using find_all() , how can one extract text? Example: In the bs4 documentation, the HTML document html_doc looks like: <p class="story"> Once upon a time there were three little sisters; and their names were <a class="sister" href="http://example.com/elsie" id="link1"> Elsie </a> , <a class="sister" href="http://example.com/lacie" id="link2"> Lacie </a> and <a class="sister" href=

scrape hidden pages if search yields more results than displayed

阅读更多关于 scrape hidden pages if search yields more results than displayed

问题 Some of the search queries entered under https://www.comparis.ch/carfinder/default would yield more than 1'000 results (shown dynamically on the search page). The results however only show a max of 100 pages with 10 results each so I'm trying to scrape the remaining data given a query that yields more than 1'000 results. The code to scrape the IDs of the first 100 pages is (takes approx. 2 minutes to run through all 100 pages): from bs4 import BeautifulSoup import requests # as the max number

Cannot get table data - HTML

阅读更多关于 Cannot get table data - HTML

问题 I am trying to get the 'Earnings Announcements table' from: https://www.zacks.com/stock/research/amzn/earnings-announcements I am using different beautifulsoup options but none get the table. table = soup.find('table', attrs={'class': 'earnings_announcements_earnings_table'}) table = soup.find_all('table') When I inspect the table, the elements of the table are there. I am pasting a portion of the code I am getting for the table (js, json?). document.obj_data = { "earnings_announcements

Issues with invoking “on click event” on the html page using beautiful soup in Python

阅读更多关于 Issues with invoking “on click event” on the html page using beautiful soup in Python

问题 I am trying to scrape names of all the items present on the webpage but by default only 18 are visible on the page & my code is scraping only those. You can view all items by clicking on "Show all" button but that button is in Javascript. After some research, I found that PyQt module can be used to solve this issue involving javascript buttons & I used it but I am still not able to invoke the "on click" event. Below is the referred code: import csv import urllib2 import sys import time from

How to scrape multiple pages with an unchanging URL - Python 3

阅读更多关于 How to scrape multiple pages with an unchanging URL - Python 3

问题 I recently got in touch with web scraping and tried to web scrape various pages. For now, I am trying to scrape the following site - http://www.pizzahut.com.cn/StoreList So far I've used selenium to get the longitude and latitude scraped. However, my code right now only extracts the first page. I know there is a dynamic web scraping that executes javascript and loads different pages, but had hard time trying to find a right solution. I was wondering if there's a way to access the other 49

How to scrape multiple pages with an unchanging URL - Python 3

阅读更多关于 How to scrape multiple pages with an unchanging URL - Python 3

Find index of tag with certain text in beautifulsoup/python

阅读更多关于 Find index of tag with certain text in beautifulsoup/python

问题 I have a simple 4x2 html table that contains information about a property. I'm trying to extract the value 1972 , which is under the column heading of Year Built . If I find all the tags td , how do I extract the index of the tag that contains the text Year Built ? Because once I find that index, I can just add 4 to get to the tag that contains the value 1972 . Here is the html: <table> <tbody> <tr> <td>Building</td> <td>Type</td> <td>Year Built</td> <td>Sq. Ft.</td> </tr> <tr> <td>R01</td>