beautifulsoup

By Beautiful Soup i scrape twitter data. I am able to get data but can't save in csv file

天大地大妈咪最大 提交于 2020-01-23 17:31:06
问题 I scraped Twitter for user name, Tweets, replies, retweets but can't save in a CSV file. Here is the code: from urllib.request import urlopen from bs4 import BeautifulSoup file = "5_twitterBBC.csv" f = open(file, "w") Headers = "tweet_user, tweet_text, replies, retweets\n" f.write(Headers) for page in range(0,5): url = "https://twitter.com/BBCWorld".format(page) html = urlopen(url) soup = BeautifulSoup(html,"html.parser") tweets = soup.find_all("div", {"class":"js-stream-item"}) for tweet in

Beautifulsoup returns incomplete html

梦想的初衷 提交于 2020-01-23 17:08:39
问题 I am reading a book about Python right now. There is a small project for homework: "Write a program that goes to a photo-sharing site like Flickr or Imgur, searches for a category of photos, and then downloads all the resulting images." It is suggested to use only webbrowser, requests and bs4 libraries. I cannot do it for Flickr. I found that the parser cannot go inside the element (div class="interaction-view"). Using "Inspect element" in Chrome I can see that there are a few "div" elements

BeautifulSoup “AttributeError: 'NoneType' object has no attribute 'text'”

和自甴很熟 提交于 2020-01-23 16:46:10
问题 I was web-scraping weather-searched Google with bs4, and Python can't find a <span> tag when there is one. How can I solve this problem? I tried to find this <span> with the class and the id , but both failed. <div id="wob_dcp"> <span class="vk_gy vk_sh" id="wob_dc">Clear with periodic clouds</span> </div> Above is the HTML code I was trying to scrape in the page: Sorry, I can't post images because of the reputation^^; response = requests.get('https://www.google.com/search?hl=ja&ei

BeautifulSoup, extracting strings within HTML tags, ResultSet objects

帅比萌擦擦* 提交于 2020-01-23 08:40:26
问题 I am confused exactly how I can use the ResultSet object with BeautifulSoup, i.e. bs4.element.ResultSet . After using find_all() , how can one extract text? Example: In the bs4 documentation, the HTML document html_doc looks like: <p class="story"> Once upon a time there were three little sisters; and their names were <a class="sister" href="http://example.com/elsie" id="link1"> Elsie </a> , <a class="sister" href="http://example.com/lacie" id="link2"> Lacie </a> and <a class="sister" href=

scrape hidden pages if search yields more results than displayed

我们两清 提交于 2020-01-22 02:17:50
问题 Some of the search queries entered under https://www.comparis.ch/carfinder/default would yield more than 1'000 results (shown dynamically on the search page). The results however only show a max of 100 pages with 10 results each so I'm trying to scrape the remaining data given a query that yields more than 1'000 results. The code to scrape the IDs of the first 100 pages is (takes approx. 2 minutes to run through all 100 pages): from bs4 import BeautifulSoup import requests # as the max number

Cannot get table data - HTML

我只是一个虾纸丫 提交于 2020-01-21 17:22:04
问题 I am trying to get the 'Earnings Announcements table' from: https://www.zacks.com/stock/research/amzn/earnings-announcements I am using different beautifulsoup options but none get the table. table = soup.find('table', attrs={'class': 'earnings_announcements_earnings_table'}) table = soup.find_all('table') When I inspect the table, the elements of the table are there. I am pasting a portion of the code I am getting for the table (js, json?). document.obj_data = { "earnings_announcements

Issues with invoking “on click event” on the html page using beautiful soup in Python

别来无恙 提交于 2020-01-21 05:16:18
问题 I am trying to scrape names of all the items present on the webpage but by default only 18 are visible on the page & my code is scraping only those. You can view all items by clicking on "Show all" button but that button is in Javascript. After some research, I found that PyQt module can be used to solve this issue involving javascript buttons & I used it but I am still not able to invoke the "on click" event. Below is the referred code: import csv import urllib2 import sys import time from

How to scrape multiple pages with an unchanging URL - Python 3

牧云@^-^@ 提交于 2020-01-19 13:12:30
问题 I recently got in touch with web scraping and tried to web scrape various pages. For now, I am trying to scrape the following site - http://www.pizzahut.com.cn/StoreList So far I've used selenium to get the longitude and latitude scraped. However, my code right now only extracts the first page. I know there is a dynamic web scraping that executes javascript and loads different pages, but had hard time trying to find a right solution. I was wondering if there's a way to access the other 49

How to scrape multiple pages with an unchanging URL - Python 3

戏子无情 提交于 2020-01-19 13:09:05
问题 I recently got in touch with web scraping and tried to web scrape various pages. For now, I am trying to scrape the following site - http://www.pizzahut.com.cn/StoreList So far I've used selenium to get the longitude and latitude scraped. However, my code right now only extracts the first page. I know there is a dynamic web scraping that executes javascript and loads different pages, but had hard time trying to find a right solution. I was wondering if there's a way to access the other 49

Find index of tag with certain text in beautifulsoup/python

落花浮王杯 提交于 2020-01-17 05:01:09
问题 I have a simple 4x2 html table that contains information about a property. I'm trying to extract the value 1972 , which is under the column heading of Year Built . If I find all the tags td , how do I extract the index of the tag that contains the text Year Built ? Because once I find that index, I can just add 4 to get to the tag that contains the value 1972 . Here is the html: <table> <tbody> <tr> <td>Building</td> <td>Type</td> <td>Year Built</td> <td>Sq. Ft.</td> </tr> <tr> <td>R01</td>