beautifulsoup

Are spaces around CSS combinators are really optional

 ̄綄美尐妖づ 提交于 2019-12-24 22:51:39
问题 I'm a bit confused by using CSS selectors with axis combinators in BeautifulSoup. Below is the simple code to illustrate what I mean: from bs4 import BeautifulSoup as bs import requests response = requests.get('https://stackoverflow.com/questions/tagged/python') soup = bs(response.text) print(len(soup.select('#mainbar > div'))) returns 6 children... but print(len(soup.select('#mainbar>div'))) returns 0 children... The same with '#mainbar ~ div' (found 1 sibling) and #mainbar~div' (found

Converting a web scrape into excel?

荒凉一梦 提交于 2019-12-24 21:51:57
问题 UPDATE: I tried to install pandas module on Pycharm and got an error? (Indexerror: list index out of range). Pandas error message I also tried to install in command prompt window with no luck using C:> pip install pandas I also tried this cmd.exe? I was able to finally get pip install pandas to work, but it still says I don't have module... pip install pandas I am trying to get this information automatically save into an excel file similar to this Sample excel import requests from bs4 import

Pycharm not recognizing installed BeautifulSoup4 module

依然范特西╮ 提交于 2019-12-24 21:28:50
问题 I have trouble with Pycharm recognizing installed modules. I'm using Pycharm 2018.2.4 CE with Python 3.7 x64, on Windows 10. I don't have Python 2.x installed. I installed requests and BeautifulSoup4 from command line using 'pip' and 'pip3' commands. Pip list and Pycharm is seeing installed module in interpreters list but when I enter my code is grayed out like none of those were installed: Grayed out code Interpreter settings I tried everything - reinstalling both Python (x64 and x86) and

Scraping contents of multi web pages of a website using BeautifulSoup and Selenium

别说谁变了你拦得住时间么 提交于 2019-12-24 20:59:02
问题 The website I want to scrap is : http://www.mouthshut.com/mobile-operators/Reliance-Jio-reviews-925812061 I want to get the last page number of the above the link for proceeding, which is 499 while taking the screenshot. My code : from bs4 import BeautifulSoup from urllib.request import urlopen as uReq from selenium import webdriver;import time from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected

Web scrapping using beautiful soup

左心房为你撑大大i 提交于 2019-12-24 20:48:58
问题 How could i get all the categories mentioned on each listing page of the same website "https://www.sfma.org.sg/member/category". for example, when i choose alcoholic beverage category on the above mentioned page, the listings mentioned on that page has the category information like this :- Catergory: Alcoholic Beverage, Bottled Beverage, Spirit / Liquor / Hard Liquor, Wine, Distributor, Exporter, Importer, Supplier how can i extract the categories mentioned here with in same variable. The

python beautifulsoup4 parsing google finance data

纵饮孤独 提交于 2019-12-24 20:48:53
问题 I'm new to using beautifulsoup and scraping in general so I'm trying to get my feet wet so to speak. I'd like to get the first row of information for the Dow Jones Industrial Average from here: http://www.google.com/finance/historical?q=INDEXDJX%3A.DJI&ei=ZN_2UqD9NOTt6wHYrAE While I can read the data and print(soup) outputs everything, I can't seem to get down far enough. How would I select the rows that I save into table? How about the first rows? Thank you so much for your help! import

Web scrapping remax.com for python

我们两清 提交于 2019-12-24 20:44:15
问题 This is similar to the question I had here. Which was answered perfectly. Now that I have something to work with what I am trying to do now is instead of having a url entered manually in to take data. I want to develop a function that will take in just the address, and zipcode and return the data I want. Now the problem is modifying the url to get the correct url. For example url = 'https://www.remax.com/realestatehomesforsale/25-montage-way-laguna-beach-ca-92651-gid100012499996.html' I see

Unable to get all the data including links from a tr tag

亡梦爱人 提交于 2019-12-24 20:26:54
问题 I've written a script in python to get data from some html elements which are in a table. I have roughly picked some data which are within a tr tag. My goal is to get the data (including href links) within class fn . What I have tried so far can parse all of them (from class fn excluding the links). How can I change my below script to get the links as well from that class. Thanks in advance for any solution. This is what I've tried so far: from bs4 import BeautifulSoup content=""" <tr> <td

Web scraping from remax.com

老子叫甜甜 提交于 2019-12-24 20:09:56
问题 I am trying to scrape some data from Remax.com for information like lotsize or square feet of property. Although I am get the following errors: --------------------------------------------------------------------------- Error Traceback (most recent call last) ~\AppData\Local\Continuum\anaconda3\lib\site-packages\urllib3\contrib\pyopenssl.py in wrap_socket(self, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname) 440 try: --> 441 cnx.do_handshake() 442 except

Python Convert HTML into JSON using Soup

冷暖自知 提交于 2019-12-24 19:19:01
问题 These are the rules The HTML tags will start with any of the following <p> , <ol> or <ul> The content of the HTML when any of step 1 tags is found will contain only the following tags: <em> , <strong> or <span style="text-decoration:underline"> Map step two tags into the following: <strong> will be this item {"bold":True} in a JSON, <em> will {"italics":True} and <span style="text-decoration:underline"> will be {"decoration":"underline"} Any text found would be {"text": "this is the text"} in