bs4

BeautifulSoup output to .txt file

大憨熊 提交于 2019-12-20 03:10:09
问题 I am trying to export my data as a .txt file from bs4 import BeautifulSoup import requests import os import os os.getcwd() '/home/folder' os.mkdir("Probeersel6") os.chdir("Probeersel6") os.getcwd() '/home/Desktop/folder' os.mkdir("img") #now `folder` url = "http://nos.nl/artikel/2093082-steeds-meer-nekklachten-bij-kinderen-door-gebruik-tablets.html" r = requests.get(url) soup = BeautifulSoup(r.content) data = soup.find_all("article", {"class": "article"}) with open(""%s".txt", "wb" %(url)) as

BeautifulSoup output to .txt file

谁都会走 提交于 2019-12-20 03:08:02
问题 I am trying to export my data as a .txt file from bs4 import BeautifulSoup import requests import os import os os.getcwd() '/home/folder' os.mkdir("Probeersel6") os.chdir("Probeersel6") os.getcwd() '/home/Desktop/folder' os.mkdir("img") #now `folder` url = "http://nos.nl/artikel/2093082-steeds-meer-nekklachten-bij-kinderen-door-gebruik-tablets.html" r = requests.get(url) soup = BeautifulSoup(r.content) data = soup.find_all("article", {"class": "article"}) with open(""%s".txt", "wb" %(url)) as

How to find all comments with Beautiful Soup

别来无恙 提交于 2019-12-17 06:45:56
问题 This question was asked four years ago, but the answer is now out of date for BS4. I want to delete all comments in my html file using beautiful soup. Since BS4 makes each comment as a special type of navigable string, I thought this code would work: for comments in soup.find_all('comment'): comments.decompose() So that didn't work.... How do I find all comments using BS4? 回答1: You can pass a function to find_all() to help it check whether the string is a Comment. For example I have below

How to get Python bs4 to work properly on XML?

。_饼干妹妹 提交于 2019-12-13 19:45:30
问题 I'm trying to use Python and BeautifulSoup 4 (bs4) to convert Inkscape SVGs into an XML-like format for some proprietary software. I can't seem to get bs4 to correctly parse a minimal example. I need the parser to respect self-closing tags, handle unicode, and not add html stuff. I thought specifying the 'lxml' parser with selfClosingTags should do it, but nope! check it out. #!/usr/bin/python from __future__ import print_function from bs4 import BeautifulSoup print('\nbs4 mangled XML:')

soup.select('.r a') in 'https://www.google.com/#q=vigilante+mic' gives empty list in python BeautifulSoup

≡放荡痞女 提交于 2019-12-13 08:42:10
问题 I am using BeautifulSoup to extract all links from google search results page. here's the snippet of the code: import requests,bs4 res = requests.get('https://www.google.com/#q=vigilante+mic') soup = bs4.BeautifulSoup(res.text) linkElem = soup.select('.r a') Now soup.select('.r a') is returning an empty list thankyou 回答1: That's because of the url you are using: https://www.google.com/#q=vigilante+mic Is a javascript version of the search. If you curl it you will see there are no answers in

Extracting information from a table except header of the table using bs4

时光总嘲笑我的痴心妄想 提交于 2019-12-13 07:23:35
问题 I am trying to extracting information from a table using bs4 and python. when I am using the following code to extract information from header of the table: tr_header=table.findAll("tr")[0] tds_in_header = [td.get_text() for td in tr_header.findAll("td")] header_items= [data.encode('utf-8') for data in tds_in_header] len_table_header = len (header_items) It works, but for the following codes that I am trying to extract information from the first row to the end of the table: tr_all=table

beautiful soup captures null values in a table

我只是一个虾纸丫 提交于 2019-12-13 05:53:18
问题 For the following piece of HTML code, I used beautifulsoup to capture the table information: <table> <tr> <td><b>Code</b></td> <td><b>Display</b></td> </tr> <tr> <td>min</td> <td>Minute</td><td/> </tr> <tr> <td>happy </td> <td>Hour</td><td/> </tr> <tr> <td>daily </td> <td>Day</td><td/> </tr> This is my code: comments = [td.get_text() for td in table.findAll("td")] Comments=[data.encode('utf-8') for data in comments] As you see, this table has two headers: "code and display" and some values in

Cannot find table using Python BeautifulSoup

巧了我就是萌 提交于 2019-12-13 02:55:44
问题 I am trying to scrape the data from the table id=AWS from the following NOAA site, https://www.weather.gov/afc/alaskaObs, but when I try to find the table using '.find' my result comes up as none. I am able to return the parent div, but can't seem to access the table. Below is my code. from bs4 import BeautifulSoup from urllib2 import urlopen # Get soup set up html = urlopen('https://www.weather.gov/afc/alaskaObs').read() soup = BeautifulSoup(html, 'lxml').find("div", {"id":"obDataDiv"}).find

Extracting properly data with bs4?

試著忘記壹切 提交于 2019-12-12 05:29:54
问题 Here is my first question on this site as I have tried many ways to get what I want but I didnt succeed.. I try to extract 2 types of data from a french website similar to CraigList. My need is simple and I manage to get those information but I still have tags and other signs in my extract. I also have issue with encoding even if using .encode(utf-8). # -*- coding: utf-8 -*- from urllib.request import urlopen from bs4 import BeautifulSoup import re import csv csvfile=open("test.csv", 'w+')

Extract specific columns from a given webpage

风流意气都作罢 提交于 2019-12-12 03:54:18
问题 I am trying to read web page using python and save the data in csv format to be imported as pandas dataframe. I have the following code that extracts the links from all the pages, instead I am trying to read certain column fields. for i in range(10): url='https://pythonexpress.in/workshop/'+str(i).zfill(3) import urllib2 from bs4 import BeautifulSoup try: page = urllib2.urlopen(url).read() soup = BeautifulSoup(page) for anchor in soup.find_all('div', {'class':'col-xs-8'})[:9]: print i, anchor