beautifulsoup | 易学教程

Python to parse html data and store into the database

阅读更多关于 Python to parse html data and store into the database

问题 a This is trouble me for two days, I am new one to python, I want to Parse the html data as the following link:http://movie.walkerplus.com/list/2015/12/ and then store the data into the postgresql database named movie_db, and there is table named films which is created by the following command: CREATE TABLE films ( title varchar(128) NOT NULL, description varchar(256) NOT NULL, directors varchar(128)[], roles varchar(128)[] ); I have parsed data, there are three list data for title,

Python associate urls's ids and url's titles in lists

阅读更多关于 Python associate urls's ids and url's titles in lists

问题 continution of this question: Python beautifulsoup how to get the line after 'href' I have this HTML code <a href="http://pluzz.francetv.fr/videos/monte_le_son_live_,101973832.html" class="ss-titre"> Monte le son </a> <div class="rs-cell-details"> <a href="http://pluzz.francetv.fr/videos/monte_le_son_live_,101973832.html" class="ss-titre"> "Rubin_Steiner" </a> <a href="http://pluzz.francetv.fr/videos/fare_maohi_,102103928.html" class="ss-titre"> Fare maohi </a> As you see, "Monte le son" and

Python associate urls's ids and url's titles in lists

阅读更多关于 Python associate urls's ids and url's titles in lists

Can I extract comments of any page from https://www.rt.com/ using python3?

阅读更多关于 Can I extract comments of any page from https://www.rt.com/ using python3?

问题 I am writing a web crawler. I extracted heading and Main Discussion of the this link but I am unable to find any one of the comment (Ctrl+u -> Ctrl+f . Comment Text). I think the comments are written in JavaScript. Can I extract it? 回答1: RT are using a service from spot.im for comments you need to do make two POST requests, first https://api.spot.im/me/network-token/spotim to get a token, then https://api.spot.im/conversation-read/spot/sp_6phY2k0C/post/353493/get to get the comments as JSON.

can we change the css , html values using splinter , beautiful soup or selenium?

阅读更多关于 can we change the css , html values using splinter , beautiful soup or selenium?

问题 can i change the value of any html , css thing using splinter or selenium like we can do with inspect element : `<form action="/action_page.php" oninput="x.value=parseInt(a.value)+parseInt(b.value)"> 0 <input type="range" id="a" name="a" value="50"> 100 + <input type="number" id="b" name="b" value="50"> = <output name="x" for="a b"></output> <br><br> <input type="submit"> </form>` can i select <input type="range" id="a" name="a" value="50"> and can change the value="30" by using splinter or

Parsing BeautifulSoup html tag

阅读更多关于 Parsing BeautifulSoup html tag

问题 I need to parse an HTML file using BeautifulSoup. The HTML looks like that: <div class="entry_container"> <div class="entry lang_en-gb" id="turn-over_1"> <span class="inline"> <h1 class="hwd">turn over</h1> </span> <div class="hom" id="turn-over_1.1"> <span class="gramGrp"><span class="pos">intransitive verb</span></span> <div class="sense"><span class="bold">1 </span><span class="gramGrp"><span class="colloc"><span>[</span>person<span>]</span></span></span><span class="lbl"><span> (</span>in

Parsing BeautifulSoup html tag

阅读更多关于 Parsing BeautifulSoup html tag

Beautiful Soup parsing multiple <div> and successive <p> tags into dictionary

阅读更多关于 Beautiful Soup parsing multiple and successive tags into dictionary

问题 I have multiple inline divs (which are 'headers), and paragraph tags beneath (not IN the divs), that are theoretically 'children'... I would like to convert this to a dictionary. I can't quite figure out the best way to do it. Here is roughly what the site looks like: <div><span>This should be dict key1</span></div> <p>This should be the value of key1</p> <p>This should be the value of key1</p> <div><span>This should be dict key2</span></div> <p>This should be the value of key2</p> The Python

How to collect data of Google Search with beautiful soup using python

阅读更多关于 How to collect data of Google Search with beautiful soup using python

问题 I want to know about how I can collect all the URL's and from the page source using beautiful soup and can visit all of them one by one in the google search results and move to next google index pages. here is the URL https://www.google.com/search?q=site%3Awww.rashmi.com&rct=j that I want to collect and screen shot here http://www.rashmi.com/blog/wp-content/uploads/2014/11/screencapture-www-google-com-search-1433026719960.png here is the code I'm trying def getPageLinks(page): links = [] for

pass argument to findAll in bs4 in python

阅读更多关于 pass argument to findAll in bs4 in python

问题 I need help with using bs4 in a function. If I want to pass the path to findAll (or find) through function, it does not work. Please see the sample below. from bs4 import BeautifulSoup data = '<h1 class="headline">Willkommen!</h1>' def check_text(path, value): soup = BeautifulSoup(''.join(data), "lxml") x1 = "h1", {"class":"headline"} x2 = path x3 = tuple(path) print type(x1), 'soup.findAll(x1)===', soup.findAll(x1) print type(x2), 'soup.findAll(x2)===', soup.findAll(x2) print type(x3), 'soup