beautifulsoup

Python to parse html data and store into the database

心已入冬 提交于 2020-01-07 08:32:58
问题 a This is trouble me for two days, I am new one to python, I want to Parse the html data as the following link:http://movie.walkerplus.com/list/2015/12/ and then store the data into the postgresql database named movie_db, and there is table named films which is created by the following command: CREATE TABLE films ( title varchar(128) NOT NULL, description varchar(256) NOT NULL, directors varchar(128)[], roles varchar(128)[] ); I have parsed data, there are three list data for title,

Python associate urls's ids and url's titles in lists

倖福魔咒の 提交于 2020-01-07 06:37:25
问题 continution of this question: Python beautifulsoup how to get the line after 'href' I have this HTML code <a href="http://pluzz.francetv.fr/videos/monte_le_son_live_,101973832.html" class="ss-titre"> Monte le son </a> <div class="rs-cell-details"> <a href="http://pluzz.francetv.fr/videos/monte_le_son_live_,101973832.html" class="ss-titre"> "Rubin_Steiner" </a> <a href="http://pluzz.francetv.fr/videos/fare_maohi_,102103928.html" class="ss-titre"> Fare maohi </a> As you see, "Monte le son" and

Python associate urls's ids and url's titles in lists

时间秒杀一切 提交于 2020-01-07 06:37:12
问题 continution of this question: Python beautifulsoup how to get the line after 'href' I have this HTML code <a href="http://pluzz.francetv.fr/videos/monte_le_son_live_,101973832.html" class="ss-titre"> Monte le son </a> <div class="rs-cell-details"> <a href="http://pluzz.francetv.fr/videos/monte_le_son_live_,101973832.html" class="ss-titre"> "Rubin_Steiner" </a> <a href="http://pluzz.francetv.fr/videos/fare_maohi_,102103928.html" class="ss-titre"> Fare maohi </a> As you see, "Monte le son" and

Can I extract comments of any page from https://www.rt.com/ using python3?

不打扰是莪最后的温柔 提交于 2020-01-07 06:25:27
问题 I am writing a web crawler. I extracted heading and Main Discussion of the this link but I am unable to find any one of the comment (Ctrl+u -> Ctrl+f . Comment Text). I think the comments are written in JavaScript. Can I extract it? 回答1: RT are using a service from spot.im for comments you need to do make two POST requests, first https://api.spot.im/me/network-token/spotim to get a token, then https://api.spot.im/conversation-read/spot/sp_6phY2k0C/post/353493/get to get the comments as JSON.

can we change the css , html values using splinter , beautiful soup or selenium?

99封情书 提交于 2020-01-07 04:55:47
问题 can i change the value of any html , css thing using splinter or selenium like we can do with inspect element : `<form action="/action_page.php" oninput="x.value=parseInt(a.value)+parseInt(b.value)"> 0 <input type="range" id="a" name="a" value="50"> 100 + <input type="number" id="b" name="b" value="50"> = <output name="x" for="a b"></output> <br><br> <input type="submit"> </form>` can i select <input type="range" id="a" name="a" value="50"> and can change the value="30" by using splinter or

Parsing BeautifulSoup html tag

十年热恋 提交于 2020-01-07 04:52:26
问题 I need to parse an HTML file using BeautifulSoup. The HTML looks like that: <div class="entry_container"> <div class="entry lang_en-gb" id="turn-over_1"> <span class="inline"> <h1 class="hwd">turn over</h1> </span> <div class="hom" id="turn-over_1.1"> <span class="gramGrp"><span class="pos">intransitive verb</span></span> <div class="sense"><span class="bold">1 </span><span class="gramGrp"><span class="colloc"><span>[</span>person<span>]</span></span></span><span class="lbl"><span> (</span>in

Parsing BeautifulSoup html tag

眉间皱痕 提交于 2020-01-07 04:52:07
问题 I need to parse an HTML file using BeautifulSoup. The HTML looks like that: <div class="entry_container"> <div class="entry lang_en-gb" id="turn-over_1"> <span class="inline"> <h1 class="hwd">turn over</h1> </span> <div class="hom" id="turn-over_1.1"> <span class="gramGrp"><span class="pos">intransitive verb</span></span> <div class="sense"><span class="bold">1 </span><span class="gramGrp"><span class="colloc"><span>[</span>person<span>]</span></span></span><span class="lbl"><span> (</span>in

Beautiful Soup parsing multiple <div> and successive <p> tags into dictionary

六眼飞鱼酱① 提交于 2020-01-07 04:19:28
问题 I have multiple inline divs (which are 'headers), and paragraph tags beneath (not IN the divs), that are theoretically 'children'... I would like to convert this to a dictionary. I can't quite figure out the best way to do it. Here is roughly what the site looks like: <div><span>This should be dict key1</span></div> <p>This should be the value of key1</p> <p>This should be the value of key1</p> <div><span>This should be dict key2</span></div> <p>This should be the value of key2</p> The Python

How to collect data of Google Search with beautiful soup using python

此生再无相见时 提交于 2020-01-07 04:18:14
问题 I want to know about how I can collect all the URL's and from the page source using beautiful soup and can visit all of them one by one in the google search results and move to next google index pages. here is the URL https://www.google.com/search?q=site%3Awww.rashmi.com&rct=j that I want to collect and screen shot here http://www.rashmi.com/blog/wp-content/uploads/2014/11/screencapture-www-google-com-search-1433026719960.png here is the code I'm trying def getPageLinks(page): links = [] for

pass argument to findAll in bs4 in python

浪尽此生 提交于 2020-01-07 04:07:06
问题 I need help with using bs4 in a function. If I want to pass the path to findAll (or find) through function, it does not work. Please see the sample below. from bs4 import BeautifulSoup data = '<h1 class="headline">Willkommen!</h1>' def check_text(path, value): soup = BeautifulSoup(''.join(data), "lxml") x1 = "h1", {"class":"headline"} x2 = path x3 = tuple(path) print type(x1), 'soup.findAll(x1)===', soup.findAll(x1) print type(x2), 'soup.findAll(x2)===', soup.findAll(x2) print type(x3), 'soup