beautifulsoup

Extract JSON from HTML Script tag with BeautifulSoup in Python

无人久伴 提交于 2020-02-02 07:03:39
问题 I have the following HTML, and what should I do to extract the JSON from the variable: window.__INITIAL_STATE__ <!DOCTYPE doctype html> <html lang="en"> <script> window.sessConf = "-2912474957111138742"; /* <sl:translate_json> */ window.__INITIAL_STATE__ = { /* Target JSON here with 12 million characters */}; /* </sl:translate_json> */ </script> </html> 回答1: You can use the following Python code to extract the JavaScript code. soup = BeautifulSoup(html) s=soup.find('script') js = 'window = {}

Extract JSON from HTML Script tag with BeautifulSoup in Python

|▌冷眼眸甩不掉的悲伤 提交于 2020-02-02 07:01:25
问题 I have the following HTML, and what should I do to extract the JSON from the variable: window.__INITIAL_STATE__ <!DOCTYPE doctype html> <html lang="en"> <script> window.sessConf = "-2912474957111138742"; /* <sl:translate_json> */ window.__INITIAL_STATE__ = { /* Target JSON here with 12 million characters */}; /* </sl:translate_json> */ </script> </html> 回答1: You can use the following Python code to extract the JavaScript code. soup = BeautifulSoup(html) s=soup.find('script') js = 'window = {}

Parsing web page in python using Beautiful Soup

孤街醉人 提交于 2020-01-31 09:30:13
问题 I have some troubles with getting the data from the website. The website source is here: view-source:http://release24.pl/wpis/23714/%22La+mer+a+boire%22+%282011%29+FRENCH.DVDRip.XviD-AYMO there's sth like this: INFORMACJE O FILMIE Tytuł............................................: La mer à boireOcena.............................................: IMDB - 6.3/10 (24)Produkcja.........................................: FrancjaGatunek...........................................: DramatCzas trwania..

Python - which is considered better for scrapping: selenium or beautifulsoup with selenium? [closed]

天大地大妈咪最大 提交于 2020-01-30 11:01:46
问题 Closed . This question is opinion-based. It is not currently accepting answers. Want to improve this question? Update the question so it can be answered with facts and citations by editing this post. Closed 2 years ago . This question is for Python 3.6.3, bs4 and Selenium 3.8 on Win10. I am trying to scrape pages with dynamic content. What I am trying to scrape is numbers and text (from http://www.oddsportal.com for example). From my understanding using requests+beautifulsoup will not do the

Beautifulsoup HTML table parsing--only able to get the last row?

隐身守侯 提交于 2020-01-30 06:42:48
问题 I have a simple HTML table to parse but somehow Beautifulsoup is only able to get me results from the last row. I'm wondering if anyone would take a look at that and see what's wrong. So I already created the rows object from the HTML table: <table class='participants-table'> <thead> <tr> <th data-field="name" class="sort-direction-toggle name">Name</th> <th data-field="type" class="sort-direction-toggle type active-sort asc">Type</th> <th data-field="sector" class="sort-direction-toggle

Beautifulsoup HTML table parsing--only able to get the last row?

帅比萌擦擦* 提交于 2020-01-30 06:41:06
问题 I have a simple HTML table to parse but somehow Beautifulsoup is only able to get me results from the last row. I'm wondering if anyone would take a look at that and see what's wrong. So I already created the rows object from the HTML table: <table class='participants-table'> <thead> <tr> <th data-field="name" class="sort-direction-toggle name">Name</th> <th data-field="type" class="sort-direction-toggle type active-sort asc">Type</th> <th data-field="sector" class="sort-direction-toggle

pandas read_html - no tables found

為{幸葍}努か 提交于 2020-01-30 03:44:13
问题 I am attempting to see if I can read a table of data from WU.com, but I am getting a type error for no tables found. (first timer on web scrapping too here) There is also another person with a very similar stackoverflow question here with WU table of data, but the solution is a little bit complicated to me. import pandas as pd df_list = pd.read_html('https://www.wunderground.com/history/daily/us/wi/milwaukee/KMKE/date/2013-6-26') print(df_list) On the webpage of historical data for Milwaukee,

find() after replaceWith() doesn't work (using BeautifulSoup)

*爱你&永不变心* 提交于 2020-01-29 05:14:19
问题 Please consider the following python session: >>> from BeautifulSoup import BeautifulSoup >>> s = BeautifulSoup("<p>This <i>is</i> a <i>test</i>.</p>"); myi = s.find("i") >>> myi.replaceWith(BeautifulSoup("was")) >>> s.find("i") >>> s = BeautifulSoup("<p>This <i>is</i> a <i>test</i>.</p>"); myi = s.find("i") >>> myi.replaceWith("was") >>> s.find("i") <i>test</i> Please note the missing output of s.find("i") after line 4! What's the reason for this? Is there a workaround? EDIT: Actually, the

find() after replaceWith() doesn't work (using BeautifulSoup)

谁说胖子不能爱 提交于 2020-01-29 05:14:04
问题 Please consider the following python session: >>> from BeautifulSoup import BeautifulSoup >>> s = BeautifulSoup("<p>This <i>is</i> a <i>test</i>.</p>"); myi = s.find("i") >>> myi.replaceWith(BeautifulSoup("was")) >>> s.find("i") >>> s = BeautifulSoup("<p>This <i>is</i> a <i>test</i>.</p>"); myi = s.find("i") >>> myi.replaceWith("was") >>> s.find("i") <i>test</i> Please note the missing output of s.find("i") after line 4! What's the reason for this? Is there a workaround? EDIT: Actually, the

Getting style of <tr> tag using BeautifulSoup

我的未来我决定 提交于 2020-01-28 11:27:10
问题 I'm scraping a page and from a table on that page I'm getting all <tr> elements like so: r = requests.get("http://lol.esportswikis.com/wiki/G2_Esports/Match_History") s = BeautifulSoup(r.content, "lxml") tr = s.find_all("table", class_="wikitable sortable")[0].find_all("tr")[3:] print tr[0] which outputs: <tr style="background-color:#C6EFCE"><td>...</td> ... <td>...</td></tr> Now I'm trying to get the style of the <tr> tag, but I have no idea how. If I do this for example: for item in tr[0]: