html-parsing

beautifulsoup parse every html files in a folder webscrapping [closed]

房东的猫 提交于 2019-12-13 07:43:39
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 2 years ago . My task is to read every html file from a directory. Conditions are to find whether each file contains tags (1) <strong>OO</strong> (2) <strong>QQ</strong> Then 回答1: Your write function is nested in the for loop, that's why you write multiple lines to your index.txt , just move the write out of the loop and put

How to store the response of an HTTP url in android

流过昼夜 提交于 2019-12-13 07:18:08
问题 I am building an android application where user enter the name, email ID This are parse to URL and all the user data are store there. Now, When an user send this info an ID is generated at that Time on server and it resend to me that unique ID as the URL response. How can I get this response and store it. So I can use it For my future use. Here is my code for parsing this value to that URL - private class DownloadJSON extends AsyncTask<String, String, String> { @Override protected void

Regular Expression Negative Lookahead/Lookbehind to Exclude HTML from Find-and-Replace

匆匆过客 提交于 2019-12-13 07:06:37
问题 I have a feature on my site where search results have the search query highlighted in results. However, some of the fields that the site searched through has HTML in it. For example, let's say I had a search result consisting of <span>Hello all</span> . If the user searched for the letter a , I want the code to return <span>Hello <mark>a</mark>all</span> instead of the messy <sp<mark>a</mark>n>Hello <mark>a</mark>ll</sp<mark>a</mark>n> that it would return now. I know that I can use negative

How to access innerText of HTML tag inside a <TD> tag

北城以北 提交于 2019-12-13 07:04:10
问题 I would like to get some text from a web page containing this. I want to have the piece of information with the href="#spec_Brand". <td class="table_spec"> <dl> <dt class="table_spec_title"> <a class="href_icon href_icon_help table_spec_titleimg" title="Which manufacturer is producing the product?" href="#spec_Brand"> <span>Brand</span> </a> <span class="table_spec_titletext">Brand</span> </dt> <dd class="table_spec_definition"> Producer of the product? </dd> </dl> </td> I'm trying to use:

How to parse a rendered web page containing javascript

强颜欢笑 提交于 2019-12-13 06:33:06
问题 How can one extract data from a rendered web page? In which java script would update the data with time. Is it possible to write user script which can access varibles from webpage java script? Please suggest possible way to achieve this. 回答1: according to Turing's Halting Problem Theorem, you can't. That's what we mean when we say that JavaScript is a Turing complete language. The only way is to execute the JavaScript and let it render the page. 回答2: it depends on your programming language.

Parsing jsoup list

让人想犯罪 __ 提交于 2019-12-13 06:26:47
问题 I created a list in which i parse a webpage. I can display the titles and the first image of the first article but what i want is display the image for each article. This is the code: EDITED CODE public class MainActivity extends Activity{ ProgressDialog mProgressDialog; public static final String TAG_TITOLI = "titoli"; private static final String TAG_IMMAGINE = "immagine"; ListView lista; Bitmap bitmap; public ImageView immagine; public ImageView logoimg; static final String BLOG_URL = "http

How to download dynamic generated content from webpage?

让人想犯罪 __ 提交于 2019-12-13 05:41:25
问题 I'm trying to download some data from a webpage that is dynamically generated, so using wget doesn't work. The page is http://gaceta.diputados.gob.mx/SIL/Legislaturas/Listados.html I want to download the list shown for each of the options that can be selected in the field "Legislatura" once downloaded I can process the data in ruby. Just wanted to know what is the best way to download this, and if posible to select each of the options and download. 回答1: You can use the Web Inspector in Safari

Why is this PHP DOM parse with getAttribute not working?

元气小坏坏 提交于 2019-12-13 05:32:41
问题 From this page, I want to get the value of the input tag with the name form_key . But I get the following error: Fatal error: Uncaught Error: Call to undefined method simple_html_dom::getAttribute() in test.php:9 Stack trace: #0 {main} thrown in test.php on line 9 Please note that I am using PHP Simple DOM parser. PHP: include('simple_html_dom.php'); $html = file_get_html('https://b2b.chiemsee.com/customer/account/login/'); if($html->getAttribute('name') =='form_key'){ echo $html->nodeValue;

Parse HTML/XML and find locations of elements in original document

雨燕双飞 提交于 2019-12-13 05:19:18
问题 Is there a way to get the original location of an element in a document, ie. the start and end character index, when parsing html/xml in Python? I've looked through the lxml documentation and couldn't find anything. eg. <a>1</a><b>2</b> ... print tree.find('b').original_position # result: (9, 16) 回答1: Google found this, the gist of which is: it's hard for malformed documents because parsing requires synthesizing valid tokens that don't have any corresponding input. It's possible for valid

how to get the meta name keywords -vba

偶尔善良 提交于 2019-12-13 04:53:25
问题 I am trying to get the meta name keywords from a webpage meta name="keywords" content="Mitch Albom,For One More Day,Little, Brown Book Group,0751537535,Fiction / General,General & Literary Fiction,Modern & contemporary fiction (post c 1945),USA I need to get the contents from it need help. Option Explicit Sub GetData() Dim ie As New InternetExplorer Dim str As String Dim wk As Worksheet Dim webpage As New HTMLDocument Dim item As HTMLHtmlElement Set wk = Sheet1 str = wk.Range("Link").value ie