问题
The question is how to read text from an Instagram profile if a user inputs an Instagram URL. I tried using java.net.URL and all I get is a big load of HTML text. I know little to nothing about working with web pages, so I am seeking some help with how I would get text from the profile (bio, post captions, comments).
Thanks!
回答1:
You can use a scraping tool (Scrapy of Parsehub etc). Just a heads up though, this is against Instagram's TOS so be careful hint hint
回答2:
Hello, you could split the html code as a string before and after the html tag.
And take the second string in the list for the first split and the first string in the list for the second split.
But you need some knowledge of html to know what an html tag is and how you find out which tag you need to split.
Have fun, I hope I could help you!
回答3:
You can use jsoup (https://jsoup.org/) to extract the specific tag from html content.
Here is an example to extract h1 tag content from the body of the HTML.
// Parse HTML String using JSoup library
String HTMLSTring = "<!DOCTYPE html>"
+ "<html>"
+ "<head>"
+ "<title>JSoup Example</title>"
+ "</head>"
+ "<body>"
+ "<table><tr><td>
<h1>HelloWorld</h1></tr>"
+ "</table>"
+ "</body>"
+ "</html>";
Document html = Jsoup.parse(HTMLSTring);
String title = html.title();
String h1 = html.body().getElementsByTag("h1").text();
You can find a few more examples from the below blog post https://javarevisited.blogspot.com/2014/09/how-to-parse-html-file-in-java-jsoup-example.html
Hope this is helpful.
来源:https://stackoverflow.com/questions/62854893/reading-text-from-an-instagram-profile