I\'m writing an android app that takes relevant data from a website and presents it to the user (html scraping). The application downloads the source code and parses it, loo
to get a webpage in java you'll find a code on the bottom of this answer.
you can use reg-expressions.
here's a nice reference
android regex
but, if the html is well written you can also try with yahoo's yql. it outputs as json or xml so you can grab it really easy after.
yahoo yql console
personalty, I parse them in python or in php because I feel more comfortable in those languages.
get webpage: How to use it:
Get_Webpage obj = new Get_Webpage("http://your_url_here"); Sting source = obj.get_webpage_source();
public class Get_Webpage {
public String parsing_url = "";
public Get_Webpage(String url_2_get){
parsing_url = url_2_get;
}
public String get_webpage_source(){
HttpClient client = new DefaultHttpClient();
HttpGet request = new HttpGet(parsing_url);
HttpResponse response = null;
try {
response = client.execute(request);
} catch (ClientProtocolException e) {
} catch (IOException e) {
}
String html = "";
InputStream in = null;
try {
in = response.getEntity().getContent();
} catch (IllegalStateException e) {
} catch (IOException e) {
}
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
StringBuilder str = new StringBuilder();
String line = null;
try {
while((line = reader.readLine()) != null)
{
str.append(line);
}
} catch (IOException e) {
}
try {
in.close();
} catch (IOException e) {
}
html = str.toString();
return html;
}
}