Does anyone of an algorithm that extracts contents from a webpage? like instapaper?
boilerpipe is opensource java. the algorithm is published in a scientific paper so you can read how well it does compared to other algorithms. reading around it seems to be one of the best.