I need to extract text from a node like this:
Some text with tags might go here.
Also there are paragraphs<
Assuming you want text only (no tags) my solution is below.
Output is:
Some text with tags might go here. Also there are paragraphs. More text can go without paragraphs
public static void main(String[] args) throws IOException {
String str =
""
+ " Some text with tags might go here."
+ " Also there are paragraphs.
"
+ " More text can go without paragraphs
"
+ "";
Document doc = Jsoup.parse(str);
Element div = doc.select("div").first();
StringBuilder builder = new StringBuilder();
stripTags(builder, div.childNodes());
System.out.println("Text without tags: " + builder.toString());
}
/**
* Strip tags from a List of type Node
* @param builder StringBuilder : input and output
* @param nodesList List of type Node
*/
public static void stripTags (StringBuilder builder, List nodesList) {
for (Node node : nodesList) {
String nodeName = node.nodeName();
if (nodeName.equalsIgnoreCase("#text")) {
builder.append(node.toString());
} else {
// recurse
stripTags(builder, node.childNodes());
}
}
}