Most questions about extracting text from HTML (i.e., stripping the tags) use:
jQuery( htmlString ).text();
While this
This seems to be (nearly) doing what you want:
function getText($node) {
return $node.contents().map(function () {
if (this.nodeName === 'BR') {
return '\n';
} else if (this.nodeType === 3) {
return this.nodeValue;
} else {
return getText($(this));
}
}).get().join('');
}
DEMO
It just recursively concatenates the values of all text nodes and replaces elements with line breaks.
But there is no semantics in this, it completely relies the original HTML formatting (the leading and trailing white spaces seem to come from how jsFiddle embeds the HTML, but you can easily trim those). For example, notice how it indents the definition term.
If you really want to do this on a semantic level, you need a list of block level elements, recursively iterate over the elements and indent them accordingly. You treat different block elements differently with respect to indentation and line breaks around them. This should not be too difficult.