html truncator in java
Is there any utility (or sample source code) that truncates HTML (for preview) in Java? I want to do the truncation on the server and not on the client. I'm using HTMLUnit to parse HTML. UPDATE: I want to be able to preview the HTML, so the truncator would maintain the HTML structure while stripping out the elements after the desired output length. I think you're going to need to write your own XML parser to accomplish this. Pull out the body node, add nodes until binary length < some fixed size, and then rebuild the document. If HTMLUnit doesn't create semantic XHTML, I'd recommend tagsoup .