问题
In my application, I use a JTextPane
to display some log information. As I want to hightlight some specific lines in this text (for example the error messages), I set the contentType
as "text/html
". This way, I can format my text.
Now, I create a JButton that copies the content of this JTextPane
into the clipboard. That part is easy, but my problem is that when I call myTextPane.getText()
, I get the HTML code, such as :
<html>
<head>
</head>
<body>
blabla<br>
<font color="#FFCC66"><b>foobar</b></font><br>
blabla
</body>
</html>
instead of getting only the raw content:
blabla
foobar
blabla
Is there a way to get only the content of my JTextPane
in plain text? Or do I need to transform the HTML into raw text by myself?
回答1:
Based on the accepted answer to: Removing HTML from a Java String
MyHtml2Text parser = new MyHtml2Text();
try {
parser.parse(new StringReader(myTextPane.getText()));
} catch (IOException ee) {
//handle exception
}
System.out.println(parser.getText());
Slightly modified version of the Html2Text
class found on the answer I linked to
import java.io.IOException;
import javax.swing.text.html.*;
import javax.swing.text.html.parser.*;
public class MyHtml2Text extends HTMLEditorKit.ParserCallback {
StringBuffer s;
public MyHtml2Text() {}
public void parse(Reader in) throws IOException {
s = new StringBuffer();
ParserDelegator delegator = new ParserDelegator();
delegator.parse(in, this, Boolean.TRUE);
}
public void handleText(char[] text, int pos) {
s.append(text);
s.append("\n");
}
public String getText() {
return s.toString();
}
}
If you need a more fine-grained handling consider implementing more of the interface defined by HTMLEditorKit.ParserCallback
回答2:
No need to use the ParserCallback. Just use:
textPane.getDocument().getText(0, textPane.getDocument().getLength()) );
回答3:
You need to do it yourself unfortunately. Imagine if some of the contents was HTML specific, eg images - the text representation is unclear. Include alt text or not for instance.
回答4:
(Is RegExp allowed? This isn't parsing, isn't it)
Take the getText() result and use String.replaceAll() to filter all tags. Than a trim() to remove leading and trailing whitespaces. For the whitespaces between your first and you last 'blabla' I don't see a general solution. Maybe you can spilt the rest around CRLF and trim all Strings again.
(I'm no regexp expert - maybe someone can provide the regexp and earn some reputation ;) )
Edit
.. I just assumed that you don't use <
and >
in your text - otherwise it.. say, it's a challenge.
来源:https://stackoverflow.com/questions/1859686/getting-raw-text-from-jtextpane