I am using Html.fromHtml(STRING).toString() to convert a string that may or may not have html and/or html entities in it, to a plain text string.
This is pretty slow, I think my last calculation was that it took about 22ms on avg. With a large batch of these it can add over a minute. So I am looking for a faster, performance built option.
Is there anyway to speed this up or are there other decoding options available?
Edit: Since there doesn't appear to be a built in method that is faster or built for performance specifically, I will reward the bounty to anyone that can point me in the direction of a library that:
- Works well with Android
- Licensed for free use
- Faster than
Html.fromHtml(String).toString();
As a note, I already tried Jsoup with this method: Jsoup.parse(String).text()
and it was slower.
What about org.apache.commons.lang.StringEscapeUtils's unescapeHtml(). The library is available on Apache site.
(EDIT: June 2019 - See the comments below for updates about the library)
fromHtml()
does not have a high-performance HTML parser, and I have no idea how quick the toString()
implementation on SpannedString
is. I doubt either were designed for your scenario.
Ideally, the strings are clean before they get to a low-power phone. Either clean them up in the build process (for resources/assets), or clean them up on a server (before you download them).
If, for whatever reason, you absolutely need to clean them up on the device, you can perhaps use the NDK to create a C/C++ library that does the cleaning for you faster.
This is an incredibly fast and simple option: Unbescape
It greatly improved our parsing performance which requires every string to be run through a decoder.
Have you looked at Strip HTML from Text JavaScript
With a large batch of these it can add over a minute
Any parsing will take some time. 22ms seems to me like fast. Anyway, can you do it in background? Can help you some kind of caching?
Although I have not tried them yet, I found some possible solutions:
I hope it helps.
来源:https://stackoverflow.com/questions/4321896/is-there-a-faster-way-to-decode-html-characters-to-a-string-than-html-fromhtml