Camel: changing stream encoding

本小妞迷上赌 提交于 2019-12-20 03:39:05

问题


I'm receiving data stream from http with that route:

from("direct:foo").
to("http://foo.com/bar.html").
to("file:///tmp/bar.html")

HTTP stream comes with Windows-1251 encoding. I'd like to re-code to UTF-8 on the fly and then store to file.

How to do that using standard camel way?


回答1:


Please have a look at .convertBodyTo() - in particular the charset argument.

from("direct:foo").
to("http://foo.com/bar.html").
convertBodyTo(String.class, "UTF-8")
to("file:///tmp/bar.html")

Reference: http://camel.apache.org/convertbodyto.html




回答2:


I think vikingsteve's solution misses a step. The input stream contains characters with encoding CP1251. The characters in that stream will not change their encoding when you convert the input stream contents to a string. You need to specify the same character encoding scheme that was used by the entity that encoded the characters when you decode them. Otherwise you will get undesirable results.

<route id="process_umlaug_file" startupOrder="2">
    <from uri="file:///home/steppra1/Downloads?fileName=input_umlauts.txt"/>
    <convertBodyTo type="java.lang.String" charset="ISO-8859-1"/>
    <to uri="file:///home/steppra1/Downloads?fileName=output_umlauts.txt&amp;charset=UTF-8"/>
</route>

I tested this reading a CP1251 encoded file containing German umlauts:

steppra1@steppra1-linux-mint ~/Downloads $ file input_umlauts.txt 
input_umlauts.txt: ISO-8859 text, with CRLF line terminators

steppra1@steppra1-linux-mint ~/Downloads $ file output_umlauts.txt 
output_umlauts.txt: UTF-8 Unicode text, with CRLF line terminators

Using the two steps of decoding and then re-coding yields properly encoded German umlauts. If I change above route to

<route id="process_umlaug_file" startupOrder="2">
    <from uri="file:///home/steppra1/Downloads?fileName=input_umlauts.txt"/>
    <convertBodyTo type="java.lang.String" charset="UTF-8"/>
    <to uri="file:///home/steppra1/Downloads?fileName=output_umlauts.txt"/>
</route>

then the output file is still UTF-8 encoded, possibly because that is my platform default, but the umlauts are garbled.



来源:https://stackoverflow.com/questions/21570081/camel-changing-stream-encoding

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!