Unwanted chars written from java REST-API to HadoopDFS using FSDataOutputStream

こ雲淡風輕ζ 提交于 2019-12-24 03:01:54

问题


We built a java REST-API to receive event data (like click on a buy button) and write that data to HDFS. Essentially we open streams for every host that is sending data (in JSON) or use existing ones, enrich data with a timestamp, an event name and hostname and write it into (FS)DataOutputStream:

1 public synchronized void writeToFile(String filename, String hostname, String content) throws IOException {
2    FSDataOutputStream stream = registry.getStream(filename, hostname);
3    stream.writeBytes(content);
4    stream.hflush();
5  }

First, we used stream.writeChars(content) in line 3, resulting in files like: .{.".m.e.s.s.a.g.e.".:.".h.e.l.l.o.".} Looking into the implementation of DataOutputStream.writeChars(String s), you see an 8-bit shift to the right and adding a leading x00 for every char, for reasons i don't understand.

Then I tried stream.writeUTF(content) in line 3, files looked much better: .W{"message":"hello"} But still, a few bytes to many. Looking into the code, writeUTF(String s) sends the number of bytes in s first, and then the string itself. So .W represents the number of bytes in the event data, proven when varying the length of the event data showed different leading chars in the file.

So my last resort, stream.writeBytes(content). Here everything looked fine: {"message":"hello"} until special characters came into play: {"message":"hallöchen"} became {"message":"hall.chen"}. writeBytes cuts the leading 8 bits of the character before writing it. I think I need some UTF-8 functionality to write these chars correctly.

So, now I'm kind of lost. How can I solve that?


回答1:


When I read this: Why does DataOutputStream.writeUTF() add additional 2 bytes at the beginning? i felt like the mentioned FSDataOutputStream methods will not work for this. A quick (and maybe dirty) solution is this:

3 byte[] contentAsBytes = content.getBytes("UTF-8");
4 for (byte singleByte : contentAsBytes) {
5   stream.writeByte(singleByte);
6 }

A cleaner way would be not to use the FSDataOutputStream, but I couldn't find an alternative. Any hint is still appreciated.




回答2:


Have you tried wrapping the FSDataOutputStream in a java.io.PrintStream and using its print methods. It is a long shot but let me know if that works for you.



来源:https://stackoverflow.com/questions/19687576/unwanted-chars-written-from-java-rest-api-to-hadoopdfs-using-fsdataoutputstream

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!