Escape special characters in Apache pig data

二次信任 提交于 2019-12-24 14:17:49

问题


I am using Apache Pig to process some data.
My data set has some strings that contain special characters i.e (#,{}[]).

This programming pig book says that you can't escape those characters.

So how can I process my data without deleting the special characters?

I thought about replacing them but would like to avoid that.

Thanks


回答1:


Have you tried loading your data? There is no way to escape these characters when they are part of the values in a tuple, bag, or map, but there is no problem whatsoever in loading these characters in when part of a string. Just specify that field as type chararray.

The only issue you will have to watch out for here is if your strings ever contain the character that Pig is using as field delimiter - for example, if you are USING PigStorage(',') and your strings contain commas. But as long as you are not telling Pig to parse your field as a map, #, [, and ] will be handled just fine.




回答2:


Easiest way would be,

input = LOAD 'inputLocation' USING TextLoader() as unparsedString:chararray;

TextLoader just reads each line of input into a String regardless of what's inside that string. You could then use your own parsing logic.




回答3:


When writing your loader function, instead of returning tuples with e.g. maps as a String (and thus later relying on Utf8StorageConverter to get the conversion to a map right):

Tuple tuple = tupleFactory.newTuple( 1 );
tuple.set(0, new DataByteArray("[age#22, name#joel]"));

you can create and set directly a Java map:

HashMap<String, Object> map = new HashMap<String, Object>(2);
map.put("age", 22);
map.put("name", "joel");
tuple.set(0, map);

This is useful especially if you have to do the parsing during loading anyway.



来源:https://stackoverflow.com/questions/15806221/escape-special-characters-in-apache-pig-data

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!