Extracting string from logs with regex in pig script

匆匆过客 提交于 2019-12-11 15:04:07

问题


I have log data and I want to extract each information into a variable

The following is sample one line log. {:id=>306, :name=>"bblite", :cpu=>{:quota=>4, :allocated=>4, :actual=>0}, :memory=>{:quota=>8192, :allocated=>8192, :actual=>8578}, :cluster_stats=>{"wc1104"=>{:cpu=>0, :mem=>8578}}}

I need variable that have all ids,a variable that have all names,a variable that have CPUs and a variable that have all cluster stats

The following is the portion of my pig script. I can store the ids but I have no idea how to extract the rest of them using regex.

. . .

matching_messages = FILTER raw_lines BY (LOWER(message) MATCHES '.*cc_altus-plaform.*');

ids = FOREACH matching_messages GENERATE REGEX_EXTRACT(message,'id=>\\d*',0);

names = FOREACH matching_messages GENERATE REGEX_EXTRACT(message,'name=>\\"\\",',0);

line_with_date = FOREACH matching_messages GENERATE
DateFormatter(timestamp) AS formatted_time: chararray, message;

DUMP names;

回答1:


The following codes snippet is the regex I have written which works:

id = FOREACH matching_messages GENERATE REGEX_EXTRACT(message,'(?<=id=>)\\d*',0);

name = FOREACH matching_messages GENERATE REGEX_EXTRACT(message,'name=>\\"[\\w]*\\"',0);

cpu = FOREACH matching_messages GENERATE REPLACE( REGEX_EXTRACT(message, 'cpu=>\\{.*?\\}',0), ',','');

memory = FOREACH matching_messages GENERATE REGEX_EXTRACT(message,'memory=>\\{.*?\\}',0);

cluster = FOREACH matching_messages GENERATE REGEX_EXTRACT(message,'cluster_stats=>\\{.*?\\}',0);


来源:https://stackoverflow.com/questions/47184158/extracting-string-from-logs-with-regex-in-pig-script

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!