apache-pig

Get MAX value per year in Apache Pig

不羁岁月 提交于 2020-06-27 05:51:37
问题 I have been trying to get the max temperature per year using the data below. Actual data looks like this but I am interested in only first column that is year and 4th column that is temperature.. 2016-11-03 12:00:00.000 +0100,Mostly Cloudy,rain,10.594444444444443,10.594444444444443,0.73,13.2664,174.0,10.1913,0.0,1019.74,Partly cloudy throughout the day. 2016-11-03 13:00:00.000 +0100,Mostly Cloudy,rain,11.072222222222223,11.072222222222223,0.72,13.1698,176.0,12.4131,0.0,1019.45,Partly cloudy

Rule of thumb for reading from a file and defining schema for complex data structure

天涯浪子 提交于 2020-04-17 18:59:27
问题 I am confused about reading a complex file (i.e. tuple and bag) in Pig and defining schemas, to be more precise how I shall translate { , (, and a deliminator (e.g. |) during reading a file. For example, I cannot figure out the content of 'complex_7.txt' with the following line in Pig: (I am doing a reverse Eng, I have this example, and I am trying to write the text file that this schema can be used on) a = LOAD '/user/maria_dev/complex_7.txt' AS (f1:int,f2:int,B:bag{T:tuple(t1:int,t2:int)});

Rule of thumb for reading from a file and defining schema for complex data structure

对着背影说爱祢 提交于 2020-04-17 18:57:05
问题 I am confused about reading a complex file (i.e. tuple and bag) in Pig and defining schemas, to be more precise how I shall translate { , (, and a deliminator (e.g. |) during reading a file. For example, I cannot figure out the content of 'complex_7.txt' with the following line in Pig: (I am doing a reverse Eng, I have this example, and I am trying to write the text file that this schema can be used on) a = LOAD '/user/maria_dev/complex_7.txt' AS (f1:int,f2:int,B:bag{T:tuple(t1:int,t2:int)});

how to delete the rows of data which is repeating in Pig

那年仲夏 提交于 2020-03-05 01:02:39
问题 "YouTube Rewind: The Shape of 2017 | #YouTubeRewind" 137843120 3014479 1602383 817582 "YouTube Rewind: The Shape of 2017 | #YouTubeRewind" 125431369 2912715 1545018 807558 "YouTube Rewind: The Shape of 2017 | #YouTubeRewind" 113876217 2811217 1470387 787174 "YouTube Rewind: The Shape of 2017 | #YouTubeRewind" 100911567 2656678 1353655 682890 "Marvel Studios' Avengers: Infinity War Official Trailer" 89930713 2606665 53011 347982 "Marvel Studios' Avengers: Infinity War Official Trailer"

how to delete the rows of data which is repeating in Pig

生来就可爱ヽ(ⅴ<●) 提交于 2020-03-05 00:56:27
问题 "YouTube Rewind: The Shape of 2017 | #YouTubeRewind" 137843120 3014479 1602383 817582 "YouTube Rewind: The Shape of 2017 | #YouTubeRewind" 125431369 2912715 1545018 807558 "YouTube Rewind: The Shape of 2017 | #YouTubeRewind" 113876217 2811217 1470387 787174 "YouTube Rewind: The Shape of 2017 | #YouTubeRewind" 100911567 2656678 1353655 682890 "Marvel Studios' Avengers: Infinity War Official Trailer" 89930713 2606665 53011 347982 "Marvel Studios' Avengers: Infinity War Official Trailer"

cant run pig with single node hadoop server

耗尽温柔 提交于 2020-02-21 05:47:01
问题 I have setup a VM with ubuntu. It runs hadoop as a single node. Later I installed apache pig on it. apache pig runs great with local mode, but it always prom ERROR 2999: Unexpected internal error. Failed to create DataStorage I am missing something very obvious. Can someone help me get this running please? More details: 1. I assume that hadoop is running fine because, I could run MapReduce jobs in python. 2. pig -x local runs as i expect. 3. when i just type pig it gives me following error

cant run pig with single node hadoop server

左心房为你撑大大i 提交于 2020-02-21 05:46:06
问题 I have setup a VM with ubuntu. It runs hadoop as a single node. Later I installed apache pig on it. apache pig runs great with local mode, but it always prom ERROR 2999: Unexpected internal error. Failed to create DataStorage I am missing something very obvious. Can someone help me get this running please? More details: 1. I assume that hadoop is running fine because, I could run MapReduce jobs in python. 2. pig -x local runs as i expect. 3. when i just type pig it gives me following error

pig - parsing string with regex

你。 提交于 2020-02-01 05:50:06
问题 I'm stuck on string parsing in Pig. I have looked at the documentation around regex_extract and regex_extract_all and hoped to use one of those functions. I have file '/logs/test.log' : cat '/logs/test.log' user=242562&friend=6226&friend=93856&age=35&friend=35900 I want to extract the friend tags from the url, and in this case, I have 3 identical tags. regex_extract seems to only work for the first instance, which is what I expected, and for regex_extract_all , it seems like I have know the

Pig Latin split columns to rows

青春壹個敷衍的年華 提交于 2020-01-30 12:00:26
问题 Is there any solution in Pig latin to transform columns to rows to get the below? Input: id|column1|column2 1|a,b,c|1,2,3 2|d,e,f|4,5,6 required output: id|column1|column2 1|a|1 1|b|2 1|c|3 2|d|4 2|e|5 2|f|6 thanks 回答1: I'm willing to bet this is not the best way to do this however ... data = load 'input' using PigStorage('|') as (id:chararray, col1:chararray, col2:chararray); A = foreach data generate id, flatten(TOKENIZE(col1)); B = foreach data generate id, flatten(TOKENIZE(col2)); RA =