apache-pig

java udf for adding columns

别说谁变了你拦得住时间么 提交于 2019-12-12 06:09:44
问题 i am writing java udf function to add the pincode by comparing the locality column.here is my code. import java.io.IOException; import org.apache.pig.EvalFunc; import org.apache.pig.data.Tuple; import org.apache.commons.lang3.StringUtils; public class MB_pincodechennai extends EvalFunc<String> { private String pincode(String input) { String property_pincode = null; String[] items = new String[]{"600088", "600016", "600053", "600070", "600040", "600106", "632301", "600109", "600083", "600054",

How to exclude special characters in a string using regular expressions in hive

三世轮回 提交于 2019-12-12 04:36:09
问题 I want to exclude periods( . ) and braces ( ( , ) ). However, decimal numbers should be left intact So basically if the input is Hive supports subqueries only in the FROM clause (through Hive 0.12). The subquery has to be given a name because every table in a FROM clause must have a name. Columns in the subquery select list must have unique names. The output should be Hive supports subqueries only in the FROM clause through Hive 0.12 The subquery has to be given a name because every table in

Pig - Get Max Count

大兔子大兔子 提交于 2019-12-12 03:38:02
问题 Sample Data DATE WindDirection 1/1/2000 SW 1/2/2000 SW 1/3/2000 SW 1/4/2000 NW 1/5/2000 NW Question below Every day is unqiue, and wind direction is not unique, SO now we are trying to get the COUNT of the most COMMON wind direction My query was weather_data = FOREACH Weather GENERATE $16 AS Date, $9 AS w_direction; e = FOREACH weather_data { unique_winds = DISTINCT weather_data.w_direction; GENERATE unique_winds, COUNT(unique_winds); } dump e; The logic is to find the DISTINCT WindDirections

Apache PIG - Get only date from TimeStamp

家住魔仙堡 提交于 2019-12-12 03:37:17
问题 I've the following code: Data = load '/user/cloudera/' using PigStorage('\t') as ( ID:chararray, Time_Interval:chararray, Code:chararray); transf = foreach Source_Data generate (int) ID, ToString( ToDate((long) Time_Interval), 'yyyy-MM-dd hh:ss:mm') as TimeStamp, (int) Code; SPLIT transf INTO Src25 IF (ToString(TimeStamp, 'yyyy-MM-dd')=='2016-07-25'), Src26 IF (ToString(TimeStamp, 'yyyy-MM-dd')=='2016-07-26'); STORE Src25 INTO '/user/cloudera/2016-07-25' using PigStorage('\t'); STORE Src26

Pig Round Decimal to Two Places

风格不统一 提交于 2019-12-12 03:01:00
问题 Any ideas on how I Can Round a Float data type to 2 decimal places in Apache Pig? For example: test = FOREACH (JOIN Load by (Op1, Op2), Load2 by (Op3,Op4)) GENERATE Load2::Number2 *Load::Number1 as Output The fields Number1 and Number2 are floats.My current calculations give me 5 to 6 decimal places. 回答1: Try this: B = FOREACH A GENERATE (((A.myfloat1 * A.myfloat2)*100f)ROUND)/100f AS myfloat3 来源: https://stackoverflow.com/questions/15538504/pig-round-decimal-to-two-places

scalars can only be used with projection in PIG

穿精又带淫゛_ 提交于 2019-12-12 02:49:05
问题 scalars can only be used with projection i am getting this error while using foreach.How can i resolved this error ? how can i use LIMIT within foreach ? please suggest some thanks in advance.. Edit (Tichdroma): Copied code from comment A = LOAD 'part-r-00000'; G = Group A by ($0,$2 ); Y = foreach G generate FLATTEN(group), FLATTEN($1); sorted = order Y by $0 ASC, $1 DESC; X = foreach Y { lim = LIMIT sorted 3; generate lim; }; Dump x; 回答1: LIMIT is available in Pig 0.9 in the FOREACH nested

Not able to export Hbase table into CSV file using HUE Pig Script

若如初见. 提交于 2019-12-12 02:43:28
问题 I have installed Apache Amabari and configured the Hue . I want to export hbase table data into csv file using pig script but I am getting following error. 2017-06-03 10:27:45,518 [ATS Logger 0] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Exception caught by TimelineClientConnectionRetry, will try 30 more time(s). Message: java.net.ConnectException: Connection refused 2017-06-03 10:27:45,703 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is

Calculating percentage using PIG latin

扶醉桌前 提交于 2019-12-12 02:36:49
问题 I have a table with two columns (code:chararray, sp:double) I want to calculate the percentage of every sp. INPUT t001 60 a002 75 a003 34 bb04 56 bbc5 23 cc2c 45 ddc5 45 desired OUTPUT: code Perc t001 17% a002 22% a003 10% bb04 16.5% bbc5 6% cc2c 13.3% ddc5 13.3% I tried like this but output is not coming. A = load '....' as (code : chararray, sp : double); B = GROUP A BY (code); allcount = FOREACH B GENERATE SUM(A.speed) as total; perc = FOREACH A GENERATE code,speed/(double)allcount.total *

Unix Shell Script as UDF for Pig and Hive

偶尔善良 提交于 2019-12-12 02:26:55
问题 Can we use unix shell script instead of using (Java or Python) for User Defined in Apache Pig and Hive? If it is possible how can we mention in Hive Query or Pig script? 回答1: No, you can't use unix shell script as Pig UDF. Pig UDFs are currently supported only in six languages: Java, Jython, Python, JavaScript, Ruby and Groovy. Please refer this link for more details http://pig.apache.org/docs/r0.14.0/udf.html 来源: https://stackoverflow.com/questions/27415656/unix-shell-script-as-udf-for-pig

JsonLoader throws error in pig

為{幸葍}努か 提交于 2019-12-12 02:23:40
问题 I am unable to decode this simple json , i dont know what i am doing wrong. please help me in this pig script. I have to decode the below data in json format. 3.json { "id": 6668, "source_name": "National Stock Exchange of India", "source_code": "NSE" } and my pig script is a = LOAD '3.json' USING org.apache.pig.builtin.JsonLoader ('id:int, source_name:chararray, source_code:chararray'); dump a; the error i get is given below: 2015-07-23 13:40:08,715 [LocalJobRunner Map Task Executor #0] INFO