apache-pig | 易学教程

java udf for adding columns

阅读更多关于 java udf for adding columns

问题 i am writing java udf function to add the pincode by comparing the locality column.here is my code. import java.io.IOException; import org.apache.pig.EvalFunc; import org.apache.pig.data.Tuple; import org.apache.commons.lang3.StringUtils; public class MB_pincodechennai extends EvalFunc<String> { private String pincode(String input) { String property_pincode = null; String[] items = new String[]{"600088", "600016", "600053", "600070", "600040", "600106", "632301", "600109", "600083", "600054",

How to exclude special characters in a string using regular expressions in hive

阅读更多关于 How to exclude special characters in a string using regular expressions in hive

问题 I want to exclude periods( . ) and braces ( ( , ) ). However, decimal numbers should be left intact So basically if the input is Hive supports subqueries only in the FROM clause (through Hive 0.12). The subquery has to be given a name because every table in a FROM clause must have a name. Columns in the subquery select list must have unique names. The output should be Hive supports subqueries only in the FROM clause through Hive 0.12 The subquery has to be given a name because every table in

Pig - Get Max Count

阅读更多关于 Pig - Get Max Count

问题 Sample Data DATE WindDirection 1/1/2000 SW 1/2/2000 SW 1/3/2000 SW 1/4/2000 NW 1/5/2000 NW Question below Every day is unqiue, and wind direction is not unique, SO now we are trying to get the COUNT of the most COMMON wind direction My query was weather_data = FOREACH Weather GENERATE $16 AS Date, $9 AS w_direction; e = FOREACH weather_data { unique_winds = DISTINCT weather_data.w_direction; GENERATE unique_winds, COUNT(unique_winds); } dump e; The logic is to find the DISTINCT WindDirections

Apache PIG - Get only date from TimeStamp

阅读更多关于 Apache PIG - Get only date from TimeStamp

问题 I've the following code: Data = load '/user/cloudera/' using PigStorage('\t') as ( ID:chararray, Time_Interval:chararray, Code:chararray); transf = foreach Source_Data generate (int) ID, ToString( ToDate((long) Time_Interval), 'yyyy-MM-dd hh:ss:mm') as TimeStamp, (int) Code; SPLIT transf INTO Src25 IF (ToString(TimeStamp, 'yyyy-MM-dd')=='2016-07-25'), Src26 IF (ToString(TimeStamp, 'yyyy-MM-dd')=='2016-07-26'); STORE Src25 INTO '/user/cloudera/2016-07-25' using PigStorage('\t'); STORE Src26

Pig Round Decimal to Two Places

阅读更多关于 Pig Round Decimal to Two Places

问题 Any ideas on how I Can Round a Float data type to 2 decimal places in Apache Pig? For example: test = FOREACH (JOIN Load by (Op1, Op2), Load2 by (Op3,Op4)) GENERATE Load2::Number2 *Load::Number1 as Output The fields Number1 and Number2 are floats.My current calculations give me 5 to 6 decimal places. 回答1: Try this: B = FOREACH A GENERATE (((A.myfloat1 * A.myfloat2)*100f)ROUND)/100f AS myfloat3 来源： https://stackoverflow.com/questions/15538504/pig-round-decimal-to-two-places

scalars can only be used with projection in PIG

阅读更多关于 scalars can only be used with projection in PIG

问题 scalars can only be used with projection i am getting this error while using foreach.How can i resolved this error ? how can i use LIMIT within foreach ? please suggest some thanks in advance.. Edit (Tichdroma): Copied code from comment A = LOAD 'part-r-00000'; G = Group A by ($0,$2 ); Y = foreach G generate FLATTEN(group), FLATTEN($1); sorted = order Y by $0 ASC, $1 DESC; X = foreach Y { lim = LIMIT sorted 3; generate lim; }; Dump x; 回答1: LIMIT is available in Pig 0.9 in the FOREACH nested

Not able to export Hbase table into CSV file using HUE Pig Script

阅读更多关于 Not able to export Hbase table into CSV file using HUE Pig Script

问题 I have installed Apache Amabari and configured the Hue . I want to export hbase table data into csv file using pig script but I am getting following error. 2017-06-03 10:27:45,518 [ATS Logger 0] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Exception caught by TimelineClientConnectionRetry, will try 30 more time(s). Message: java.net.ConnectException: Connection refused 2017-06-03 10:27:45,703 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is

Calculating percentage using PIG latin

阅读更多关于 Calculating percentage using PIG latin

问题 I have a table with two columns (code:chararray, sp:double) I want to calculate the percentage of every sp. INPUT t001 60 a002 75 a003 34 bb04 56 bbc5 23 cc2c 45 ddc5 45 desired OUTPUT: code Perc t001 17% a002 22% a003 10% bb04 16.5% bbc5 6% cc2c 13.3% ddc5 13.3% I tried like this but output is not coming. A = load '....' as (code : chararray, sp : double); B = GROUP A BY (code); allcount = FOREACH B GENERATE SUM(A.speed) as total; perc = FOREACH A GENERATE code,speed/(double)allcount.total *

Unix Shell Script as UDF for Pig and Hive

阅读更多关于 Unix Shell Script as UDF for Pig and Hive

问题 Can we use unix shell script instead of using (Java or Python) for User Defined in Apache Pig and Hive? If it is possible how can we mention in Hive Query or Pig script? 回答1: No, you can't use unix shell script as Pig UDF. Pig UDFs are currently supported only in six languages: Java, Jython, Python, JavaScript, Ruby and Groovy. Please refer this link for more details http://pig.apache.org/docs/r0.14.0/udf.html 来源： https://stackoverflow.com/questions/27415656/unix-shell-script-as-udf-for-pig

JsonLoader throws error in pig

阅读更多关于 JsonLoader throws error in pig

问题 I am unable to decode this simple json , i dont know what i am doing wrong. please help me in this pig script. I have to decode the below data in json format. 3.json { "id": 6668, "source_name": "National Stock Exchange of India", "source_code": "NSE" } and my pig script is a = LOAD '3.json' USING org.apache.pig.builtin.JsonLoader ('id:int, source_name:chararray, source_code:chararray'); dump a; the error i get is given below: 2015-07-23 13:40:08,715 [LocalJobRunner Map Task Executor #0] INFO