apache-pig

Loading JSON file with serde in Cloudera

我的梦境 提交于 2019-12-31 07:18:12
问题 I am trying to work with a JSON file with this bag structure : { "user_id": "kim95", "type": "Book", "title": "Modern Database Systems: The Object Model, Interoperability, and Beyond.", "year": "1995", "publisher": "ACM Press and Addison-Wesley", "authors": [ { "name": "null" } ], "source": "DBLP" } { "user_id": "marshallo79", "type": "Book", "title": "Inequalities: Theory of Majorization and Its Application.", "year": "1979", "publisher": "Academic Press", "authors": [ { "name": "Albert W.

Transform bag of key-value tuples to map in Apache Pig

允我心安 提交于 2019-12-30 04:37:06
问题 I am new to Pig and I want to convert a bag of tuples to a map with specific value in each tuple as key. Basically I want to change: {(id1, value1),(id2, value2), ...} into [id1#value1, id2#value2] I've been looking around online for a while, but I can't seem to find a solution. I've tried: bigQMap = FOREACH bigQFields GENERATE TOMAP(queryId, queryStart); but I end up with a bag of maps (e.g. {[id1#value1], [id2#value2], ...} ), which is not what I want. How can I build up a map out of a bag

how to include external jar file using PIG

好久不见. 提交于 2019-12-29 06:45:09
问题 When I run a mapreduce job using hadoop command, I use -libjars to setup my jar to the cache and the classpath. How to do something like this in PIG? 回答1: register /local/path/to/myJar.jar 回答2: There are two ways to add external jars to Pig environment. Use "-Dpig.additional.jars" to start Pig pig -Dpig.additional.jars=/local/path/to/your.jar Use "register" command in Pig scripts or grunt register /local/path/to/your.jar; You can use any one according to your requirement. 回答3: An extension to

how to load files with different delimiter each time in piglatin

亡梦爱人 提交于 2019-12-29 01:43:47
问题 Data from input sources have different delimiters like , OR ; .Sometime it may be , sometimes it may be ; .But PigStorage function accepts only single argument as delimiter at a time. How to load this kind of data [ with delimiter , OR ; ] 回答1: Can you check if this works for you? It will work all the input files with different delimiter It will work same file with different delimiter also. You can add as many delimiters inside character class [,:,] Example: input1.txt 1,2,3,4 input2.txt a-b

How to force STORE (overwrite) to HDFS in Pig?

半城伤御伤魂 提交于 2019-12-28 05:39:04
问题 When developing Pig scripts that use the STORE command I have to delete the output directory for every run or the script stops and offers: 2012-06-19 19:22:49,680 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 6000: Output Location Validation Failed for: 'hdfs://[server]/user/[user]/foo/bar More info to follow: Output directory hdfs://[server]/user/[user]/foo/bar already exists So I'm searching for an in-Pig solution to automatically remove the directory , also one that doesn't choke

PIG - Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected

泪湿孤枕 提交于 2019-12-28 04:28:07
问题 I was trying to load a table from hive. I am using Hcatalog for that. I logged into hive using pig -useHCatalog i export almost all jars from hive and hadoop register 'hdfs://localhost:8020/user/pig/jars/hive-jdbc-0.10.0-cdh4.5.0.jar'; register 'hdfs://localhost:8020/user/pig/jars/hive-exec-0.10.0-cdh4.5.0.jar'; register 'hdfs://localhost:8020/user/pig/jars/hive-common-0.10.0-cdh4.5.0.jar'; register 'hdfs://localhost:8020/user/pig/jars/hive-metastore-0.10.0-cdh4.5.0.jar'; register 'hdfs:/

Pig gives me this error when I tried dump the data

强颜欢笑 提交于 2019-12-25 16:08:31
问题 I used following 3 statments to read a data which was present in hdfs and then dump the data while using pig in mapreduce mode it gives me following huge error please can somebody expalin it to me or provide solution please grunt> a= load '/temp' AS (name:chararray, age:int, salary:int); grunt> b= foreach a generate (name, salary); grunt> dump b; 2017-04-19 20:47:00,463 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN 2017-04-19 20:47:00,544

Storing Date and Time In PIG

给你一囗甜甜゛ 提交于 2019-12-25 14:47:28
问题 I am trying to store a txt file that has two columns date and time respectively. Something like this: 1999-01-01 12:08:56 Now I want to perform some Date operations using PIG, but i want to store date and time like this 1999-01-01T12:08:56 ( I checked this link): http://docs.oracle.com/javase/6/docs/api/java/text/SimpleDateFormat.html What I want to know is that what kind of format can I use in which my date and time are in one column, so that I can feed it to PIG, and then how to load that

Storing Date and Time In PIG

懵懂的女人 提交于 2019-12-25 14:47:10
问题 I am trying to store a txt file that has two columns date and time respectively. Something like this: 1999-01-01 12:08:56 Now I want to perform some Date operations using PIG, but i want to store date and time like this 1999-01-01T12:08:56 ( I checked this link): http://docs.oracle.com/javase/6/docs/api/java/text/SimpleDateFormat.html What I want to know is that what kind of format can I use in which my date and time are in one column, so that I can feed it to PIG, and then how to load that

Pig 0.13.0 Installation on windows 8

試著忘記壹切 提交于 2019-12-25 14:41:24
问题 I could get into the grunt shell in pig 0.13.0 version on windows. When trying to load a simple file from hdfs and dump it. The following error occurs. 2014-10-13 17:29:45,167 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 29 98: Unhandled internal error. org.apache.hadoop.mapreduce.JobContext Details at logfile: C:\hadoop-2.5.1\logs\pig_1413201361692.log Do anyone faced this error ever? I need a solution to solve this. 回答1: I have solved this issue by Build the pig using ant from the