bigdata | 易学教程

How to pass Hive conf variable in hive udf?

阅读更多关于 How to pass Hive conf variable in hive udf?

问题 I want to pass hive conf variable to hive UDF. below is a code snippet. hive -f ../hive/testHive.sql -hivevar testArg=${testArg} Below is hive UDF call. select setUserDefinedValueForColumn(columnName,'${testArg}') from testTable; In udf I am getting value of testArg as null. Please advice me how to use hive conf variable in udf and how to access Hive configuration in hive UDF? 回答1: I think that you should pass hive variable as 'hiveconf' using below command: hive --hiveconf testArg="my test

Load JSON array into Pig

阅读更多关于 Load JSON array into Pig

I have a json file with the following format [ { "id": 2, "createdBy": 0, "status": 0, "utcTime": "Oct 14, 2014 4:49:47 PM", "placeName": "21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia", "longitude": 77.5983817, "latitude": 12.9832418, "createdDate": "Sep 16, 2014 2:59:03 PM", "accuracy": 5, "loginType": 1, "mobileNo": "0000005567" }, { "id": 4, "createdBy": 0, "status": 0, "utcTime": "Oct 14, 2014 4:52:48 PM", "placeName": "21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia", "longitude": 77.5983817, "latitude": 12.9832418, "createdDate": "Oct 8, 2014

Why is my large JSF data table is not populating only in IE?

阅读更多关于 Why is my large JSF data table is not populating only in IE?

I am trying to generate a table dynamically using HtmlDataTable in JSF. When I am giving the number of rows and columns greater than 25 each, some of the cells are not getting populated only in IE and it's getting very slow. However, I can see the value when debugging the code using Firebug. It is working fine in Firefox and Chrome. How is this caused and how can I solve it? BalusC Internet Explorer is known to have an extremely poor table renderer. Especially when the columns and table nesting goes overzealous. There's no other solution than making your table smaller by introducing lazy

I am trying to get list of all the authors who have had more than 3 piece of work - DBpedia Sparql

阅读更多关于 I am trying to get list of all the authors who have had more than 3 piece of work - DBpedia Sparql

问题 I am trying to get list of all the authors who have had 3 or more piece of work done (in DBpedia). my example can be run on : http://dbpedia.org/sparql base code select (count(?work) as ?totalWork), ?author Where { ?work dbo:author ?author. } GROUP BY ?author I get each authors total amount of piece of work done. But when I try to filter to show only list of author that have more than 3 piece of work. I get error: I tried HAVING keyword or using FILTER keyword. Using Filter select (count(

How to pass Hive conf variable in hive udf?

阅读更多关于 How to pass Hive conf variable in hive udf?

I want to pass hive conf variable to hive UDF. below is a code snippet. hive -f ../hive/testHive.sql -hivevar testArg=${testArg} Below is hive UDF call. select setUserDefinedValueForColumn(columnName,'${testArg}') from testTable; In udf I am getting value of testArg as null. Please advice me how to use hive conf variable in udf and how to access Hive configuration in hive UDF? I think that you should pass hive variable as 'hiveconf' using below command: hive --hiveconf testArg="my test args" -f ../hive/testHive.sql Then you may have below code inside a GenericUDF evaluate() method: @Override

Hadoop 2 IOException only when trying to open supposed cache files

阅读更多关于 Hadoop 2 IOException only when trying to open supposed cache files

I recent updated to hadoop 2.2 (using this tutorial here ). My main job class looks like so, and throws an IOException: import java.io.*; import java.net.*; import java.util.*; import java.util.regex.*; import org.apache.hadoop.conf.*; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.*; import org.apache.hadoop.mapreduce.*; import org.apache.hadoop.mapreduce.lib.chain.*; import org.apache.hadoop.mapreduce.lib.input.*; import org.apache.hadoop.mapreduce.lib.output.*; import org.apache.hadoop.mapreduce.lib.reduce.*; public class UFOLocation2 { public static class MapClass extends

Hadoop hdfs showing ls: `/home/hduser/input/': No such file or directory error

阅读更多关于 Hadoop hdfs showing ls: `/home/hduser/input/': No such file or directory error

问题 I have installed Hadoop 2.6 on single machine using This Tutorial. I am using Ubuntu 12.04 machine and Java version 1.6.0_27. I have created separate user as hduser for Hadoop operations. I have set HADOOP_HOME envrioment variable's value /usr/local/hadoop where I have extracted the Hadoop distribution. Now I am following an example. But when I execute the command $HADOOP_HOME/bin/hdfs dfs -ls /home/hduser/input/ it gives following error - 15/01/02 18:32:38 WARN util.NativeCodeLoader: Unable

PySpark: inconsistency in converting timestamp to integer in dataframe

阅读更多关于 PySpark: inconsistency in converting timestamp to integer in dataframe

问题 I have a dataframe with a rough structure like the following: +-------------------------+-------------------------+--------+ | timestamp | adj_timestamp | values | +-------------------------+-------------------------+--------+ | 2017-05-31 15:30:48.000 | 2017-05-31 11:30:00.000 | 0 | +-------------------------+-------------------------+--------+ | 2017-05-31 15:31:45.000 | 2017-05-31 11:30:00.000 | 0 | +-------------------------+-------------------------+--------+ | 2017-05-31 15:32:49.000 |

Unique Key generation in Hive/Hadoop

阅读更多关于 Unique Key generation in Hive/Hadoop

问题 While selecting a set of records from a big data hive table, a unique key needs to be created for each record. In a sequential mode of operation , it is easy to generate unique id by calling soem thing like max(id). Since hive runs the task in parallel, how can we generate unique key as part of a select query, without compromising the performance of hadoop. Is this really a map reduce problem or do we need to go for a sequential approach to solve this. 回答1: If by some reason you do not want

Is Data Lake and Big Data the same?

阅读更多关于 Is Data Lake and Big Data the same?

I am trying to understand all if there is a real difference between data lake and Big data if you check the concepts both are like a Big repository which saves the information until it becomes necessary, so, When can we say that we are using big data or data lake? Thanks in advance I can't say I've come across the term 'big repository' before, but to answer the original question, no, data lake and big data are not the same, although in fairness they are both thrown around a lot and the definitions vary depending who you ask, but I'll try to give it a shot: Big Data Is used to describe both the