bigdata

How to pass Hive conf variable in hive udf?

我怕爱的太早我们不能终老 提交于 2019-12-02 10:28:30
问题 I want to pass hive conf variable to hive UDF. below is a code snippet. hive -f ../hive/testHive.sql -hivevar testArg=${testArg} Below is hive UDF call. select setUserDefinedValueForColumn(columnName,'${testArg}') from testTable; In udf I am getting value of testArg as null. Please advice me how to use hive conf variable in udf and how to access Hive configuration in hive UDF? 回答1: I think that you should pass hive variable as 'hiveconf' using below command: hive --hiveconf testArg="my test

Load JSON array into Pig

倖福魔咒の 提交于 2019-12-02 09:26:54
I have a json file with the following format [ { "id": 2, "createdBy": 0, "status": 0, "utcTime": "Oct 14, 2014 4:49:47 PM", "placeName": "21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia", "longitude": 77.5983817, "latitude": 12.9832418, "createdDate": "Sep 16, 2014 2:59:03 PM", "accuracy": 5, "loginType": 1, "mobileNo": "0000005567" }, { "id": 4, "createdBy": 0, "status": 0, "utcTime": "Oct 14, 2014 4:52:48 PM", "placeName": "21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia", "longitude": 77.5983817, "latitude": 12.9832418, "createdDate": "Oct 8, 2014

Why is my large JSF data table is not populating only in IE?

假如想象 提交于 2019-12-02 07:21:11
I am trying to generate a table dynamically using HtmlDataTable in JSF. When I am giving the number of rows and columns greater than 25 each, some of the cells are not getting populated only in IE and it's getting very slow. However, I can see the value when debugging the code using Firebug. It is working fine in Firefox and Chrome. How is this caused and how can I solve it? BalusC Internet Explorer is known to have an extremely poor table renderer. Especially when the columns and table nesting goes overzealous. There's no other solution than making your table smaller by introducing lazy

I am trying to get list of all the authors who have had more than 3 piece of work - DBpedia Sparql

家住魔仙堡 提交于 2019-12-02 07:13:40
问题 I am trying to get list of all the authors who have had 3 or more piece of work done (in DBpedia). my example can be run on : http://dbpedia.org/sparql base code select (count(?work) as ?totalWork), ?author Where { ?work dbo:author ?author. } GROUP BY ?author I get each authors total amount of piece of work done. But when I try to filter to show only list of author that have more than 3 piece of work. I get error: I tried HAVING keyword or using FILTER keyword. Using Filter select (count(

How to pass Hive conf variable in hive udf?

老子叫甜甜 提交于 2019-12-02 06:38:26
I want to pass hive conf variable to hive UDF. below is a code snippet. hive -f ../hive/testHive.sql -hivevar testArg=${testArg} Below is hive UDF call. select setUserDefinedValueForColumn(columnName,'${testArg}') from testTable; In udf I am getting value of testArg as null. Please advice me how to use hive conf variable in udf and how to access Hive configuration in hive UDF? I think that you should pass hive variable as 'hiveconf' using below command: hive --hiveconf testArg="my test args" -f ../hive/testHive.sql Then you may have below code inside a GenericUDF evaluate() method: @Override

Hadoop 2 IOException only when trying to open supposed cache files

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-02 06:33:47
I recent updated to hadoop 2.2 (using this tutorial here ). My main job class looks like so, and throws an IOException: import java.io.*; import java.net.*; import java.util.*; import java.util.regex.*; import org.apache.hadoop.conf.*; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.*; import org.apache.hadoop.mapreduce.*; import org.apache.hadoop.mapreduce.lib.chain.*; import org.apache.hadoop.mapreduce.lib.input.*; import org.apache.hadoop.mapreduce.lib.output.*; import org.apache.hadoop.mapreduce.lib.reduce.*; public class UFOLocation2 { public static class MapClass extends

Hadoop hdfs showing ls: `/home/hduser/input/': No such file or directory error

倾然丶 夕夏残阳落幕 提交于 2019-12-02 05:15:26
问题 I have installed Hadoop 2.6 on single machine using This Tutorial. I am using Ubuntu 12.04 machine and Java version 1.6.0_27. I have created separate user as hduser for Hadoop operations. I have set HADOOP_HOME envrioment variable's value /usr/local/hadoop where I have extracted the Hadoop distribution. Now I am following an example. But when I execute the command $HADOOP_HOME/bin/hdfs dfs -ls /home/hduser/input/ it gives following error - 15/01/02 18:32:38 WARN util.NativeCodeLoader: Unable

PySpark: inconsistency in converting timestamp to integer in dataframe

坚强是说给别人听的谎言 提交于 2019-12-02 03:29:12
问题 I have a dataframe with a rough structure like the following: +-------------------------+-------------------------+--------+ | timestamp | adj_timestamp | values | +-------------------------+-------------------------+--------+ | 2017-05-31 15:30:48.000 | 2017-05-31 11:30:00.000 | 0 | +-------------------------+-------------------------+--------+ | 2017-05-31 15:31:45.000 | 2017-05-31 11:30:00.000 | 0 | +-------------------------+-------------------------+--------+ | 2017-05-31 15:32:49.000 |

Unique Key generation in Hive/Hadoop

亡梦爱人 提交于 2019-12-02 03:22:24
问题 While selecting a set of records from a big data hive table, a unique key needs to be created for each record. In a sequential mode of operation , it is easy to generate unique id by calling soem thing like max(id). Since hive runs the task in parallel, how can we generate unique key as part of a select query, without compromising the performance of hadoop. Is this really a map reduce problem or do we need to go for a sequential approach to solve this. 回答1: If by some reason you do not want

Is Data Lake and Big Data the same?

非 Y 不嫁゛ 提交于 2019-12-02 02:27:14
I am trying to understand all if there is a real difference between data lake and Big data if you check the concepts both are like a Big repository which saves the information until it becomes necessary, so, When can we say that we are using big data or data lake? Thanks in advance I can't say I've come across the term 'big repository' before, but to answer the original question, no, data lake and big data are not the same, although in fairness they are both thrown around a lot and the definitions vary depending who you ask, but I'll try to give it a shot: Big Data Is used to describe both the