hdinsight

Replace character in pig

拈花ヽ惹草 提交于 2019-12-13 04:55:46
问题 My data is in the following format.. {"Foo":"ABC","Bar":"20090101100000","Quux":"{\"QuuxId\":1234,\"QuuxName\":\"Sam\"}"} I need it to be in this format: {"Foo":"ABC","Bar":"20090101100000","Quux":{"QuuxId":1234,"QuuxName":"Sam"}} I'm trying to using Pig's replace function to get it in the format I need.. So, I tried .. "LOGS = LOAD 'inputloc' USING TextStorage() as unparsedString:chararray;;" + "REPL1 = foreach LOGS REPLACE($0, '"{', '{');" + "REPL2 = foreach REPL1 REPLACE($0, '}"', '}');"

HDInsight Hive not finding SerDe jar in ADD JAR statement

人走茶凉 提交于 2019-12-13 04:40:46
问题 I've uploaded json-serde-1.1.9.2.jar to the blob store with path "/lib/" and added ADD JAR /lib/json-serde-1.1.9.2.jar But am getting /lib/json-serde-1.1.9.2.jar does not exist I've tried it without the path and also provided the full url to the ADD JAR statement with the same result. Would really appreciate some help on this, thanks! 回答1: If you don't include the scheme, then Hive is going to look on the local filesystem (you can see the code around line 768 of the source) when you included

Hive function quarter() returns 'invalid function'

别说谁变了你拦得住时间么 提交于 2019-12-13 04:32:01
问题 This says the function quarter() was introduced in Hive 1.3 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions I am using the default version of HDInsight (3.1) to run Hadoop: https://azure.microsoft.com/en-us/documentation/articles/hdinsight-component-versioning/ When I try to use the quarter function I get: Logging initialized using configuration in file:/C:/apps/dist/hive-0.13.0.2.1.15.1-0001/conf/hive-log4j.properties SLF4J: Class path

Sqoop on HDInsight does not close JDBC connection properly?

倾然丶 夕夏残阳落幕 提交于 2019-12-13 03:52:16
问题 If I use Azure SQL or Azure MySQL as metastore for SQOOP jobs there seem to be a serious bug in Sqoop on HDInsight as it does not close connection properly for saved sqoop jobs. Here is a repo steps: Use Azure SQL or Azure MySQL as SQOOP metastore and create an incremental import saved SQOOP job and then run it at the very end to get an exception: ----------------ON AZURE SQL------------ 17/08/02 23:15:51 INFO tool.ImportTool: Updated data for job: FactOnlineSalesIncr 17/08/02 23:15:51 WARN

How to submit Apache Spark job to Hadoop YARN on Azure HDInsight

不打扰是莪最后的温柔 提交于 2019-12-12 10:55:59
问题 I am very excited that HDInsight switched to Hadoop version 2, which supports Apache Spark through YARN. Apache Spark is a much better fitting parallel programming paradigm than MapReduce for the task that I want to perform. I was unable to find any documentation however on how to do remote job submission of a Apache Spark job to my HDInsight cluster. For remote job submission of standard MapReduce jobs I know that there are several REST endpoints like Templeton and Oozie. But as for as I was

Microsoft Windows Azure storage: the remote server returned an error 404 not found

折月煮酒 提交于 2019-12-12 03:45:13
问题 I am constantly getting an error "404 not found". I have created cluster and storage account and container. Detailed error that I get is: Unhandled Exception: System.AggregateException: One or more errors occurred. --- Microsoft.WindowsAzure.Storage.StorageException: The remote server returned an error: (404) Not Found. System.Net.WebException: The remote server returned an error: (404) Not Found. This is my code: public static void ConnectToAzureCloudServer() { HadoopJobConfiguration

Rebuild index failed on Hive on Azure HDInsight with Tez

吃可爱长大的小学妹 提交于 2019-12-12 03:22:22
问题 I try to create indexes on Hive on Azure HDInsight with Tez enabled. I can successfully create indexes but I can't rebuild them : the job failed with this output : Map 1: -/- Reducer 2: 0/1 Status: Failed Vertex failed, vertexName=Map 1, vertexId=vertex_1421234198072_0091_1_01, diagnostics=[Vertex Input: measures initializer failed.] Vertex killed, vertexName=Reducer 2, vertexId=vertex_1421234198072_0091_1_00, diagnostics=[Vertex > received Kill in INITED state.] DAG failed due to vertex

Exception from loading Microsoft.WindowsAzure.Storage when creating a new HiveConnection

做~自己de王妃 提交于 2019-12-11 20:37:45
问题 I had this code working: ClusterDetails details return new HiveConnection( new Uri(details.ConnectionUrl), details.HttpUserName, details.HttpPassword, details.DefaultStorageAccount.Name, details.DefaultStorageAccount.Key); but when I updated the dlls through Nuget, I started getting this exception: {"Could not load file or assembly 'Microsoft.WindowsAzure.Storage, Version=2.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35' or one of its dependencies. The located assembly's manifest

Insert Overwrite: Cannot move … to the trash, as it contains the trash

↘锁芯ラ 提交于 2019-12-11 20:35:51
问题 I am attempting to insert into a table by selecting from another: INSERT OVERWRITE TABLE testtable1 select * from testtable0 The error: Moving data to: wasb://{container}@{storage}.blob.core.windows.net/hive/scratch/hive_2015-06-01_15-05-14_062_6478651325775395196-1/-ext-10000 Loading data to table default.testtable1 rmr: DEPRECATED: Please use 'rm -r' instead. rmr: Cannot move "wasb://{container}@{storage}.blob.core.windows.net/" to the trash, as it contains the trash. Consider using

How to improve performance of loading data from NON Partition table into ORC partition table in HIVE

喜夏-厌秋 提交于 2019-12-11 11:15:37
问题 I'm new to Hive Querying, I'm looking for best practices to retrieve data from Hive table. we have enabled TeZ has execution engine and enabled vectorization. We want to make reporting from Hive table, I read from TEZ document that it can be used for real time reporting. Scenario is from my WEB Application, I would like to show result from Hive Query Select * from Hive table on UI, but for any query, in the hive command prompt takes minimum 20-60 secs even though hive table has 60 GB data ,.