hadoop2

error when running sqoop2 server on Amazon EMR with yarn

梦想的初衷 提交于 2020-03-26 08:11:23
问题 I'm trying to install sqoop 2 (version 1.99.3) on an Amazon EMR cluster (AMI version 3.2.0 / Hadoop version 2.4.0). When I start the sqoop server, I see this error in localhost.log: Sep 10, 2014 4:55:56 PM org.apache.catalina.core.StandardContext listenerStart SEVERE: Exception sending context initialized event to listener instance of class org.apache.sqoop.server.ServerInitializer java.lang.RuntimeException: Failure in server initialization at org.apache.sqoop.core.SqoopServer.initialize

在windows下开发、调试hadoop 2程序

拟墨画扇 提交于 2020-03-24 20:38:13
3 月,跳不动了?>>> 一、winutils的windows版本 GitHub上,有牛人提供了 winutils的windows的版本,项目地址是: https://github.com/srccodes/hadoop-common-2.2.0-bin 直接下载此项目的zip包,下载后是文件名是hadoop-common-2.2.0-bin-master.zip,解压到一个目录 二、配置环境变量 配置环境变量: HADOOP_HOME, 值是 hadoop-common-2.2.0-bin-master.zip解压后的目录,如D:\Program\hadoop-common-2.2.0-bin-master 三、验证 利用hive JDBC执行show tables import java.sql.Connection; import java.sql.DriverManager; import java.sql.ResultSet; import java.sql.SQLException; import java.sql.Statement; public class Demo { public final static String hiveJDBC = "jdbc:hive2://172.168.10.12:10000"; public static void main

Hadoop Cluster - “hadoop” user ssh communication

≡放荡痞女 提交于 2020-03-23 09:49:13
问题 I am setting up Hadoop 2.7.3 cluster on EC2 servers - 1 NameNode, 1 Secondary NameNode and 2 DataNodes. Hadoop core uses SSH for communication with slaves to launch the processes on the slave node. Do we need to have same SSH keys on all the nodes for the hadoop user? What is the best practice/ideal way to copy or add the NameNode to Slave nodes SSH credentials? 回答1: Do we need to have same SSH keys on all the nodes for the hadoop user? The same public key needs to be on all of the nodes What

jobTracker property in job.properties of oozie

穿精又带淫゛_ 提交于 2020-02-25 04:49:46
问题 I'm using hadoop-2.7.2 and oozie-4.0.1, what should be the jobTracker value in job.properties file of oozie workflow. I referred this link; http://hadooptutorial.info/apache-oozie-installation-on-ubuntu-14-04/ which states that, in YARN architecture the job tracker runs on 8032 port and i'm currently using this. But in mapred-site.xml of hadoop i'm having the value hdfs://localhost:54311 for job tracker property. I'm confused, can any one explain me or provide some useful links for installing

Apache PIG - How to cut digits after decimal point

一世执手 提交于 2020-01-25 16:23:11
问题 Is there any possibility to cut a certain area after the decimal point of a float or double number? For example: the result would be 2.67894 => I want to have 2.6 as result (and not 2.7 when rounded). 回答1: try it.. val is your values like 2.666,3.666,4.666666,5.3456334..... b = foreach a GENERATE (FLOOR(val * 10) / 10); dump b; 回答2: Write a UDF (User Defined Function) for this. A very simple python UDF (numformat.py): @outputSchema('value:double') def format(data): return round(data,1) (Of

Apache PIG - How to cut digits after decimal point

余生长醉 提交于 2020-01-25 16:20:30
问题 Is there any possibility to cut a certain area after the decimal point of a float or double number? For example: the result would be 2.67894 => I want to have 2.6 as result (and not 2.7 when rounded). 回答1: try it.. val is your values like 2.666,3.666,4.666666,5.3456334..... b = foreach a GENERATE (FLOOR(val * 10) / 10); dump b; 回答2: Write a UDF (User Defined Function) for this. A very simple python UDF (numformat.py): @outputSchema('value:double') def format(data): return round(data,1) (Of

Hbase client API not connecting to Hbase

泪湿孤枕 提交于 2020-01-24 21:32:30
问题 I am following this link to insert data into my hbase. I followed all the steps and written below code: import java.io.IOException; import java.util.ArrayList; import java.util.List; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.HColumnDescriptor; import org.apache.hadoop.hbase.HTableDescriptor; import org.apache.hadoop.hbase.KeyValue; import org.apache.hadoop.hbase.MasterNotRunningException; import org.apache

Counter is not working in reducer code

限于喜欢 提交于 2020-01-15 11:07:27
问题 I am working on a Big hadoop project and there is a small KPI, where I have to write only the top 10 values in reduces output. To complete this requirement, I have used a counter and break the loop when counter is equal to 11, but still reducer writes all of the values to HDFS. This is a pretty simple java code, but I am stuck :( For testing, I have created one stand alone class (java application) to do this and this is working there; I'm wondering why it is not working in reducer code.

Know the disk space of data nodes in hadoop?

萝らか妹 提交于 2020-01-15 05:13:06
问题 Is there a way or any command using which I can come to know the disk space of each datanode or the total cluster disk space? I tried the command dfs -du -h / but it seems that I do not have permission to execute it for many directories and hence cannot get the actual disk space. 回答1: From UI: http://namenode:50070/dfshealth.html#tab-datanode ---> which will give you all the details about datanode. From command line: To get disk space of each datanode: sudo -u hdfs hdfs dfsadmin -report --->

How to change the output file name from part-00000 in reducer to inputfile name

时光毁灭记忆、已成空白 提交于 2020-01-14 04:16:06
问题 Currently I am able to implement the name change from part-00000 to a custom fileName in mapper. I am doing this by taking the inputSplit . I tried the same in reducer to rename the file but, fileSplit method is not available for reducer. So, is there a best way to rename the output of a reducer to with inputfile name. Below is how I acheived it in mapper. @Override public void setup(Context con) throws IOException, InterruptedException { fileName = ((FileSplit) con.getInputSplit()).getPath()