hadoop2 | 易学教程

error when running sqoop2 server on Amazon EMR with yarn

阅读更多关于 error when running sqoop2 server on Amazon EMR with yarn

问题 I'm trying to install sqoop 2 (version 1.99.3) on an Amazon EMR cluster (AMI version 3.2.0 / Hadoop version 2.4.0). When I start the sqoop server, I see this error in localhost.log: Sep 10, 2014 4:55:56 PM org.apache.catalina.core.StandardContext listenerStart SEVERE: Exception sending context initialized event to listener instance of class org.apache.sqoop.server.ServerInitializer java.lang.RuntimeException: Failure in server initialization at org.apache.sqoop.core.SqoopServer.initialize

在windows下开发、调试hadoop 2程序

阅读更多关于在windows下开发、调试hadoop 2程序

3 月，跳不动了？>>> 一、winutils的windows版本 GitHub上，有牛人提供了 winutils的windows的版本，项目地址是： https://github.com/srccodes/hadoop-common-2.2.0-bin 直接下载此项目的zip包，下载后是文件名是hadoop-common-2.2.0-bin-master.zip,解压到一个目录二、配置环境变量配置环境变量： HADOOP_HOME, 值是 hadoop-common-2.2.0-bin-master.zip解压后的目录，如D:\Program\hadoop-common-2.2.0-bin-master 三、验证利用hive JDBC执行show tables import java.sql.Connection; import java.sql.DriverManager; import java.sql.ResultSet; import java.sql.SQLException; import java.sql.Statement; public class Demo { public final static String hiveJDBC = "jdbc:hive2://172.168.10.12:10000"; public static void main

Hadoop Cluster - “hadoop” user ssh communication

阅读更多关于 Hadoop Cluster - “hadoop” user ssh communication

问题 I am setting up Hadoop 2.7.3 cluster on EC2 servers - 1 NameNode, 1 Secondary NameNode and 2 DataNodes. Hadoop core uses SSH for communication with slaves to launch the processes on the slave node. Do we need to have same SSH keys on all the nodes for the hadoop user? What is the best practice/ideal way to copy or add the NameNode to Slave nodes SSH credentials? 回答1: Do we need to have same SSH keys on all the nodes for the hadoop user? The same public key needs to be on all of the nodes What

jobTracker property in job.properties of oozie

阅读更多关于 jobTracker property in job.properties of oozie

问题 I'm using hadoop-2.7.2 and oozie-4.0.1, what should be the jobTracker value in job.properties file of oozie workflow. I referred this link; http://hadooptutorial.info/apache-oozie-installation-on-ubuntu-14-04/ which states that, in YARN architecture the job tracker runs on 8032 port and i'm currently using this. But in mapred-site.xml of hadoop i'm having the value hdfs://localhost:54311 for job tracker property. I'm confused, can any one explain me or provide some useful links for installing

Apache PIG - How to cut digits after decimal point

阅读更多关于 Apache PIG - How to cut digits after decimal point

问题 Is there any possibility to cut a certain area after the decimal point of a float or double number? For example: the result would be 2.67894 => I want to have 2.6 as result (and not 2.7 when rounded). 回答1: try it.. val is your values like 2.666,3.666,4.666666,5.3456334..... b = foreach a GENERATE (FLOOR(val * 10) / 10); dump b; 回答2: Write a UDF (User Defined Function) for this. A very simple python UDF (numformat.py): @outputSchema('value:double') def format(data): return round(data,1) (Of

Apache PIG - How to cut digits after decimal point

阅读更多关于 Apache PIG - How to cut digits after decimal point

Hbase client API not connecting to Hbase

阅读更多关于 Hbase client API not connecting to Hbase

问题 I am following this link to insert data into my hbase. I followed all the steps and written below code: import java.io.IOException; import java.util.ArrayList; import java.util.List; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.HColumnDescriptor; import org.apache.hadoop.hbase.HTableDescriptor; import org.apache.hadoop.hbase.KeyValue; import org.apache.hadoop.hbase.MasterNotRunningException; import org.apache

Counter is not working in reducer code

阅读更多关于 Counter is not working in reducer code

问题 I am working on a Big hadoop project and there is a small KPI, where I have to write only the top 10 values in reduces output. To complete this requirement, I have used a counter and break the loop when counter is equal to 11, but still reducer writes all of the values to HDFS. This is a pretty simple java code, but I am stuck :( For testing, I have created one stand alone class (java application) to do this and this is working there; I'm wondering why it is not working in reducer code.

Know the disk space of data nodes in hadoop?

阅读更多关于 Know the disk space of data nodes in hadoop?

问题 Is there a way or any command using which I can come to know the disk space of each datanode or the total cluster disk space? I tried the command dfs -du -h / but it seems that I do not have permission to execute it for many directories and hence cannot get the actual disk space. 回答1: From UI: http://namenode:50070/dfshealth.html#tab-datanode ---> which will give you all the details about datanode. From command line: To get disk space of each datanode: sudo -u hdfs hdfs dfsadmin -report --->

How to change the output file name from part-00000 in reducer to inputfile name

阅读更多关于 How to change the output file name from part-00000 in reducer to inputfile name

问题 Currently I am able to implement the name change from part-00000 to a custom fileName in mapper. I am doing this by taking the inputSplit . I tried the same in reducer to rename the file but, fileSplit method is not available for reducer. So, is there a best way to rename the output of a reducer to with inputfile name. Below is how I acheived it in mapper. @Override public void setup(Context con) throws IOException, InterruptedException { fileName = ((FileSplit) con.getInputSplit()).getPath()