hadoop2 | 易学教程

Couldn't start hadoop datanode normally

阅读更多关于 Couldn't start hadoop datanode normally

i am trying to install hadoop 2.2.0 i am getting following kind of error while starting dataenode services please help me resolve this issue.Thanks in Advance. 2014-03-11 08:48:16,406 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /home/prassanna/usr/local/hadoop/yarn_data/hdfs/datanode/in_use.lock acquired by nodename 3627@prassanna-Studio-1558 2014-03-11 08:48:16,426 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool BP-611836968-127.0.1.1-1394507838610 (storage id DS-1960076343-127.0.1.1-50010-1394127604582) service to

Hadoop : Reading ORC files and putting into RDBMS?

阅读更多关于 Hadoop : Reading ORC files and putting into RDBMS?

I have a hive table which is stored in ORC files format. I want to export the data to a Teradata database. I researched sqoop but could not find a way to export ORC files. Is there a way to make sqoop work for ORC ? or is there any other tool that I could use to export the data ? Thanks. vinayak_narune You can use Hcatalog sqoop export --connect "jdbc:sqlserver://xxxx:1433;databaseName=xxx;USERNAME=xxx;PASSWORD=xxx" --table rdmsTableName --hcatalog-database hiveDB --hcatalog-table hiveTableName 来源： https://stackoverflow.com/questions/36475364/hadoop-reading-orc-files-and-putting-into-rdbms

What is the replacement of NULLIF in Hive?

阅读更多关于 What is the replacement of NULLIF in Hive?

I would like to know what is the replacement of NULLIF in Hive? I am using COALESCE but its not serving my requirement. My query statement is something like : COALESCE(A,B,C) AS D COALESCE will return first NOT NULL value. But my A/B/C contain blank values so COALESCE is not assigning that value to D as it is considering blank as NOT NULL. But I want the correct value to be get assign to D. In SQL I could have use COALESCE(NULLIF(A,'')......) so it will check for blank as well. I tried CASE but it's not working. Just use case : select (case when A is null or A = '' then . . . end) This is

Hadoop Error - All data nodes are aborting

阅读更多关于 Hadoop Error - All data nodes are aborting

I am using Hadoop 2.3.0 version. Sometimes when I execute the Map reduce job, the below errors will get displayed. 14/08/10 12:14:59 INFO mapreduce.Job: Task Id : attempt_1407694955806_0002_m_000780_0, Status : FAILED Error: java.io.IOException: All datanodes 192.168.30.2:50010 are bad. Aborting... at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1023) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:838) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:483

In spark, how does broadcast work?

阅读更多关于 In spark, how does broadcast work?

问题 This is a very simple question: in spark, broadcast can be used to send variables to executors efficiently. How does this work ? More precisely: when are values sent : as soon as I call broadcast , or when the values are used ? Where exactly is the data sent : to all executors, or only to the ones that will need it ? where is the data stored ? In memory, or on disk ? Is there a difference in how simple variables and broadcast variables are accessed ? What happens under the hood when I call

Increase number of Hive mappers in Hadoop 2

阅读更多关于 Increase number of Hive mappers in Hadoop 2

问题 I created a HBase table from Hive and I'm trying to do a simple aggregation on it. This is my Hive query: from my_hbase_table select col1, count(1) group by col1; The map reduce job spawns only 2 mappers and I'd like to increase that. With a plain map reduce job I would configure the yarn and mapper memory to increase the number of mappers. I tried the following in Hive but it did not work: set yarn.nodemanager.resource.cpu-vcores=16; set yarn.nodemanager.resource.memory-mb=32768; set

What is --direct mode in sqoop?

阅读更多关于 What is --direct mode in sqoop?

As per my understanding sqoop is used to import or export table/data from the Database to HDFS or Hive or HBASE. And we can directly import a single table or list of tables. Internally mapreduce program (i think only map task) will run. My doubt is what is sqoop direct and what when to go with sqoop direct option? Just read the Sqoop documentation! General principles are located here for imports and there for exports Some databases can perform imports in a more high-performance fashion by using database-specific data movement tools (...) Some databases provides a direct mode for exports as

Is there the equivalent for a `find` command in `hadoop`?

阅读更多关于 Is there the equivalent for a `find` command in `hadoop`?

I know that from the terminal, one can do a find command to find files such as : find . -type d -name "*something*" -maxdepth 4 But, when I am in the hadoop file system, I have not found a way to do this. hadoop fs -find .... throws an error. How do people traverse files in hadoop? I'm using hadoop 2.6.0-cdh5.4.1 . hadoop fs -find was introduced in Apache Hadoop 2.7.0. Most likely you're using an older version hence you don't have it yet. see: HADOOP-8989 for more information. In the meantime you can use hdfs dfs -ls -R <pattern> e.g,: hdfs dfs -ls -R /demo/order*.* but that's not as powerful

Pig and Hadoop connection error

阅读更多关于 Pig and Hadoop connection error

I am getting ConnectionRefused error when I am running pig in mapreduce mode. Details: I have installed Pig from tarball( pig-0.14), and exported the classpath in bashrc. I have all the Hadoop (hadoop-2.5) daemons up and running (confirmed by JPS). [root@localhost sbin]# jps 2272 Jps 2130 DataNode 2022 NameNode 2073 SecondaryNameNode 2238 NodeManager 2190 ResourceManager I am running pig in mapreduce mode: [root@localhost sbin]# pig grunt> file = LOAD '/input/pig_input.csv' USING PigStorage(',') AS (col1,col2,col3); grunt> dump file; And then I am getting the error: java.io.IOException: java

Setting Spark as default execution engine for Hive

阅读更多关于 Setting Spark as default execution engine for Hive

Hadoop 2.7.3, Spark 2.1.0 and Hive 2.1.1. I am trying to set spark as default execution engine for hive. I uploaded all jars in $SPARK_HOME/jars to hdfs folder and copied scala-library, spark-core, and spark-network-common jars to HIVE_HOME/lib. Then I configured hive-site.xml with the following properties: <property> <name>hive.execution.engine</name> <value>spark</value> </property> <property> <name>spark.master</name> <value>spark://master:7077</value> <description>Spark Master URL</description> </property> <property> <name>spark.eventLog.enabled</name> <value>true</value> <description