Cloudera | 易学教程

Is CDH4 meant mainly for YARN?

阅读更多关于 Is CDH4 meant mainly for YARN?

问题 I have several questions or rather confusions regarding CDH4. I am posting here since I did not get any concrete information regarding my questions. Is CDH4 meant to promote YARN? I tried setting up MapReduce1 using CDH4.3.0 using tarball. I finally did but it is round about and painful. Whereas YARN set up is strait forward. Is anyone using YARN in production at all? Apache clearly says that YARN is still in alpha version and not meant for production. In such cases why is Cloudera making

Unable to delete HDFS Corrupt files

阅读更多关于 Unable to delete HDFS Corrupt files

问题 I am unable to delete corrupt files present in my HDFS. Namenode has run into Safe mode. Total number of blocks are 980, out of which 978 have reported. When I run the following command, sudo -u hdfs hdfs dfsadmin -report The report generated is, Safe mode is ON Configured Capacity: 58531520512 (54.51 GB) Present Capacity: 35774078976 (33.32 GB) DFS Remaining: 32374509568 (30.15 GB) DFS Used: 3399569408 (3.17 GB) DFS Used%: 9.50% Under replicated blocks: 0 Blocks with corrupt replicas: 0

Adding a new namenode data directory to an existing cluster

阅读更多关于 Adding a new namenode data directory to an existing cluster

问题 What procedure do i need to follow to properly add a new NameNode data directory (dfs.name.dir, dfs.namenode.name.dir) to an existing production cluster? I have added the new path to the comma-delimited list in the hdfs-site.xml file but when i try to start the namenode i get the following error: Directory /data/nfs/dfs/nn is in an inconsistent state: storage directory does not exist or is not accessible. In my case, i have two directories already in place and working. (/data/1/dfs/nn,/data/2

CDH4: Version conflict: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected

阅读更多关于 CDH4: Version conflict: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected

问题 I'm trying to upgrade from CDH3 to CDH4 and am getting a version conflict from compile to run time. I'm getting this error: Exception in thread "main" java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected From googling it seems that my code is being compiled against Hadoop 1.x and is running on Hadoop 2.0. I'm compiling and running the app on the same Hadoop client, so it should all be Hadoop 2.0. Here's what I get from running

Hadoop connectivity with SAS

阅读更多关于 Hadoop connectivity with SAS

问题 I want to use SAS/ACESS 9.3M2 Interface for connecting sas with my Hive. My question is, whether sas imports hive cubes into sas environment and queries there? or, It again hits hive for the purpose of reporting so that it runs MR which degrades my reporting performance to more than 2-4 secs. If it imports hive tables to its environment what would be its performance when compared to normal sql cubes? I am totally new to sas i want my reports generated with in 2-4 secs where my aggregated data

Filtering input files using globStatus in MapReduce

阅读更多关于 Filtering input files using globStatus in MapReduce

问题 I have a lot of input files and I want to process selected ones based on the date that has been appended in the end. I am now confused on where do I use the globStatus method to filter out the files. I have a custom RecordReader class and I was trying to use globStatus in its next method but it didn't work out. public boolean next(Text key, Text value) throws IOException { Path filePath = fileSplit.getPath(); if (!processed) { key.set(filePath.getName()); byte[] contents = new byte[(int)

Is Hadoop in Docker container faster/worth it? [closed]

阅读更多关于 Is Hadoop in Docker container faster/worth it? [closed]

问题 Closed . This question is opinion-based. It is not currently accepting answers. Want to improve this question? Update the question so it can be answered with facts and citations by editing this post. Closed 3 years ago . I have a Hadoop based environment. I use Flume , Hue and Cassandra in this system. There is a big hype around Docker nowadays, so would like to examine, what are pros and cons in dockerization in this case. I think it should be much more portable, but it can be set using

PhoenixOutputFormat not found when running a Spark Job on CDH 5.4 with Phoenix 4.5

阅读更多关于 PhoenixOutputFormat not found when running a Spark Job on CDH 5.4 with Phoenix 4.5

问题 I managed to configure Phoenix 4.5 on Cloudera CDH 5.4 by recompiling the source code. sqlline.py works well, but there are problems with spark. spark-submit --class my.JobRunner \ --master yarn --deploy-mode client \ --jars `ls -dm /myapp/lib/* | tr -d ' \r\n'` \ /myapp/mainjar.jar The /myapp/lib folders contains the phoenix core lib, which contains class org.apache.phoenix.mapreduce.PhoenixOutputFormat . But it seems that the driver/executor cannot see it. Exception in thread "main" java

Spark writing to hdfs not working with the saveAsNewAPIHadoopFile method

阅读更多关于 Spark writing to hdfs not working with the saveAsNewAPIHadoopFile method

问题 I am using Spark 1.1.0 on CDH 5.2.0 and am trying to make sure that I can read from and write to hdfs. I quickly realized that the .textFile and .saveAsTextFile call the old api and do not seem to be compatible with our hdfs version. def testHDFSReadOld(sc: SparkContext, readFile: String){ //THIS WILL FAIL WITH //(TID 0, dl1rhd416.internal.edmunds.com): java.lang.IllegalStateException: unread block data //java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java

HDFS as volume in cloudera quickstart docker

阅读更多关于 HDFS as volume in cloudera quickstart docker

问题 I am fairly new to both hadoop and docker. I haven been working on extending the cloudera/quickstart docker image docker file and wanted to mount a directory form host and map it to hdfs location, so that performance is increased and data are persist localy. When i mount volume anywhere with -v /localdir:/someDir everything works fine, but that's not my goal. But when i do -v /localdir:/var/lib/hadoop-hdfs both datanode and namenode fails to start and I get : "cd /var/lib/hadoop-hdfs: