Cloudera | 易学教程

How To Automate Hadoop Trash Cleanup

阅读更多关于 How To Automate Hadoop Trash Cleanup

问题 I can clear trash under my user folder by running hadoop fs -expunge This gets rid of files that are older than the fs.trash.interval value. Is there a for expunge to happen automatically to recover diskspace? Also I see the following output when I run expunge [cloudera@localhost conf]$ hadoop fs -expunge 14/07/17 15:43:54 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 1 minutes, Emptier interval = 0 minutes. The emptier interval is 0 which suggests that

Running wordcount sample using MRV1 on CDH4.0.1 VM

阅读更多关于 Running wordcount sample using MRV1 on CDH4.0.1 VM

问题 I downloaded the VM from https://downloads.cloudera.com/demo_vm/vmware/cloudera-demo-vm-cdh4.0.0-vmware.tar.gz I found that below listed services are running after the system boots. MRV1 Services hadoop-0.20-mapreduce-jobtracker hadoop-0.20-mapreduce-tasktracker MRV2 services hadoop-yarn-nodemanager hadoop-yarn-resourcemanager hadoop-mapreduce-historyserver HDFS Services hadoop-hdfs-namenode hadoop-hdfs-datanode The word count example runs fine and generates the output as expected /usr/bin

Cloudera: upload a File in the HDFS Exception

阅读更多关于 Cloudera: upload a File in the HDFS Exception

问题 I use a MAC OS X Yosemite with a VM cloudera-quickstart-vm-5.4.2-0-virtualbox. When I type "hdfs dfs -put testfile.txt" to put a TEXT FILE into HDFS I get a DataStreamer Exception . I notice that the main problem is that the number of nodes that I have is null. I copy here below the complete error message and I would like to know how should I do to solve this. > WARN hdfs.DFSClient: DataStreamer > Exceptionorg.apache.hadoop.ipc.RemoteException(java.io.IOException): > File /user/cloudera

离线安装 Cloudera ( CDH 5.x )

阅读更多关于离线安装 Cloudera ( CDH 5.x )

要配置生产环境前，最好严格按照官方文档/说明配置环境。比如，官方说这个安装包用于RETHAT6, CENTOS6，那就要装到6的版本下，不然很容易出现各种各样的错。配置这个CDH5我入了很多坑，最重要的有2点 1. HP GEN9 DL60服务器装CentOS系统, 开始想装7的版本, 官方只说支持到6. 经过各种折腾确实装好了（后面会单开一章写服务器装CentOS7） 2. CDH暂不官方支持CentOS 7, 查到国外有人是安装成功了,但是会有各种小问题需要息解决. 如7带的是python2.7, 而CDH5还是用的2.6, 所以会有包缺失. 所以最终我还是把我的服务器重装回6.5了 ==================准备安装========================== 1. 查看并下载最新的CDH版本: http://archive.cloudera.com/cdh5/parcels/latest/ ( 20150530时还是5.3.3的版本), 如果是CentOS6, 则下载带el6的 . 　　需要下载的文件有三个, 对应的parcel, parcel.sha1, manifest.json 下载完.sha1后改后缀为.sha 2. 下载对应的CM版本: http://archive-primary.cloudera.com/cm5/cm/5/

Cloudera 5.6: Parquet does not support date. See HIVE-6384

阅读更多关于 Cloudera 5.6: Parquet does not support date. See HIVE-6384

问题 I am currently using Cloudera 5.6 trying to create a parquet format table in hive table based off another table, but I am running into an error. create table sfdc_opportunities_sandbox_parquet like sfdc_opportunities_sandbox STORED AS PARQUET Error Message Parquet does not support date. See HIVE-6384 I read that hive 1.2 has a fix for this issue, but Cloudera 5.6 and 5.7 do not come with hive 1.2. Has anyone found way around this issue? 回答1: Except from using an other data type like TIMESTAMP

Cloudera 5.6: Parquet does not support date. See HIVE-6384

阅读更多关于 Cloudera 5.6: Parquet does not support date. See HIVE-6384

Connecting and Persisting to HBase

阅读更多关于 Connecting and Persisting to HBase

问题 I just tried to connect to hbase which is part of the cloudera-vm using a java client. (192.168.56.102 is the inet ip of the vm) I use virtual box with host only network setting. So I can access the webUI of the hbase master @ http://192.168.56.102:60010/master.jsp Also my java client (worked well on the vm itself) established connection to 192.168.56.102:2181 But when it calls getMaster I get connection refused see log: 11/09/14 11:19:30 INFO zookeeper.ZooKeeper: Initiating client connection

Accessing HBase running in VM with a client on host system

阅读更多关于 Accessing HBase running in VM with a client on host system

问题 I try to write some data to hbase with a client program HBase @ Hadoop runs in a preconfigured VM from Cloudera @ ubuntu. The Client runs on the system hosting the VM and running the client directly in the VM works. So now I want to use the client outside the vm to access the servers on the vm I'm using NAT. To be able to access the servers like HBase Master, HUE..running on the vm I configured port forwarding in virtual box: Thus I can reach the overview sites of the HBase Master, HUE.. To

Issue in connecting kafka from outside

阅读更多关于 Issue in connecting kafka from outside

问题 I am using hortonwork Sandbox for kafka server trying to connect kafka from eclipse with java code . Use this configuration to connect to producer to send the message metadata.broker.list=sandbox.hortonworks.com:45000 serializer.class=kafka.serializer.DefaultEncoder zk.connect=sandbox.hortonworks.com:2181 request.required.acks=0 producer.type=sync where sandbox.hortonworks.com is sandboxname to whom i connect in kafka server.properties I changed this configuration host.name=sandbox

Oozie workflow: Hive table not found but it does exist

阅读更多关于 Oozie workflow: Hive table not found but it does exist

问题 I got a oozie workflow, running on a CDH4 cluster of 4 machines (one master-for-everything, three "dumb" workers). The hive metastore runs on the master using mysql (driver is present), the oozie server also runs on the master using mysql, too. Using the web interface I can import and query hive as expected, but when I do the same queries within an oozie workflow it fails. Even the addition of the "IF EXISTS" leads to the error below. I tried to add the connection information as properties to