hue

cannot configure HDFS address using gethue/hue docker image

回眸只為那壹抹淺笑 提交于 2019-12-11 07:32:17
问题 I'm trying to get the Hue docker image from gethue/hue, but it seems to ignore the configuration I give him and always look for HDFS on localhost instead of the docker container I ask him to look for. Here is some context: I'm using the following docker compose to launch a HDFS cluster: hdfs-namenode: image: bde2020/hadoop-namenode:1.1.0-hadoop2.7.1-java8 hostname: namenode environment: - CLUSTER_NAME=davidov ports: - "8020:8020" - "50070:50070" volumes: - ./data/hdfs/namenode:/hadoop/dfs

Apache Hue: Hue集成Oozie--hue调度shell脚本--hue调度hive脚本--hue调度MapReduce程序--Hue配置定时调度任务

醉酒当歌 提交于 2019-12-11 07:11:09
Hue集成Oozie 修改hue配置文件hue.ini [liboozie] # The URL where the Oozie service runs on. #This is required in order for # users to submit jobs. Empty value disables the config check. oozie_url=http: / / node - 1:11000 / oozie # Requires FQDN in oozie_url if enabled ## security_enabled=false # Location on HDFS where the #workflows/coordinator are deployed when submitted. remote_deployement_dir= / user / root / oozie_works [oozie] # Location on local FS where the examples are stored. # local_data_dir=/export/servers/oozie-4.1.0-cdh5.14.0/examples/apps # Location on local FS where the data for the

Apache Hue:详细介绍

狂风中的少年 提交于 2019-12-11 05:47:06
Apache Hue介绍 Hue是什么: HUE=Hadoop User Experience Hue是一个开源的Apache Hadoop UI系统, 由Cloudera Desktop演化而来,最后Cloudera公司将其贡献给Apache基金会的Hadoop社区,它是基于Python Web框架Django实现的。 通过使用Hue, 可以在浏览器端的Web控制台上与Hadoop集群进行交互,来分析处理数据,例如操作HDFS上的数据,运行MapReduce Job,执行Hive的SQL语句,浏览HBase数据库等等。 Hue能做什么 : 访问HDFS和文件浏览 通过web调试和开发hive以及数据结果展示 查询solr和结果展示,报表生成 通过web调试和开发impala交互式SQL Query spark调试和开发 Pig开发和调试 oozie任务的开发,监控,和工作流协调调度 Hbase数据查询和修改,数据展示 Hive的元数据(metastore)查询 MapReduce任务进度查看,日志追踪 创建和提交MapReduce,Streaming,Java job任务 Sqoop2的开发和调试 Zookeeper的浏览和编辑 数据库(MySQL,PostGres,SQlite,Oracle)的查询和展示 Hue的架构: Hue是一个友好的界面集成框架

Describe table shows “from deserializer” for column comments in Hue Hive Avro format

为君一笑 提交于 2019-12-11 05:07:02
问题 We have observed that when we store the data in Avro format, it converts byte stream to binary, due to which all the comments gets converted to “from deserializer”. We found a jira bug for this issue as well, few confirms, this issue has been addressed with 0.13 version. We are using hive 1.1 (Cloudera). But we are still facing the issue. Jira :- https://issues.apache.org/jira/browse/HIVE-6681 https://www.bountysource.com/issues/1320154-describe-on-a-table-returns-from-deserializer-for-column

Apache Hue安装部署及编译

╄→尐↘猪︶ㄣ 提交于 2019-12-11 04:09:33
1. 上传解压安装包: Hue的安装支持多种方式,包括rpm包的方式进行安装、tar.gz包的方式进行安装以及cloudera manager的方式来进行安装等,我们这里使用tar.gz包的方式来进行安装。 Hue的压缩包的下载地址: http://archive.cloudera.com/cdh5/cdh/5/ 我们这里使用的是CDH5.14.0这个对应的版本,具体下载地址为 http://archive.cloudera.com/cdh5/cdh/5/hue-3.9.0-cdh5.14.0.tar.gz cd /export/servers/ tar -zxvf hue-3.9.0-cdh5.14.0.tar.gz 2. 编译初始化工作(联网状态下) 2.1. 联网安装各种必须的依赖包 yum install -y asciidoc cyrus-sasl-devel cyrus-sasl-gssapi cyrus-sasl-plain gcc gcc-c++ krb5-devel libffi-devel libxml2-devel libxslt-devel make openldap-devel python-devel sqlite-devel gmp-devel 2.2. Hue初始化配置(必须把要所需要的依赖下载完) 进入到hue的安装目录下的desktop

Apache Hue安装与介绍

白昼怎懂夜的黑 提交于 2019-12-11 02:39:15
1. Hue是什么 HUE=Hadoop User Experience Hue是一个开源的Apache Hadoop UI系统,由Cloudera Desktop演化而来,最后Cloudera公司将其贡献给Apache基金会的Hadoop社区,它是基于Python Web框架Django实现的。 通过使用Hue,可以在浏览器端的Web控制台上与Hadoop集群进行交互,来分析处理数据,例如操作HDFS上的数据,运行MapReduce Job,执行Hive的SQL语句,浏览HBase数据库等等。 2. Hue能做什么 1.访问HDFS和文件浏览 2.通过web调试和开发hive以及数据结果展示 3.查询solr和结果展示,报表生成 4.通过web调试和开发impala交互式SQL Query 5.spark调试和开发 6.Pig开发和调试 7.oozie任务的开发,监控,和工作流协调调度 8.Hbase数据查询和修改,数据展示 9.Hive的元数据(metastore)查询 10.MapReduce任务进度查看,日志追踪 11.创建和提交MapReduce,Streaming,Java job任务 12.Sqoop2的开发和调试 13.Zookeeper的浏览和编辑 14.数据库(MySQL,PostGres,SQlite,Oracle)的查询和展示 3. Hue的架构

Example Oozie job works from Hue, but not from command line: SparkMain not found

﹥>﹥吖頭↗ 提交于 2019-12-11 00:25:36
问题 I've successfully run the example Spark workflow ("Copy a file by launching a Spark Java program") provided in the Hue Oozie workflow editor (in the Cloudera 5.5.1 QuickStart VM). I'm now trying to run it manually using the oozie commandline tool: oozie job -oozie http://localhost:11000/oozie -config job.properties -run The workflow XML is basically unchanged - I have copied it to HDFS and have the following job.properties : nameNode=hdfs://localhost:8020 jobTracker=localhost:8032 oozie.wf

04、Apache Hue与软件的集成

◇◆丶佛笑我妖孽 提交于 2019-12-10 20:54:29
1、Hue集成HDFS 注意:修改完HDFS相关配置之后,需要把配置scp给集群中每台集群,重启hdfs集群 第一步:修改core-site.xml配置 添加下面代码 cd /export/servers/hadoop-2.6.0-cdh5.14.0/etc/hadoop vim core-site.xml <!--允许通过httpfs方式访问hdfs的主机名 --> <property> <name>hadoop.proxyuser.root.hosts</name> <value>*</value> </property> <!--允许通过httpfs方式访问hdfs的用户组 --> <property> <name>hadoop.proxyuser.root.groups</name> <value>*</value> </property> 第二步:修改hdfs-site.xml配置 添加下面代码 cd /export/servers/hadoop-2.6.0-cdh5.14.0/etc/hadoop vim hdfs-site.xml <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> 把修改的两个文件拷贝到另外两个节点路径上 scp core-site.xml

Obtain date from timestamp

♀尐吖头ヾ 提交于 2019-12-10 17:48:18
问题 I have a date field like this: 2017-03-22 11:09:55 (column name: install_date) I have another date field with date like this: 2017-04-20 (column name: test_date) I would like to obtain only the date field from the above (2017-03-22) so that I can perform a DATEDIFF between install_date and test_date. 回答1: Assuming you are looking for this in Hive , you can use TO_DATE function. TO_DATE('2000-01-01 10:20:30') returns '2000-01-01' NOTE : Input to TO_DATE is a string 来源: https://stackoverflow

Can't instantiate SparkSession on EMR 5.0 HUE

旧城冷巷雨未停 提交于 2019-12-10 10:16:32
问题 I'm running an EMR 5.0 cluster and I'm using HUE to create an OOZIE workflow to submit a SPARK 2.0 job. I have ran the job with a spark-submit directly on the YARN and as a step on the same cluster. No problem. But when I do it with HUE I get the following error: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.internal.SessionState': at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:949) at org.apache.spark