HDFS

携程实时智能检测平台建设实践

大城市里の小女人 提交于 2020-01-06 14:36:14
【推荐】2019 Java 开发者跳槽指南.pdf(吐血整理) >>> 一、背景介绍 1.规则告警带来的问题 大部分监控平台是基于规则告警实现监控指标的预警。规则告警一般基于统计学,如某个指标同比、环比连续上升或下降到一定阈值进行告警。规则告警需要用户较为熟悉业务指标的形态,从而才能较为准确的配置告警阈值,这样带来的问题是配置规则告警非常繁琐、告警效果也比较差,需要大量人力物力来维护规则告警。当一个告警产生时,也需要耗费许多人力验证告警是否正确并确认是否需要重新调整阈值。在携程,规则告警还涉及了其它问题,比如携程光公司级别的监控平台就有三个,每个业务部门还会根据自己的业务需求或业务场景构建自己的监控平台。携程内部有十几个不同规模的监控平台,在每一个监控平台都配置监控指标对于用户是非常繁琐的。 二、Prophet 针对规则告警存在的以上几种问题,携程构建了自己的实时智能异常检测平台——Prophet。携程构建Prophet的灵感源于FaceBook的Prophet,但实现上有别于FaceBook的Prophet。 1.一站式异常检测解决方案 首先,Prophet以时间序列类型的数据作为数据输入。其次,Prophet以监控平台作为接入对象,以去规则化为目标。基于深度学习算法实现异常的智能检测,基于实时计算引擎实现异常的实时检测,提供了统一的异常检测解决方案。 2.Prophet系统架构

How to prevent committing of an empty Avro file into HDFS?

风流意气都作罢 提交于 2020-01-06 09:02:59
问题 I have a job that create a Avro file into HDFS and append the file with data. However, occasionally there wont be any data for appending, in that case I don't want the application to flush and close the file, instead it should check whether the file is empty or not (but I assume thatthe Avro schema will be written into the header so technically not an empty file) and delete the file if it is empty. Is this feasible with Avro+HDFS lib? 回答1: Try using LazyOutputFormat when specifying the output

I am trying to format namenode in hdfs but says: permission denied

爱⌒轻易说出口 提交于 2020-01-06 01:35:59
问题 I am trying to format namenode . For this I have tried. hduser@Ubuntu:/usr/hadoop/hadoop-2.7.1$ bin/hdfs namenode -format It says: bin/hdfs: line 304: /root/software/jdk1.8.0_45/bin/java: Permission denied bin/hdfs: line 304: exec: /root/software/jdk1.8.0_45/bin/java: cannot execute: Permission denied 回答1: So, you have an odd install. It looks like you are trying to reference a JDK installation that is installed to /root (this is very unusual). You are getting permission denied because you

hdfs API

ε祈祈猫儿з 提交于 2020-01-05 19:36:18
HDFS API详解 2012-09-06 13:59:13 | 分类: hadoop | 标签: hdfs api | 举报 | 字号 订阅 Hadoop中关于文件操作类基本上全部是在" org.apache.hadoop.fs "包中,这些API能够支持的操作包含:打开文件,读写文件,删除文件等。 Hadoop类库中最终面向用户提供的 接口类 是 FileSystem ,该类是个 抽象类 ,只能通过来类的get方法得到具体类。get方法存在几个重载版本,常用的是这个: static FileSystem get(Configuration conf);   该类封装了几乎所有的文件操作,例如mkdir,delete等。综上基本上可以得出操作文件的程序库框架: operator() { 得到Configuration对象 得到FileSystem对象 进行文件操作 } 6.1 上传本地文件   通过" FileSystem.copyFromLocalFile(Path src,Patch dst) "可将 本地文件 上传 到 HDFS 的制定位置上,其中 src和dst均为文件的完整路径。具体事例如下: package com.hebut.file; import org.apache.hadoop.conf.Configuration; import org.apache

Cloudera Manager and hdfs-site.xml

谁说胖子不能爱 提交于 2020-01-05 05:58:30
问题 When using Cloudera Manager I can access to the hdfs-site.xml file via : Cloudera Manager > Cluster > HDFS > Instances > (NameNode, for example)> Processes COnfiguration Files > hdfs-site.xml Then the URL points to : http://quickstart.cloudera:7180/cmf/process/8/config?filename=hdfs-site.xml Is this file accessible directly via the file system and if yes, where is it located 回答1: The configurations set in the Cloudera Manager are stored in the Cloudera Manager Database. They are not persisted

What are some common HDFS commands that can be mapped in the bash files?

只愿长相守 提交于 2020-01-05 05:54:54
问题 I am relatively new to Hadoop and I have been using HDFS CLI a lot. Commands like hdfs dfs -ls are becoming redundant to type. Is it possible to create an alias to this command (i.e., h -ls ) in either the .bashrc or .bash_profile files? Are there any other useful commands that I can map here? 回答1: The good practice is to put aliases in .bash_aliases . For your problem, I'd put alias h="hdfs dfs" in my .bash_aliases file (create it if it doesn't exist) Most distribs will already have this in

What are some common HDFS commands that can be mapped in the bash files?

妖精的绣舞 提交于 2020-01-05 05:54:27
问题 I am relatively new to Hadoop and I have been using HDFS CLI a lot. Commands like hdfs dfs -ls are becoming redundant to type. Is it possible to create an alias to this command (i.e., h -ls ) in either the .bashrc or .bash_profile files? Are there any other useful commands that I can map here? 回答1: The good practice is to put aliases in .bash_aliases . For your problem, I'd put alias h="hdfs dfs" in my .bash_aliases file (create it if it doesn't exist) Most distribs will already have this in

What should be the size of the file in HDFS for best MapReduce job performance

两盒软妹~` 提交于 2020-01-05 02:57:10
问题 I want to do a copy text files from external sources to HDFS. Lets assume that I can combine and split the files based on their size, what should be the size of the text file for best custom Map Reduce job performance. Does size matter ? 回答1: HDFS is designed to support very large files not small files. Applications that are compatible with HDFS are those that deal with large data sets. These applications write their data only once but they read it one or more times and require these reads to

How to give custom name to Sqoop output files

妖精的绣舞 提交于 2020-01-04 17:33:06
问题 When I import data to hive using sqoop bydefault it creates file name as part-m-0000, part-m-0001 etc on HDFS. Is it possible to rename these files? If i wish to give some meaningfull name like suffxing file name with date to indicate load how can I do it? Please suggest 回答1: You can't do it with sqoop directly, but you can rename them in HDFS after sqoop is done importing: today=`date +%Y-%m-%d` files=$(hadoop fs -ls /path-to-files | awk '{print $8}') for f in $files; do hadoop fs -mv $f $f

Spark Hadoop Failed to get broadcast

别等时光非礼了梦想. 提交于 2020-01-04 06:15:16
问题 Running a spark-submit job and receiving a "Failed to get broadcast_58_piece0..." error. I'm really not sure what I'm doing wrong. Am I overusing UDFs? Too complicated a function? As a summary of my objective, I am parsing text from pdfs, which are stored as base64 encoded strings in JSON objects. I'm using Apache Tika to get the text, and trying to make copious use of data frames to make things easier. I had written a piece of code that ran the text extraction through tika as a function