Getting data directly from a website to a hdfs

放肆的年华 提交于 2019-12-13 10:42:33

问题


How do I get data directly which is entering on a website concurrently on hdfs?


回答1:


If you plan to have High availability read and writes, then you can use Hbase to store the data.

If you are using REST API, you can store the data directly to Hbase as it has dedicated Hbase REST API that can store into Hbase Tables.

1) Linear and modular scalability. 2) Strictly consistent reads and writes. 3) Automatic and configurable sharding of tables.

For more about HBase :- https://hbase.apache.org/

Else if you want some streaming data into HDFS from any source, you can look into confluent platform ( which has inbuilt kafka ) and can store into HDFS.




回答2:


This entirely depends on what data you have and how willing you are to maintain extra tools on top of Hadoop.

if you're just accepting events from a logfile, Flume, Fluentd, or Filebeat are your best options.

If you are accepting client side events, such as clicks or mouse movements, for example, then you need some backend server accepting those requests. For example, Flume TCP Source, but you probably want some type of authentication endpoint in front of this service to prevent random external messages to your event channel.

You can also use Kafka. The Kafka REST Proxy (by Confluent) can be used to accept REST requests and produce to a Kafka topic. Kafka HDFS Connect (also by Confluent) can consume from Kafka and publish messages to HDFS in near real time, much like Flume

Other options include Apache Nifi or Streamsets. Again, using a TCP or HTTP event source listener with an HDFS destination processor



来源:https://stackoverflow.com/questions/49726697/getting-data-directly-from-a-website-to-a-hdfs

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!