Can Apache Sqoop and Flume be used interchangeably?

点点圈 提交于 2019-12-10 09:40:53

问题


I am new to Big data. From some of the answers to What's the difference between Flume and Sqoop?, both Flume and Sqoop can pull data from source and push to Hadoop. Can anyone please specify exaclty where flume is used and where sqoop is? Can both be used for the same tasks?


回答1:


Flume and Sqoop are both designed to work with different kind of data sources.

Sqoop works with any kind of RDBMS system that supports JDBC connectivity. Flume on the other hand works well with streaming data sources like log data which is being generated continuously in your environment.

Specifically,

  • Sqoop could be used to import/export data to/from RDBMS systems like Oracle, MS SQL Server, MySQL, PostgreSQL, Netezza, Teradata and some others which supports JDBC connectivity.
  • Flume could be used to ingest high throughput data from sources like below and insert into destinations (sinks) below.
    • Commonly used flume sources:
      • Spooling directory - directory in which lot of files are being created, used mostly for collecting and aggregating log data
      • JMS - collect metrics from JMS based systems
      • And lots more
    • Commonly used flume sinks:
      • HDFS
      • HBase
      • Solr
      • ElasticSearch
      • And lots more

No, both tools cannot be used to achieve the same task like for example flume cannot be used with databases and sqoop cannot be used with streaming data sources or flat files.

If you are interested flume also has an alternate which does the same thing called as chukwa.



来源:https://stackoverflow.com/questions/27162380/can-apache-sqoop-and-flume-be-used-interchangeably

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!