Apache Drill vs Spark

前端 未结 3 1091
有刺的猬
有刺的猬 2021-02-05 14:30

I have some expirience with Apache Spark and Spark-SQL. Recently I\'ve found Apache Drill project. Could you describe me what are the most significant advantages/differences bet

3条回答
  •  天命终不由人
    2021-02-05 14:55

    Drill provides the ability for you to query different kinds of datasets with ANSI SQL. This makes it great for adhoc data exploration, and connecting BI tools to datasets via ODBC. You can even use Drill to SQL JOIN different kinds of datasets. For example, you could join records in a MySQL table with rows in a JSON file, or a CSV file, or OpenTSDB, or MapR-DB... the list goes on. Drill can connect to lots of different types of data.

    When I think to use Spark, I'm typically wanting to use it for RDDs (resilient distributed dataset). RDDs make it easy to process a lot of data, quickly. Spark also has a bunch of libraries for ML and streaming. Drill doesn't process data at all. It just gets you access to said data. You could use Drill to pull data into Spark, or Tensorflow, or PySpark, or Tableau, etc.

提交回复
热议问题