apache-drill | 易学教程

Can pyarrow write multiple parquet files to a folder like fastparquet's file_scheme='hive' option?

阅读更多关于 Can pyarrow write multiple parquet files to a folder like fastparquet's file_scheme='hive' option?

问题 I have a multi-million record SQL table that I'm planning to write out to many parquet files in a folder, using the pyarrow library. The data content seems too large to store in a single parquet file. However, I can't seem to find an API or parameter with the pyarrow library that allows me to specify something like: file_scheme="hive" As is supported by the fastparquet python library. Here's my sample code: #!/usr/bin/python import pyodbc import pandas as pd import pyarrow as pa import

Apache Drill 1.2 and Oracle JDBC

阅读更多关于 Apache Drill 1.2 and Oracle JDBC

问题 Using Apache Drill v1.2 and Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit in embedded mode. I'm curious if anyone has had any success connecting Apache Drill to an Oracle DB. I've updated the drill-override.conf with the following configurations (per documents): drill.exec: { cluster-id: "drillbits1", zk.connect: "localhost:2181", drill.exec.sys.store.provider.local.path = "/mypath" } and placed the ojdbc6.jar in \apache-drill-1.2.0\jars\3rdparty . I can successfully

Apache Drill 1.2 and Oracle JDBC

阅读更多关于 Apache Drill 1.2 and Oracle JDBC

Using Apache Drill v1.2 and Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit in embedded mode. I'm curious if anyone has had any success connecting Apache Drill to an Oracle DB. I've updated the drill-override.conf with the following configurations (per documents): drill.exec: { cluster-id: "drillbits1", zk.connect: "localhost:2181", drill.exec.sys.store.provider.local.path = "/mypath" } and placed the ojdbc6.jar in \apache-drill-1.2.0\jars\3rdparty . I can successfully create the storage plug-in: { "type": "jdbc", "driver": "oracle.jdbc.driver.OracleDriver", "url": "jdbc

How to start drillbit locally in distributed mode?

阅读更多关于 How to start drillbit locally in distributed mode?

I downloaded Apache Drill v1.8, edited the conf/drill-override.conf to have the following changes: drill.exec: { cluster-id: "drillbits1", zk.connect: "10.178.23.140:2181,10.178.23.140:2182,10.178.23.140:2183,10.178.23.140:2184" } ..zookeeper cluster is effectively consisted of 4 Zookeeper instances started on the same, one machine, I'm trying to start Drill on. (i.e. I'm only using one machine for Apache Drill and Zookeeper's cluster, the machine's IP is 10.178.23.140 ) So I keep getting this error: Exception in thread "main" org.apache.drill.exec.exception.DrillbitStartupException: Failure

How to write custom storage plugin for apache drill

阅读更多关于 How to write custom storage plugin for apache drill

I have my data in a propriety format, None of the ones supported by Apache drill. Are there any tutorial on how to write my own storage plugin to handle such data. This is something that really should be in the docs but currently is not. The interface isn't too complicated, but it can be a bit much to look at one of the existing plugins and understand everything that is going on. There are 2 major components to writing a storage plugin, exposing information to the query planner and schema management system and then actually implementing the translation from the datasource API to the drill

Apache Drill connection through Java

阅读更多关于 Apache Drill connection through Java

Throughout the Wiki of Apache Drill, I could only see queries running via SqlLine client. Is there any programmatical way to run queries in Drill other than the REST API? Any samples or pointers? Or is it as equivalent as using JDBC driver to run SQL queries? You can use the Drill JDBC driver, which is documented here: http://drill.apache.org/docs/using-the-jdbc-driver/ Note that if you're building your Java program with Maven, you'll need to install the Drill dependencies locally: mvn install:install-file -Dfile=/opt/apache-drill-1.0.0/jars/drill-java-exec-1.0.0-rebuffed.jar -DgroupId=org

Is it possible to read and write Parquet using Java without a dependency on Hadoop and HDFS?

阅读更多关于 Is it possible to read and write Parquet using Java without a dependency on Hadoop and HDFS?

I've been hunting around for a solution to this question. It appears to me that there is no way to embed reading and writing Parquet format in a Java program without pulling in dependencies on HDFS and Hadoop. Is this correct? I want to read and write on a client machine, outside of a Hadoop cluster. I started to get excited about Apache Drill, but it appears that it must run as a separate process. What I need is an in-process ability to read and write a file using the Parquet format. Krishas You can write parquet format out side hadoop cluster using java Parquet Client API. Here is a sample

Fast Hadoop Analytics (Cloudera Impala vs Spark/Shark vs Apache Drill)

阅读更多关于 Fast Hadoop Analytics (Cloudera Impala vs Spark/Shark vs Apache Drill)

问题 I want to do some "near real-time" data analysis (OLAP-like) on the data in a HDFS. My research showed that the three mentioned frameworks report significant performance gains compared to Apache Hive. Does anyone have some practical experience with either one of those? Not only concerning performance, but also with respect of stability? 回答1: Comparison between Hive and Impala or Spark or Drill sometimes sounds inappropriate to me. The goals behind developing Hive and these tools were

Apache Drill: table not found on s3 bucket

阅读更多关于 Apache Drill: table not found on s3 bucket

问题 I'm a newbye with Apache Drill. The scenario is this: I've an S3 bucket, where I place my csv file called test.csv. I've install Apache Drill with instructions from official website. I followed this tutorial: https://drill.apache.org/blog/2014/12/09/running-sql-queries-on-amazon-s3/ for create an S3 plugin. I start Drill, use the correct "workspace" (with: use my-s3;), but when I try to select records from test.cav file an error occured: Table 's3./test.csv' not found. Can anyone help me?

Apache Drill vs Spark

阅读更多关于 Apache Drill vs Spark

问题 I have some expirience with Apache Spark and Spark-SQL. Recently I've found Apache Drill project. Could you describe me what are the most significant advantages/differences between them? I've already read Fast Hadoop Analytics (Cloudera Impala vs Spark/Shark vs Apache Drill) but this topic is still unclear for me. 回答1: Here's an article I came across that discusses some of the SQL technologies: http://www.zdnet.com/article/sql-and-hadoop-its-complicated/ Drill is fundamentally different in