Apache Drill using Google Cloud Storage

前端 未结 2 788
温柔的废话
温柔的废话 2021-01-21 14:15

The Apache Drill features list mentions that it can query data from Google Cloud Storage, but I can\'t find any information on how to do that. I\'ve got it working fine with S3

2条回答
  •  醉酒成梦
    2021-01-21 15:20

    This is quite an old question, so I imagine you either found a solution or moved on with your life, but for anyone looking for a solution without using Dataproc, here's a solution:

    1. Add the JAR file from the GCP connectors to the jars/3rdparty directory.
    2. Add the following to the site-core.xml file in the conf directory (change the upper-case values such as YOUR_PROJECT_ID to your own details):
    
        fs.gs.project.id
        YOUR_PROJECT_ID
        
          Optional. Google Cloud Project ID with access to GCS buckets.
          Required only for list buckets and create bucket operations.
        
      
      
        fs.gs.auth.service.account.private.key.id
        YOUR_PRIVATE_KEY_ID
      
        
            fs.gs.auth.service.account.private.key
            -----BEGIN PRIVATE KEY-----\nYOUR_PRIVATE_KEY\n-----END PRIVATE KEY-----\n
        
      
        fs.gs.auth.service.account.email
        YOUR_SERVICE_ACCOUNT_EMAIL/value>
        
          The email address is associated with the service account used for GCS
          access when fs.gs.auth.service.account.enable is true. Required
          when authentication key specified in the Configuration file (Method 1)
          or a PKCS12 certificate (Method 3) is being used.
        
      
      
        fs.gs.working.dir
        /
        
          The directory relative gs: uris resolve in inside of the default bucket.
        
      
       
        fs.gs.implicit.dir.repair.enable
        true
        
          Whether or not to create objects for the parent directories of objects
          with / in their path e.g. creating gs://bucket/foo/ upon deleting or
          renaming gs://bucket/foo/bar.
        
      
       
        fs.gs.glob.flatlist.enable
        true
        
          Whether or not to prepopulate potential glob matches in a single list
          request to minimize calls to GCS in nested glob cases.
        
      
       
        fs.gs.copy.with.rewrite.enable
        true
        
          Whether or not to perform copy operation using Rewrite requests. Allows
          to copy files between different locations and storage classes.
        
      
    

    Start Apache Drill.

    Add a custom storage to Drill.

    You're good to go.

    The solution is from here, where I detail some more about what we do around data exploration with Apache Drill.

提交回复
热议问题