How to list all databases and tables in AWS Glue Catalog?

不打扰是莪最后的温柔 提交于 2019-12-07 06:45:03

问题


I created a Development Endpoint in the AWS Glue console and now I have access to SparkContext and SQLContext in gluepyspark console.

How can I access the catalog and list all databases and tables? The usual sqlContext.sql("show tables").show() does not work.

What might help is the CatalogConnection Class but I have no idea in which package it is. I tried importing from awsglue.context and no success.


回答1:


I spend several hours trying to find some info about CatalogConnection class but haven't found anything. (Even in the aws-glue-lib repository https://github.com/awslabs/aws-glue-libs)

In my case I needed table names in Glue Job Script console

Finally I used boto library and retrieved database and table names with Glue client:

import boto3


client = boto3.client('glue',region_name='us-east-1')

responseGetDatabases = client.get_databases()

databaseList = responseGetDatabases['DatabaseList']

for databaseDict in databaseList:

    databaseName = databaseDict['Name']
    print '\ndatabaseName: ' + databaseName

    responseGetTables = client.get_tables( DatabaseName = databaseName )
    tableList = responseGetTables['TableList']

    for tableDict in tableList:

         tableName = tableDict['Name']
         print '\n-- tableName: '+tableName

Important thing is to setup the region properly

Reference: get_databases - http://boto3.readthedocs.io/en/latest/reference/services/glue.html#Glue.Client.get_databases

get_tables - http://boto3.readthedocs.io/en/latest/reference/services/glue.html#Glue.Client.get_tables




回答2:


Glue returns back one page per response. If you have more than 100 tables, make sure you use NextToken to retrieve all tables.

def get_glue_tables(database=None):
    next_token = ""

    while True:
        response = glue_client.get_tables(
            DatabaseName=database,
            NextToken=next_token
        )

        for table in response.get('TableList'):
            print(table.get('Name'))

        next_token = response.get('NextToken')

        if next_token is None:
            break


来源:https://stackoverflow.com/questions/46080504/how-to-list-all-databases-and-tables-in-aws-glue-catalog

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!