问题
I have been trying to create a table within our data catalog using the python API. Following the documentation posted here and here for the API. I can understand how that goes. Nevertheless, I need to undestand how to declare a field structure when I create the table because when I take a look on the Storage Definition for the table here there is any explanation about how should I define this type of column for my table. In addition. I dont see the classification property for the table where is covered. Maybe on properties? I have used the boto3 documentation for this sample
code:
import boto3
client = boto3.client(service_name='glue', region_name='us-east-1')
response = client.create_table(
DatabaseName='dbname',
TableInput={
'Name': 'tbname',
'Description': 'tb description',
'Owner': 'I'm',
'StorageDescriptor': {
'Columns': [
{ 'Name': 'agents', 'Type': 'struct','Comment': 'from deserializer' },
{ 'Name': 'conference_sid', 'Type': 'string','Comment': 'from deserializer' },
{ 'Name': 'call_sid', 'Type': 'string','Comment': 'from deserializer' }
] ,
'Location': 's3://bucket/location/',
'InputFormat': 'org.apache.hadoop.mapred.TextInputFormat',
'OutputFormat': 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat',
'Compressed': False,
'SerdeInfo': { 'SerializationLibrary': 'org.openx.data.jsonserde.JsonSerDe'}
},
'TableType' : "EXTERNAL_TABLE"} )
回答1:
Found this post because I ran into the same issue and eventually found the solution so you could do as type:
array<struct<id:string,timestamp:bigint,message:string>>
I found this "hint" while using the AWS Console and clicking on a data type of an existing table created via a Crawler. It hints:
An ARRAY of scalar type as a top - level column.
ARRAY <STRING>
An ARRAY with elements of complex type (STRUCT).
ARRAY < STRUCT <
place: STRING,
start_year: INT
>>
An ARRAY as a field (CHILDREN) within a STRUCT. (The STRUCT is inside another ARRAY, because it is rare for a STRUCT to be a top-level column.)
ARRAY < STRUCT <
spouse: STRING,
children: ARRAY <STRING>
>>
A STRUCT as the element type of an ARRAY.
ARRAY < STRUCT <
street: STRING,
city: STRING,
country: STRING
>>
来源:https://stackoverflow.com/questions/52318838/glue-aws-creating-a-data-catalog-table-on-boto3-python