AWS Glue Crawler Not Creating Table

匿名 (未验证) 提交于 2019-12-03 01:33:01

问题:

I have a crawler I created in AWS Glue that does not create a table in the Data Catalog after it successfully completes.

The crawler takes roughly 20 seconds to run and the logs show it successfully completed. CloudWatch log shows:

  • Benchmark: Running Start Crawl for Crawler
  • Benchmark: Classification Complete, writing results to DB
  • Benchmark: Finished writing to Catalog
  • Benchmark: Crawler has finished running and is in ready state

I am at a loss as to why the tables in the data catalog are not being created. AWS Docs are not of much help debugging.

回答1:

check the IAM role associated with the crawler. Most likely you don't have correct permission.

When you create the crawler, if you choose to create an IAM role(the default setting), then it will create a policy for S3 object you specified only. if later you edit the crawler and change the S3 path only. The role associated with the crawler won't have permission to the new S3 path.



回答2:

If you have existing tables in the target database the crawler may associate your new files with the existing table rather than create a new one.

This occurs when there are similarities in the data or a folder structure that the Glue may interpret as partitioning.

Also on occasion I have needed to refresh the table listing of a database to get new ones to show up.



回答3:

You can try excluding some files in the s3 bucket, and those excluded files should appear in the log. I find it helpful in debugging what's happening with the crawler.



回答4:

Here is my sample role JSON that allows glue to access s3 and create a table.

{ "Version": "2012-10-17", "Statement": [     {         "Sid": "VisualEditor0",         "Effect": "Allow",         "Action": [             "ec2:DeleteTags",             "ec2:CreateTags"         ],         "Resource": [             "arn:aws:ec2:*:*:instance/*",             "arn:aws:ec2:*:*:security-group/*",             "arn:aws:ec2:*:*:network-interface/*"         ],         "Condition": {             "ForAllValues:StringEquals": {                 "aws:TagKeys": "aws-glue-service-resource"             }         }     },     {         "Sid": "VisualEditor1",         "Effect": "Allow",         "Action": [             "iam:GetRole",             "cloudwatch:PutMetricData",             "ec2:DeleteNetworkInterface",             "s3:ListBucket",             "s3:GetBucketAcl",             "logs:PutLogEvents",             "ec2:DescribeVpcAttribute",             "glue:*",             "ec2:DescribeSecurityGroups",             "ec2:CreateNetworkInterface",             "s3:GetObject",             "s3:PutObject",             "logs:CreateLogStream",             "s3:ListAllMyBuckets",             "ec2:DescribeNetworkInterfaces",             "logs:AssociateKmsKey",             "ec2:DescribeVpcEndpoints",             "iam:ListRolePolicies",             "s3:DeleteObject",             "ec2:DescribeSubnets",             "iam:GetRolePolicy",             "s3:GetBucketLocation",             "ec2:DescribeRouteTables"         ],         "Resource": "*"     },     {         "Sid": "VisualEditor2",         "Effect": "Allow",         "Action": "s3:CreateBucket",         "Resource": "arn:aws:s3:::aws-glue-*"     },     {         "Sid": "VisualEditor3",         "Effect": "Allow",         "Action": "logs:CreateLogGroup",         "Resource": "*"     } ]

}



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!