aws-glue | 易学教程

Connection with Oracle cx_Oracle problem with AWS Glue Python Shell

阅读更多关于 Connection with Oracle cx_Oracle problem with AWS Glue Python Shell

问题 I am working on AWS Glue Python Shell. I want to connect python shell with Oracle. I am successful installing psycopg2 and mysql libraries but when I tried to connect Oracle using cx_Oracle, I have successfully installed the library but I am facing the error DatabaseError: DPI-1047: Cannot locate a 64-bit Oracle Client library: "libclntsh.so: cannot open shared object file: No such file or directory" I have tried following things I have downloaded so files from S3 and placed it in lib folder

Connection with Oracle cx_Oracle problem with AWS Glue Python Shell

阅读更多关于 Connection with Oracle cx_Oracle problem with AWS Glue Python Shell

AWS Glue Crawler cannot parse large files (classification UNKNOWN)

阅读更多关于 AWS Glue Crawler cannot parse large files (classification UNKNOWN)

问题 I've been working on trying to use the crawler from AWS Glue to try to obtain the columns and other features of a certain json file. I've parsed the json file locally by converting it to UTF-8 and using boto3 to move it into an s3 container and accessing that container from the crawler. I created a json classifier with the custom classifier $[*] and created a crawler with normal settings. When I do this with a file that is relatively small (<50 Kb) the crawler correctly identifies the columns

Trouble updating IAM to allow AWS Glue to the AWS Secrets Manager

阅读更多关于 Trouble updating IAM to allow AWS Glue to the AWS Secrets Manager

问题 I am working on a project that requires that an AWS Glue Python script access the AWS Secrets Manager. I tried giving Glue permissions to do this via IAM, but I don't see how; I can see the permissions strings showing that Lambda has access but I don't see a way to edit them. I tried creating a new role that had the right permissions but when I went to attach it seemed to have disappeared ... My fallback workaround is to grab the secret via a tiny Lambda and xfer it via S3 to Glue ... but

AWS Glue output file name

阅读更多关于 AWS Glue output file name

问题 I am using AWS to transform some JSON files. I have added the files to Glue from S3. The job I have set up reads the files in ok, the job runs successfully, there is a file added to the correct S3 bucket. The issue I have is that I cant name the file - it is given a random name, it is also not given the .JSON extension. How can I name the file and also add the extension to the output? 回答1: Due to the nature of how Spark works, it's not possible to name the file. However, it's possible to

Calling stored procedure from aws Glue Script

阅读更多关于 Calling stored procedure from aws Glue Script

问题 After the ETL Job is done, What is the best way to call stored procedure in AWS Glue script? I am using PySpark to fetch the data from S3 and storing in staging table. After this process, need to call a stored procedure. This stored procedure loads data from the staging table into the appropriate MDS tables. If I have to call a Stored Procedure after ETL Job is done, what is the best way? If I consider AWS lambda, is there any way that lambda can be notified after the ETL. 回答1: You can use

Calling stored procedure from aws Glue Script

阅读更多关于 Calling stored procedure from aws Glue Script

Why I need to set the transformatioin_ctx parameter when calling transformation and sink operataions for AWS Glue bookmark to work

阅读更多关于 Why I need to set the transformatioin_ctx parameter when calling transformation and sink operataions for AWS Glue bookmark to work

问题 The AWS Glue Bookmark document (https://docs.aws.amazon.com/glue/latest/dg/monitor-continuations.html) seems to suggest one has to pass a transformation_ctx parameter to source, transform and sink operation for the bookmark to work. This is relfected in the sample code in that page, where invocation of all of create_dynamic_frame.from_catalog(), ApplyMapping.apply() and write_dynamic_frame.from_options() are passed with a transformation_ctx value. I can understand the point to pass such a

AWS Glue crawler need to create one table from many files with identical schemas

阅读更多关于 AWS Glue crawler need to create one table from many files with identical schemas

问题 We have a very large number of folders and files in S3, all under one particular folder, and we want to crawl for all the CSV files, and then query them from one table in Athena. The CSV files all have the same schema. The problem is that the crawler is generating a table for every file, instead of one table. Crawler configurations have a checkbox option to "Create a single schema for each S3 path" but this doesn't seem to do anything. Is what I need possible? Thanks. 回答1: Glue crawlers

AWS Glue crawler need to create one table from many files with identical schemas

阅读更多关于 AWS Glue crawler need to create one table from many files with identical schemas