Connection timeout when reading Netezza from AWS Glue

倾然丶 夕夏残阳落幕 提交于 2021-01-29 19:11:01

问题


I am trying to use AWS Glue for pulling data from my on-premise Netezza database into S3. The code I have written so far (not complete)

df = glueContext.read.format("jdbc")\
    .option("driver", "org.netezza.Driver")\
    .option("url", "jdbc:netezza://NetezzaHost01:5480/Netezza_DB")\
    .option("dbtable", "ADMIN.table1")\
    .option("user", "myUser")\
    .option("password", "myPassword")\
    .load()

print(df.count())

I am using a custom JDBC driver jar since AWS Glue does not support Netezza natively (the driver is provided by IBM) and specifying it while triggering the job as a Dependency.

This code keeps failing with a timeout error:

py4j.protocol.Py4JJavaError: An error occurred while calling o68.load.
: org.netezza.error.NzSQLException: Connection timed out (Connection timed out)

A few things I have tried which did not work: - Use spark instead of glue to read - Use a very small table (<100 rows) as source

I should add that the Netezza database is behind a corporate firewall, but I do not see any options to specify security groups (as you can do with Glue native connections) when using custom drivers.

Any thoughts?


回答1:


1) If you are trying to access the netezza host that is on prem, you first need to validate that you are able to reach netezza from the VPC that you have chosen for your glue job.

2) This poses a problem since the VPC is chosen on the basis of the connection you add to glue, whcih apparantly does not mention netezza as being supported. However you can still enter the netezza url and set it up.The test might not work, however at least you would be able to choose a subnet and sec-group of your choosing. Your sec group should open up the netezza port

3) Im guessing your vpc has direct connect/vpn setup to your office network. As long as your firewall accepts connections from the CIDR range of your subnet that you have added to your glue job, it should work. You might need to ask the team that manages the firewall for netezza, to open up connections from your VPC/subnet ip-range



来源:https://stackoverflow.com/questions/61943210/connection-timeout-when-reading-netezza-from-aws-glue

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!