Filtering nested JSON in AWS Glue

依然范特西╮ 提交于 2020-01-03 06:03:08

问题


We would like to use an AWS-Glue Job to filter JSON messages within an s3 bucket.

Here is some example JSON:

{ "property": {"subproperty1": "A", "subproperty2": "B" }}
{ "property": {"subproperty1": "C", "subproperty2": "D" }}

We want to filter on subproperty1 in ["A", "B"]. This is what we try:

applyFilter1 = Filter.apply(
  frame = datasource0, 
  f = lambda x: x["property.subproperty1"] in ["A", "B"]
)

Output is then written so a new s3 bucket as follows:

datasink2 = glueContext.write_dynamic_frame.from_options(
    frame = applyFilter1, 
    connection_type = "s3", 
    connection_options = {"path": "s3://<my-s3-location>"}, 
    format = "json", 
    transformation_ctx = "datasink2"
)

Unfortunately the resulting file is empty. Any idea? Is filtering nested expressions like this supported in AWS Glue?

来源:https://stackoverflow.com/questions/48687238/filtering-nested-json-in-aws-glue

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!