Using sys.stdout.write() to create multiple files in NiFi?

僤鯓⒐⒋嵵緔 提交于 2021-02-11 12:02:46

问题


I have a pipeline in NiFi that pulls down some invalid JSON that I need to clean up. The best solution I've concocted is to run a Python script via ExecuteStreamCommand and simultaneously clean/split it up in one fell swoop. However, even though I use sys.stdout.write() in my for loop, only the original JSON comes out in the output stream in NiFi.

Am I misusing sys.stdout.write() or is this possible, but I've just done something wrong? My end goal is for each line of the json to be a new flow file, i.e. file 1 is {"fruit":"apple",..., file 2 is {"fruit":"cherry",..., and so on.

example JSON

{"fruit":"apple", "vegetable":"celery", "location":{"country":"nor\\way", "city":"oslo", }, "color":"blue"}
{"fruit":"cherry", "vegetable":"kale", "location":{"country":"france", "city":"calais", }, "color":"green"}
{"fruit":"peach", "vegetable":"peas", "location":{"country":"united\\kingdom", "city":"london", }, "color":"yellow"}

script

import json
import re
import sys

flow_file = sys.stdin.read()
try:
    load = json.loads(flow_file)
    sys.stdout.write(flow_file)
except:
    flow_file_esc = re.sub(r"[(\\)]", "", flow_file)
    for f in flow_file_esc.splitlines():
        sys.stdout.write(str(f))

回答1:


Can you clean the file first with ReplaceText and then split it with SplitJson, SplitRecord, or ForkRecord?

If you need to combine the two operations and want to script it, you could try ExecuteScript with Jython (since it doesn't look like you're using native CPython libraries), I have some simple examples in my cookbook and my blog.



来源:https://stackoverflow.com/questions/60383529/using-sys-stdout-write-to-create-multiple-files-in-nifi

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!