Why is ExecuteSQLRecord taking a long time to start outputting flow files on large tables?

社会主义新天地 提交于 2019-12-11 14:55:58

问题


I am using the ExecuteSQLRecord processor to dump the contents of a large table (100 GB) with 100+ million records.

I have set up the properties like below. However, what I am noticing is that it takes a good 45 minutes before I see any flow files coming out of this processor?

What am I missing?

I am on NiFi 1.9.1

Thank you.


回答1:


An alternative to ExecuteSQL(Record) or even GenerateTableFetch -> ExecuteSQL(Record) is to use QueryDatabaseTable without a Max-Value Column. It has a Fetch Size property that attempts to set the number of rows returned on each pull from the database. Oracle's default is 10 for example, so with 10000 rows per flow file, ExecuteSQL has to make 1000 trips to the DB, fetching 10 rows at a time. I recommend setting Fetch Size to Max Rows Per Flow File as a general rule, then one fetch is made per outgoing flow file.

The Fetch Size property should be available to the ExecuteSQL processors as well, I wrote up Apache Jira NIFI-6865 to cover this improvement.



来源:https://stackoverflow.com/questions/58824311/why-is-executesqlrecord-taking-a-long-time-to-start-outputting-flow-files-on-lar

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!