问题
I am using the ExecuteSQLRecord processor to dump the contents of a large table (100 GB) with 100+ million records.
I have set up the properties like below. However, what I am noticing is that it takes a good 45 minutes before I see any flow files coming out of this processor?
What am I missing?
I am on NiFi 1.9.1
Thank you.
回答1:
An alternative to ExecuteSQL(Record) or even GenerateTableFetch -> ExecuteSQL(Record) is to use QueryDatabaseTable without a Max-Value Column. It has a Fetch Size property that attempts to set the number of rows returned on each pull from the database. Oracle's default is 10 for example, so with 10000 rows per flow file, ExecuteSQL has to make 1000 trips to the DB, fetching 10 rows at a time. I recommend setting Fetch Size to Max Rows Per Flow File as a general rule, then one fetch is made per outgoing flow file.
The Fetch Size property should be available to the ExecuteSQL processors as well, I wrote up Apache Jira NIFI-6865 to cover this improvement.
来源:https://stackoverflow.com/questions/58824311/why-is-executesqlrecord-taking-a-long-time-to-start-outputting-flow-files-on-lar