问题
I have two CSV files:
Identity(no,name,Age)
which has 10 rowsLocation(Address,no,City)
which has 100 rows
I need to extract rows and check the no
column in the Identity
with Location
CSV files.
Get the single row from Identity
CSV file and check Identity.no
with Location.no
having 100 rows in Location
CSV file.
If it is matching then combine the name, Age, Address, City
in Identity, Location
Note: I need to get 1st row from Identity
compare it with 100 rows in Location
CSV file and then get the 2nd row compare it with 100 rows. It will be continue up to 10 rows in Identity
CSV file.
And overall results convert into Json.Then move the results in to SQL Server.
Is it possible in Apache Nifi?
Any help appreciated.
回答1:
You can do this in NiFi by using the DistributedMapCache feature, which implements a key/value store for lookups. The setup requires a distributed map cache, plus two flows - one to populate the cache with your Address records, and one to look up the address by the no
field.
The DistributedMapCache is defined by two controller services, a DistributedMapCacheServer and a DistributeMapCacheClientService. If your data set is small, you can just use "localhost" as the server.
Populating the cache requires reading the Address file, splitting the records, extracting the
no
key, and putting key/value pairs to the cache. An approximate flow might include GetFile -> SplitText -> ExtractText -> UpdateAttribute -> PutDistributedMapCache.Looking up your identity records is actually fairly similar to the flow above, in that it requires reading the Identity file, splitting the records, extracting the
no
key, and then fetching the address record. Processor flow might include GetFile -> SplitText -> ExtractText -> UpdateAttribute -> FetchDistributedMapCache.
You can convert the whole or parts from CSV to JSON with AttributesToJSON, or maybe ExecuteScript.
来源:https://stackoverflow.com/questions/40673060/how-can-i-compare-the-one-line-in-one-csv-with-all-lines-in-another-csv-file