How can I compare the one line in one CSV with all lines in another CSV file?

问题

I have two CSV files:

Identity(no,name,Age) which has 10 rows
Location(Address,no,City) which has 100 rows

I need to extract rows and check the no column in the Identity with Location CSV files.

Get the single row from Identity CSV file and check Identity.no with Location.no having 100 rows in Location CSV file.

If it is matching then combine the name, Age, Address, City in Identity, Location

Note: I need to get 1st row from Identity compare it with 100 rows in Location CSV file and then get the 2nd row compare it with 100 rows. It will be continue up to 10 rows in Identity CSV file.

And overall results convert into Json.Then move the results in to SQL Server.

Is it possible in Apache Nifi?

Any help appreciated.

回答1:

You can do this in NiFi by using the DistributedMapCache feature, which implements a key/value store for lookups. The setup requires a distributed map cache, plus two flows - one to populate the cache with your Address records, and one to look up the address by the no field.

The DistributedMapCache is defined by two controller services, a DistributedMapCacheServer and a DistributeMapCacheClientService. If your data set is small, you can just use "localhost" as the server.
Populating the cache requires reading the Address file, splitting the records, extracting the no key, and putting key/value pairs to the cache. An approximate flow might include GetFile -> SplitText -> ExtractText -> UpdateAttribute -> PutDistributedMapCache.
Looking up your identity records is actually fairly similar to the flow above, in that it requires reading the Identity file, splitting the records, extracting the no key, and then fetching the address record. Processor flow might include GetFile -> SplitText -> ExtractText -> UpdateAttribute -> FetchDistributedMapCache.

You can convert the whole or parts from CSV to JSON with AttributesToJSON, or maybe ExecuteScript.

来源：https://stackoverflow.com/questions/40673060/how-can-i-compare-the-one-line-in-one-csv-with-all-lines-in-another-csv-file

标签

sql-server

csv

apache-nifi