I have a large csv file with more than 1 million rows. Each row has two features, callsite (the location of an API invocation) and a sequence of tokens to the callsite. They