I need to find all 3-grams shingles in a txt file (sport articles with title and text) in mapreduce way. However, the txt files have the format
This is the ti