I\'m building a backend and trying to crunch the following problem.
2000
characters on average)
The "Patricia tree" is a good solution for this kind of problem. It's sort of a radix tree with the radix being the character choices involved. So to find if "the dog" is in the tree, you start at the root, tag the "t" branch, then the "h" branch, and so on. Except Patricia trees do this really fast.
So you spin your text through, and you can get all tree locations (=phrases) that hits. This will even get you overlapping matches if you want.
The main article about them is Donald R. Morrison, PATRICIA - Practical Algorithm to Retrieve Information Coded in Alphanumeric, Journal of the ACM, 15(4):514-534, October 1968. There's some discussion at https://xlinux.nist.gov/dads/HTML/patriciatree.html There are several implementations on github, though I don't know which are good.