I heard about clustering to group similar data. I want to know how it works in the specific case for String.
I have a table with more than different 100,000 words. <
You can use an algorithm like the Levenshtein distance for the distance calculation and k-means for clustering.
the Levenshtein distance is a string metric for measuring the amount of difference between two sequences
Do some testing and find a similarity threshold per word that will decide your groups.