There is a package called stringdist that allows for string comparison using several different methods. Copypasting from that page:
- Hamming distance: Number of positions with same symbol in both strings. Only defined for strings of equal length.
- Levenshtein distance: Minimal number of insertions, deletions and replacements needed for transforming string a into string b.
- (Full) Damerau-Levenshtein distance: Like Levenshtein distance, but transposition of adjacent symbols is allowed.
- Optimal String Alignment / restricted Damerau-Levenshtein distance: Like (full) Damerau-Levenshtein distance but each substring may only be edited once.
- Longest Common Substring distance: Minimum number of symbols that have to be removed in both strings until resulting substrings are identical.
- q-gram distance: Sum of absolute differences between N-gram vectors of both strings.
- Cosine distance: 1 minus the cosine similarity of both N-gram vectors.
- Jaccard distance: 1 minues the quotient of shared N-grams and all observed N-grams.
- Jaro distance: The Jaro distance is a formula of 4 values and effectively a special case of the Jaro-Winkler distance with p = 0.
- Jaro-Winkler distance: This distance is a formula of 5 parameters determined by the two compared strings (A,B,m,t,l) and p chosen from [0, 0.25].
That will give you the distance. You might not need to perform a cluster analysis, perhaps sorting by the string distance itself is sufficient. I have created a script to provide the basic functionality here... feel free to improve it as needed.