Find a common string within a list of strings

点点圈 提交于 2019-11-29 13:43:12

This works better than my first approach(striked out).

You can use following extension to get all substrings of the shortest string in the list(for efficiency):

public static IEnumerable<string> getAllSubstrings(this string word)
{
    return from charIndex1 in Enumerable.Range(0, word.Length)
           from charIndex2 in Enumerable.Range(0, word.Length - charIndex1 + 1)
           where charIndex2 > 0
           select word.Substring(charIndex1, charIndex2);
}
  • now order these substrings by Length(longest first)
  • look if all other strings(excluding the string itself because that test is redundant) contain that substring (Enumerable.All returns immediately if one string doesn't contain a given substring)
  • if one string appears in all others you have found the longest common substring
  • otherwise repeat that until you've checked all substrings(if no common string was found)

string shortest = list.OrderBy(s => s.Length).First();
IEnumerable<string> shortestSubstrings = shortest
    .getAllSubstrings()
    .OrderByDescending(s => s.Length);
var other = list.Where(s => s != shortest).ToArray();
string longestCommonIntersection = string.Empty;
foreach (string subStr in shortestSubstrings)
{
    bool allContains = other.All(s => s.Contains(subStr));
    if (allContains)
    {
        longestCommonIntersection = subStr;
        break;
    }
}

DEMO

Find the shortest entry in the list.

  • Today
  • Monday
  • Tuesday
  • Wednesday

So we use "Today".

Build a list of strings of consecutive characters in "Today" of the length of the string down to each character, in "longest first" order.

"Today",

"Toda", "oday",

"Tod", "oda", "day",

"To", "od", "da", "ay",

"t", "o", "d", "a", "y"

Enumerate over this list, finding the first entry for which all the other strings contain that entry.

        List<string> words = new List<string> { "Today", "Monday", "Tuesday", "Wednesday" };

        // Select shortest word in the list
        string shortestWord = (from word in words
                            orderby word.Length
                            select word).First();

        int shortWordLength = shortestWord.Length;

        // Build up the list of consecutive character strings, in length order.
        List<string> parts = new List<string>();
        for (int partLength = shortWordLength; partLength > 0; partLength--)
        {
            for (int partStartIndex = 0; partStartIndex <= shortWordLength - partLength; partStartIndex++)
            {
                parts.Add(shortestWord.Substring(partStartIndex, partLength));
            }
        }
        // Find the first part which is in all the words.
        string longestSubString = (from part in parts where words.All(s => s.Contains(part)) select part).FirstOrDefault();

       // longestSubString is the longest part of all the words, or null if no matches are found.

EDIT

Thinking a little more about it, you can optimise a little.

You don't need to build up a list of parts - just test each part as it is generated. Also, by sorting the word list in length order, you always test against the shortest strings first to reject candidate parts more quickly.

        string longestSubString = null;

        List<string> words = new List<string> { "Todays", "Monday", "Tuesday" };

        // Sort word list by length
        List<string> wordsInLengthOrder = (from word in words
                                           orderby word.Length
                                           select word).ToList();

        string shortestWord = wordsInLengthOrder[0];
        int shortWordLength = shortestWord.Length;

        // Work through the consecutive character strings, in length order.
        for (int partLength = shortWordLength; (partLength > 0) && (longestSubString == null); partLength--)
        {
            for (int partStartIndex = 0; partStartIndex <= shortWordLength - partLength; partStartIndex++)
            {
                string part = shortestWord.Substring(partStartIndex, partLength);

                // Test if all the words in the sorted list contain the part.
                if (wordsInLengthOrder.All(s => s.Contains(part)))
                {
                    longestSubString = part;
                    break;
                }
            }

        }

        Console.WriteLine(longestSubString);
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!