Implementing an efficent algorithm to find the intersection of two strings

前端 未结 5 1919
挽巷
挽巷 2020-12-18 13:29

Implement an algorithm that takes two strings as input, and returns the intersection of the two, with each letter represented at most once.

相关标签:
5条回答
  • 2020-12-18 13:36

    "with each letter represented at most once"

    I'm assuming that this means you just need to know the intersections, and not how many times they occurred. If that's so then you can trim down your algorithm by making use of yield. Instead of storing the count and continuing to iterate the second string looking for additional matches, you can yield the intersection right there and continue to the next possible match from the first string.

    0 讨论(0)
  • 2020-12-18 13:44

    You don't need to 2 char arrays. The System.String data type has a built-in indexer by position that returns the char from that position, so you could just loop through from 0 to (String.Length - 1). If you're more interested in speed than optimizing storage space, then you could make a HashSet for the one of the strings, then make a second HashSet which will contain your final result. Then you iterate through the second string, testing each char against the first HashSet, and if it exists then add it the second HashSet. By the end, you already have a single HashSet with all the intersections, and save yourself the pass of running through the Hashtable looking for ones with a non-zero value.

    EDIT: I entered this before all the comments on the question about not wanting to use any built-in containers at all

    0 讨论(0)
  • 2020-12-18 13:45

    Haven't tested this, but here's my thought:

    1. Quicksort both strings in place, so you have an ordered sequence of characters
    2. Keeping an index into both strings, compare the "next" character from each string, pick and output the first one, incrementing the index for that string.
    3. Continue until you get to the end of one of the strings, then just pull unique values from the rest of the remaining string.

    Won't use additional memory, only needs the two original strings, two integers, and an output string (or StringBuilder). As an added bonus, the output values will be sorted too!

    Part 2: This is what I'd write (sorry about the comments, new to stackoverflow):

    private static string intersect(string left, string right)
    {
      StringBuilder theResult = new StringBuilder();
    
      string sortedLeft = Program.sort(left);
      string sortedRight = Program.sort(right);
    
      int leftIndex = 0;
      int rightIndex = 0;
    
      //  Work though the string with the "first last character".
      if (sortedLeft[sortedLeft.Length - 1] > sortedRight[sortedRight.Length - 1])
      {
        string temp = sortedLeft;
        sortedLeft = sortedRight;
        sortedRight = temp;
      }
    
      char lastChar = default(char);
      while (leftIndex < sortedLeft.Length)
      {
        char nextChar = (sortedLeft[leftIndex] <= sortedRight[rightIndex]) ? sortedLeft[leftIndex++] : sortedRight[rightIndex++];
    
        if (lastChar == nextChar) continue;
    
        theResult.Append(nextChar);
        lastChar = nextChar;
      }
    
      //  Add the remaining characters from the "right" string
      while (rightIndex < sortedRight.Length)
      {
        char nextChar = sortedRight[rightIndex++];
        if (lastChar == nextChar) continue;
    
        theResult.Append(nextChar);
        lastChar = nextChar;
      }
      theResult.Append(sortedRight, rightIndex, sortedRight.Length - rightIndex);
    
      return (theResult.ToString());
    }
    

    I hope that makes more sense.

    0 讨论(0)
  • 2020-12-18 13:52

    here's how I would do this. It's still O(N) and it doesn't use a hash table but instead one int array of length 26. (ideally)

    1. make an array of 26 integers, each element for a letter of the alphebet. init to 0's.
    2. iterate over the first string, decrementing one when a letter is encountered.
    3. iterate over the second string and take the absolute of whatever is at the index corresponding to any letter you encounter. (edit: thanks to scwagner in comments)
    4. return all letters corresponding to all indexes holding value greater than 0.

    still O(N) and extra space of only 26 ints.

    of course if you're not limited to only lower or uppercase characters your array size may need to change.

    0 讨论(0)
  • 2020-12-18 13:57

    How about this ...

    var s1 = "aabbccccffffd";
    var s2 = "aabc";
    
    var ans = s1.Intersect(s2);
    
    0 讨论(0)
提交回复
热议问题