Implementing an efficent algorithm to find the intersection of two strings

前端未结

关注

 5  1919

Implement an algorithm that takes two strings as input, and returns the intersection of the two, with each letter represented at most once.

相关标签:

5条回答

挽巷

2020-12-18 13:36

"with each letter represented at most once"

I'm assuming that this means you just need to know the intersections, and not how many times they occurred. If that's so then you can trim down your algorithm by making use of yield. Instead of storing the count and continuing to iterate the second string looking for additional matches, you can yield the intersection right there and continue to the next possible match from the first string.

0 讨论(0)
发布评论:

提交评论
- 加载中...
野趣味

2020-12-18 13:44

You don't need to 2 char arrays. The System.String data type has a built-in indexer by position that returns the char from that position, so you could just loop through from 0 to (String.Length - 1). If you're more interested in speed than optimizing storage space, then you could make a HashSet for the one of the strings, then make a second HashSet which will contain your final result. Then you iterate through the second string, testing each char against the first HashSet, and if it exists then add it the second HashSet. By the end, you already have a single HashSet with all the intersections, and save yourself the pass of running through the Hashtable looking for ones with a non-zero value.

EDIT: I entered this before all the comments on the question about not wanting to use any built-in containers at all

0 讨论(0)
发布评论:

提交评论
- 加载中...

故里飘歌

2020-12-18 13:45

Haven't tested this, but here's my thought:

Quicksort both strings in place, so you have an ordered sequence of characters
Keeping an index into both strings, compare the "next" character from each string, pick and output the first one, incrementing the index for that string.
Continue until you get to the end of one of the strings, then just pull unique values from the rest of the remaining string.

Won't use additional memory, only needs the two original strings, two integers, and an output string (or StringBuilder). As an added bonus, the output values will be sorted too!

Part 2: This is what I'd write (sorry about the comments, new to stackoverflow):

private static string intersect(string left, string right)
{
  StringBuilder theResult = new StringBuilder();

  string sortedLeft = Program.sort(left);
  string sortedRight = Program.sort(right);

  int leftIndex = 0;
  int rightIndex = 0;

  //  Work though the string with the "first last character".
  if (sortedLeft[sortedLeft.Length - 1] > sortedRight[sortedRight.Length - 1])
  {
    string temp = sortedLeft;
    sortedLeft = sortedRight;
    sortedRight = temp;
  }

  char lastChar = default(char);
  while (leftIndex < sortedLeft.Length)
  {
    char nextChar = (sortedLeft[leftIndex] <= sortedRight[rightIndex]) ? sortedLeft[leftIndex++] : sortedRight[rightIndex++];

    if (lastChar == nextChar) continue;

    theResult.Append(nextChar);
    lastChar = nextChar;
  }

  //  Add the remaining characters from the "right" string
  while (rightIndex < sortedRight.Length)
  {
    char nextChar = sortedRight[rightIndex++];
    if (lastChar == nextChar) continue;

    theResult.Append(nextChar);
    lastChar = nextChar;
  }
  theResult.Append(sortedRight, rightIndex, sortedRight.Length - rightIndex);

  return (theResult.ToString());
}

I hope that makes more sense.

0 讨论(0)

野的像风

2020-12-18 13:52
here's how I would do this. It's still O(N) and it doesn't use a hash table but instead one int array of length 26. (ideally)
1. make an array of 26 integers, each element for a letter of the alphebet. init to 0's.
2. iterate over the first string, decrementing one when a letter is encountered.
3. iterate over the second string and take the absolute of whatever is at the index corresponding to any letter you encounter. (edit: thanks to scwagner in comments)
4. return all letters corresponding to all indexes holding value greater than 0.
still O(N) and extra space of only 26 ints.

of course if you're not limited to only lower or uppercase characters your array size may need to change.
0 讨论(0)
发布评论:

提交评论
- 加载中...
太阳男子

2020-12-18 13:57
How about this ...
```
var s1 = "aabbccccffffd";
var s2 = "aabc";

var ans = s1.Intersect(s2);
```
0 讨论(0)
发布评论:

提交评论
- 加载中...