I have 2 strings
string a = \"foo bar\";
string b = \"bar foo\";
and I want to detect the changes from a
to b
. <
I'll go out on a limb here and provide an algorithm that's not the most efficient, but is easy to reason about.
Let's cover some ground first:
1) Order matters
string before = "bar foo"
string after = "foo bar"
Even though "bar" and "foo" occur in both strings, "bar" will need to be removed and added again later. This also tells us it's the after
string that gives us the order of chars we're interested in, we want "foo" first.
2) Order over count
Another way to look at it, is that some chars may never get their turn.
string before = "abracadabra"
string after = "bar bar"
Only the bold chars of "bar bar", get their say in "abracadabra". Even though we've got two b's in both strings, only the first one counts. By the time we get to the second b in "bar bar" the second b in "abracadabra" has already been passed, when we were looking for the first occurrence of 'r'.
3) Barriers
Barriers are the chars that exist in both strings, taking order and count into consideration. This already suggests a set might not be the most appropriate data structure, as we would lose count.
For an input
string before = "pinata"
string after = "accidental"
We get (pseudocode)
var barriers = { 'a', 't', 'a' }
"pinata"
"accidental"
Let's follow the execution flow:
after
so everything prepending the first 'a' in before
can be removed. "pinata" -> "ata"after
string, so we can insert everything in between. "ata" -> "accidenta"after
, so there will be some post processing. "accidenta" -> "accidental"Note 'i' and 'n' don't get to play, again, order over count.
We've established that order and count matter, a Queue
comes to mind.
static public List CalculateDifferences(string before, string after)
{
List result = new List();
Queue barriers = new Queue();
#region Preprocessing
int index = 0;
for (int i = 0; i < after.Length; i++)
{
// Look for the first match starting at index
int match = before.IndexOf(after[i], index);
if (match != -1)
{
barriers.Enqueue(after[i]);
index = match + 1;
}
}
#endregion
#region Queue Processing
index = 0;
while (barriers.Any())
{
char barrier = barriers.Dequeue();
// Get the offset to the barrier in both strings,
// ignoring the part that's already been handled
int offsetBefore = before.IndexOf(barrier, index) - index;
int offsetAfter = after.IndexOf(barrier, index) - index;
// Remove prefix from 'before' string
if (offsetBefore > 0)
{
RemoveChars(before.Substring(index, offsetBefore), result);
before = before.Substring(offsetBefore);
}
// Insert prefix from 'after' string
if (offsetAfter > 0)
{
string substring = after.Substring(index, offsetAfter);
AddChars(substring, result);
before = before.Insert(index, substring);
index += substring.Length;
}
// Jump over the barrier
KeepChar(barrier, result);
index++;
}
#endregion
#region Post Queue processing
if (index < before.Length)
{
RemoveChars(before.Substring(index), result);
}
if (index < after.Length)
{
AddChars(after.Substring(index), result);
}
#endregion
return result;
}
static private void KeepChar(char barrier, List result)
{
result.Add(new Difference()
{
c = barrier,
op = Operation.Equal
});
}
static private void AddChars(string substring, List result)
{
result.AddRange(substring.Select(x => new Difference()
{
c = x,
op = Operation.Add
}));
}
static private void RemoveChars(string substring, List result)
{
result.AddRange(substring.Select(x => new Difference()
{
c = x,
op = Operation.Remove
}));
}