I m looking for the fastest way to replace multiple (~500) substrings of a big (~1mb) string. Whatever I have tried it seems that String.Replace is the fastest way of doing
I made a variation on Fredou's code that requires less compares as it works on int* instead of char*. It still requires n iterations for a string of n length, it just has to do less comparing. You could have n/2 iterations if the string is neatly aligned by 2 (so the string to replace can only occur at indexes 0, 2, 4, 6, 8, etc) or even n/4 if it's aligned by 4 (you'd use long*). I'm not very good at bit fiddling like this, so someone might be able to find some obvious flaw in my code that could be more efficient. I verified that the result of my variation is the same as that of the simple string.Replace.
Additionally, I expect that some gains could be made in the 500x string.Copy that it does, but haven't looked into that yet.
My results (Fredou II):
IMPLEMENTATION | EXEC MS | GC MS
#1 Simple | 6816 | 0
#2 Simple parallel | 4202 | 0
#3 ParallelSubstring | 27839 | 4
#4 Fredou I | 2103 | 106
#5 Fredou II | 1334 | 91
So about 2/3 of the time (x86, but x64 was about the same).
For this code:
private unsafe struct TwoCharStringChunk
{
public fixed char chars[2];
}
private unsafe static void FredouImplementation_Variation1(string input, int inputLength, string replace, TwoCharStringChunk[] replaceBy)
{
var output = new string[replaceBy.Length];
for (var i = 0; i < replaceBy.Length; ++i)
output[i] = string.Copy(input);
var r = new TwoCharStringChunk();
r.chars[0] = replace[0];
r.chars[1] = replace[1];
_staticToReplace = r;
Parallel.For(0, replaceBy.Length, l => Process_Variation1(output[l], input, inputLength, replaceBy[l]));
}
private static TwoCharStringChunk _staticToReplace ;
private static unsafe void Process_Variation1(string output, string input, int len, TwoCharStringChunk replaceBy)
{
int n = 0;
int m = len - 1;
fixed (char* i = input, o = output, chars = _staticToReplace .chars)
{
var replaceValAsInt = *((int*)chars);
var replaceByValAsInt = *((int*)replaceBy.chars);
while (n < m)
{
var compareInput = *((int*)&i[n]);
if (compareInput == replaceValAsInt)
{
((int*)&o[n])[0] = replaceByValAsInt;
n += 2;
}
else
{
++n;
}
}
}
}
The struct with the fixed buffer is not strictly necessary here and could have been replaced with a simple int field, but expand the char[2] to char[3] and this code can be made to work with three letter strings as well, which wouldn't be possible if it was an int field.
It required some changes to the Program.cs as well, so here's the full gist:
https://gist.github.com/JulianR/7763857
EDIT: I'm not sure why my ParallelSubstring is so slow. I'm running .NET 4 in Release mode, no debugger, in either x86 or x64.