string str1 = \"12345ABC...\\\\...ABC100000\";
// Hypothetically huge string of 100000 + Unicode Chars
str1 = str1.Replace(\"1\", string.Empty);
str1 = str1.Rep
Since you have multiple replaces on one string, I wolud recomend you to use RegEx over StringBuilder.
if you want a built in class in dotnet i think StringBuilder is the best. to make it manully you can use unsafe code with char* and iterate through your string and replace based on your criteria
Here is my benchmark:
using System;
using System.Diagnostics;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
internal static class MeasureTime
{
internal static TimeSpan Run(Action func, uint count = 1)
{
if (count <= 0)
{
throw new ArgumentOutOfRangeException("count", "Must be greater than zero");
}
long[] arr_time = new long[count];
Stopwatch sw = new Stopwatch();
for (uint i = 0; i < count; i++)
{
sw.Start();
func();
sw.Stop();
arr_time[i] = sw.ElapsedTicks;
sw.Reset();
}
return new TimeSpan(count == 1 ? arr_time.Sum() : Convert.ToInt64(Math.Round(arr_time.Sum() / (double)count)));
}
}
public class Program
{
public static string RandomString(int length)
{
Random random = new Random();
const string chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
return new String(Enumerable.Range(1, length).Select(_ => chars[random.Next(chars.Length)]).ToArray());
}
public static void Main()
{
string rnd_str = RandomString(500000);
Regex regex = new Regex("a|c|e|g|i|k", RegexOptions.Compiled);
TimeSpan ts1 = MeasureTime.Run(() => regex.Replace(rnd_str, "!!!"), 10);
Console.WriteLine("Regex time: {0:hh\\:mm\\:ss\\:fff}", ts1);
StringBuilder sb_str = new StringBuilder(rnd_str);
TimeSpan ts2 = MeasureTime.Run(() => sb_str.Replace("a", "").Replace("c", "").Replace("e", "").Replace("g", "").Replace("i", "").Replace("k", ""), 10);
Console.WriteLine("StringBuilder time: {0:hh\\:mm\\:ss\\:fff}", ts2);
TimeSpan ts3 = MeasureTime.Run(() => rnd_str.Replace("a", "").Replace("c", "").Replace("e", "").Replace("g", "").Replace("i", "").Replace("k", ""), 10);
Console.WriteLine("String time: {0:hh\\:mm\\:ss\\:fff}", ts3);
char[] ch_arr = {'a', 'c', 'e', 'g', 'i', 'k'};
TimeSpan ts4 = MeasureTime.Run(() => new String((from c in rnd_str where !ch_arr.Contains(c) select c).ToArray()), 10);
Console.WriteLine("LINQ time: {0:hh\\:mm\\:ss\\:fff}", ts4);
}
}
Regex time: 00:00:00:008
StringBuilder time: 00:00:00:015
String time: 00:00:00:005
LINQ can't process rnd_str (Fatal Error: Memory usage limit was exceeded)
String.Replace is fastest
1. What is the fastest way of replacing these values, ignoring memory concerns?
The fastest way is to build a custom component that's specific to your use case. As of .NET 4.6, There's no class in the BCL designed for multiple string replacements.
If you NEED something fast out of the BCL, StringBuilder is the fastest BCL component for simple string replacement. The source code can be found here: It's pretty efficient for replacing a single string. Only use Regex if you really need the pattern-matching power of regular expressions. It's slower and a little more cumbersome, even when compiled.
2. What is the most memory efficient way of achieving the same result?
The most memory-efficient way is to perform a filtered stream copy from the source to the destination (explained below). Memory consumption will be limited to your buffer, however this will be more CPU intensive; as a rule of thumb, you're going to trade CPU performance for memory consumption.
Technical Details
String replacements are tricky. Even when performing a string replacement in a mutable memory space (such as with StringBuilder), it's expensive. If the replacement string is a different length than original string, you're going to be relocating every character following the replacement string to keep the whole string contiguous. This results in a LOT of memory writes, and even in the case of StringBuilder, causes you to rewrite most of the string in-memory on every call to Replace.
So what is the fastest way to do string replacements? Write the new string using a single-pass: Don't let your code go back and have to re-write anything. Writes are more expensive than reads. You're going to have to code this yourself for best results.
High-Memory Solution
The class I've written generates strings based on templates. I place tokens ($ReplaceMe$) in a template which marks places where I want to insert a string later. I use it in cases where XmlWriter is too onerous for XML that's largely static and repetitive, and I need to produce large XML (or JSON) data streams.
The class works by slicing the template up into parts and places each part into a numbered dictionary. Parameters are also enumerated. The order in which the parts and parameters are inserted into a new string are placed into an integer array. When a new string is generated, the parts and parameters are picked from the dictionary and used to create a new string.
It's neither fully-optimized nor is it bulletproof, but it works great for generating very large data streams from templates.
Low-Memory Solution
You'll need to read small chunks from the source string into a buffer, search the buffer using an optimized search algorithm, and then write the new string to the destination stream / string. There are a lot of potential caveats here, but it would be memory efficient and a better solution for source data that's dynamic and can't be cached, such as whole-page translations or source data that's too large to reasonably cache. I don't have a sample solution for this handy.
Sample Code
Desired Results
<DataTable source='Users'>
<Rows>
<Row id='25' name='Administrator' />
<Row id='29' name='Robert' />
<Row id='55' name='Amanda' />
</Rows>
</DataTable>
Template
<DataTable source='$TableName$'>
<Rows>
<Row id='$0$' name='$1$'/>
</Rows>
</DataTable>
Test Case
class Program
{
static string[,] _users =
{
{ "25", "Administrator" },
{ "29", "Robert" },
{ "55", "Amanda" },
};
static StringTemplate _documentTemplate = new StringTemplate(@"<DataTable source='$TableName$'><Rows>$Rows$</Rows></DataTable>");
static StringTemplate _rowTemplate = new StringTemplate(@"<Row id='$0$' name='$1$' />");
static void Main(string[] args)
{
_documentTemplate.SetParameter("TableName", "Users");
_documentTemplate.SetParameter("Rows", GenerateRows);
Console.WriteLine(_documentTemplate.GenerateString(4096));
Console.ReadLine();
}
private static void GenerateRows(StreamWriter writer)
{
for (int i = 0; i <= _users.GetUpperBound(0); i++)
_rowTemplate.GenerateString(writer, _users[i, 0], _users[i, 1]);
}
}
StringTemplate Source
public class StringTemplate
{
private string _template;
private string[] _parts;
private int[] _tokens;
private string[] _parameters;
private Dictionary<string, int> _parameterIndices;
private string[] _replaceGraph;
private Action<StreamWriter>[] _callbackGraph;
private bool[] _graphTypeIsReplace;
public string[] Parameters
{
get { return _parameters; }
}
public StringTemplate(string template)
{
_template = template;
Prepare();
}
public void SetParameter(string name, string replacement)
{
int index = _parameterIndices[name] + _parts.Length;
_replaceGraph[index] = replacement;
_graphTypeIsReplace[index] = true;
}
public void SetParameter(string name, Action<StreamWriter> callback)
{
int index = _parameterIndices[name] + _parts.Length;
_callbackGraph[index] = callback;
_graphTypeIsReplace[index] = false;
}
private static Regex _parser = new Regex(@"\$(\w{1,64})\$", RegexOptions.Compiled);
private void Prepare()
{
_parameterIndices = new Dictionary<string, int>(64);
List<string> parts = new List<string>(64);
List<object> tokens = new List<object>(64);
int param_index = 0;
int part_start = 0;
foreach (Match match in _parser.Matches(_template))
{
if (match.Index > part_start)
{
//Add Part
tokens.Add(parts.Count);
parts.Add(_template.Substring(part_start, match.Index - part_start));
}
//Add Parameter
var param = _template.Substring(match.Index + 1, match.Length - 2);
if (!_parameterIndices.TryGetValue(param, out param_index))
_parameterIndices[param] = param_index = _parameterIndices.Count;
tokens.Add(param);
part_start = match.Index + match.Length;
}
//Add last part, if it exists.
if (part_start < _template.Length)
{
tokens.Add(parts.Count);
parts.Add(_template.Substring(part_start, _template.Length - part_start));
}
//Set State
_parts = parts.ToArray();
_tokens = new int[tokens.Count];
int index = 0;
foreach (var token in tokens)
{
var parameter = token as string;
if (parameter == null)
_tokens[index++] = (int)token;
else
_tokens[index++] = _parameterIndices[parameter] + _parts.Length;
}
_parameters = _parameterIndices.Keys.ToArray();
int graphlen = _parts.Length + _parameters.Length;
_callbackGraph = new Action<StreamWriter>[graphlen];
_replaceGraph = new string[graphlen];
_graphTypeIsReplace = new bool[graphlen];
for (int i = 0; i < _parts.Length; i++)
{
_graphTypeIsReplace[i] = true;
_replaceGraph[i] = _parts[i];
}
}
public void GenerateString(Stream output)
{
var writer = new StreamWriter(output);
GenerateString(writer);
writer.Flush();
}
public void GenerateString(StreamWriter writer)
{
//Resolve graph
foreach(var token in _tokens)
{
if (_graphTypeIsReplace[token])
writer.Write(_replaceGraph[token]);
else
_callbackGraph[token](writer);
}
}
public void SetReplacements(params string[] parameters)
{
int index;
for (int i = 0; i < _parameters.Length; i++)
{
if (!Int32.TryParse(_parameters[i], out index))
continue;
else
SetParameter(index.ToString(), parameters[i]);
}
}
public string GenerateString(int bufferSize = 1024)
{
using (var ms = new MemoryStream(bufferSize))
{
GenerateString(ms);
ms.Position = 0;
using (var reader = new StreamReader(ms))
return reader.ReadToEnd();
}
}
public string GenerateString(params string[] parameters)
{
SetReplacements(parameters);
return GenerateString();
}
public void GenerateString(StreamWriter writer, params string[] parameters)
{
SetReplacements(parameters);
GenerateString(writer);
}
}
All characters in a .NET string are "unicode chars". Do you mean they're non-ascii? That shouldn't make any odds - unless you run into composition issues, e.g. an "e + acute accent" not being replaced when you try to replace an "e acute".
You could try using a regular expression with Regex.Replace, or StringBuilder.Replace. Here's sample code doing the same thing with both:
using System;
using System.Text;
using System.Text.RegularExpressions;
class Test
{
static void Main(string[] args)
{
string original = "abcdefghijkl";
Regex regex = new Regex("a|c|e|g|i|k", RegexOptions.Compiled);
string removedByRegex = regex.Replace(original, "");
string removedByStringBuilder = new StringBuilder(original)
.Replace("a", "")
.Replace("c", "")
.Replace("e", "")
.Replace("g", "")
.Replace("i", "")
.Replace("k", "")
.ToString();
Console.WriteLine(removedByRegex);
Console.WriteLine(removedByStringBuilder);
}
}
I wouldn't like to guess which is more efficient - you'd have to benchmark with your specific application. The regex way may be able to do it all in one pass, but that pass will be relatively CPU-intensive compared with each of the many replaces in StringBuilder.
Here's a quick benchmark...
Stopwatch s = new Stopwatch();
s.Start();
string replace = source;
replace = replace.Replace("$TS$", tsValue);
replace = replace.Replace("$DOC$", docValue);
s.Stop();
Console.WriteLine("String.Replace:\t\t" + s.ElapsedMilliseconds);
s.Reset();
s.Start();
StringBuilder sb = new StringBuilder(source);
sb = sb.Replace("$TS$", tsValue);
sb = sb.Replace("$DOC$", docValue);
string output = sb.ToString();
s.Stop();
Console.WriteLine("StringBuilder.Replace:\t\t" + s.ElapsedMilliseconds);
I didn't see much difference on my machine (string.replace was 85ms and stringbuilder.replace was 80), and that was against about 8MB of text in "source"...