Regex replacements inside a StringBuilder

一个人想着一个人 提交于 2019-11-28 09:42:00

The best and most efficient solution for your time is to try the simplest approach first: forget the StringBuilder and just use Regex.Replace. Then find out how slow it is - it may very well be good enough. Don't forget to try the regex in both compiled and non-compiled mode.

If that isn't fast enough, consider using a StringBuilder for any replacements you can express simply, and then use Regex.Replace for the rest. You might also want to consider trying to combine replacements, reducing the number of regexes (and thus intermediate strings) used.

You have 3 options:

  1. Do this in an inefficient way with strings as others have recommended here.

  2. Use the .Matches() call on your Regex object, and emulate the way .Replace() works (see #3).

  3. Adapt the Mono implementation of Regex to build a Regex that accepts StringBuilder (and please share it here!) Almost all of the work is already done for you in Mono, but it will take time to suss out the parts that make it work into their own library. Mono's Regex leverages Novell's 2002 JVM implementation of Regex, oddly enough.

In Mono:

System.Text.RegularExpressions.Regex uses an RxCompiler to instantiate an IMachineFactory in the form of an RxInterpreterFactory, which unsurprisingly makes IMachines as RxInterpreters. Getting those to emit is most of what you need to do, although if you're just looking to learn how it's all structured for efficiency, it's notable much of what you're looking for is in its base class, BaseMachine.

In particular, in BaseMachine is the StringBuilder-based stuff. In the method LTRReplace, it first instantiates a StringBuilder with the initial string, and everything from there on out is purely StringBuilder-based. It's actually very annoying that Regex doesn't have StringBuilder methods hanging out, if we assume the internal Microsoft .Net implementation is similar.

Circling back to suggestion 2, you can mimic LTRReplace's behavior by calling .Matches(), tracking where you are in the original string, and looping:

var matches = regex.Matches(original);
var sb = new StringBuilder(original.Length);
int pos = 0; // position in original string
foreach(var match in matches)
{
    sb.Append(original.Substring(pos, match.Index)); // Append the portion of the original we skipped
    pos = match.Index;

    // Make any operations you like on the match result, like your own custom Replace, or even run another Regex

    pos += match.Value.Length;
}
sb.Append(original.Substring(pos, original.Length - 1));

But, this only saves you some strings - the mod-Mono approach is the only one that really does it right.

Paul Smith

I'm not sure if this helps your scenario or not, but I ran into some memory consumption ceilings with Regex and I needed a simple wildcard replacement extension method on a StringBuilder to push past it. If you need complex Regex matching and/or backreferences, this won't do, but if simple * or ? wildcard replacements (with literal "replace" text) would get the job done for you, then the workaround at the end of my question here should at least give you a boost:

Has anyone implemented a Regex and/or Xml parser around StringBuilders or Streams?

Here's an extension method you could use to accomplish what you want. It takes in a Dictionary where the key is the pattern you're looking for and the value is what you want to replace it with. You still create copies of the incoming string but you only have to deal with this once instead of creating copies for multiple calls to Regex.Replace.

public static StringBuilder BulkReplace(this StringBuilder source, IDictionary<string, string> replacementMap)
{
    if (source.Length == 0 || replacementMap.Count == 0)
    {
        return source;
    }
    string replaced = Regex.Replace(source.ToString(), String.Join("|", replacementMap.Keys.Select(Regex.Escape).ToArray()), m => replacementMap[m.Value], RegexOptions.IgnoreCase);
    return source.Clear().Append(replaced);
}
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!