C# Tokenizer - keeping the separators [duplicate]

筅森魡賤 提交于 2019-11-30 07:33:34

You can use Regex.Split with zero-width assertions. For example, the following will split on +-*/:

Regex.Split(str, @"(?=[-+*/])|(?<=[-+*/])");

Effectively this says, "split at this point if it is followed by, or preceded by, any of -+*/. The matched string itself will be zero-length, so you won't lose any part of the input string.

This produces your output:

string s = "24+3";
string seps = @"(\t)|(\n)|(\+)|(-)|(\*)|(/)|(\()|(\))";
string[] tokens = System.Text.RegularExpressions.Regex.Split(s, seps);

foreach (string token in tokens)
    Console.WriteLine(token);

If you want a very flexible, powerful, reliable, and expandable solution, you can use the C# port of ANTLR. There is some initial overhead (link is setup information for VS2008) that would likely result in overkill for such a tiny project. Here's a calculator example with support for variables.

Probably overkill for your class, but if you're interested in learning about "real" solutions to this type of real-world problem, have a look-see. I even have a Visual Studio package for working with the grammars, or you can use ANTLRWorks separately.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!