I\'m looking for a decent lexical scanner generator for C#/.NET -- something that supports Unicode character categories, and generates somewhat readable & efficient code
I agree with @David Robbins, ANTLR is probably your best bet. However, the generated ANTLR code does need a seperate runtime library in order to use the generated code because there are some string parsing and other library commonalities that the generated code relies on. ANTLR generates a lexer AND a parser.
On a side note: ANTLR is great...I wrote a 400+ line grammar to generate over 10k or C# code to efficiently parse a language. This included built in error checking for every possible thing that could go wrong in the parsing of the language. Try to do that by hand, and you'll never keep up with the bugs.
GPLEX seems to support your requirements.
The two solutions that come to mind are ANTLR and Gold. ANTLR has a GUI based grammar designer, and an excellent sample project in C# can be found here.
I just found this
http://www.seclab.tuwien.ac.at/projects/cuplex/lex.htm
It says that it's configurable enough to support unicode ;-).
Herber