Using Roslyn to parse/transform/generate code: am I aiming too high, or too low?

情到浓时终转凉″ 提交于 2019-11-30 00:09:33
svick

If your requirement is parsing C# source code, then I think Roslyn is a good choice. And if you're going to use it for this part, I think it also makes sense to use it for code generations.

Code generation using Roslyn can be quite verbose (especially when compared with CodeDom), but I think that's not going to be a big issue for you.

I think SyntaxRewriter is best suited for making localized changes in code. But you're asking about parsing whole class and generating types based on that, I think for that, querying the syntax tree directly would work best.

For example, the simplest example of generating a read-only interface for all properties in a class could look something like this:

var originalClass =
    compilationUnit.DescendantNodes().OfType<ClassDeclarationSyntax>().Single();
string originalClassName = originalClass.Identifier.ValueText;
var properties =
    originalClass.DescendantNodes().OfType<PropertyDeclarationSyntax>();

var generatedInterface =
    SyntaxFactory.InterfaceDeclaration('I' + originalClassName)
          .AddMembers(
              properties.Select(
                  p =>
                  SyntaxFactory.PropertyDeclaration(p.Type, p.Identifier)
                        .AddAccessorListAccessors(
                            SyntaxFactory.AccessorDeclaration(SyntaxKind.GetAccessorDeclaration)
                                  .WithSemicolonToken(SyntaxFactory.Token(SyntaxKind.SemicolonToken))))
                        .ToArray());

I think Roslyn is a great way to solve this problem. In terms of what part of Roslyn would I use - I would probably use a SyntaxWalker over the original class, and then use the Fluent API to build up new SyntaxNodes for the new types you want to generate. You may be able to re-use some parts of the original tree in the generated code (for example, the argument lists, etc).

A quick example of what this might look like is:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Roslyn.Compilers;
using Roslyn.Compilers.CSharp;
using Roslyn.Services;
using Roslyn.Services.CSharp;

    class Program
    {
        static void Main(string[] args)
        {
            var syntaxTree = SyntaxTree.ParseText(@"
class C
{
    internal void M(string s, int i)
    {
    }
}");


        }
    }


class Walker : SyntaxWalker
{
    private InterfaceDeclarationSyntax @interface = Syntax.InterfaceDeclaration("ISettings");

    private ClassDeclarationSyntax wrapperClass = Syntax.ClassDeclaration("SettingsWrapper")
        .WithBaseList(Syntax.BaseList(
            Syntax.SeparatedList<TypeSyntax>(Syntax.ParseTypeName("ISettings"))));

    private ClassDeclarationSyntax @class = Syntax.ClassDeclaration("SettingsClass")
        .WithBaseList(Syntax.BaseList(
            Syntax.SeparatedList<TypeSyntax>(Syntax.ParseTypeName("ISettings"))));

    public override void VisitMethodDeclaration(MethodDeclarationSyntax node)
    {
        var parameters = node.ParameterList.Parameters.ToArray();
        var typeParameters = node.TypeParameterList.Parameters.ToArray();
        @interface = @interface.AddMembers(
            Syntax.MethodDeclaration(node.ReturnType, node.Identifier.ToString())
                .AddParameterListParameters(parameters)
                .AddTypeParameterListParameters(typeParameters));

        // More code to add members to the classes too.
    }
}

I am doing something very similar, and I am using Roslyn to parse the existing C# code as well. However, I am using T4 templates to generate the new code. T4 templates are designed for text generation, and provide a very nice abstraction so that you can actually specify stuff that LOOKS like code instead of this crazy object tree.

On the question of code generation, my advice is to actually use a combination of inline code snippets (parsed using CSharpSyntaxTree.ParseText) and manually generated SyntaxNodes, but with a strong preference for the former. I have also used T4 in the past but am moving away from them due to general lack of integration & capability.

Advantages/disadvantages of each:

Roslyn ParseText

  • Generates arguably more readable code-generator code.
  • Allows 'text templating' approach e.g. using C# 6 string interpolation.
  • Less verbose.
  • Guarantees valid syntax trees.
  • Can be more performant.
  • Easier to get started.
  • Text can become harder to read than SyntaxNodes if majority is procedural.

Roslyn SyntaxNode building

  • Better for transforming existing syntax trees - no need to start from scratch.
    • But existing trivia can make this confusing/complex.
  • More verbose. Arguably harder to read and build.
    • Syntax trees are often more complex than you imagine
  • SyntaxFactory API provides guidance on valid syntax.
  • Roslyn Quoter helps you transform textual code to factory code.
  • Syntax trees are not necessarily valid.
  • Code is perhaps more robust once written.

T4 templates

  • Good if majority of code to be generated is boiler plate.
  • No proper CI support.
  • No syntax highlighting or intellisense without 3rd party extensions.
  • One to one mapping between input and output files.
    • Not ideal if you are doing more complex generation e.g. entire class hierarchy based on single input.
  • Still probably want to use Roslyn to "reflect" on input types, otherwise you will get into trouble with System.Reflection and file locks etc.
  • Less discoverable API. T4 includes, parameters etc. can be confusing to learn.

Roslyn code-gen tips

  • If you are only parsing snippets of code e.g. method statements, then you will need to use CSharpParseOptions.Default.WithKind(SourceCodeKind.Script) to get the right syntax nodes back.
  • If you are parsing a whole block of code for a method body then you will want to parse it as a GlobalStatementSyntax and then access the Statement property as a BlockSyntax.
  • Use a helper method to parse single SyntaxNodes:

        private static TSyntax ParseText<TSyntax>(string code, bool asScript = false)
        {
            var options = asScript
                ? CSharpParseOptions.Default.WithKind(SourceCodeKind.Script)
                : CSharpParseOptions.Default;
    
            var syntaxNodes =
                CSharpSyntaxTree.ParseText(code, options)
                    .GetRoot()
                    .ChildNodes();
    
            return syntaxNodes.OfType<TSyntax>().First();
        }
    
  • When building SyntaxNodes by hand you will typically want to make a final call to SyntaxTree.NormalizeWhitespace(elasticTrivia: true) to make the code "round-trippable".
  • Typically you will want to use SyntaxNode.ToFullString() to get the actual code text including trivia.
  • Use SyntaxTree.WithFilePath() as a convenient place to store the eventual file name for when you come to write out the code.
  • If your goal is to output source files, the end game is to end up with valid CompilationUnitSyntaxs.
  • Don't forget to pretty-print using Formatter.Format as one of the final steps.
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!