How to use Roslyn C# scripting in batch processing with several scripts?

问题

I am writing multi-threaded solution that will be used for transferring data from different sources to a central database. Solution, in general, has two parts:

Single-threaded Import engine
Multi-threaded client that invokes Import engine in threads.

In order to minimize custom development I am using Roslyn scripting. This feature is enabled with Nuget Package manager in Import engine project. Every import is defined as transformation of input table – that has collection of input fields – to destination table – again with collection of destination fields.

Scripting engine is used here to allow custom transformation between input and output. For every input/output pair there is text field with custom script. Here is simplified code used for script initialization:

//Instance of class passed to script engine
_ScriptHost = new ScriptHost_Import();

if (Script != "") //Here we have script fetched from DB as text
{
  try
  {
    //We are creating script object …
    ScriptObject = CSharpScript.Create<string>(Script, globalsType: typeof(ScriptHost_Import));
    //… and we are compiling it upfront to save time since this might be invoked multiple times.
    ScriptObject.Compile();
    IsScriptCompiled = true;
  }
  catch
  {
    IsScriptCompiled = false;
  }
}

Later we will invoke this script with:

async Task<string> RunScript()
{
    return (await ScriptObject.RunAsync(_ScriptHost)).ReturnValue.ToString();
}

So, after import definition initialization, where we might have any number of input/output pair description along with script object, memory foot print increases approximately 50 MB per pair where scripting is defined. Similar usage pattern is applied to validation of destination rows before storing it to a DB (every field might have several scripts that are used to check validity of data).

All in all, typical memory footprint with modest transformation/validation scripting is 200 MB per thread. If we need to invoke several threads, memory usage will be very high and 99% will be used for scripting. If Import engine is enclosed in WCF based middle layer (which I did) quickly we stumble upon "Insufficient memory" problem.

Obvious solution would be to have one scripting instance that would somehow dispatch code execution to specific function inside the script depending on the need (input/output transformation, validation or something else). I.e. instead of script text for every field we will have SCRIPT_ID that will be passed as global parameter to script engine. Somewhere in script we need to switch to specific portion of code that would execute and return appropriate value.

Benefit of such solution should be considerably better memory usage. Drawback the fact that script maintenance is removed from specific point where it is used.

Before implementing this change, I would like to hear opinions about this solution and suggestions for different approach.

回答1:

As it seems - using scripting for the mission might be a wasteful overkill - you use many application layers and the memory gets full.

EDIT

Some quick benchmark:

This code:

    static void Main(string[] args)
    {
        Console.WriteLine("Compiling");
        string code = "System.Threading.Thread.SpinWait(100000000);  System.Console.WriteLine(\" Script end\");";
        List<Script<object>> scripts = Enumerable.Range(0, 50).Select(num =>
             CSharpScript.Create(code, ScriptOptions.Default.WithReferences(typeof(Control).Assembly))).ToList();

        GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced); // for fair-play

        for (int i = 0; i < 10; i++)
            Task.WaitAll(scripts.Select(script => script.RunAsync()).ToArray());
    }

Consumes about ~600MB in my environment (just referenced the System.Windows.Form in the ScriptOption for sizing the scripts). It reuse the Script<object> - it's not consuming more memory on second call to RunAsync.

But we can do better:

    static void Main(string[] args)
    {
        Console.WriteLine("Compiling");
        string code = "return () => { System.Threading.Thread.SpinWait(100000000);  System.Console.WriteLine(\" Script end\"); };";

        List<Action> scripts = Enumerable.Range(0, 50).Select(async num =>
            await CSharpScript.EvaluateAsync<Action>(code, ScriptOptions.Default.WithReferences(typeof(Control).Assembly))).Select(t => t.Result).ToList();

        GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);

        for (int i = 0; i < 10; i++)
            Task.WaitAll(scripts.Select(script => Task.Run(script)).ToArray());
    }

In this script, I'm simplifying a bit the solution I proposed to returning Action object, but i think the performance impact is small (but on real implementations I really think you should use your own interface to make it flexible).

When the script is running, you can see a steep rise in memory to ~240MB, but after I'm calling the garbage collector (for demonstration purpose, and I did the same on the previous code) the memory usage drops back to ~30MB. It also faster.

回答2:

I am not sure whether this existed at the time of question creation but there is something very similar and, let's say, official way how to run scripts multiple times without increasing program memory. You need to use CreateDelegate method that will do exactly what is expected.

I will post it here just for the convenience:

var script = CSharpScript.Create<int>("X*Y", globalsType: typeof(Globals));
ScriptRunner<int> runner = script.CreateDelegate();

for (int i = 0; i < 10; i++)
{
  Console.WriteLine(await runner(new Globals { X = i, Y = i }));
}

It takes some memory initially, but keep runner in some global list and invoke it later quickly.

来源：https://stackoverflow.com/questions/43432940/how-to-use-roslyn-c-sharp-scripting-in-batch-processing-with-several-scripts

标签

scripting

roslyn