How to use Roslyn C# scripting in batch processing with several scripts?

天大地大妈咪最大 提交于 2020-01-13 16:29:13

问题


I am writing multi-threaded solution that will be used for transferring data from different sources to a central database. Solution, in general, has two parts:

  1. Single-threaded Import engine
  2. Multi-threaded client that invokes Import engine in threads.

In order to minimize custom development I am using Roslyn scripting. This feature is enabled with Nuget Package manager in Import engine project. Every import is defined as transformation of input table – that has collection of input fields – to destination table – again with collection of destination fields.

Scripting engine is used here to allow custom transformation between input and output. For every input/output pair there is text field with custom script. Here is simplified code used for script initialization:

//Instance of class passed to script engine
_ScriptHost = new ScriptHost_Import();

if (Script != "") //Here we have script fetched from DB as text
{
  try
  {
    //We are creating script object …
    ScriptObject = CSharpScript.Create<string>(Script, globalsType: typeof(ScriptHost_Import));
    //… and we are compiling it upfront to save time since this might be invoked multiple times.
    ScriptObject.Compile();
    IsScriptCompiled = true;
  }
  catch
  {
    IsScriptCompiled = false;
  }
}

Later we will invoke this script with:

async Task<string> RunScript()
{
    return (await ScriptObject.RunAsync(_ScriptHost)).ReturnValue.ToString();
}

So, after import definition initialization, where we might have any number of input/output pair description along with script object, memory foot print increases approximately 50 MB per pair where scripting is defined. Similar usage pattern is applied to validation of destination rows before storing it to a DB (every field might have several scripts that are used to check validity of data).

All in all, typical memory footprint with modest transformation/validation scripting is 200 MB per thread. If we need to invoke several threads, memory usage will be very high and 99% will be used for scripting. If Import engine is enclosed in WCF based middle layer (which I did) quickly we stumble upon "Insufficient memory" problem.

Obvious solution would be to have one scripting instance that would somehow dispatch code execution to specific function inside the script depending on the need (input/output transformation, validation or something else). I.e. instead of script text for every field we will have SCRIPT_ID that will be passed as global parameter to script engine. Somewhere in script we need to switch to specific portion of code that would execute and return appropriate value.

Benefit of such solution should be considerably better memory usage. Drawback the fact that script maintenance is removed from specific point where it is used.

Before implementing this change, I would like to hear opinions about this solution and suggestions for different approach.


回答1:


As it seems - using scripting for the mission might be a wasteful overkill - you use many application layers and the memory gets full.

Other solutions:

  • How do you interface with the DB? you can manipulate the query itself according to your needs instead of writing a whole script for that.
  • How about using Generics? with enough T's to fit your needs:

    public class ImportEngine<T1,T2,T3,T3,T5>

  • Using Tuples (which is pretty much like using generics)

But if you still think scripts is the right tool for you, I found that the memory usage of scripts can be lowered by running the script work inside your application, (and not with RunAsync), you can do this be getting back from RunAsync the logic, and re-use it, instead of doing the work inside the heavy and memory wasteful RunAsync. Here is an example:

Instead of simply (the script string):

DoSomeWork();

You can do this (IHaveWork is an interface defined in you app, with only one method Work):

public class ScriptWork : IHaveWork
{
    Work()
    {
        DoSomeWork();
    }
}
return new ScriptWork();

This way you call the heavy RunAsync only for short period, and it is returning a worker that you can re-use inside your application (and you can of course extend this by adding parameters to the Work method and inherit logic from your application and so on...).

The pattern also breaking the isolation between your app and the script, so you can easily give and get data from the script.

EDIT

Some quick benchmark:

This code:

    static void Main(string[] args)
    {
        Console.WriteLine("Compiling");
        string code = "System.Threading.Thread.SpinWait(100000000);  System.Console.WriteLine(\" Script end\");";
        List<Script<object>> scripts = Enumerable.Range(0, 50).Select(num =>
             CSharpScript.Create(code, ScriptOptions.Default.WithReferences(typeof(Control).Assembly))).ToList();

        GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced); // for fair-play

        for (int i = 0; i < 10; i++)
            Task.WaitAll(scripts.Select(script => script.RunAsync()).ToArray());
    }

Consumes about ~600MB in my environment (just referenced the System.Windows.Form in the ScriptOption for sizing the scripts). It reuse the Script<object> - it's not consuming more memory on second call to RunAsync.

But we can do better:

    static void Main(string[] args)
    {
        Console.WriteLine("Compiling");
        string code = "return () => { System.Threading.Thread.SpinWait(100000000);  System.Console.WriteLine(\" Script end\"); };";

        List<Action> scripts = Enumerable.Range(0, 50).Select(async num =>
            await CSharpScript.EvaluateAsync<Action>(code, ScriptOptions.Default.WithReferences(typeof(Control).Assembly))).Select(t => t.Result).ToList();

        GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);

        for (int i = 0; i < 10; i++)
            Task.WaitAll(scripts.Select(script => Task.Run(script)).ToArray());
    }

In this script, I'm simplifying a bit the solution I proposed to returning Action object, but i think the performance impact is small (but on real implementations I really think you should use your own interface to make it flexible).

When the script is running, you can see a steep rise in memory to ~240MB, but after I'm calling the garbage collector (for demonstration purpose, and I did the same on the previous code) the memory usage drops back to ~30MB. It also faster.




回答2:


I am not sure whether this existed at the time of question creation but there is something very similar and, let's say, official way how to run scripts multiple times without increasing program memory. You need to use CreateDelegate method that will do exactly what is expected.

I will post it here just for the convenience:

var script = CSharpScript.Create<int>("X*Y", globalsType: typeof(Globals));
ScriptRunner<int> runner = script.CreateDelegate();

for (int i = 0; i < 10; i++)
{
  Console.WriteLine(await runner(new Globals { X = i, Y = i }));
}

It takes some memory initially, but keep runner in some global list and invoke it later quickly.



来源:https://stackoverflow.com/questions/43432940/how-to-use-roslyn-c-sharp-scripting-in-batch-processing-with-several-scripts

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!