Options for deploying R models in production

前端 未结 6 1102
星月不相逢
星月不相逢 2020-12-12 12:15

There doesn\'t seem to be too many options for deploying predictive models in production which is surprising given the explosion in Big Data.

I understand that the

6条回答
  •  春和景丽
    2020-12-12 12:39

    The answer really depends on what your production environment is.

    If your "big data" are on Hadoop, you can try this relatively new open source PMML "scoring engine" called Pattern.

    Otherwise you have no choice (short of writing custom model-specific code) but to run R on your server. You would use save to save your fitted models in .RData files and then load and run corresponding predict on the server. (That is bound to be slow but you can always try and throw more hardware at it.)

    How you do that really depends on your platform. Usually there is a way to add "custom" functions written in R. The term is UDF (user-defined function). In Hadoop you can add such functions to Pig (e.g. https://github.com/cd-wood/pigaddons) or you can use RHadoop to write simple map-reduce code that would load the model and call predict in R. If your data are in Hive, you can use Hive TRANSFORM to call external R script.

    There are also vendor-specific ways to add functions written in R to various SQL databases. Again look for UDF in the documentation. For instance, PostgreSQL has PL/R.

提交回复
热议问题