发表新帖

发表新帖

Options for deploying R models in production

前端未结

关注

 6  1102

星月不相逢 2020-12-12 12:15

There doesn\'t seem to be too many options for deploying predictive models in production which is surprising given the explosion in Big Data.

I understand that the

6条回答

春和景丽 (楼主)

2020-12-12 12:39

The answer really depends on what your production environment is.

If your "big data" are on Hadoop, you can try this relatively new open source PMML "scoring engine" called Pattern.

Otherwise you have no choice (short of writing custom model-specific code) but to run R on your server. You would use save to save your fitted models in .RData files and then load and run corresponding predict on the server. (That is bound to be slow but you can always try and throw more hardware at it.)

How you do that really depends on your platform. Usually there is a way to add "custom" functions written in R. The term is UDF (user-defined function). In Hadoop you can add such functions to Pig (e.g. https://github.com/cd-wood/pigaddons) or you can use RHadoop to write simple map-reduce code that would load the model and call predict in R. If your data are in Hive, you can use Hive TRANSFORM to call external R script.

There are also vendor-specific ways to add functions written in R to various SQL databases. Again look for UDF in the documentation. For instance, PostgreSQL has PL/R.

0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...

热议问题