Best Practise Coding for R script running in production [closed]

泄露秘密 提交于 2019-12-25 18:52:57

问题


We have a linux production server and a number of scripts we are writing that we want to run on it to collect data which then will be put into a Spark data lake.

My background is SQL Server / Fortran and there are very specific best practices that should be followed.

  • Production environments should be stable in terms of version control, both from the code point of view, but also the installed applications, operating system, etc.
  • Changes to code/applications/operating system should be done either in a separate environment or in a way that is controlled and can be backed out.
  • If a second environment exist, then the possibility of parallel execution to test system changes can be performed.
  • (Largely), developers are restricted from changing the production environment

In reviewing the R code, there are a number of things that I have questions on.

  • library(), install.packages() - I would like to exclude the possibility of installing newer versions of packages each time scripts are run?
  • how is it best to call R packages that are scheduled through a CRON job? There are a number of choices here.
  • When using RSelenium what is the most efficient way to use a gui/web browser or virtualised web browser?

回答1:


In any case I would scratch any notion of updating the packages automatically. Expect the maintainers of the packages you rely on to introduce backward incompatible changes. Your code will stop working out of the blue if you auto update. Do not assume anything sacred.

Past that you need to ask yourself how much hands on is your deployment. If you're OK with manually setting up each deployment then you can probably get away using the packrat package to pull down and keep sources of the exact versions you are using. This way reproducing your deployment is painful, but at least possible. If you want fully automated reproducible deployments I suggest you start building docker images with your packages and tagging them with dates or versions.

If you make no provisions for reproducing your environment you are asking for trouble, while it may seem OK at first to simply fix any incompatibilities as they come up with updates, and does indeed seem to be the official workflow from the powers that be, however misguided that is; eventually as your codebase grows that will be all you will end up doing.



来源:https://stackoverflow.com/questions/36919953/best-practise-coding-for-r-script-running-in-production

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!