Using AWS for parallel processing with R

后端 未结 2 696
长情又很酷
长情又很酷 2020-12-23 18:17

I want to take a shot at the Kaggle Dunnhumby challenge by building a model for each customer. I want to split the data into ten groups and use Amazon web-services (AWS) to

2条回答
  •  难免孤独
    2020-12-23 18:50

    You can build up everything manually at AWS. You have to build your own amazon computer cluster with several instances. There is a nice tutorial video available at the Amazon website: http://www.youtube.com/watch?v=YfCgK1bmCjw

    But it will take you several hours to get everything running:

    • starting 11 EC2 instances (for every group one instance + one head instance)
    • R and MPI on all machines (check for preinstalled images)
    • configuring MPI correctly (probably add a security layer)
    • in best case a file server which will be mounted to all nodes (share data)
    • with this infrastructure the best solution is the use of the snow or foreach package (with Rmpi)

    The segue package is nice but you will definitely get data communication problems!

    The simples solution is cloudnumbers.com (http://www.cloudnumbers.com). This platform provides you with easy access to computer clusters in the cloud. You can test 5 hours for free with a small computer cluster in the cloud! Check the slides from the useR conference: http://cloudnumbers.com/hpc-news-from-the-user2011-conference

提交回复
热议问题