Migrate from running ML training and testing locally to Google Cloud

非 Y 不嫁゛ 提交于 2020-01-24 20:55:50

问题


I currently have a simple Machine Learning infrastructure running locally and I want to migrate this all onto Google Cloud. I simply fetch the data I need from a database, build my model and then test the model on test data. This is all done in PyCharm locally.

I want to simply migrate this and have the possibility for all this to be done on Google Cloud, while having the flexibility to make local changes that can apply when run on the cloud as well. There are many Google Cloud resources relating to this and so I am looking for best practices people follow on running such a procedure.

Thanks and please let me know if there are any clarifications needed.


回答1:


I highly suggest you to take a look at this machine learning workflow in the cloud which consists of:

  • Data Ingestion and Collection
  • Storing the data.
  • Processing data.
  • ML training.
  • ML deployment.

Data Ingestion and Collection

There are multiple resources you can use if you would like to ingest data with Google Cloud Platform. The simplest solution I can recommend to you are both Google Compute Engine or an App Engine App (for example for a forum where a user fill some data up).

Nonetheless, if you would like to ingest data in real-time, you can also use Cloud Pub/Sub.

Storing the data

As you mentioned, you are retrieving all the information from a database. If you are used to work with SQL or NoSQL I highy suggest you to go after Cloud SQL. Not only provides a good interface when building your instance, but also lets you access it securely and very rapidly.

If it not the case, you can also use Google Cloud Storage or BigQuery, but over those two, I will pick BigQuery since it has also the possibility to work with stream data.

Processing data

For processing data before feeding it to the model you can use either:

  • Cloud DataFlow: Cloud Dataflow is a fully-managed service for transforming and enriching data in stream (real time) and batch (historical) modes with equal reliability and expressiveness -- no more complex workarounds or compromises needed.
  • Cloud Dataproc: Dataproc is a fast, easy-to-use, fully managed cloud service for running Apache Spark and Apache Hadoop clusters in a simpler, more cost-efficient way.
  • Cloud Dataprep: Cloud Dataprep by Trifacta is an intelligent data service for visually exploring, cleaning, and preparing structured and unstructured data for analysis, reporting, and machine learning.

ML training & ML deployment

For training/deploying your ML model I would suggest to use AI platform.

AI Platform makes it easy for machine learning developers, data scientists, and data engineers to take their ML projects from ideation to production and deployment, quickly and cost-effectively.

If you have to work with huge datasets, the best practices are run the model as a Tensorflow job with AI Platform so you can have a training cluster.

Finally for deploying your models using AI Platform, you can take a look here.



来源:https://stackoverflow.com/questions/59467642/migrate-from-running-ml-training-and-testing-locally-to-google-cloud

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!