How to create a pipeline in sagemaker with pytorch

问题

I am dealing with a classification problem with text data in sagemaker. Where, i first fit and transform it into structured format(say by using TFIDF in sklearn) then i kept the result in S3 bucket and i used it for training my pytorch model for which i have written the code in my entry point.

if we notice, by the end of the above process, i have two models

sklearn TFIDF model
actual PyTorch model

So, when every time i need to predict on a new text data, i need to separately process(transform) the text data with TFIDF model which i created during my training.

How can i create a pipeline in sagemaker with sklearn's TFIDF and pytorch models.

if i fit and transform text data using TFIDF in my main method in entrypoint then if i train my pytorch model in my main method, i can return only one model which will be used in model_fn()

回答1:

First, checkout the mnist example here:

https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/pytorch_mnist/pytorch_mnist.ipynb

With script mode, you can run the code (in mnist.py) using the below estimator.

from sagemaker.pytorch import PyTorch

estimator = PyTorch(entry_point='mnist.py',
                    role=role,
                    framework_version='1.1.0',
                    train_instance_count=2,
                    train_instance_type='ml.c4.xlarge',
                    hyperparameters={
                        'epochs': 6,
                        'backend': 'gloo'
                    })

Simply update the mnist.py script as per tfidf pipeline. Hope this helps.

回答2:

Apparently, We need to use inference pipelines.

An inference pipeline is an Amazon SageMaker model that is composed of a linear sequence of two to five containers that process requests for inferences on data. You use an inference pipeline to define and deploy any combination of pretrained Amazon SageMaker built-in algorithms and your own custom algorithms packaged in Docker containers. You can use an inference pipeline to combine preprocessing, predictions, and post-processing data science tasks. Inference pipelines are fully managed.

one can read the docs here -

https://docs.aws.amazon.com/sagemaker/latest/dg/inference-pipelines.html

Example -

https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/scikit_learn_inference_pipeline/Inference%20Pipeline%20with%20Scikit-learn%20and%20Linear%20Learner.ipynb

来源：https://stackoverflow.com/questions/57767899/how-to-create-a-pipeline-in-sagemaker-with-pytorch

标签

python

scikit-learn

pytorch

amazon-sagemaker