问题
I am dealing with a classification problem with text data in sagemaker. Where, i first fit and transform it into structured format(say by using TFIDF in sklearn) then i kept the result in S3 bucket and i used it for training my pytorch model for which i have written the code in my entry point.
if we notice, by the end of the above process, i have two models
- sklearn TFIDF model
- actual PyTorch model
So, when every time i need to predict on a new text data, i need to separately process(transform) the text data with TFIDF model which i created during my training.
How can i create a pipeline in sagemaker with sklearn's TFIDF and pytorch models.
if i fit and transform text data using TFIDF in my main method in entrypoint then if i train my pytorch model in my main method, i can return only one model which will be used in model_fn()
回答1:
First, checkout the mnist example here:
https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/pytorch_mnist/pytorch_mnist.ipynb
With script mode, you can run the code (in mnist.py) using the below estimator.
from sagemaker.pytorch import PyTorch
estimator = PyTorch(entry_point='mnist.py',
role=role,
framework_version='1.1.0',
train_instance_count=2,
train_instance_type='ml.c4.xlarge',
hyperparameters={
'epochs': 6,
'backend': 'gloo'
})
Simply update the mnist.py script as per tfidf pipeline. Hope this helps.
回答2:
Apparently, We need to use inference pipelines.
An inference pipeline is an Amazon SageMaker model that is composed of a linear sequence of two to five containers that process requests for inferences on data. You use an inference pipeline to define and deploy any combination of pretrained Amazon SageMaker built-in algorithms and your own custom algorithms packaged in Docker containers. You can use an inference pipeline to combine preprocessing, predictions, and post-processing data science tasks. Inference pipelines are fully managed.
one can read the docs here -
https://docs.aws.amazon.com/sagemaker/latest/dg/inference-pipelines.html
Example -
https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/scikit_learn_inference_pipeline/Inference%20Pipeline%20with%20Scikit-learn%20and%20Linear%20Learner.ipynb
来源:https://stackoverflow.com/questions/57767899/how-to-create-a-pipeline-in-sagemaker-with-pytorch