Google Cloud ML-engine scikit-learn prediction probability 'predict_proba()'

与世无争的帅哥 提交于 2019-12-04 11:15:29

问题


Google Cloud ML-engine supports the ability to deploy scikit-learn Pipeline objects. For example a text classification Pipeline could look like the following,

classifier = Pipeline([
('vect', CountVectorizer()), 
('clf', naive_bayes.MultinomialNB())])

The classifier can be trained,

classifier.fit(train_x, train_y)

Then the classifier can be uploaded to Google Cloud Storage,

model = 'model.joblib'
joblib.dump(classifier, model)
model_remote_path = os.path.join('gs://', bucket_name, datetime.datetime.now().strftime('model_%Y%m%d_%H%M%S'), model)
subprocess.check_call(['gsutil', 'cp', model, model_remote_path], stderr=sys.stdout)

Then a Model and Version can be created, either through the Google Cloud Console, or programmatically, linking the 'model.joblib' file to the Version.

This classifier can then be used to predict new data by calling the deployed model predict endpoint,

ml = discovery.build('ml','v1')
project_id = 'projects/{}/models/{}'.format(project_name, model_name)
if version_name is not None:
    project_id += '/versions/{}'.format(version_name)
request_dict = {'instances':['Test data']}
ml_request = ml.projects().predict(name=project_id, body=request_dict).execute()

The Google Cloud ML-engine calls the predict function of the classifier and returns the predicted class. However, I would like to be able to return the confidence score. Normally this could be achieved by calling the predict_proba function of the classier, however there doesn't seem to be the option to change the called function. My question is: Is it possible to return the confidence score for a scikit-learn classifier when using the Google Cloud ML-engine? If not, would you have any recommendations as to how else to achieve this result?

Update: I've found a hacky solution. It involved overwriting the predict function of the classifier with its own predict_proba function,

nb = naive_bayes.MultinomialNB()
nb.predict = nb.predict_proba
classifier = Pipeline([
('vect', CountVectorizer()), 
('clf', nb)])

Surprisingly this works. If anyone knows of a neater solution then please let me know.

Update: Google have released a new feature (currently in beta) called Custom prediction routines. This allows you to define what code is run when a prediction request comes in. It adds more code to the solution, but it certainly less hacky.


回答1:


The ML Engine API you are using, only has the predict method, as you can see in the documentation, so it will only do the prediction (unless you force it to do something else with the hack you mentioned).

If you want to do something else with your trained model, you’ll have to load it and use it normally. If you want to use the model stored in Cloud Storage you can do something like:

from google.cloud import storage
from sklearn.externals import joblib

bucket_name = "<BUCKET_NAME>"
gs_model = "path/to/model.joblib"  # path in your Cloud Storage bucket
local_model = "/path/to/model.joblib"  # path in your local machine

client = storage.Client()
bucket = client.get_bucket(bucket_name)
blob = bucket.blob(gs_model)
blob.download_to_filename(local_model)

model = joblib.load(local_model)
model.predict_proba(test_data)


来源:https://stackoverflow.com/questions/52151548/google-cloud-ml-engine-scikit-learn-prediction-probability-predict-proba

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!