Get actual field name from JPMML model's InputField

徘徊边缘 提交于 2019-12-11 04:58:28

问题


I have a scikit model that I'm using in my java app using JPMML. I'm trying to set the InputFields using the name of the column that was used during training, but "inField.getName().getValue()" is obfuscated to "x{#}". Is there anyway i could map "x{#}" back to the original feature/attribute name?

Map<FieldName, FieldValue> arguments = new LinkedHashMap<>();
    or (InputField inField : patternEvaluator.getInputFields()) {
        int value = activeFeatures.contains(inField.getName().getValue()) ? 1 : 0;
        FieldValue inputFieldValue = inField.prepare(value);
        arguments.put(inField.getName(), inputFieldValue);              
            }
Map<FieldName, ?> results = patternEvaluator.evaluate(arguments);

Here's how I'm generating the modal

from sklearn2pmml import PMMLPipeline
from sklearn2pmml import PMMLPipeline
import os
import pandas as pd
from sklearn.pipeline import Pipeline
import numpy as np

data = pd.read_csv('/pydata/training.csv')
X = data[data.keys()[:-1]].as_matrix()
y = data['classname'].as_matrix()

X_train, X_test, y_train, y_test =    train_test_split(X,y,test_size=0.3,random_state=0)

estimators = [("read", RandomForestClassifier(n_jobs=5,n_estimators=200, max_features='auto'))]    
pipe = PMMLPipeline(estimators)
pipe.fit(X_train,y_train)
pipe.active_fields = np.array(data.columns)
sklearn2pmml(pipe, "/pydata/model.pmml", with_repr = True)

Thanks


回答1:


Does the PMML document contain actual field names at all? Open it in a text editor, and see what are the values of /PMML/DataDictionary/DataField@name attributes.

Your question indicates that the conversion from Scikit-Learn to PMML was incomplete, because it didn't include information about active field (aka input field) names. In that case they are assumed to be x1, x2, .., xn.




回答2:


Your pipeline only includes the estimator, that is why the names are lost. You have to include all the preprocessing steps as well in order to get them into the PMML.

Let's assume you do not do any preprocessing at all, then that is probably what you need (I do not repeat parts of your code which are required in this snippet):

nones = [(d, None) for d in data.columns]

mapper = DataFrameMapper(nones,df_out=True)

lm = PMMLPipeline([
    ("mapper", mapper),
    ("estimator", estimators)
])

lm.fit(X_train,y_train)

sklearn2pmml(lm, "ScikitLearnNew.pmml", with_repr=True)

In case you do require some preprocessing on your data, instead of None you can use any other transformator (e.g. LabelBinarizer). But the preprocessing has to be happening inside the pipeline in order to be included in the PMML.



来源:https://stackoverflow.com/questions/48047109/get-actual-field-name-from-jpmml-models-inputfield

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!