问题

Started with Machine learning in Python recently.

I am working on a Python project involving regression to predict some values. The input is a data set consisting of 70 features which are a mix of categorical and ordinal variables. The dependent variable is continuous.

The input would be data and the number of significant variables.

i had some questions which are mentioned below.

1] Is there a way to perform feature selection using forward selection technique in Tensorflow?

2] are there alternatives to feature selection ?

Any help would be appreciated !

回答1:

Problem

I have N number of features (N = 70 for example) and I want to select the K top features. (1) How could I do this in TensorFlow and (2) what alternatives are there to feature selection.

Discussion

I will show one way to limit the number of N features to at most K using a variant of L1 loss. As for alternatives to feature selection there are many depending on what you want to achieve. If you can go outside of TensorFlow then you could use Decision Trees or a Random Forest and simply constrain the number of leaves to use at most K features. If you must use TensorFlow and you want an alternative to top K features that regularizes your weights you could use random dropout or L2 loss. Again, it really depends on what you want to achieve when looking for an alternative to top K features.

Solution to Top K of N features.

Supposed our TensorFlow graph is defined as follows

import tensorflow as tf

size = 4

x_in = tf.placeholder( shape=[None,size] , dtype=tf.float32 )
y_in = tf.placeholder( shape=[None] , dtype=tf.float32 )
l1_weight = tf.placeholder( shape=[] , dtype=tf.float32 )

m = tf.Variable( tf.random_uniform( shape=[size,1] , minval=0.1 , maxval=0.9 ) , dtype=tf.float32 )
m = tf.nn.relu(m)
b = tf.Variable([-10], dtype=tf.float32 )

predict = tf.squeeze( tf.nn.xw_plus_b(x_in,m,b) )

l1_loss = tf.reduce_sum(tf.abs(m))
loss = tf.reduce_mean( tf.square( y_in - predict ) ) + l1_loss * l1_weight

optimizer = tf.train.GradientDescentOptimizer(1e-4)
train = optimizer.minimize(loss)

m_all_0 = tf.zeros( [size,1] , dtype=tf.float32 )
zerod_feature_count = tf.reduce_sum( tf.cast( tf.equal( m , m_all_0 ) , dtype=tf.float32 ) )
k = size - zerod_feature_count

Let's define some data and use this

import numpy as np
data = np.array([[5.1,3.5,1.4,0.2,0],[4.9,3.0,1.4,0.2,0],[4.7,3.2,1.3,0.2,0],[4.6,3.1,1.5,0.2,0],[5.0,3.6,1.4,0.2,0],[5.4,3.9,1.7,0.4,0],[4.6,3.4,1.4,0.3,0],[5.0,3.4,1.5,0.2,0],[4.4,2.9,1.4,0.2,0],[4.9,3.1,1.5,0.1,0],[5.4,3.7,1.5,0.2,0],[4.8,3.4,1.6,0.2,0],[4.8,3.0,1.4,0.1,0],[4.3,3.0,1.1,0.1,0],[5.8,4.0,1.2,0.2,0],[5.7,4.4,1.5,0.4,0],[5.4,3.9,1.3,0.4,0],[5.1,3.5,1.4,0.3,0],[5.7,3.8,1.7,0.3,0],[5.1,3.8,1.5,0.3,0],[5.4,3.4,1.7,0.2,0],[5.1,3.7,1.5,0.4,0],[4.6,3.6,1.0,0.2,0],[5.1,3.3,1.7,0.5,0],[4.8,3.4,1.9,0.2,0],[5.0,3.0,1.6,0.2,0],[5.0,3.4,1.6,0.4,0],[5.2,3.5,1.5,0.2,0],[5.2,3.4,1.4,0.2,0],[4.7,3.2,1.6,0.2,0],[4.8,3.1,1.6,0.2,0],[5.4,3.4,1.5,0.4,0],[5.2,4.1,1.5,0.1,0],[5.5,4.2,1.4,0.2,0],[4.9,3.1,1.5,0.1,0],[5.0,3.2,1.2,0.2,0],[5.5,3.5,1.3,0.2,0],[4.9,3.1,1.5,0.1,0],[4.4,3.0,1.3,0.2,0],[5.1,3.4,1.5,0.2,0],[5.0,3.5,1.3,0.3,0],[4.5,2.3,1.3,0.3,0],[4.4,3.2,1.3,0.2,0],[5.0,3.5,1.6,0.6,0],[5.1,3.8,1.9,0.4,0],[4.8,3.0,1.4,0.3,0],[5.1,3.8,1.6,0.2,0],[4.6,3.2,1.4,0.2,0],[5.3,3.7,1.5,0.2,0],[5.0,3.3,1.4,0.2,0],[7.0,3.2,4.7,1.4,1],[6.4,3.2,4.5,1.5,1],[6.9,3.1,4.9,1.5,1],[5.5,2.3,4.0,1.3,1],[6.5,2.8,4.6,1.5,1],[5.7,2.8,4.5,1.3,1],[6.3,3.3,4.7,1.6,1],[4.9,2.4,3.3,1.0,1],[6.6,2.9,4.6,1.3,1],[5.2,2.7,3.9,1.4,1],[5.0,2.0,3.5,1.0,1],[5.9,3.0,4.2,1.5,1],[6.0,2.2,4.0,1.0,1],[6.1,2.9,4.7,1.4,1],[5.6,2.9,3.6,1.3,1],[6.7,3.1,4.4,1.4,1],[5.6,3.0,4.5,1.5,1],[5.8,2.7,4.1,1.0,1],[6.2,2.2,4.5,1.5,1],[5.6,2.5,3.9,1.1,1],[5.9,3.2,4.8,1.8,1],[6.1,2.8,4.0,1.3,1],[6.3,2.5,4.9,1.5,1],[6.1,2.8,4.7,1.2,1],[6.4,2.9,4.3,1.3,1],[6.6,3.0,4.4,1.4,1],[6.8,2.8,4.8,1.4,1],[6.7,3.0,5.0,1.7,1],[6.0,2.9,4.5,1.5,1],[5.7,2.6,3.5,1.0,1],[5.5,2.4,3.8,1.1,1],[5.5,2.4,3.7,1.0,1],[5.8,2.7,3.9,1.2,1],[6.0,2.7,5.1,1.6,1],[5.4,3.0,4.5,1.5,1],[6.0,3.4,4.5,1.6,1],[6.7,3.1,4.7,1.5,1],[6.3,2.3,4.4,1.3,1],[5.6,3.0,4.1,1.3,1],[5.5,2.5,4.0,1.3,1],[5.5,2.6,4.4,1.2,1],[6.1,3.0,4.6,1.4,1],[5.8,2.6,4.0,1.2,1],[5.0,2.3,3.3,1.0,1],[5.6,2.7,4.2,1.3,1],[5.7,3.0,4.2,1.2,1],[5.7,2.9,4.2,1.3,1],[6.2,2.9,4.3,1.3,1],[5.1,2.5,3.0,1.1,1],[5.7,2.8,4.1,1.3,1]])

x = data[:,0:4]
y = data[:,-1]

sess = tf.Session()
sess.run(tf.global_variables_initializer())

Let's define a reusable trial function that can test different weights of l1_loss

def trial(weight=1):
    print "initial m,loss",sess.run([m,loss],feed_dict={x_in:x, y_in:y, l1_weight:0})
    for _ in range(10000):
        sess.run(train,feed_dict={x_in:x, y_in:y, l1_weight:0})
    print "after training m,loss",sess.run([m,loss],feed_dict={x_in:x, y_in:y, l1_weight:0})
    for _ in range(10000):
        sess.run(train,feed_dict={x_in:x, y_in:y, l1_weight:weight})
        if sess.run(k) <= 3 :
            break
    print "after l1 loss m",sess.run([m,loss],feed_dict={x_in:x, y_in:y, l1_weight:weight})

Then we'll try it out

print "The number of non-zero parameters is",sess.run(k)
print "Doing a training session"
trial()
print "The number of non-zero parameters is",sess.run(k)

And the results look good

The number of non-zero parameters is 4.0
Doing a training session
initial m,loss [array([[0.75030506],
       [0.4959089 ],
       [0.646675  ],
       [0.44027993]], dtype=float32), 8.228347]
after training m,loss [array([[1.1898338 ],
       [1.0033115 ],
       [0.15164669],
       [0.16414128]], dtype=float32), 0.68957466]
after l1 loss m [array([[1.250356  ],
       [0.92532456],
       [0.10235767],
       [0.        ]], dtype=float32), 2.9621665]
The number of non-zero parameters is 3.0

来源：https://stackoverflow.com/questions/49843886/perform-feature-selection-in-tensorflow

标签

python

tensorflow

regression

Perform feature selection in Tensorflow

问题

回答1:

Problem

Discussion

Solution to Top K of N features.