问题
Started with Machine learning in Python recently.
I am working on a Python project involving regression to predict some values. The input is a data set consisting of 70 features which are a mix of categorical and ordinal variables. The dependent variable is continuous.
The input would be data and the number of significant variables.
i had some questions which are mentioned below.
1] Is there a way to perform feature selection using forward selection technique in Tensorflow?
2] are there alternatives to feature selection ?
Any help would be appreciated !
回答1:
Problem
I have N number of features (N = 70 for example) and I want to select the K top features. (1) How could I do this in TensorFlow and (2) what alternatives are there to feature selection.
Discussion
I will show one way to limit the number of N features to at most K using a variant of L1 loss. As for alternatives to feature selection there are many depending on what you want to achieve. If you can go outside of TensorFlow then you could use Decision Trees or a Random Forest and simply constrain the number of leaves to use at most K features. If you must use TensorFlow and you want an alternative to top K features that regularizes your weights you could use random dropout or L2 loss. Again, it really depends on what you want to achieve when looking for an alternative to top K features.
Solution to Top K of N features.
Supposed our TensorFlow graph is defined as follows
import tensorflow as tf
size = 4
x_in = tf.placeholder( shape=[None,size] , dtype=tf.float32 )
y_in = tf.placeholder( shape=[None] , dtype=tf.float32 )
l1_weight = tf.placeholder( shape=[] , dtype=tf.float32 )
m = tf.Variable( tf.random_uniform( shape=[size,1] , minval=0.1 , maxval=0.9 ) , dtype=tf.float32 )
m = tf.nn.relu(m)
b = tf.Variable([-10], dtype=tf.float32 )
predict = tf.squeeze( tf.nn.xw_plus_b(x_in,m,b) )
l1_loss = tf.reduce_sum(tf.abs(m))
loss = tf.reduce_mean( tf.square( y_in - predict ) ) + l1_loss * l1_weight
optimizer = tf.train.GradientDescentOptimizer(1e-4)
train = optimizer.minimize(loss)
m_all_0 = tf.zeros( [size,1] , dtype=tf.float32 )
zerod_feature_count = tf.reduce_sum( tf.cast( tf.equal( m , m_all_0 ) , dtype=tf.float32 ) )
k = size - zerod_feature_count
Let's define some data and use this
import numpy as np
data = np.array([[5.1,3.5,1.4,0.2,0],[4.9,3.0,1.4,0.2,0],[4.7,3.2,1.3,0.2,0],[4.6,3.1,1.5,0.2,0],[5.0,3.6,1.4,0.2,0],[5.4,3.9,1.7,0.4,0],[4.6,3.4,1.4,0.3,0],[5.0,3.4,1.5,0.2,0],[4.4,2.9,1.4,0.2,0],[4.9,3.1,1.5,0.1,0],[5.4,3.7,1.5,0.2,0],[4.8,3.4,1.6,0.2,0],[4.8,3.0,1.4,0.1,0],[4.3,3.0,1.1,0.1,0],[5.8,4.0,1.2,0.2,0],[5.7,4.4,1.5,0.4,0],[5.4,3.9,1.3,0.4,0],[5.1,3.5,1.4,0.3,0],[5.7,3.8,1.7,0.3,0],[5.1,3.8,1.5,0.3,0],[5.4,3.4,1.7,0.2,0],[5.1,3.7,1.5,0.4,0],[4.6,3.6,1.0,0.2,0],[5.1,3.3,1.7,0.5,0],[4.8,3.4,1.9,0.2,0],[5.0,3.0,1.6,0.2,0],[5.0,3.4,1.6,0.4,0],[5.2,3.5,1.5,0.2,0],[5.2,3.4,1.4,0.2,0],[4.7,3.2,1.6,0.2,0],[4.8,3.1,1.6,0.2,0],[5.4,3.4,1.5,0.4,0],[5.2,4.1,1.5,0.1,0],[5.5,4.2,1.4,0.2,0],[4.9,3.1,1.5,0.1,0],[5.0,3.2,1.2,0.2,0],[5.5,3.5,1.3,0.2,0],[4.9,3.1,1.5,0.1,0],[4.4,3.0,1.3,0.2,0],[5.1,3.4,1.5,0.2,0],[5.0,3.5,1.3,0.3,0],[4.5,2.3,1.3,0.3,0],[4.4,3.2,1.3,0.2,0],[5.0,3.5,1.6,0.6,0],[5.1,3.8,1.9,0.4,0],[4.8,3.0,1.4,0.3,0],[5.1,3.8,1.6,0.2,0],[4.6,3.2,1.4,0.2,0],[5.3,3.7,1.5,0.2,0],[5.0,3.3,1.4,0.2,0],[7.0,3.2,4.7,1.4,1],[6.4,3.2,4.5,1.5,1],[6.9,3.1,4.9,1.5,1],[5.5,2.3,4.0,1.3,1],[6.5,2.8,4.6,1.5,1],[5.7,2.8,4.5,1.3,1],[6.3,3.3,4.7,1.6,1],[4.9,2.4,3.3,1.0,1],[6.6,2.9,4.6,1.3,1],[5.2,2.7,3.9,1.4,1],[5.0,2.0,3.5,1.0,1],[5.9,3.0,4.2,1.5,1],[6.0,2.2,4.0,1.0,1],[6.1,2.9,4.7,1.4,1],[5.6,2.9,3.6,1.3,1],[6.7,3.1,4.4,1.4,1],[5.6,3.0,4.5,1.5,1],[5.8,2.7,4.1,1.0,1],[6.2,2.2,4.5,1.5,1],[5.6,2.5,3.9,1.1,1],[5.9,3.2,4.8,1.8,1],[6.1,2.8,4.0,1.3,1],[6.3,2.5,4.9,1.5,1],[6.1,2.8,4.7,1.2,1],[6.4,2.9,4.3,1.3,1],[6.6,3.0,4.4,1.4,1],[6.8,2.8,4.8,1.4,1],[6.7,3.0,5.0,1.7,1],[6.0,2.9,4.5,1.5,1],[5.7,2.6,3.5,1.0,1],[5.5,2.4,3.8,1.1,1],[5.5,2.4,3.7,1.0,1],[5.8,2.7,3.9,1.2,1],[6.0,2.7,5.1,1.6,1],[5.4,3.0,4.5,1.5,1],[6.0,3.4,4.5,1.6,1],[6.7,3.1,4.7,1.5,1],[6.3,2.3,4.4,1.3,1],[5.6,3.0,4.1,1.3,1],[5.5,2.5,4.0,1.3,1],[5.5,2.6,4.4,1.2,1],[6.1,3.0,4.6,1.4,1],[5.8,2.6,4.0,1.2,1],[5.0,2.3,3.3,1.0,1],[5.6,2.7,4.2,1.3,1],[5.7,3.0,4.2,1.2,1],[5.7,2.9,4.2,1.3,1],[6.2,2.9,4.3,1.3,1],[5.1,2.5,3.0,1.1,1],[5.7,2.8,4.1,1.3,1]])
x = data[:,0:4]
y = data[:,-1]
sess = tf.Session()
sess.run(tf.global_variables_initializer())
Let's define a reusable trial function that can test different weights of l1_loss
def trial(weight=1):
print "initial m,loss",sess.run([m,loss],feed_dict={x_in:x, y_in:y, l1_weight:0})
for _ in range(10000):
sess.run(train,feed_dict={x_in:x, y_in:y, l1_weight:0})
print "after training m,loss",sess.run([m,loss],feed_dict={x_in:x, y_in:y, l1_weight:0})
for _ in range(10000):
sess.run(train,feed_dict={x_in:x, y_in:y, l1_weight:weight})
if sess.run(k) <= 3 :
break
print "after l1 loss m",sess.run([m,loss],feed_dict={x_in:x, y_in:y, l1_weight:weight})
Then we'll try it out
print "The number of non-zero parameters is",sess.run(k)
print "Doing a training session"
trial()
print "The number of non-zero parameters is",sess.run(k)
And the results look good
The number of non-zero parameters is 4.0
Doing a training session
initial m,loss [array([[0.75030506],
[0.4959089 ],
[0.646675 ],
[0.44027993]], dtype=float32), 8.228347]
after training m,loss [array([[1.1898338 ],
[1.0033115 ],
[0.15164669],
[0.16414128]], dtype=float32), 0.68957466]
after l1 loss m [array([[1.250356 ],
[0.92532456],
[0.10235767],
[0. ]], dtype=float32), 2.9621665]
The number of non-zero parameters is 3.0
来源:https://stackoverflow.com/questions/49843886/perform-feature-selection-in-tensorflow