Cannot change number of clusters in KMeansClustering Tensorflow

我们两清 提交于 2019-12-13 04:46:49

问题


I found this code and it works perfectly. THe idea - split my data and train KMeansClustering on it. So I create InitHook and iterator and use it for training.

class _IteratorInitHook(tf.train.SessionRunHook):
    """Hook to initialize data iterator after session is created."""

    def __init__(self):
        super(_IteratorInitHook, self).__init__()
        self.iterator_initializer_fn = None

    def after_create_session(self, session, coord):
        """Initialize the iterator after the session has been created."""
        del coord
        self.iterator_initializer_fn(session)


# Run K-means clustering.
def _get_input_fn():
    """Helper function to create input function and hook for training.
    Returns:
        input_fn: Input function for k-means Estimator training.
        init_hook: Hook used to load data during training.
    """
    init_hook = _IteratorInitHook()

    def _input_fn():
        """Produces tf.data.Dataset object for k-means training.
        Returns:
            Tensor with the data for training.
        """
        features_placeholder = tf.placeholder(tf.float32,
                                                my_data.shape)
        delf_dataset = tf.data.Dataset.from_tensor_slices((features_placeholder))
        delf_dataset = delf_dataset.shuffle(1000).batch(
            my_data.shape[0])
        iterator = delf_dataset.make_initializable_iterator()

        def _initializer_fn(sess):
            """Initialize dataset iterator, feed in the data."""
            sess.run(
                iterator.initializer,
                feed_dict={features_placeholder: my_data})

        init_hook.iterator_initializer_fn = _initializer_fn
        return iterator.get_next()

    return _input_fn, init_hook


input_fn, init_hook = _get_input_fn()

output_cluster_dir = 'parameters/clusters'

kmeans = tf.contrib.factorization.KMeansClustering(
    num_clusters=1024,
    model_dir=output_cluster_dir,
    use_mini_batch=False,
)


print('Starting K-means clustering...')
kmeans.train(input_fn, hooks=[init_hook])

But if I change num_clusters to 512 or 256 I get next error:

InvalidArgumentError: segment_ids[0] = 600 is out of range [0, 256)
[[node UnsortedSegmentSum (defined at /home/mikhail/.conda/envs/tf2/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py:1112) ]] [[node Squeeze (defined at /home/mikhail/.conda/envs/tf2/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py:1112) ]]

Look like I have some problems on splitting data to batches OR my KMeans use 1024 clusters on default even I set another value!

I can't figure out what to change to make it work correctly. Traceback is huge, if its needed I can attach as a file.


回答1:


I found the problem: as you can see I save codebook to parameters/clusters. When it have created tensorflow save graph here too. So default behaviour for tensorflow - DO NOT CREATE new graph if it already exist!

So every time I tried to run KMeansClustering it still use graph, loaded from codebook. I solved the issue by deleting folder clusters every time I run KMeansClustering.

I still have some issues: I create new clusters, and start 2 scripts in parallel to create features using it: one of them creates for old codebook and one for new! Still forcing it, but my recommendation here is to restart everything after you created new codebook (maybe some info still loaded in tensorflow).



来源:https://stackoverflow.com/questions/56337848/cannot-change-number-of-clusters-in-kmeansclustering-tensorflow

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!