Gensim equivalent of training steps

江枫思渺然 提交于 2019-12-11 07:01:35

问题


Does gensim Word2Vec have an option that is the equivalent of "training steps" in the TensorFlow word2vec example here: Word2Vec Basic? If not, what default value does gensim use? Is the gensim parameter iter related to training steps?

The TensorFlow script includes this section.

with tf.Session(graph=graph) as session:
    # We must initialize all variables before we use them.
    init.run()
    print('Initialized')

    average_loss = 0
    for step in xrange(num_steps):
        batch_inputs, batch_labels = generate_batch(
            batch_size, num_skips, skip_window)
        feed_dict = {train_inputs: batch_inputs, train_labels: batch_labels}

    # We perform one update step by evaluating the optimizer op (including it
    # in the list of returned values for session.run()
    _, loss_val = session.run([optimizer, loss], feed_dict=feed_dict)
    average_loss += loss_val

    if step % 2000 == 0:
        if step > 0:
            average_loss /= 2000
        # The average loss is an estimate of the loss over the last 2000 batches.
        print('Average loss at step ', step, ': ', average_loss)
        average_loss = 0

    # Note that this is expensive (~20% slowdown if computed every 500 steps)
    if step % 10000 == 0:
        sim = similarity.eval()
        for i in xrange(valid_size):
            valid_word = reverse_dictionary[valid_examples[i]]
            top_k = 8  # number of nearest neighbors
            nearest = (-sim[i, :]).argsort()[1:top_k + 1]
            log_str = 'Nearest to %s:' % valid_word
            for k in xrange(top_k):
                close_word = reverse_dictionary[nearest[k]]
                log_str = '%s %s,' % (log_str, close_word)
            print(log_str)
  final_embeddings = normalized_embeddings.eval()

In the TensorFlow example, if I perform T-SNE on the embeddings and plot with matplotlib, the plot looks more reasonable to me when the number of steps is high. I am using a small corpus of 1,200 emails. One way it looks more reasonable is that numbers are clustered together. I would like to attain the same apparent level of quality using gensim.


回答1:


Yes, Word2Vec class constructor has iter argument:

iter = number of iterations (epochs) over the corpus. Default is 5.

Also, if you call Word2Vec.train() method directly, you can pass in epochs argument that has the same meaning.

The number of actual training steps is deduced from epochs, but depends on other parameters like text size, window size and batch size. If you're just looking to improve the quality of embedding vectors, increasing iter is the right way.



来源:https://stackoverflow.com/questions/47025885/gensim-equivalent-of-training-steps

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!