CBOW v.s. skip-gram: why invert context and target words?

前端 未结 2 763
伪装坚强ぢ
伪装坚强ぢ 2021-01-29 20:32

In this page, it is said that:

[...] skip-gram inverts contexts and targets, and tries to predict each context word from its target word [...]

2条回答
  •  孤城傲影
    2021-01-29 21:30

    It has to do with what exactly you're calculating at any given point. The difference will become clearer if you start to look at models that incorporate a larger context for each probability calculation.

    In skip-gram, you're calculating the context word(s) from the word at the current position in the sentence; you're "skipping" the current word (and potentially a bit of the context) in your calculation. The result can be more than one word (but not if your context window is just one word long).

    In CBOW, you're calculating the current word from the context word(s), so you will only ever have one word as a result.

提交回复
热议问题