NLP: Building (small) corpora, or “Where to get lots of not-too-specialized English-language text files?”

前端 未结 7 897
温柔的废话
温柔的废话 2021-01-13 03:41

Does anyone have a suggestion for where to find archives or collections of everyday English text for use in a small corpus? I have been using Gutenberg Project books for a

7条回答
  •  猫巷女王i
    2021-01-13 03:53

    If you're willing to pay money, you should check out the data available at the Linguistic Data Consortium, such as the Penn Treebank.

提交回复
热议问题