Export inception output to spacy's training input format

こ雲淡風輕ζ 提交于 2021-01-28 21:18:42


I am using INCEpTION 0.11.0 (https://inception-project.github.io/) to annotate my training data. I would like to use python spacy to use this training data. I could see couple of format in Inception to which I can exported to but I am not sure which one is best suited for spacy.

I could not see any document about converting these exported file to space’s format.

I could write a new script to do this conversion. Before doing that I was wondering is someone already solved this and can give some advice? Which export format I should choose so that it will be easier to convert to spacy’s format?


Exporting your data as CONLLU is likely the most straightforward approach. SpaCy can convert CONLLU documents to its expected format using the the converter script: python -m spacy convert /path/to/input/doc.connlu /path/to/output/doc.jsonl -c conllu.

You'll find that it supports the conversion of CONLL documents, but it isn't immediately obvious which CONLL format is supported. You can try this by playing with the -c argument above.

