What do the BILOU tags mean in Named Entity Recognition?

前端未结

关注

 5  1705

Title pretty much sums up the question. I\'ve noticed that in some papers people have referred to a BILOU encoding scheme for NER as opposed to the typical BIO tagging sche

相关标签:

5条回答

情深已故

2020-12-24 11:21
BIO is the same as BILOU except for the following points:
1. In BILOU, the last I tag in a particular I "cluster" would be converted to L. Eg.
```
BIO - B-foo, I-foo, I-foo, O, O, O, B-bar, I-bar
BILOU - B-foo, I-foo, L-foo, O, O, O, B-bar, L-bar
```
1. In BILOU, any standalone tag is converted to a U tag. Eg.
```
BIO - B-foo, O, O, O, B-bar
BILOU - U-foo, O, O, O, U-bar
```
Following is a set of same tags represented in both BIO and BILOU notations:
```
BIO - B-foo, I-foo, I-foo, O, O, B-bar, I-bar, O, B-bar, O
BILOU - B-foo, I-foo, L-foo, O, O, B-bar, L-bar, O, U-bar, O
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
梦谈多话

2020-12-24 11:31

I would like to add some experience comparing BIO and BILOU schemes. My experiment was on one dataset only and may not be representative.

My dataset contains around 35 thousand short utterances (2-10 tokens) and are annotated using 11 different tags. In other words, there are 11 named entities.

The features used include the word, left and right 2-grams, 1-5 character ngrams (except middle ones), shape features and so on. Few entities are gazetteer backed as well.

I shuffled the dataset and split it into 80/20 parts: training and testing. This process was repeated 5 times and for each entity I recorded Precision, Recall and F1-measure. The performance was measured at entity level, not at token level as in Ratinov & Roth, 2009 paper.

The software I used to train a model is CRFSuite. I used L-BFGS solver with c1=0 and c2=1.

First of all, the test results compared for the 5 folds are very similar. This means there is little of variability from run to run, which is good. Second, BIO scheme performed very similarly as BILOU scheme. If there is any significant difference, perhaps it is at the third or fourth digit after period in Precision, Recall and F1-measures.

Conclusion: In my experiment BILOU scheme is not better (but also not worse) than the BIO scheme.

0 讨论(0)
发布评论:

提交评论
- 加载中...
北海茫月

2020-12-24 11:38
```
B = Beginning
I/M = Inside / Middle
L/E = Last / End
O = Outside
U/W = Unit-length / Whole
```
BILOU is the same with BMEWO.

There is also BMEWO+, which put more information about surrounding word class to Outside tokens (thus "O plus")

See details here https://lingpipe-blog.com/2009/10/14/coding-chunkers-as-taggers-io-bio-bmewo-and-bmewo/
0 讨论(0)
发布评论:

提交评论
- 加载中...
误落风尘

2020-12-24 11:39
Based on an issue and a patch in Clear TK, it seems like BILOU stands for "Beginning, Inside and Last tokens of multi-token chunks, Unit-length chunks and Outside" (emphasis added). For instance, the chunking denoted by brackets
```
(foo foo foo) (bar) no no no (bar bar)
```
can be encoded with BILOU as
```
B-foo, I-foo, L-foo, U-bar, O, O, O, B-bar, L-bar
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
悲哀的现实

2020-12-24 11:39
- B - 'begin'
- I - 'inside'
- L - 'last'
- O - 'outside/other'
- U - 'unigram'
0 讨论(0)
发布评论:

提交评论
- 加载中...