Grammatical inference of regular expressions for given finite list of representative strings?

前端未结

关注

 2  1438

一生所求 2020-12-03 01:30

I\'m working on analyzing a large public dataset with lots of verbose human-readable strings that were clearly generated by some regular (in the formal language theory sense

2条回答

萌比男神i (楼主)

2020-12-03 01:55

The only thing I can suggest is to play around with Nltk (Natural Language Toolkit for Python) a bit and see if it can at least recognize recurring patterns.

Another thing you may look into is MALLET (Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction etc.)

Perl has something called LinkParser but it seems to require you to provide a representation of the actual grammar (on the other hand, it comes with a large set of different models so maybe it could be shoehorned to help you sorting your samples).

Gate may allow you to create examples from a subset of records in your corpus and possibly reverse engineer the grammar from those.

Finally, have a look at the CRAN repository for text-specific packages.

0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...