How best to store data for a chatbot?

淺唱寂寞╮ 提交于 2019-12-03 04:00:13

Data Storage Choices: It Depends

Simple, non-learning bots: XML is fine

It looks like you already have a basic XML structure worked out. For just starting out, I'd say that's fine, especially for AI support-chat kind of bots (if userMsg.contains('lega') then print('TOS & Copyright...').

Of course, switching to any new format will take time and overhead.

Learning, Complicated bots: database!

If you're looking to do something much larger, especially if you have CleverBot in mind, I think you're going to need a database. This is because when your file .. is a file and is gigantic and trying to keep it all available in memory is resource intensive. For this kind of project, I'd recommend a database.

Why? English is Complicated

A while back I wrote a nieve bayes spam sorter. It took about 10,000 pieces of spam to "train" it at a 7% accuracy rate, which took about 6 hours and 1.5GB of RAM to hold the data in memory. That's a lot of data. English is very hard and can't really be broken into if 'pony' then 'saddle', so for a bot to "learn" the best responses, your database is going to become massive and very quickly.

I think we can model this information as an ontology. You can encode much richer information, in terms of relations, attributes, levels etc. There are formats like RDF, OWL etc. which you can use and are supported by almost all languages.

And most importantly, managing data would be be easy if you use an ontology editor , i would recommend Protege (http://protege.stanford.edu/), take a look at it.

You are probably looking at a database. Any serious NLP system would be using one, unless you have a rule-based thing which operates on a small set of rules. Think about whether you would want to write a piece of C code that handles a 5 MB xml file. I would most definitely not. Stanford university host a nice demo if you are interested in the linguistic side of it.

You could also try something like a graphdb that Freebase uses to store relations between various entities. Basically, it is a graph of nodes and edges, and each node has attributes and values for those attributes. The edges also have attributes similar to nodes and an edge connecting two nodes defines a relationship between them.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!