Interesting NLP/machine-learning style project — analyzing privacy policies

后端 未结 3 1531
别那么骄傲
别那么骄傲 2021-01-05 14:55

I wanted some input on an interesting problem I\'ve been assigned. The task is to analyze hundreds, and eventually thousands, of privacy policies and identify core characte

3条回答
  •  南笙
    南笙 (楼主)
    2021-01-05 15:36

    A very interesting problem indeed!

    On a higher level, what you want is summarization- a document has to be reduced to a few key phrases. This is far from being a solved problem. A simple approach would be to search for keywords as opposed to key phrases. You can try something like LDA for topic modelling to find what each document is about. You can then search for topics which are present in all documents- I suspect what will come up is stuff to do with licenses, location, copyright, etc. MALLET has an easy-to-use implementation of LDA.

提交回复
热议问题