AI/Deep Learning approach to judge reference in paper?

删除回忆录丶 提交于 2019-12-11 05:08:08

问题


This is a real part of my work. In order to keep students' paper follow given format standard, I have to judge the reference type in students' papers, separate its items(author, title, magazine name, year, etc), then give modification advice if some item is missing. It is tedious, so after some years' of work-by-people, I get too tired. I think to do it by programming.

In a paper, many kinds of references will be citation, for example, Journal paper, Dissertation paper, book and so on. They have different format. On the other hand, if I submit a paper to different journal, maybe I have to met different format

I am looking for algorithms ( you can read my previous try which use regex. but this obviously will fail when more diverse formats are used. python re can't find this grouped name ) which can

  1. judge a paper whether is journal, dissertation, book ...
  2. separate author, paper title, book name, publisher, year, and so on. Please note, there may be punctuation in author, paper title, book name
  3. if there are many authors, then tell each name. Because often we only need at most 3 authors, if more are found, we should use "et al"
  4. if some information is missing, then give hint for completeness

the following is just an example for kinds of format for journal paper. We can find it is hard to understand them by simple string match.

[example 1] Duan,C., X.Meng, C.Tu. How to make local image features more efficient and distinctive[J].IET Computer Vision,2008,2(3):178-189.

we can find that there are 3 authors("Duan,C.", "X.Meng", "C.Tu") whose names are separated by comma, however comma are also used in one person name("Duan,C."). So it is actually hard to use regex to judge peoples' name

[example 2] Harris,C. & M.Stephens. A combined corner and edge detector[J]. Alvey Vision Conference,1988,5(7):147-151.

& is used to separate two names, however we can find maybe someone else write it as Harris,C., M.Stephens

If we use MLA format( Autocite a Journal in MLA Format)

[example 3] Fearon, James D., and David D. Laitin. "Ethnicity, Insurgency, and Civil War." American Political Science Review 97.01 (2003): 75. Print.

We can find this one does not use [J] but since if go after the pattern Last, First M., and First M. Last. "Article Title." Journal Title Series Volume. Issue (Year Published): Page(s). Print., we can say it is a journal paper. We can "translate" it in other format:

Fearon, James D., David D. Laitin. Ethnicity, Insurgency, and Civil War[J]. American Political Science Review. 2003 97(01): 75

where 75 means this paper has only one page, i.e. 75.

as for IEEE format( IEEE Style: Journal Articles), We find vol, pp and so on which are not necessary for above formats

[example 4, IEEE format]G. Liu, K. Y. Lee, and H. F. Jordan, "TDM and TWDM de Bruijn networks and shufflenets for optical communications," IEEE Trans. Comp., vol. 46, pp. 695-701, June 1997.

If we read

[example 4, missing pages] G. Liu, K. Y. Lee, and H. F. Jordan, "TDM and TWDM de Bruijn networks and shufflenets for optical communications," IEEE Trans. Comp., vol. 46, June 1997.

we should tell the user to check and complete by adding pages

来源:https://stackoverflow.com/questions/51093751/ai-deep-learning-approach-to-judge-reference-in-paper

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!