Convert BibTex file to database entries using Python

夙愿已清 提交于 2021-01-20 15:59:10

问题


Given a bibTex file, I need to add the respective fields(author, title, journal etc.) to a table in a MySQL database (with a custom schema).

After doing some initial research, I found that there exists Bibutils which I could use to convert a bib file to xml. My initial idea was to convert it to XML and then parse the XML in python to populate a dictionary.

My main questions are:

  1. Is there a better way I could do this conversion?
  2. Is there a library which directly parses a bibTex and gives me the fields in python?

(I did find bibliography.parsing, which uses bibutils internally but there is not much documentation on it and am finding it tough to get it to work).


回答1:


Old question, but I am doing the same thing at the moment using the Pybtex library, which has an inbuilt parser:

from pybtex.database.input import bibtex

#open a bibtex file
parser = bibtex.Parser()
bibdata = parser.parse_file("myrefs.bib")

#loop through the individual references
for bib_id in bibdata.entries:
    b = bibdata.entries[bib_id].fields
    try:
        # change these lines to create a SQL insert
        print b["title"]
        print b["journal"]
        print b["year"]
        #deal with multiple authors
        for author in bibdata.entries[bib_id].persons["author"]:
            print author.first(), author.last()
    # field may not exist for a reference
    except(KeyError):
        continue



回答2:


Converting to XML is a fine idea.

XML exists as an application-independent data format, so that you can parse it with readily-available libraries; using it as an intermediary has no particular drawbacks. In fact, you can usually import XML into a database without even going through a programming language such as Python (although the amount of Python you'd have to write for a task like this is trivial).

So far as I know, there is no direct, mature bibTeX reader for Python.




回答3:


You can also use Python BibtexParser: https://github.com/sciunto/python-bibtexparser

Documentation: https://bibtexparser.readthedocs.org

It's very straight forward (I use it in production).

For the record, I am not the developer of this library.




回答4:


You could use the Perl package Bib2ML (aka. Bib2HTML). It contains a bib2sql tool that generates a SQL database from a BibTeX database, with the following schema:

An alternative tool: bibsql and bibtosql.

Then you can feed it to your schema by writing some SQL conversion queries.




回答5:


My workaround is to use bibtexparser to export relevant fields to a csv file;

import bibtexparser
import pandas as pd

with open("../../bib/small.bib") as bibtex_file:
    bib_database = bibtexparser.load(bibtex_file)
    
df = pd.DataFrame(bib_database.entries)
selection = df[['doi', 'number']]
selection.to_csv('temp.csv', index=False)

And then write the csv to a table in the database, and delete the temp.csv.

This avoids some complication with pybtex I found.



来源:https://stackoverflow.com/questions/9235853/convert-bibtex-file-to-database-entries-using-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!