How to parse .ttl files with RDFLib?

前端 未结 3 1074
一个人的身影
一个人的身影 2021-02-07 22:14

I have a file in .ttl form. It has 4 attributes/columns containing quadruples of the following form:

  1. (id, student_name, student_address, student
3条回答
  •  广开言路
    2021-02-07 22:29

    Turtle is a subset of Notation 3 syntax so rdflib should be able to parse it using format='n3'. Check whether rdflib preserves comments (ids are specified in the comments (#...) in your sample). If not and the input format is as simple as shown in your example then you could parse it manually:

    import re
    from collections import namedtuple
    from itertools import takewhile
    
    Entry = namedtuple('Entry', 'id name address phone')
    
    def get_entries(path):
        with open(path) as file:
            # an entry starts with `#@` line and ends with a blank line
            for line in file:
                if line.startswith('#@'):
                    buf = [line]
                    buf.extend(takewhile(str.strip, file)) # read until blank line
                    yield Entry(*re.findall(r'<([^>]+)>', ''.join(buf)))
    
    print("\n".join(map(str, get_entries('example.ttl'))))
    

    Output:

    Entry(id='id1', name='Alice', address='USA', phone='12345')
    Entry(id='id1', name='Jane', address='France', phone='78900')
    

    To save entries to a db:

    import sqlite3
    
    with sqlite3.connect('example.db') as conn:
        conn.execute('''CREATE TABLE IF NOT EXISTS entries
                 (id text, name text, address text, phone text)''')
        conn.executemany('INSERT INTO entries VALUES (?,?,?,?)',
                         get_entries('example.ttl'))
    

    To group by id if you need some postprocessing in Python:

    import sqlite3
    from itertools import groupby
    from operator import itemgetter
    
    with sqlite3.connect('example.db') as c:
        rows = c.execute('SELECT * FROM entries ORDER BY id LIMIT ?', (10,))
        for id, group in groupby(rows, key=itemgetter(0)):
            print("%s:\n\t%s" % (id, "\n\t".join(map(str, group))))
    

    Output:

    id1:
        ('id1', 'Alice', 'USA', '12345')
        ('id1', 'Jane', 'France', '78900')
    

提交回复
热议问题