I have a number of very large text files which I need to process, the largest being about 60GB.
Each line has 54 characters in seven fields and I want to remove the
Those seem like very large files... Why are they so large? What processing are you doing per line? Why not use a database with some map reduce calls (if appropriate) or simple operations of the data? The point of a database is to abstract the handling and management large amounts of data that can't all fit in memory.
You can start to play with the idea with sqlite3 which just uses flat files as databases. If you find the idea useful then upgrade to something a little more robust and versatile like postgresql.
Create a database
conn = sqlite3.connect('pts.db')
c = conn.cursor()
Creates a table
c.execute('''CREATE TABLE ptsdata (filename, line, x, y, z''')
Then use one of the algorithms above to insert all the lines and points in the database by calling
c.execute("INSERT INTO ptsdata VALUES (filename, lineNumber, x, y, z)")
Now how you use it depends on what you want to do. For example to work with all the points in a file by doing a query
c.execute("SELECT lineNumber, x, y, z FROM ptsdata WHERE filename=file.txt ORDER BY lineNumber ASC")
And get n
lines at a time from this query with
c.fetchmany(size=n)
I'm sure there is a better wrapper for the sql statements somewhere, but you get the idea.