How do quickly search through a .csv file in Python

无人久伴 提交于 2019-12-01 05:04:31

问题


I'm reading a 6 million entry .csv file with Python, and I want to be able to search through this file for a particular entry.

Are there any tricks to search the entire file? Should you read the whole thing into a dictionary or should you perform a search every time? I tried loading it into a dictionary but that took ages so I'm currently searching through the whole file every time which seems wasteful.

Could I possibly utilize that the list is alphabetically ordered? (e.g. if the search word starts with "b" I only search from the line that includes the first word beginning with "b" to the line that includes the last word beginning with "b")

I'm using import csv.

(a side question: it is possible to make csv go to a specific line in the file? I want to make the program start at a random line)

Edit: I already have a copy of the list as an .sql file as well, how could I implement that into Python?


回答1:


If the csv file isn't changing, load in it into a database, where searching is fast and easy. If you're not familiar with SQL, you'll need to brush up on that though.

Here is a rough example of inserting from a csv into a sqlite table. Example csv is ';' delimited, and has 2 columns.

import csv
import sqlite3

con = sqlite3.Connection('newdb.sqlite')
cur = con.cursor()
cur.execute('CREATE TABLE "stuff" ("one" varchar(12), "two" varchar(12));')

f = open('stuff.csv')
csv_reader = csv.reader(f, delimiter=';')

cur.executemany('INSERT INTO stuff VALUES (?, ?)', csv_reader)
cur.close()
con.commit()
con.close()
f.close()



回答2:


you can use memory mapping for really big files

import mmap,os,re
reportFile = open( "big_file" )
length = os.fstat( reportFile.fileno() ).st_size
try:
    mapping = mmap.mmap( reportFile.fileno(), length, mmap.MAP_PRIVATE, mmap.PROT_READ )
except AttributeError:
    mapping = mmap.mmap( reportFile.fileno(), 0, None, mmap.ACCESS_READ )
data = mapping.read(length)
pat =re.compile("b.+",re.M|re.DOTALL) # compile your pattern here.
print pat.findall(data)



回答3:


Well, if your words aren't too big (meaning they'll fit in memory), then here is a simple way to do this (I'm assuming that they are all words).

from bisect import bisect_left

f = open('myfile.csv')

words = []
for line in f:
    words.extend(line.strip().split(','))

wordtofind = 'bacon'
ind = bisect_left(words,wordtofind)
if words[ind] == wordtofind:
    print '%s was found!' % wordtofind

It might take a minute to load in all of the values from the file. This uses binary search to find your words. In this case I was looking for bacon (who wouldn't look for bacon?). If there are repeated values you also might want to use bisect_right to find the the index of 1 beyond the rightmost element that equals the value you are searching for. You can still use this if you have key:value pairs. You'll just have to make each object in your words list be a list of [key, value].

Side Note

I don't think that you can really go from line to line in a csv file very easily. You see, these files are basically just long strings with \n characters that indicate new lines.




回答4:


You can't go directly to a specific line in the file because lines are variable-length, so the only way to know when line #n starts is to search for the first n newlines. And it's not enough to just look for '\n' characters because CSV allows newlines in table cells, so you really do have to parse the file anyway.




回答5:


my idea is to use python zodb module to store dictionaty type data and then create new csv file using that data structure. do all your operation at that time.




回答6:


There is a fairly simple way to do this.Depending on how many columns you want python to print then you may need to add or remove some of the print lines.

import csv
search=input('Enter string to search: ')
stock=open ('FileName.csv', 'wb')
reader=csv.reader(FileName)
for row in reader:
    for field in row:
        if field==code:
            print('Record found! \n')
            print(row[0])
            print(row[1])
            print(row[2])

I hope this managed to help.



来源:https://stackoverflow.com/questions/2299454/how-do-quickly-search-through-a-csv-file-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!