How does unicodecsv.DictReader represent a csv file

我只是一个虾纸丫 提交于 2019-12-13 07:41:35

问题


I'm currently going through the Udacity course on data analysis in python, and we've been using the unicodecsv library.

More specifically we've written the following code which reads a csv file and converts it into a list. Here is the code:

def read_csv(filename):
with open(filename,'rb')as f:
    reader = unicodecsv.DictReader(f)
    return list(reader) 

In order to get my head around this, I'm trying to figure out how the data is represented in the dictionary and the list, and I'm very confused. Can someone please explain it to me.

For example, one thing I don't understand is why the following throws an error

enrollment['cancel_date']

While the following works fine:

for enrollment in enrollments:
enrollments['cancel_date'] = parse_date(enrollment['cancel_date'])

Hopefully this question makes sense. I'm just having trouble visualizing how all of this is represented.

Any help would be appreciated. Thanks.


回答1:


I too landed up here for some troubles related to the course and found this unanswered. However I think you already managed it. Anyway answering here so that someone else might find this helpful.

Like we all know, dictionaries can be accessed like

dictionary_name['key']

and likewise enrollments['cancel_date'] should also work.

But if you do something like

print enrollments

you will see the structure

[{u'status': u'canceled', u'is_udacity': u'True', ...}, {}, ... {}]

If you notice the brackets, it's like a list of dictionaries. You may argue it is a list of list. Try it.

print enrollments[0][0]

You'll get an error! KeyError.

So, it's like a collection of dictionaries. How to access them? Zoom down to any dictionary (rather rows of the csv) by enrollments[n].

Now you have a dictionary. You can now use freely the key.

print enrollments[0]['cancel_date']

Now coming to your loop,

for enrollment in enrollments:
    enrollment['cancel_date'] = parse_date(enrollment['cancel_date'])

What this is doing is the enrollment is the dummy variable capturing each of the iterable element enrollments like enrollments[1], enrollments[2] ... enrollments[n].

So every-time enrollment is having a dictionary from enrollments and so enrollment['cancel_date'] works over enrollments['cancel_date'].

Lastly I want to add a little more thing which is why I came to the thread.

What is the meaning of "u" in u'..' ? Ex: u'cancel_date' = u'11-02-19'.

The answer is this means the string is encoded as an Unicode. It is not part of the string, it is python notation. Unicode is a library that contains the characters and symbol for all of the world's languages.

This mainly happens because the unicodecsv package does not take the headache of tracking and converting each item in the csv file. It reads them as Unicode to preserve all characters. Now that's why Caroline and you defined and used parse_date() and other functions to convert the Unicode strings to the desired datatype. This is all a part of the Data Wrangling process.



来源:https://stackoverflow.com/questions/43302791/how-does-unicodecsv-dictreader-represent-a-csv-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!