问题
I'm currently going through the Udacity course on data analysis in python, and we've been using the unicodecsv library.
More specifically we've written the following code which reads a csv file and converts it into a list. Here is the code:
def read_csv(filename):
with open(filename,'rb')as f:
reader = unicodecsv.DictReader(f)
return list(reader)
In order to get my head around this, I'm trying to figure out how the data is represented in the dictionary and the list, and I'm very confused. Can someone please explain it to me.
For example, one thing I don't understand is why the following throws an error
enrollment['cancel_date']
While the following works fine:
for enrollment in enrollments:
enrollments['cancel_date'] = parse_date(enrollment['cancel_date'])
Hopefully this question makes sense. I'm just having trouble visualizing how all of this is represented.
Any help would be appreciated. Thanks.
回答1:
I too landed up here for some troubles related to the course and found this unanswered. However I think you already managed it. Anyway answering here so that someone else might find this helpful.
Like we all know, dictionaries can be accessed like
dictionary_name['key']
and likewise
enrollments['cancel_date']
should also work.
But if you do something like
print enrollments
you will see the structure
[{u'status': u'canceled', u'is_udacity': u'True', ...}, {}, ... {}]
If you notice the brackets, it's like a list of dictionaries
. You may argue it is a list of list
. Try it.
print enrollments[0][0]
You'll get an error! KeyError
.
So, it's like a collection of dictionaries. How to access them? Zoom down to any dictionary (rather rows of the csv) by enrollments[n]
.
Now you have a dictionary. You can now use freely the key
.
print enrollments[0]['cancel_date']
Now coming to your loop,
for enrollment in enrollments:
enrollment['cancel_date'] = parse_date(enrollment['cancel_date'])
What this is doing is the enrollment
is the dummy variable capturing each of the iterable element enrollments
like enrollments[1], enrollments[2] ... enrollments[n]
.
So every-time enrollment
is having a dictionary from enrollments
and so enrollment['cancel_date']
works over enrollments['cancel_date']
.
Lastly I want to add a little more thing which is why I came to the thread.
What is the meaning of "u" in u'..' ? Ex: u'cancel_date' = u'11-02-19'.
The answer is this means the string is encoded as an Unicode
. It is not part of the string, it is python notation. Unicode is a library that contains the characters and symbol for all of the world's languages.
This mainly happens because the unicodecsv
package does not take the headache of tracking and converting each item in the csv file. It reads them as Unicode to preserve all characters. Now that's why Caroline and you defined and used parse_date()
and other functions to convert the Unicode strings to the desired datatype. This is all a part of the Data Wrangling process.
来源:https://stackoverflow.com/questions/43302791/how-does-unicodecsv-dictreader-represent-a-csv-file