Only first character of unicode strings getting written to csv

问题

The nutshell of my problem is that my script cannot write complete unicode strings (retrieved from a db) to a csv, instead only the first character of each string is written to the file. eg:

U,1423.0,831,1,139

Where the output should be:

University of Washington Students,1423.0,831,1,139

Some background: I'm connecting to an MSSQL database using pyodbc. I have my odbc config file set up for unicode, and connect to the db as follows:

p.connect("DSN=myserver;UID=username;PWD=password;DATABASE=mydb;CHARSET=utf-8")

I can get data no problem, but the issue arises when I try to save query results to the csv file. I've tried using csv.writer, the UnicodeWriter solution in the official docs, and most recently, the unicodecsv module I found on github. Each method yields the same results.

The weird thing is I can print the strings in the python console no problem. Yet, if I take that same string and write it to csv, the problem emerges. See my test code & results below:

Code to highlight issue:

print "'Raw' string from database:"
print "\tencoding:\t" + whatisthis(report.data[1][0])
print "\tprint string:\t" + report.data[1][0]
print "\tstring len:\t" + str(len(report.data[1][0]))

f = StringIO()
w = unicodecsv.writer(f, encoding='utf-8')
w.writerows(report.data)
f.seek(0)
r = unicodecsv.reader(f)
row = r.next()
row = r.next()

print "Write/Read from csv file:"
print "\tencoding:\t" + whatisthis(row[0])
print "\tprint string:\t" + row[0]
print "\tstring len:\t" + str(len(row[0]))

Output from test:

'Raw' string from database:
    encoding: unicode string
    print string: University of Washington Students
    string len: 66
Write/Read from csv file:
    encoding: unicode string
    print string: U
    string len: 1

What could be the reason for this issue and how might I resolve it? Thanks!

EDIT: the whatisthis function is just to check the string format, taken from this post

def whatisthis(s):
    if isinstance(s, str):
        print "ordinary string"
    elif isinstance(s, unicode):
        print "unicode string"
    else:
        print "not a string"

回答1:

import StringIO as sio
import unicodecsv as ucsv

class Report(object):
    def __init__(self, data):
        self.data = data

report = Report( 
  [
     ["University of Washington Students", 1, 2, 3],
     ["UCLA", 5, 6, 7]
  ]
)



print report.data
print report.data[0][0]

print "*" * 20

f = sio.StringIO()
writer = ucsv.writer(f, encoding='utf-8')
writer.writerows(report.data)

print f.getvalue()
print "-" * 20

f.seek(0)

reader = ucsv.reader(f)
row = reader.next()

print row
print row[0]



--output:--
[['University of Washington Students', 1, 2, 3], ['UCLA', 5, 6, 7]]
University of Washington Students
********************
University of Washington Students,1,2,3
UCLA,5,6,7

--------------------
[u'University of Washington Students', u'1', u'2', u'3']
University of Washington Students

Who knows what mischief your whatisthis() function is up to.

来源：https://stackoverflow.com/questions/17394092/only-first-character-of-unicode-strings-getting-written-to-csv

标签

python

csv

unicode

pyodbc