问题
I am trying to remove the string unicode "u'" marks in my string list. The list is a list of actors from this site http://www.boxofficemojo.com/yearly/chart/?yr=2013&p=.htm.
I have a method that gets these strings from this website:
def getActors(item_url):
response = requests.get(item_url)
soup = BeautifulSoup(response.content, "lxml") # or BeautifulSoup(response.content, "html5lib")
tempActors = []
try:
tempActors.append(soup.find(text="Actors:").find_parent("tr").find_all(text=True)[1:])
except AttributeError:
tempActors.append("n/a")
return tempActors
This method puts each movie's actors into a temporary list. I call this method later in a webcrawling method with
listOfActors.append(getActors(href))
to append all these temporary lists into a big list of all the movie's actors.
Later, I write this list into a csv file with
for item in listOfActors:
wr.writerow((item))
Right now the output is like
[u'Jennifer Lawrence', u'Josh Hutcherson', u'Liam Hemsworth', u'Elizabeth Banks', u'Stanley Tucci', u'Woody Harrelson', u'Philip Seymour Hoffman', u'Jeffrey Wright', u'Jena Malone', u'Amanda Plummer', u'Sam Claflin', u'Donald Sutherland', u'Lenny Kravitz']
[u'Robert Downey, Jr.', u'Gwyneth Paltrow', u'Don Cheadle', u'Guy Pearce', u'Rebecca Hall', u'James Badge Dale', u'Jon Favreau', u'Ben Kingsley', u'Paul Bettany*', u' ', u'(Voice)', u'Mark Ruffalo*', u' ', u'(Cameo)']
I tried using str() method but I don't think it's working, either I'm not placing it in the right place or this isn't the right way to do it.
The issue is that I'm not getting each individual actor in the list by itself, I'm kind of clumping each movie's actors together, so I don't know how to convert the entire list.
回答1:
Provide a small example that reproduces the problem and it is much easier to correct your mistakes. Lacking that, here's an example, with the UnicodeWriter straight from the codecs documentation. Just make sure your data is a list of lists of Unicode strings:
#!python2
#coding:utf8
import csv
import cStringIO
import codecs
data = [[u'Chinese',u'English'],
[u'马克',u'Mark'],
[u'你好',u'Hello']]
class UnicodeWriter:
"""
A CSV writer which will write rows to CSV file "f",
which is encoded in the given encoding.
"""
def __init__(self, f, dialect=csv.excel, encoding="utf-8-sig", **kwds):
# Redirect output to a queue
self.queue = cStringIO.StringIO()
self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
self.stream = f
self.encoder = codecs.getincrementalencoder(encoding)()
def writerow(self, row):
self.writer.writerow([s.encode("utf-8") for s in row])
# Fetch UTF-8 output from the queue ...
data = self.queue.getvalue()
data = data.decode("utf-8")
# ... and reencode it into the target encoding
data = self.encoder.encode(data)
# write to the target stream
self.stream.write(data)
# empty queue
self.queue.truncate(0)
def writerows(self, rows):
for row in rows:
self.writerow(row)
with open('out.csv','wb') as f:
w = UnicodeWriter(f)
w.writerows(data)
来源:https://stackoverflow.com/questions/31064344/how-to-remove-string-unicode-from-list