How to remove string unicode from list

杀马特。学长 韩版系。学妹 提交于 2019-12-25 04:33:31

问题


I am trying to remove the string unicode "u'" marks in my string list. The list is a list of actors from this site http://www.boxofficemojo.com/yearly/chart/?yr=2013&p=.htm.

I have a method that gets these strings from this website:

def getActors(item_url):
    response = requests.get(item_url)
    soup = BeautifulSoup(response.content, "lxml")  # or   BeautifulSoup(response.content, "html5lib")
    tempActors = []
    try:
        tempActors.append(soup.find(text="Actors:").find_parent("tr").find_all(text=True)[1:])
    except AttributeError:
        tempActors.append("n/a")

    return tempActors

This method puts each movie's actors into a temporary list. I call this method later in a webcrawling method with

listOfActors.append(getActors(href))

to append all these temporary lists into a big list of all the movie's actors.

Later, I write this list into a csv file with

for item in listOfActors:
    wr.writerow((item))

Right now the output is like

[u'Jennifer Lawrence', u'Josh Hutcherson', u'Liam Hemsworth', u'Elizabeth Banks', u'Stanley Tucci', u'Woody Harrelson', u'Philip Seymour Hoffman', u'Jeffrey Wright', u'Jena Malone', u'Amanda Plummer', u'Sam Claflin', u'Donald Sutherland', u'Lenny Kravitz']
[u'Robert Downey, Jr.', u'Gwyneth Paltrow', u'Don Cheadle', u'Guy Pearce', u'Rebecca Hall', u'James Badge Dale', u'Jon Favreau', u'Ben Kingsley', u'Paul Bettany*', u' ', u'(Voice)', u'Mark Ruffalo*', u' ', u'(Cameo)']

I tried using str() method but I don't think it's working, either I'm not placing it in the right place or this isn't the right way to do it. The issue is that I'm not getting each individual actor in the list by itself, I'm kind of clumping each movie's actors together, so I don't know how to convert the entire list.


回答1:


Provide a small example that reproduces the problem and it is much easier to correct your mistakes. Lacking that, here's an example, with the UnicodeWriter straight from the codecs documentation. Just make sure your data is a list of lists of Unicode strings:

#!python2
#coding:utf8
import csv
import cStringIO
import codecs

data = [[u'Chinese',u'English'],
        [u'马克',u'Mark'],
        [u'你好',u'Hello']]

class UnicodeWriter:
    """
    A CSV writer which will write rows to CSV file "f",
    which is encoded in the given encoding.
    """

    def __init__(self, f, dialect=csv.excel, encoding="utf-8-sig", **kwds):
        # Redirect output to a queue
        self.queue = cStringIO.StringIO()
        self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
        self.stream = f
        self.encoder = codecs.getincrementalencoder(encoding)()

    def writerow(self, row):
        self.writer.writerow([s.encode("utf-8") for s in row])
        # Fetch UTF-8 output from the queue ...
        data = self.queue.getvalue()
        data = data.decode("utf-8")
        # ... and reencode it into the target encoding
        data = self.encoder.encode(data)
        # write to the target stream
        self.stream.write(data)
        # empty queue
        self.queue.truncate(0)

    def writerows(self, rows):
        for row in rows:
            self.writerow(row)

with open('out.csv','wb') as f:
    w = UnicodeWriter(f)
    w.writerows(data)


来源:https://stackoverflow.com/questions/31064344/how-to-remove-string-unicode-from-list

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!