UnicodeDecodeError when performing os.walk

前端 未结 6 1979
故里飘歌
故里飘歌 2020-12-05 14:30

I am getting the error:

\'ascii\' codec can\'t decode byte 0x8b in position 14: ordinal not in range(128)

when trying to do os.walk. The er

6条回答
  •  情深已故
    2020-12-05 15:13

    After examination of the source of the error, something happens within the C-code routine listdir which returns non-unicode filenames when they are not standard ascii. The only fix therefore is to do a forced decode of the directory list within os.walk, which requires a replacement of os.walk. This replacement function works:

    def asciisafewalk(top, topdown=True, onerror=None, followlinks=False):
        """
        duplicate of os.walk, except we do a forced decode after listdir
        """
        islink, join, isdir = os.path.islink, os.path.join, os.path.isdir
    
        try:
            # Note that listdir and error are globals in this module due
            # to earlier import-*.
            names = os.listdir(top)
            # force non-ascii text out
            names = [name.decode('utf8','ignore') for name in names]
        except os.error, err:
            if onerror is not None:
                onerror(err)
            return
    
        dirs, nondirs = [], []
        for name in names:
            if isdir(join(top, name)):
                dirs.append(name)
            else:
                nondirs.append(name)
    
        if topdown:
            yield top, dirs, nondirs
        for name in dirs:
            new_path = join(top, name)
            if followlinks or not islink(new_path):
                for x in asciisafewalk(new_path, topdown, onerror, followlinks):
                    yield x
        if not topdown:
            yield top, dirs, nondirs
    

    By adding the line: names = [name.decode('utf8','ignore') for name in names] all the names are proper ascii & unicode, and everything works correctly.

    A big question remains however - how can this be solved without resorting to this hack?

提交回复
热议问题