Strip white spaces from CSV file

前端 未结 7 1240
梦谈多话
梦谈多话 2020-12-15 17:02

I need to stripe the white spaces from a CSV file that I read

import csv

aList=[]
with open(self.filename, \'r\') as f:
    reader = csv.reader(f, delimite         


        
相关标签:
7条回答
  • 2020-12-15 17:12

    The most memory-efficient method to format the cells after parsing is through generators. Something like:

    with open(self.filename, 'r') as f:
        reader = csv.reader(f, delimiter=',', quoting=csv.QUOTE_NONE)
        for row in reader:
            yield (cell.strip() for cell in row)
    

    But it may be worth moving it to a function that you can use to keep munging and to avoid forthcoming iterations. For instance:

    nulls = {'NULL', 'null', 'None', ''}
    
    def clean(reader):
        def clean(row):
            for cell in row:
                cell = cell.strip()
                yield None if cell in nulls else cell
    
        for row in reader:
            yield clean(row)
    

    Or it can be used to factorize a class:

    def factory(reader):
        fields = next(reader)
    
        def clean(row):
            for cell in row:
                cell = cell.strip()
                yield None if cell in nulls else cell
    
        for row in reader:
            yield dict(zip(fields, clean(row)))
    
    0 讨论(0)
  • 2020-12-15 17:12

    Read a CSV (or Excel file) using Pandas and trim it using this custom function.

    #Definition for strippping whitespace
    def trim(dataset):
        trim = lambda x: x.strip() if type(x) is str else x
        return dataset.applymap(trim)
    

    You can now apply trim(CSV/Excel) to your code like so (as part of a loop, etc.)

    dataset = trim(pd.read_csv(dataset))
    dataset = trim(pd.read_excel(dataset))
    
    0 讨论(0)
  • 2020-12-15 17:23

    You can do:

    aList.append([element.strip() for element in row])
    
    0 讨论(0)
  • In my case, I only cared about stripping the whitespace from the field names (aka column headers, aka dictionary keys), when using csv.DictReader.

    Create a class based on csv.DictReader, and override the fieldnames property to strip out the whitespace from each field name (aka column header, aka dictionary key).

    Do this by getting the regular list of fieldnames, and then iterating over it while creating a new list with the whitespace stripped from each field name, and setting the underlying _fieldnames attribute to this new list.

    import csv
    
    class DictReaderStrip(csv.DictReader):
        @property                                    
        def fieldnames(self):
            if self._fieldnames is None:
                # Initialize self._fieldnames
                # Note: DictReader is an old-style class, so can't use super()
                csv.DictReader.fieldnames.fget(self)
                if self._fieldnames is not None:
                    self._fieldnames = [name.strip() for name in self._fieldnames]
            return self._fieldnames
    
    0 讨论(0)
  • 2020-12-15 17:26

    You can create a wrapper object around your file that strips away the spaces before the CSV reader sees them. This way, you can even use the csv file with cvs.DictReader.

    import re
    
    class CSVSpaceStripper:
      def __init__(self, filename):
        self.fh = open(filename, "r")
        self.surroundingWhiteSpace = re.compile("\s*;\s*")
        self.leadingOrTrailingWhiteSpace = re.compile("^\s*|\s*$")
    
      def close(self):
        self.fh.close()
        self.fh = None
    
      def __iter__(self):
        return self
    
      def next(self):
        line = self.fh.next()
        line = self.surroundingWhiteSpace.sub(";", line)
        line = self.leadingOrTrailingWhiteSpace.sub("", line)
        return line
    

    Then use it like this:

    o = csv.reader(CSVSpaceStripper(filename), delimiter=";")
    o = csv.DictReader(CSVSpaceStripper(filename), delimiter=";")
    

    I hardcoded ";" to be the delimiter. Generalising the code to any delimiter is left as an exercise to the reader.

    0 讨论(0)
  • 2020-12-15 17:28
    with open(self.filename, 'r') as f:
        reader = csv.reader(f, delimiter=',', quoting=csv.QUOTE_NONE)
        return [[x.strip() for x in row] for row in reader]
    
    0 讨论(0)
提交回复
热议问题