Pythonic/efficient way to strip whitespace from every Pandas Data frame cell that has a stringlike object in it

后端未结

关注

 8  1499

I\'m reading a CSV file into a DataFrame. I need to strip whitespace from all the stringlike cells, leaving the other cells unchanged in Python 2.7.

Here is what I\

相关标签:

8条回答

情话喂你

2020-12-04 12:31
When you call pandas.read_csv, you can use a regular expression that matches zero or more spaces followed by a comma followed by zero or more spaces as the delimiter.

For example, here's "data.csv":
```
In [19]: !cat data.csv
1.5, aaa,  bbb ,  ffffd     , 10 ,  XXX   
2.5, eee, fff  ,       ggg, 20 ,     YYY
```
(The first line ends with three spaces after XXX, while the second line ends at the last Y.)

The following uses pandas.read_csv() to read the files, with the regular expression ' *, *' as the delimiter. (Using a regular expression as the delimiter is only available in the "python" engine of read_csv().)
```
In [20]: import pandas as pd

In [21]: df = pd.read_csv('data.csv', header=None, delimiter=' *, *', engine='python')

In [22]: df
Out[22]: 
     0    1    2    3   4    5
0  1.5  aaa  bbb  ffffd  10  XXX
1  2.5  eee  fff  ggg  20  YYY
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
梦毁少年i

2020-12-04 12:31
Here is a column-wise solution with pandas apply:
```
import numpy as np

def strip_obj(col):
    if col.dtypes == object:
        return (col.astype(str)
                   .str.strip()
                   .replace({'nan': np.nan}))
    return col

df = df.apply(strip_obj, axis=0)
```
This will convert values in object type columns to string. Should take caution with mixed-type columns. For example if your column is zip codes with 20001 and ' 21110 ' you will end up with '20001' and '21110'.
0 讨论(0)
发布评论:

提交评论
- 加载中...
温柔的废话

2020-12-04 12:32
The "data['values'].str.strip()" answer above did not work for me, but I found a simple work around. I am sure there is a better way to do this. The str.strip() function works on Series. Thus, I converted the dataframe column into a Series, stripped the whitespace, replaced the converted column back into the dataframe. Below is the example code.
```
import pandas as pd
data = pd.DataFrame({'values': ['   ABC   ', '   DEF', '  GHI  ']})
print ('-----')
print (data)

data['values'].str.strip()
print ('-----')
print (data)

new = pd.Series([])
new = data['values'].str.strip()
data['values'] = new
print ('-----')
print (new)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

Happy的楠姐

2020-12-04 12:36

You could use pandas' Series.str.strip() method to do this quickly for each string-like column:

>>> data = pd.DataFrame({'values': ['   ABC   ', '   DEF', '  GHI  ']})
>>> data
      values
0     ABC   
1        DEF
2      GHI  

>>> data['values'].str.strip()
0    ABC
1    DEF
2    GHI
Name: values, dtype: object

0 讨论(0)

暗喜

2020-12-04 12:39

I found the following code useful and something that would likely help others. This snippet will allow you to delete spaces in a column as well as in the entire DataFrame, depending on your use case.

import pandas as pd

def remove_whitespace(x):
    try:
        # remove spaces inside and outside of string
        x = "".join(x.split())

    except:
        pass
    return x

# Apply remove_whitespace to column only
df.orderId = df.orderId.apply(remove_whitespace)
print(df)


# Apply to remove_whitespace to entire Dataframe
df = df.applymap(remove_whitespace)
print(df)

0 讨论(0)

难免孤独

2020-12-04 12:50
We want to:
1. Apply our function to each element in our dataframe - use applymap.
2. Use type(x)==str (versus x.dtype == 'object') because Pandas will label columns as object for columns of mixed datatypes (an object column may contain int and/or str).
3. Maintain the datatype of each element (we don't want to convert everything to a str and then strip whitespace).
Therefore, I've found the following to be the easiest:

df.applymap(lambda x: x.strip() if type(x)==str else x)
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页