问题
I'm importing some csv data into a Pandas DataFrame (in Python). One series is meant to be all numerical values. However, it also contains some spurious "$-" elements represented as strings. These have been left over from previous formatting. If I just import the series, Pandas reports it as a series of 'object'.
What's the best way to replace these "$-" strings with zeros? Or more generally, how can I replace all the strings in a series (which is predominantly numerical), with a numerical value, and convert the series to a floating point type?
- Steve
回答1:
Use Series.str.replace
and Series.astype
df = pd.Series(['2$-32$-4','123$-12','00123','44'])
df.str.replace(r'\$-','0').astype(float)
0 203204
1 123012
2 123
3 44
dtype: float64
回答2:
You can use the convert_objects method of the DataFrame
, with convert_numeric=True
to change the strings to NaNs
From the docs:
convert_numeric: If True, attempt to coerce to numbers (including strings), with unconvertible values becoming NaN.
In [17]: df
Out[17]:
a b c
0 1. 2. 4
1 sd 2. 4
2 1. fg 5
In [18]: df2 = df.convert_objects(convert_numeric=True)
In [19]: df2
Out[19]:
a b c
0 1 2 4
1 NaN 2 4
2 1 NaN 5
Finally, if you want to convert those NaNs
to 0
's, you can use df.replace
In [20]: df2.replace('NaN',0)
Out[20]:
a b c
0 1 2 4
1 0 2 4
2 1 0 5
回答3:
Use .to_numeric
to covert the strings to numeric (set strings to NaN
using the errors option 'coerce'):
df = pd.to_numeric(df, errors='coerce')
and then convert the NaN
value to zeros using replace:
df.replace('NaN',0)
来源:https://stackoverflow.com/questions/33440234/pandas-how-to-replace-string-with-zero-values-in-a-dataframe-series