I\'m trying to see if I can remove the trailing zeros from this phone number column.
Example:
0
1 8.00735e+09
2 4.35789e+09
3 6.10644e
import numpy as np
import pandas as pd
s = pd.Series([ None, np.nan, '',8.00735e+09, 4.35789e+09, 6.10644e+09])
s_new = s.fillna('').astype(str).str.replace(".0","",regex=False)
s_new
Here I filled null values with empty string, converted series to string type, replaced .0
with empty string.
This outputs:
0
1
2
3 8007350000
4 4357890000
5 6106440000
dtype: object
Here is a solution using pandas nullable integers (the solution assumes that input Series values are either empty strings or floating point numbers):
import pandas as pd, numpy as np
s = pd.Series(['', 8.00735e+09, 4.35789e+09, 6.10644e+09])
s.replace('', np.nan).astype('Int64')
Output (pandas-0.25.1):
0 NaN
1 8007350000
2 4357890000
3 6106440000
dtype: Int64
Advantages of the solution:
use astype(np.int64)
s = pd.Series(['', 8.00735e+09, 4.35789e+09, 6.10644e+09])
mask = pd.to_numeric(s).notnull()
s.loc[mask] = s.loc[mask].astype(np.int64)
s
0
1 8007350000
2 4357890000
3 6106440000
dtype: object
If somebody is still interesting: I had the problem that I round the df and get the trailing zeros. Here is what I did.
new_df = np.round(old_df,3).astype(str)
Then all trailing zeros were gone in the new_df.
In Pandas/NumPy, integers are not allowed to take NaN values, and arrays/series (including dataframe columns) are homogeneous in their datatype --- so having a column of integers where some entries are None
/np.nan
is downright impossible.
EDIT:data.phone.astype('object')
should do the trick; in this case, Pandas treats your column as a series of generic Python objects, rather than a specific datatype (e.g. str
/float
/int
), at the cost of performance if you intend to run any heavy computations with this data (probably not in your case).
Assuming you want to keep those NaN entries, your approach of converting to strings is a valid possibility:
data.phone.astype(str).str.split('.', expand = True)[0]
should give you what you're looking for (there are alternative string methods you can use, such as .replace
or .extract
, but .split
seems the most straightforward in this case).
Alternatively, if you are only interested in the display of floats (unlikely I'd suppose), you can do pd.set_option('display.float_format','{:.0f}'.format)
, which doesn't actually affect your data.
Just do
data['phone'] = data['phone'].astype(str)
data['phone'] = data['phone'].str.replace('.0', ' ')
which uses a regex style lookup on all entries in the column and replaces any '.0' matches with blank space. For example
data = pd.DataFrame(
data = [['bob','39384954.0'],['Lina','23827484.0']],
columns = ['user','phone'], index = [1,2]
)
data['phone'] = data['phone'].astype(str)
data['phone'] = data['phone'].str.replace('.0', ' ')
print data
user phone
1 bob 39384954
2 Lina 23827484