Pandas (Python) reading and working on Java BigInteger/ large numbers

六眼飞鱼酱① 提交于 2019-12-10 17:53:18

问题


I have a data file (csv) with Nilsimsa hash values. Some of them would have as long as 80 characters. I wish to read them in Python for data analysis tasks. Is there a way to import the data in python without information loss?

EDIT: I have tried the implementations proposed in the comments but that does not work for me. Example data in csv file would be: 77241756221441762028881402092817125017724447303212139981668021711613168152184106


回答1:


Start with a simple text file to read in, just one variable and one row.

%more foo.txt
x
77241756221441762028881402092817125017724447303212139981668021711613168152184106

In [268]: df=pd.read_csv('foo.txt')

Pandas will read it in as a string because it's too big to store as a core number type like int64 or float64. But the info is there, you didn't lose anything.

In [269]: df.x
Out[269]: 
0    7724175622144176202888140209281712501772444730...
Name: x, dtype: object

In [270]: type(df.x[0])
Out[270]: str

And you can use plain python to treat it as a number. Recall the caveats from the links in the comments, this isn't going to be as fast as stuff in numpy and pandas where you have stored a whole column as int64. This is using the more flexible but slower object mode to handle things.

You can change a column to be stored as longs (long integers) like this. (But note that the dtype is still object because everything except the core numpy types (int32, int64, float64, etc.) are stored as objects.)

In [271]: df.x = df.x.map(int)

And then can more or less treat it like a number.

In [272]: df.x * 2
Out[272]: 
0    1544835124428835240577628041856342500354488946...
Name: x, dtype: object

You'll have to do some formatting to see the whole number. Or go the numpy route which will default to showing the whole number.

In [273]: df.x.values * 2
Out[273]: array([ 154483512442883524057762804185634250035448894606424279963336043423226336304368212L], dtype=object)



回答2:


As explained by @JohnE in his answer that we do not lose any information while reading big numbers using Pandas. They are stored as dtype=object, to make numerical computation on them we need to transform this data into numerical type.

For series:

We have to apply the map(func) to the series in the dataframe:

df['columnName'].map(int)

Whole dataframe:

If for some reason, our entire dataframe is composed of columns with dtype=object, we look at applymap(func)

from the documentation of Pandas:

DataFrame.applymap(func): Apply a function to a DataFrame that is intended to operate elementwise, i.e. like doing map(func, series) for each series in the DataFrame

so to transform all columns in dataframe:

 df.applymap(int)


来源:https://stackoverflow.com/questions/31373134/pandas-python-reading-and-working-on-java-biginteger-large-numbers

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!