start index at 1 for Pandas DataFrame

我的未来我决定 提交于 2019-12-31 08:23:32

问题


I need the index to start at 1 rather than 0 when writing a Pandas DataFrame to CSV.

Here's an example:

In [1]: import pandas as pd

In [2]: result = pd.DataFrame({'Count': [83, 19, 20]})

In [3]: result.to_csv('result.csv', index_label='Event_id')                               

Which produces the following output:

In [4]: !cat result.csv
Event_id,Count
0,83
1,19
2,20

But my desired output is this:

In [5]: !cat result2.csv
Event_id,Count
1,83
2,19
3,20

I realize that this could be done by adding a sequence of integers shifted by 1 as a column to my data frame, but I'm new to Pandas and I'm wondering if a cleaner way exists.


回答1:


Index is an object, and default index starts from 0:

>>> result.index
Int64Index([0, 1, 2], dtype=int64)

You can shift this index by 1 with

>>> result.index += 1 
>>> result.index
Int64Index([1, 2, 3], dtype=int64)



回答2:


Just set the index before writing to CSV.

df.index = np.arange(1, len(df))

And then write it normally.




回答3:


This worked for me

 df.index = np.arange(1, len(df)+1)



回答4:


source: In Python pandas, start row index from 1 instead of zero without creating additional column

Working example:

import pandas as pdas
dframe = pdas.read_csv(open(input_file))
dframe.index = dframe.index + 1



回答5:


Another way in one line:

df.shift()[1:]



回答6:


You can use this one:

import pandas as pd

result = pd.DataFrame({'Count': [83, 19, 20]})
result.index += 1
print(result)

or this one, by getting the help of numpy library like this:

import pandas as pd
import numpy as np

result = pd.DataFrame({'Count': [83, 19, 20]})
result.index = np.arange(1, len(result)+1)
print(result)

np.arange will create a numpy array and return values within a given interval which is (1, len(result)+1) and finally you will assign that array to result.index.




回答7:


Fork from the original answer, giving some cents:

  • if I'm not mistaken, starting from version 0.23, index object is RangeIndex type

From the official doc:

RangeIndex is a memory-saving special case of Int64Index limited to representing monotonic ranges. Using RangeIndex may in some instances improve computing speed.

In case of a huge index range, that makes sense, using the representation of the index, instead of defining the whole index at once (saving memory).

Therefore, an example (using Series, but it applies to DataFrame also):

>>> import pandas as pd
>>> 
>>> countries = ['China', 'India', 'USA']
>>> ds = pd.Series(countries)
>>> 
>>>
>>> type(ds.index)
<class 'pandas.core.indexes.range.RangeIndex'>
>>> ds.index
RangeIndex(start=0, stop=3, step=1)
>>> 
>>> ds.index += 1
>>> 
>>> ds.index
RangeIndex(start=1, stop=4, step=1)
>>> 
>>> ds
1    China
2    India
3      USA
dtype: object
>>> 

As you can see, the increment of the index object, changes the start and stop parameters.



来源:https://stackoverflow.com/questions/20167930/start-index-at-1-for-pandas-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!