可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I have a DataFrame using pandas and column labels that I need to edit to replace the original column labels.
I'd like to change the column names in a DataFrame A
where the original column names are:
['$a', '$b', '$c', '$d', '$e']
to
['a', 'b', 'c', 'd', 'e'].
I have the edited column names stored it in a list, but I don't know how to replace the column names.
回答1:
Just assign it to the .columns
attribute:
>>> df = pd.DataFrame({'$a':[1,2], '$b': [10,20]}) >>> df.columns = ['a', 'b'] >>> df a b 0 1 10 1 2 20
回答2:
Use the df.rename()
function and refer the columns to be renamed. Not all the columns have to be renamed:
df = df.rename(columns={'oldName1': 'newName1', 'oldName2': 'newName2'}) # Or rename the existing DataFrame (rather than creating a copy) df.rename(columns={'oldName1': 'newName1', 'oldName2': 'newName2'}, inplace=True)
回答3:
The rename
method can take a function, for example:
In [11]: df.columns Out[11]: Index([u'$a', u'$b', u'$c', u'$d', u'$e'], dtype=object) In [12]: df.rename(columns=lambda x: x[1:], inplace=True) In [13]: df.columns Out[13]: Index([u'a', u'b', u'c', u'd', u'e'], dtype=object)
回答4:
Since you only want to remove the $ sign in all column names, you could just do:
df = df.rename(columns=lambda x: x.replace('$', ''))
OR
df.rename(columns=lambda x: x.replace('$', ''), inplace=True)
回答5:
As documented in http://pandas.pydata.org/pandas-docs/stable/text.html:
df.columns = df.columns.str.replace('$','')
回答6:
df.columns = ['a', 'b', 'c', 'd', 'e']
It will replace the existing names with the names you provide, in the order you provide.
You can also assign them by index like this:
df.columns.values[2] = 'c' #renames the 2nd column to 'c'
回答7:
Pandas 0.21+ Answer
There have been some significant updates to column renaming in version 0.21.
- The
rename
method has added the axis
parameter which may be set to columns
or 1
. This update makes this method match the rest of the pandas API. It still has the index
and columns
parameters but you are no longer forced to use them. - The
set_axis
method with the inplace
set to False
enables you to rename all the index or column labels with a list.
Examples for Pandas 0.21+
Construct sample DataFrame:
df = pd.DataFrame({'$a':[1,2], '$b': [3,4], '$c':[5,6], '$d':[7,8], '$e':[9,10]}) $a $b $c $d $e 0 1 3 5 7 9 1 2 4 6 8 10
Using rename
with axis='columns'
or axis=1
df.rename({'$a':'a', '$b':'b', '$c':'c', '$d':'d', '$e':'e'}, axis='columns')
or
df.rename({'$a':'a', '$b':'b', '$c':'c', '$d':'d', '$e':'e'}, axis=1)
Both result in the following:
a b c d e 0 1 3 5 7 9 1 2 4 6 8 10
It is still possible to use the old method signature:
df.rename(columns={'$a':'a', '$b':'b', '$c':'c', '$d':'d', '$e':'e'})
The rename
function also accepts functions that will be applied to each column name.
df.rename(lambda x: x[1:], axis='columns')
or
df.rename(lambda x: x[1:], axis=1)
Using set_axis
with a list and inplace=False
You can supply a list to the set_axis
method that is equal in length to the number of columns (or index). Currently, inplace
defaults to True
, but inplace
will be defaulted to False
in future releases.
df.set_axis(['a', 'b', 'c', 'd', 'e'], axis='columns', inplace=False)
or
df.set_axis(['a', 'b', 'c', 'd', 'e'], axis=1, inplace=False)
Why not use df.columns = ['a', 'b', 'c', 'd', 'e']
?
There is nothing wrong with assigning columns directly like this. It is a perfectly good solution.
The advantage of using set_axis
is that it can be used as part of a method chain and that it returns a new copy of the DataFrame. Without it, you would have to store your intermediate steps of the chain to another variable before reassigning the columns.
# new for pandas 0.21+ df.some_method1() .some_method2() .set_axis() .some_method3() # old way df1 = df.some_method1() .some_method2() df1.columns = columns df1.some_method3()
回答8:
old_names = ['$a', '$b', '$c', '$d', '$e'] new_names = ['a', 'b', 'c', 'd', 'e'] df.rename(columns=dict(zip(old_names, new_names)), inplace=True)
This way you can manually edit the new_names
as you wish. Works great when you need to rename only a few columns to correct mispellings, accents, remove special characters etc.
回答9:
Column names vs Names of Series
I would like to explain a bit what happens behind the scenes.
Dataframes are a set of Series.
Series in turn are an extension of a numpy.array
numpy.array
s have a property .name
This is the name of the series. It is seldom that pandas respects this attribute, but it lingers in places and can be used to hack some pandas behaviors.
Naming the list of columns
A lot of answers here talks about the df.columns
attribute being a list
when in fact it is a Series
. This means it has a .name
attribute.
This is what happens if you decide to fill in the name of the columns Series
:
df.columns = ['column_one', 'column_two'] df.columns.names = ['name of the list of columns'] df.index.names = ['name of the index'] name of the list of columns column_one column_two name of the index 0 4 1 1 5 2 2 6 3
Note that the name of the index always comes one column lower.
Artifacts that linger
The .name
attribute lingers on sometimes. If you set df.columns = ['one', 'two']
then the df.one.name
will be 'one'
.
If you set df.one.name = 'three'
then df.columns
will still give you ['one', 'two']
, and df.one.name
will give you 'three'
BUT
pd.DataFrame(df.one)
will return
three 0 1 1 2 2 3
Because pandas reuses the .name
of the already defined Series
.
Multi level column names
Pandas has ways of doing multi layered column names. There is not so much magic involved but I wanted to cover this in my answer too since I don't see anyone picking up on this here.
|one | |one |two | 0 | 4 | 1 | 1 | 5 | 2 | 2 | 6 | 3 |
This is easily achievable by setting columns to lists, like this:
df.columns = [['one', 'one'], ['one', 'two']]
回答10:
One line or Pipeline solutions
I'll focus on two things:
OP clearly states
I have the edited column names stored it in a list, but I don't know how to replace the column names.
I do not want to solve the problem of how to replace '$'
or strip the first character off of each column header. OP has already done this step. Instead I want to focus on replacing the existing columns
object with a new one given a list of replacement column names.
df.columns = new
where new
is the list of new columns names is as simple as it gets. The drawback of this approach is that it requires editing the existing dataframe's columns
attribute and it isn't done inline. I'll show a few ways to perform this via pipelining without editing the existing dataframe.
Setup 1
To focus on the need to rename of replace column names with a pre-existing list, I'll create a new sample dataframe df
with initial column names and unrelated new column names.
df = pd.DataFrame({'Jack': [1, 2], 'Mahesh': [3, 4], 'Xin': [5, 6]}) new = ['x098', 'y765', 'z432'] df Jack Mahesh Xin 0 1 3 5 1 2 4 6
Solution 1
pd.DataFrame.rename
It has been said already that if you had a dictionary mapping the old column names to new column names, you could use pd.DataFrame.rename
.
d = {'Jack': 'x098', 'Mahesh': 'y765', 'Xin': 'z432'} df.rename(columns=d) x098 y765 z432 0 1 3 5 1 2 4 6
However, you can easily create that dictionary and include it in the call to rename
. The following takes advantage of the fact that when iterating over df
, we iterate over each column name.
# given just a list of new column names df.rename(columns=dict(zip(df, new))) x098 y765 z432 0 1 3 5 1 2 4 6
This works great if your original column names are unique. But if they are not, then this breaks down.
Setup 2
non-unique columns
df = pd.DataFrame( [[1, 3, 5], [2, 4, 6]], columns=['Mahesh', 'Mahesh', 'Xin'] ) new = ['x098', 'y765', 'z432'] df Mahesh Mahesh Xin 0 1 3 5 1 2 4 6
Solution 2
pd.concat
using the keys
argument
First, notice what happens when we attempt to use solution 1:
df.rename(columns=dict(zip(df, new))) y765 y765 z432 0 1 3 5 1 2 4 6
We didn't map the new
list as the column names. We ended up repeating y765
. Instead, we can use the keys
argument of the pd.concat
function while iterating through the columns of df
.
pd.concat([c for _, c in df.items()], axis=1, keys=new) x098 y765 z432 0 1 3 5 1 2 4 6
Solution 3
Reconstruct. This should only be used if you have a single dtype
for all columns. Otherwise, you'll end up with dtype
object
for all columns and converting them back requires more dictionary work.
Single dtype
pd.DataFrame(df.values, df.index, new) x098 y765 z432 0 1 3 5 1 2 4 6
Mixed dtype
pd.DataFrame(df.values, df.index, new).astype(dict(zip(new, df.dtypes))) x098 y765 z432 0 1 3 5 1 2 4 6
Solution 4
This is a gimmicky trick with transpose
and set_index
. pd.DataFrame.set_index
allows us to set an index inline but there is no corresponding set_columns
. So we can transpose, then set_index
, and transpose back. However, the same single dtype
versus mixed dtype
caveat from solution 3 applies here.
Single dtype
df.T.set_index(np.asarray(new)).T x098 y765 z432 0 1 3 5 1 2 4 6
Mixed dtype
df.T.set_index(np.asarray(new)).T.astype(dict(zip(new, df.dtypes))) x098 y765 z432 0 1 3 5 1 2 4 6
Solution 5
Use a lambda
in pd.DataFrame.rename
that cycles through each element of new
In this solution, we pass a lambda that takes x
but then ignores it. It also takes a y
but doesn't expect it. Instead, an iterator is given as a default value and I can then use that to cycle through one at a time without regard to what the value of x
is.
df.rename(columns=lambda x, y=iter(new): next(y)) x098 y765 z432 0 1 3 5 1 2 4 6
And as pointed out to me by the folks in sopython chat, if I add a *
in between x
and y
, I can protect my y
variable. Though, in this context I don't believe it needs protecting. It is still worth mentioning.
df.rename(columns=lambda x, *, y=iter(new): next(y)) x098 y765 z432 0 1 3 5 1 2 4 6
回答11:
If you've got the dataframe, df.columns dumps everything into a list you can manipulate and then reassign into your dataframe as the names of columns...
columns = df.columns columns = [row.replace("$","") for row in columns] df.rename(columns=dict(zip(columns, things)), inplace=True) df.head() #to validate the output
Best way? IDK. A way - yes.
A better way of evaluating all the main techniques put forward in the answers to the question is below using cProfile to gage memory & execution time. @kadee, @kaitlyn, & @eumiro had the functions with the fastest execution times - though these functions are so fast we're comparing the rounding of .000 and .001 seconds for all the answers. Moral: my answer above likely isn't the 'Best' way.
import pandas as pd import cProfile, pstats, re old_names = ['$a', '$b', '$c', '$d', '$e'] new_names = ['a', 'b', 'c', 'd', 'e'] col_dict = {'$a': 'a', '$b': 'b','$c':'c','$d':'d','$e':'e'} df = pd.DataFrame({'$a':[1,2], '$b': [10,20],'$c':['bleep','blorp'],'$d':[1,2],'$e':['texa$','']}) df.head() def eumiro(df,nn): df.columns = nn #This direct renaming approach is duplicated in methodology in several other answers: return df def lexual1(df): return df.rename(columns=col_dict) def lexual2(df,col_dict): return df.rename(columns=col_dict, inplace=True) def Panda_Master_Hayden(df): return df.rename(columns=lambda x: x[1:], inplace=True) def paulo1(df): return df.rename(columns=lambda x: x.replace('$', '')) def paulo2(df): return df.rename(columns=lambda x: x.replace('$', ''), inplace=True) def migloo(df,on,nn): return df.rename(columns=dict(zip(on, nn)), inplace=True) def kadee(df): return df.columns.str.replace('$','') def awo(df): columns = df.columns columns = [row.replace("$","") for row in columns] return df.rename(columns=dict(zip(columns, '')), inplace=True) def kaitlyn(df): df.columns = [col.strip('$') for col in df.columns] return df print 'eumiro' cProfile.run('eumiro(df,new_names)') print 'lexual1' cProfile.run('lexual1(df)') print 'lexual2' cProfile.run('lexual2(df,col_dict)') print 'andy hayden' cProfile.run('Panda_Master_Hayden(df)') print 'paulo1' cProfile.run('paulo1(df)') print 'paulo2' cProfile.run('paulo2(df)') print 'migloo' cProfile.run('migloo(df,old_names,new_names)') print 'kadee' cProfile.run('kadee(df)') print 'awo' cProfile.run('awo(df)') print 'kaitlyn' cProfile.run('kaitlyn(df)')
回答12:
df = pd.DataFrame({'$a': [1], '$b': [1], '$c': [1], '$d': [1], '$e': [1]})
If your new list of columns is in the same order as the existing columns, the assignment is simple:
new_cols = ['a', 'b', 'c', 'd', 'e'] df.columns = new_cols >>> df a b c d e 0 1 1 1 1 1
If you had a dictionary keyed on old column names to new column names, you could do the following:
d = {'$a': 'a', '$b': 'b', '$c': 'c', '$d': 'd', '$e': 'e'} df.columns = df.columns.map(lambda col: d[col]) # Or `.map(d.get)` as pointed out by @PiRSquared. >>> df a b c d e 0 1 1 1 1 1
If you don't have a list or dictionary mapping, you could strip the leading $
symbol via a list comprehension:
df.columns = [col[1:] if col[0] == '$' else col for col in df]
回答13:
DataFrame -- df.rename() will work.
df.rename(columns = {'Old Name':'New Name'})
df is the DataFrame you have, and the Old Name is the column name you want to change, then the New Name is the new name you change to. This DataFrame built-in method makes things very easier.
回答14:
Another way we could replace the original column labels is by stripping the unwanted characters (here '$') from the original column labels.
This could have been done by running a for loop over df.columns and appending the stripped columns to df.columns.
Instead , we can do this neatly in a single statement by using list comprehension like below:
df.columns = [col.strip('$') for col in df.columns]
(strip
method in Python strips the given character from beginning and end of the string.)
回答15:
Real simple just use
df.columns = ['Name1', 'Name2', 'Name3'...]
and it will assign the column names by the order you put them
回答16:
You could use str.slice
for that:
df.columns = df.columns.str.slice(1)
回答17:
I know this question and answer has been chewed to death. But I referred to it for inspiration for one of the problem I was having . I was able to solve it using bits and pieces from different answers hence providing my response in case anyone needs it.
My method is generic wherein you can add additional delimiters by comma separating delimiters=
variable and future-proof it.
Working Code:
import pandas as pd import re df = pd.DataFrame({'$a':[1,2], '$b': [3,4],'$c':[5,6], '$d': [7,8], '$e': [9,10]}) delimiters = '$' matchPattern = '|'.join(map(re.escape, delimiters)) df.columns = [re.split(matchPattern, i)[1] for i in df.columns ]
Output:
>>> df $a $b $c $d $e 0 1 3 5 7 9 1 2 4 6 8 10 >>> df a b c d e 0 1 3 5 7 9 1 2 4 6 8 10
回答18:
Note that these approach do not work for a MultiIndex. For a MultiIndex, you need to do something like the following:
>>><