Renaming columns in pandas
Renaming columns in pandas
Question
I have a DataFrame using pandas and column labels that I need to edit to replace the original column labels.
I'd like to change the column names in a DataFrame A
where the original column names are:
['$a', '$b', '$c', '$d', '$e']
to
['a', 'b', 'c', 'd', 'e'].
I have the edited column names stored it in a list, but I don't know how to replace the column names.
Accepted Answer
Just assign it to the .columns
attribute:
>>> df = pd.DataFrame({'$a':[1,2], '$b': [10,20]})
>>> df.columns = ['a', 'b']
>>> df
a b
0 1 10
1 2 20
Popular Answer
RENAME SPECIFIC COLUMNS
Use the df.rename()
function and refer the columns to be renamed. Not all the columns have to be renamed:
df = df.rename(columns={'oldName1': 'newName1', 'oldName2': 'newName2'})
# Or rename the existing DataFrame (rather than creating a copy)
df.rename(columns={'oldName1': 'newName1', 'oldName2': 'newName2'}, inplace=True)
Minimal Code Example
df = pd.DataFrame('x', index=range(3), columns=list('abcde'))
df
a b c d e
0 x x x x x
1 x x x x x
2 x x x x x
The following methods all work and produce the same output:
df2 = df.rename({'a': 'X', 'b': 'Y'}, axis=1) # new method
df2 = df.rename({'a': 'X', 'b': 'Y'}, axis='columns')
df2 = df.rename(columns={'a': 'X', 'b': 'Y'}) # old method
df2
X Y c d e
0 x x x x x
1 x x x x x
2 x x x x x
Remember to assign the result back, as the modification is not-inplace. Alternatively, specify inplace=True
:
df.rename({'a': 'X', 'b': 'Y'}, axis=1, inplace=True)
df
X Y c d e
0 x x x x x
1 x x x x x
2 x x x x x
From v0.25, you can also specify errors='raise'
to raise errors if an invalid column-to-rename is specified. See v0.25 rename()
docs.
REASSIGN COLUMN HEADERS
Use df.set_axis()
with axis=1
and inplace=False
(to return a copy).
df2 = df.set_axis(['V', 'W', 'X', 'Y', 'Z'], axis=1, inplace=False)
df2
V W X Y Z
0 x x x x x
1 x x x x x
2 x x x x x
This returns a copy, but you can modify the DataFrame in-place by setting inplace=True
(this is the default behaviour for versions <=0.24 but is likely to change in the future).
You can also assign headers directly:
df.columns = ['V', 'W', 'X', 'Y', 'Z']
df
V W X Y Z
0 x x x x x
1 x x x x x
2 x x x x x
Read more... Read less...
The rename
method can take a function, for example:
In [11]: df.columns
Out[11]: Index([u'$a', u'$b', u'$c', u'$d', u'$e'], dtype=object)
In [12]: df.rename(columns=lambda x: x[1:], inplace=True)
In [13]: df.columns
Out[13]: Index([u'a', u'b', u'c', u'd', u'e'], dtype=object)
Pandas 0.21+ Answer
There have been some significant updates to column renaming in version 0.21.
- The
rename
method has added theaxis
parameter which may be set tocolumns
or1
. This update makes this method match the rest of the pandas API. It still has theindex
andcolumns
parameters but you are no longer forced to use them. - The
set_axis
method with theinplace
set toFalse
enables you to rename all the index or column labels with a list.
Examples for Pandas 0.21+
Construct sample DataFrame:
df = pd.DataFrame({'$a':[1,2], '$b': [3,4],
'$c':[5,6], '$d':[7,8],
'$e':[9,10]})
$a $b $c $d $e
0 1 3 5 7 9
1 2 4 6 8 10
Using rename
with axis='columns'
or axis=1
df.rename({'$a':'a', '$b':'b', '$c':'c', '$d':'d', '$e':'e'}, axis='columns')
or
df.rename({'$a':'a', '$b':'b', '$c':'c', '$d':'d', '$e':'e'}, axis=1)
Both result in the following:
a b c d e
0 1 3 5 7 9
1 2 4 6 8 10
It is still possible to use the old method signature:
df.rename(columns={'$a':'a', '$b':'b', '$c':'c', '$d':'d', '$e':'e'})
The rename
function also accepts functions that will be applied to each column name.
df.rename(lambda x: x[1:], axis='columns')
or
df.rename(lambda x: x[1:], axis=1)
Using set_axis
with a list and inplace=False
You can supply a list to the set_axis
method that is equal in length to the number of columns (or index). Currently, inplace
defaults to True
, but inplace
will be defaulted to False
in future releases.
df.set_axis(['a', 'b', 'c', 'd', 'e'], axis='columns', inplace=False)
or
df.set_axis(['a', 'b', 'c', 'd', 'e'], axis=1, inplace=False)
Why not use df.columns = ['a', 'b', 'c', 'd', 'e']
?
There is nothing wrong with assigning columns directly like this. It is a perfectly good solution.
The advantage of using set_axis
is that it can be used as part of a method chain and that it returns a new copy of the DataFrame. Without it, you would have to store your intermediate steps of the chain to another variable before reassigning the columns.
# new for pandas 0.21+
df.some_method1()
.some_method2()
.set_axis()
.some_method3()
# old way
df1 = df.some_method1()
.some_method2()
df1.columns = columns
df1.some_method3()
Since you only want to remove the $ sign in all column names, you could just do:
df = df.rename(columns=lambda x: x.replace('$', ''))
OR
df.rename(columns=lambda x: x.replace('$', ''), inplace=True)
df.columns = ['a', 'b', 'c', 'd', 'e']
It will replace the existing names with the names you provide, in the order you provide.