Advertisement
Advertisement


Remap values in pandas column with a dict


Question

I have a dictionary which looks like this: di = {1: "A", 2: "B"}

I would like to apply it to the "col1" column of a dataframe similar to:

     col1   col2
0       w      a
1       1      2
2       2    NaN

to get:

     col1   col2
0       w      a
1       A      2
2       B    NaN

How can I best do this? For some reason googling terms relating to this only shows me links about how to make columns from dicts and vice-versa :-/

2013/12/01
1
343
12/1/2013 4:58:41 AM

Accepted Answer

You can use .replace. For example:

>>> df = pd.DataFrame({'col2': {0: 'a', 1: 2, 2: np.nan}, 'col1': {0: 'w', 1: 1, 2: 2}})
>>> di = {1: "A", 2: "B"}
>>> df
  col1 col2
0    w    a
1    1    2
2    2  NaN
>>> df.replace({"col1": di})
  col1 col2
0    w    a
1    A    2
2    B  NaN

or directly on the Series, i.e. df["col1"].replace(di, inplace=True).

2018/10/03
364
10/3/2018 8:31:34 AM


There is a bit of ambiguity in your question. There are at least three two interpretations:

  1. the keys in di refer to index values
  2. the keys in di refer to df['col1'] values
  3. the keys in di refer to index locations (not the OP's question, but thrown in for fun.)

Below is a solution for each case.


Case 1: If the keys of di are meant to refer to index values, then you could use the update method:

df['col1'].update(pd.Series(di))

For example,

import pandas as pd
import numpy as np

df = pd.DataFrame({'col1':['w', 10, 20],
                   'col2': ['a', 30, np.nan]},
                  index=[1,2,0])
#   col1 col2
# 1    w    a
# 2   10   30
# 0   20  NaN

di = {0: "A", 2: "B"}

# The value at the 0-index is mapped to 'A', the value at the 2-index is mapped to 'B'
df['col1'].update(pd.Series(di))
print(df)

yields

  col1 col2
1    w    a
2    B   30
0    A  NaN

I've modified the values from your original post so it is clearer what update is doing. Note how the keys in di are associated with index values. The order of the index values -- that is, the index locations -- does not matter.


Case 2: If the keys in di refer to df['col1'] values, then @DanAllan and @DSM show how to achieve this with replace:

import pandas as pd
import numpy as np

df = pd.DataFrame({'col1':['w', 10, 20],
                   'col2': ['a', 30, np.nan]},
                  index=[1,2,0])
print(df)
#   col1 col2
# 1    w    a
# 2   10   30
# 0   20  NaN

di = {10: "A", 20: "B"}

# The values 10 and 20 are replaced by 'A' and 'B'
df['col1'].replace(di, inplace=True)
print(df)

yields

  col1 col2
1    w    a
2    A   30
0    B  NaN

Note how in this case the keys in di were changed to match values in df['col1'].


Case 3: If the keys in di refer to index locations, then you could use

df['col1'].put(di.keys(), di.values())

since

df = pd.DataFrame({'col1':['w', 10, 20],
                   'col2': ['a', 30, np.nan]},
                  index=[1,2,0])
di = {0: "A", 2: "B"}

# The values at the 0 and 2 index locations are replaced by 'A' and 'B'
df['col1'].put(di.keys(), di.values())
print(df)

yields

  col1 col2
1    A    a
2   10   30
0    B  NaN

Here, the first and third rows were altered, because the keys in di are 0 and 2, which with Python's 0-based indexing refer to the first and third locations.

2013/11/27

DSM has the accepted answer, but the coding doesn't seem to work for everyone. Here is one that works with the current version of pandas (0.23.4 as of 8/2018):

import pandas as pd

df = pd.DataFrame({'col1': [1, 2, 2, 3, 1],
            'col2': ['negative', 'positive', 'neutral', 'neutral', 'positive']})

conversion_dict = {'negative': -1, 'neutral': 0, 'positive': 1}
df['converted_column'] = df['col2'].replace(conversion_dict)

print(df.head())

You'll see it looks like:

   col1      col2  converted_column
0     1  negative                -1
1     2  positive                 1
2     2   neutral                 0
3     3   neutral                 0
4     1  positive                 1

The docs for pandas.DataFrame.replace are here.

2018/09/02

Adding to this question if you ever have more than one columns to remap in a data dataframe:

def remap(data,dict_labels):
    """
    This function take in a dictionnary of labels : dict_labels 
    and replace the values (previously labelencode) into the string.

    ex: dict_labels = {{'col1':{1:'A',2:'B'}}

    """
    for field,values in dict_labels.items():
        print("I am remapping %s"%field)
        data.replace({field:values},inplace=True)
    print("DONE")

    return data

Hope it can be useful to someone.

Cheers

2017/12/06

Or do apply:

df['col1'].apply(lambda x: {1: "A", 2: "B"}.get(x,x))

Demo:

>>> df['col1']=df['col1'].apply(lambda x: {1: "A", 2: "B"}.get(x,x))
>>> df
  col1 col2
0    w    a
1    1    2
2    2  NaN
>>> 
2018/09/16

Given map is faster than replace (@JohnE's solution) you need to be careful with Non-Exhaustive mappings where you intend to map specific values to NaN. The proper method in this case requires that you mask the Series when you .fillna, else you undo the mapping to NaN.

import pandas as pd
import numpy as np

d = {'m': 'Male', 'f': 'Female', 'missing': np.NaN}
df = pd.DataFrame({'gender': ['m', 'f', 'missing', 'Male', 'U']})

keep_nan = [k for k,v in d.items() if pd.isnull(v)]
s = df['gender']

df['mapped'] = s.map(d).fillna(s.mask(s.isin(keep_nan)))

    gender  mapped
0        m    Male
1        f  Female
2  missing     NaN
3     Male    Male
4        U       U
2020/05/05