Advertisement
Advertisement

## Difference between map, applymap and apply methods in Pandas

### Question

Can you tell me when to use these vectorization methods with basic examples?

I see that `map` is a `Series` method whereas the rest are `DataFrame` methods. I got confused about `apply` and `applymap` methods though. Why do we have two methods for applying a function to a DataFrame? Again, simple examples which illustrate the usage would be great!

2019/01/20
1
485
1/20/2019 5:07:45 PM

### Accepted Answer

Straight from Wes McKinney's Python for Data Analysis book, pg. 132 (I highly recommended this book):

Another frequent operation is applying a function on 1D arrays to each column or row. DataFrame’s apply method does exactly this:

``````In : frame = DataFrame(np.random.randn(4, 3), columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon'])

In : frame
Out:
b         d         e
Utah   -0.029638  1.081563  1.280300
Ohio    0.647747  0.831136 -1.549481
Texas   0.513416 -0.884417  0.195343
Oregon -0.485454 -0.477388 -0.309548

In : f = lambda x: x.max() - x.min()

In : frame.apply(f)
Out:
b    1.133201
d    1.965980
e    2.829781
dtype: float64
``````

Many of the most common array statistics (like sum and mean) are DataFrame methods, so using apply is not necessary.

Element-wise Python functions can be used, too. Suppose you wanted to compute a formatted string from each floating point value in frame. You can do this with applymap:

``````In : format = lambda x: '%.2f' % x

In : frame.applymap(format)
Out:
b      d      e
Utah    -0.03   1.08   1.28
Ohio     0.65   0.83  -1.55
Texas    0.51  -0.88   0.20
Oregon  -0.49  -0.48  -0.31
``````

The reason for the name applymap is that Series has a map method for applying an element-wise function:

``````In : frame['e'].map(format)
Out:
Utah       1.28
Ohio      -1.55
Texas      0.20
Oregon    -0.31
Name: e, dtype: object
``````

Summing up, `apply` works on a row / column basis of a DataFrame, `applymap` works element-wise on a DataFrame, and `map` works element-wise on a Series.

2013/11/05
551
11/5/2013 8:40:33 PM

## Comparing `map`, `applymap` and `ap``ply`: Context Matters

First major difference: DEFINITION

• `map` is defined on Series ONLY
• `applymap` is defined on DataFrames ONLY
• `apply` is defined on BOTH

Second major difference: INPUT ARGUMENT

• `map` accepts `dict`s, `Series`, or callable
• `applymap` and `apply` accept callables only

Third major difference: BEHAVIOR

• `map` is elementwise for Series
• `applymap` is elementwise for DataFrames
• `apply` also works elementwise but is suited to more complex operations and aggregation. The behaviour and return value depends on the function.

Fourth major difference (the most important one): USE CASE

• `map` is meant for mapping values from one domain to another, so is optimised for performance (e.g., `df['A'].map({1:'a', 2:'b', 3:'c'})`)
• `applymap` is good for elementwise transformations across multiple rows/columns (e.g., `df[['A', 'B', 'C']].applymap(str.strip)`)
• `apply` is for applying any function that cannot be vectorised (e.g., `df['sentences'].apply(nltk.sent_tokenize)`)

## Summarising Footnotes

1. `map` when passed a dictionary/Series will map elements based on the keys in that dictionary/Series. Missing values will be recorded as NaN in the output.
2. `applymap` in more recent versions has been optimised for some operations. You will find `applymap` slightly faster than `apply` in some cases. My suggestion is to test them both and use whatever works better.

3. `map` is optimised for elementwise mappings and transformation. Operations that involve dictionaries or Series will enable pandas to use faster code paths for better performance.

4. `Series.apply` returns a scalar for aggregating operations, Series otherwise. Similarly for `DataFrame.apply`. Note that `apply` also has fastpaths when called with certain NumPy functions such as `mean`, `sum`, etc.
2019/06/20

## Quick Summary

• `DataFrame.apply` operates on entire rows or columns at a time.

• `DataFrame.applymap`, `Series.apply`, and `Series.map` operate on one element at time.

`Series.apply` and `Series.map` are similar and often interchangeable. Some of their slight differences are discussed in osa's answer below.

2020/07/29

Adding to the other answers, in a `Series` there are also map and apply.

Apply can make a DataFrame out of a series; however, map will just put a series in every cell of another series, which is probably not what you want.

``````In : p=pd.Series([1,2,3])
In : p
Out:
0    1
1    2
2    3
dtype: int64

In : p.apply(lambda x: pd.Series([x, x]))
Out:
0  1
0  1  1
1  2  2
2  3  3

In : p.map(lambda x: pd.Series([x, x]))
Out:
0    0    1
1    1
dtype: int64
1    0    2
1    2
dtype: int64
2    0    3
1    3
dtype: int64
dtype: object
``````

Also if I had a function with side effects, such as "connect to a web server", I'd probably use `apply` just for the sake of clarity.

``````series.apply(download_file_for_every_element)
``````

`Map` can use not only a function, but also a dictionary or another series. Let's say you want to manipulate permutations.

Take

``````1 2 3 4 5
2 1 4 5 3
``````

The square of this permutation is

``````1 2 3 4 5
1 2 5 3 4
``````

You can compute it using `map`. Not sure if self-application is documented, but it works in `0.15.1`.

``````In : p=pd.Series([1,0,3,4,2])

In : p.map(p)
Out:
0    0
1    1
2    4
3    2
4    3
dtype: int64
``````
2017/11/01

@jeremiahbuddha mentioned that apply works on row/columns, while applymap works element-wise. But it seems you can still use apply for element-wise computation....

``````    frame.apply(np.sqrt)
Out:
b         d         e
Utah         NaN  1.435159       NaN
Ohio    1.098164  0.510594  0.729748
Texas        NaN  0.456436  0.697337
Oregon  0.359079       NaN       NaN

frame.applymap(np.sqrt)
Out:
b         d         e
Utah         NaN  1.435159       NaN
Ohio    1.098164  0.510594  0.729748
Texas        NaN  0.456436  0.697337
Oregon  0.359079       NaN       NaN
``````
2013/12/19

Just wanted to point out, as I struggled with this for a bit

``````def f(x):
if x < 0:
x = 0
elif x > 100000:
x = 100000
return x

df.applymap(f)
df.describe()
``````

## this does not modify the dataframe itself, has to be reassigned

``````df = df.applymap(f)
df.describe()
``````
2015/09/26

Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Email: [email protected]