# Difference between map, applymap and apply methods in Pandas

## Difference between map, applymap and apply methods in Pandas

### Question

Can you tell me when to use these vectorization methods with basic examples?

I see that `map`

is a `Series`

method whereas the rest are `DataFrame`

methods. I got confused about `apply`

and `applymap`

methods though. Why do we have two methods for applying a function to a DataFrame? Again, simple examples which illustrate the usage would be great!

### Accepted Answer

Straight from Wes McKinney's Python for Data Analysis book, pg. 132 (I highly recommended this book):

Another frequent operation is applying a function on 1D arrays to each column or row. DataFrame’s apply method does exactly this:

```
In [116]: frame = DataFrame(np.random.randn(4, 3), columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon'])
In [117]: frame
Out[117]:
b d e
Utah -0.029638 1.081563 1.280300
Ohio 0.647747 0.831136 -1.549481
Texas 0.513416 -0.884417 0.195343
Oregon -0.485454 -0.477388 -0.309548
In [118]: f = lambda x: x.max() - x.min()
In [119]: frame.apply(f)
Out[119]:
b 1.133201
d 1.965980
e 2.829781
dtype: float64
```

Many of the most common array statistics (like sum and mean) are DataFrame methods, so using apply is not necessary.

Element-wise Python functions can be used, too. Suppose you wanted to compute a formatted string from each floating point value in frame. You can do this with applymap:

```
In [120]: format = lambda x: '%.2f' % x
In [121]: frame.applymap(format)
Out[121]:
b d e
Utah -0.03 1.08 1.28
Ohio 0.65 0.83 -1.55
Texas 0.51 -0.88 0.20
Oregon -0.49 -0.48 -0.31
```

The reason for the name applymap is that Series has a map method for applying an element-wise function:

```
In [122]: frame['e'].map(format)
Out[122]:
Utah 1.28
Ohio -1.55
Texas 0.20
Oregon -0.31
Name: e, dtype: object
```

Summing up, `apply`

works on a row / column basis of a DataFrame, `applymap`

works element-wise on a DataFrame, and `map`

works element-wise on a Series.

Read more... Read less...

## Comparing `map`

, `applymap`

and `ap`

`ply`

: Context Matters

First major difference: **DEFINITION**

`map`

is defined on Series ONLY`applymap`

is defined on DataFrames ONLY`apply`

is defined on BOTH

Second major difference: **INPUT ARGUMENT**

`map`

accepts`dict`

s,`Series`

, or callable`applymap`

and`apply`

accept callables only

Third major difference: **BEHAVIOR**

`map`

is elementwise for Series`applymap`

is elementwise for DataFrames`apply`

also works elementwise but is suited to more complex operations and aggregation. The behaviour and return value depends on the function.

Fourth major difference (the most important one): **USE CASE**

`map`

is meant for mapping values from one domain to another, so is optimised for performance (e.g.,`df['A'].map({1:'a', 2:'b', 3:'c'})`

)`applymap`

is good for elementwise transformations across multiple rows/columns (e.g.,`df[['A', 'B', 'C']].applymap(str.strip)`

)`apply`

is for applying any function that cannot be vectorised (e.g.,`df['sentences'].apply(nltk.sent_tokenize)`

)

## Summarising

Footnotes

`map`

when passed a dictionary/Series will map elements based on the keys in that dictionary/Series. Missing values will be recorded as NaN in the output.

`applymap`

in more recent versions has been optimised for some operations. You will find`applymap`

slightly faster than`apply`

in some cases. My suggestion is to test them both and use whatever works better.

`map`

is optimised for elementwise mappings and transformation. Operations that involve dictionaries or Series will enable pandas to use faster code paths for better performance.`Series.apply`

returns a scalar for aggregating operations, Series otherwise. Similarly for`DataFrame.apply`

. Note that`apply`

also has fastpaths when called with certain NumPy functions such as`mean`

,`sum`

, etc.

## Quick Summary

`DataFrame.apply`

operates on entire rows or columns at a time.`DataFrame.applymap`

,`Series.apply`

, and`Series.map`

operate on one element at time.

`Series.apply`

and `Series.map`

are similar and often interchangeable. Some of their slight differences are discussed in osa's answer below.

Adding to the other answers, in a `Series`

there are also map and apply.

**Apply can make a DataFrame out of a series**; however, map will just put a series in every cell of another series, which is probably not what you want.

```
In [40]: p=pd.Series([1,2,3])
In [41]: p
Out[31]:
0 1
1 2
2 3
dtype: int64
In [42]: p.apply(lambda x: pd.Series([x, x]))
Out[42]:
0 1
0 1 1
1 2 2
2 3 3
In [43]: p.map(lambda x: pd.Series([x, x]))
Out[43]:
0 0 1
1 1
dtype: int64
1 0 2
1 2
dtype: int64
2 0 3
1 3
dtype: int64
dtype: object
```

Also if I had a function with side effects, such as "connect to a web server", I'd probably use `apply`

just for the sake of clarity.

```
series.apply(download_file_for_every_element)
```

** Map can use not only a function, but also a dictionary or another series.** Let's say you want to manipulate permutations.

Take

```
1 2 3 4 5
2 1 4 5 3
```

The square of this permutation is

```
1 2 3 4 5
1 2 5 3 4
```

You can compute it using `map`

. Not sure if self-application is documented, but it works in `0.15.1`

.

```
In [39]: p=pd.Series([1,0,3,4,2])
In [40]: p.map(p)
Out[40]:
0 0
1 1
2 4
3 2
4 3
dtype: int64
```

@jeremiahbuddha mentioned that apply works on row/columns, while applymap works element-wise. But it seems you can still use apply for element-wise computation....

```
frame.apply(np.sqrt)
Out[102]:
b d e
Utah NaN 1.435159 NaN
Ohio 1.098164 0.510594 0.729748
Texas NaN 0.456436 0.697337
Oregon 0.359079 NaN NaN
frame.applymap(np.sqrt)
Out[103]:
b d e
Utah NaN 1.435159 NaN
Ohio 1.098164 0.510594 0.729748
Texas NaN 0.456436 0.697337
Oregon 0.359079 NaN NaN
```

Just wanted to point out, as I struggled with this for a bit

```
def f(x):
if x < 0:
x = 0
elif x > 100000:
x = 100000
return x
df.applymap(f)
df.describe()
```

## this does not modify the dataframe itself, has to be reassigned

```
df = df.applymap(f)
df.describe()
```