Advertisement
Advertisement


How can I count the occurrences of a list item?


Question

Given an item, how can I count its occurrences in a list in Python?

2020/09/02
1
1578
9/2/2020 9:25:16 AM

Accepted Answer

If you only want one item's count, use the count method:

>>> [1, 2, 3, 4, 1, 4, 1].count(1)
3

Don't use this if you want to count multiple items. Calling count in a loop requires a separate pass over the list for every count call, which can be catastrophic for performance. If you want to count all items, or even just multiple items, use Counter, as explained in the other answers.

2017/11/15
1908
11/15/2017 9:48:15 PM


Counting the occurrences of one item in a list

For counting the occurrences of just one list item you can use count()

>>> l = ["a","b","b"]
>>> l.count("a")
1
>>> l.count("b")
2

Counting the occurrences of all items in a list is also known as "tallying" a list, or creating a tally counter.

Counting all items with count()

To count the occurrences of items in l one can simply use a list comprehension and the count() method

[[x,l.count(x)] for x in set(l)]

(or similarly with a dictionary dict((x,l.count(x)) for x in set(l)))

Example:

>>> l = ["a","b","b"]
>>> [[x,l.count(x)] for x in set(l)]
[['a', 1], ['b', 2]]
>>> dict((x,l.count(x)) for x in set(l))
{'a': 1, 'b': 2}

Counting all items with Counter()

Alternatively, there's the faster Counter class from the collections library

Counter(l)

Example:

>>> l = ["a","b","b"]
>>> from collections import Counter
>>> Counter(l)
Counter({'b': 2, 'a': 1})

How much faster is Counter?

I checked how much faster Counter is for tallying lists. I tried both methods out with a few values of n and it appears that Counter is faster by a constant factor of approximately 2.

Here is the script I used:

from __future__ import print_function
import timeit

t1=timeit.Timer('Counter(l)', \
                'import random;import string;from collections import Counter;n=1000;l=[random.choice(string.ascii_letters) for x in range(n)]'
                )

t2=timeit.Timer('[[x,l.count(x)] for x in set(l)]',
                'import random;import string;n=1000;l=[random.choice(string.ascii_letters) for x in range(n)]'
                )

print("Counter(): ", t1.repeat(repeat=3,number=10000))
print("count():   ", t2.repeat(repeat=3,number=10000)

And the output:

Counter():  [0.46062711701961234, 0.4022796869976446, 0.3974247490405105]
count():    [7.779430688009597, 7.962715800967999, 8.420845870045014]
2017/05/23

Another way to get the number of occurrences of each item, in a dictionary:

dict((i, a.count(i)) for i in a)
2015/07/31

list.count(x) returns the number of times x appears in a list

see: http://docs.python.org/tutorial/datastructures.html#more-on-lists

2015/12/24

Given an item, how can I count its occurrences in a list in Python?

Here's an example list:

>>> l = list('aaaaabbbbcccdde')
>>> l
['a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'b', 'c', 'c', 'c', 'd', 'd', 'e']

list.count

There's the list.count method

>>> l.count('b')
4

This works fine for any list. Tuples have this method as well:

>>> t = tuple('aabbbffffff')
>>> t
('a', 'a', 'b', 'b', 'b', 'f', 'f', 'f', 'f', 'f', 'f')
>>> t.count('f')
6

collections.Counter

And then there's collections.Counter. You can dump any iterable into a Counter, not just a list, and the Counter will retain a data structure of the counts of the elements.

Usage:

>>> from collections import Counter
>>> c = Counter(l)
>>> c['b']
4

Counters are based on Python dictionaries, their keys are the elements, so the keys need to be hashable. They are basically like sets that allow redundant elements into them.

Further usage of collections.Counter

You can add or subtract with iterables from your counter:

>>> c.update(list('bbb'))
>>> c['b']
7
>>> c.subtract(list('bbb'))
>>> c['b']
4

And you can do multi-set operations with the counter as well:

>>> c2 = Counter(list('aabbxyz'))
>>> c - c2                   # set difference
Counter({'a': 3, 'c': 3, 'b': 2, 'd': 2, 'e': 1})
>>> c + c2                   # addition of all elements
Counter({'a': 7, 'b': 6, 'c': 3, 'd': 2, 'e': 1, 'y': 1, 'x': 1, 'z': 1})
>>> c | c2                   # set union
Counter({'a': 5, 'b': 4, 'c': 3, 'd': 2, 'e': 1, 'y': 1, 'x': 1, 'z': 1})
>>> c & c2                   # set intersection
Counter({'a': 2, 'b': 2})

Why not pandas?

Another answer suggests:

Why not use pandas?

Pandas is a common library, but it's not in the standard library. Adding it as a requirement is non-trivial.

There are builtin solutions for this use-case in the list object itself as well as in the standard library.

If your project does not already require pandas, it would be foolish to make it a requirement just for this functionality.

2020/06/20

I've compared all suggested solutions (and a few new ones) with perfplot (a small project of mine).

Counting one item

For large enough arrays, it turns out that

numpy.sum(numpy.array(a) == 1) 

is slightly faster than the other solutions.

enter image description here

Counting all items

As established before,

numpy.bincount(a)

is what you want.

enter image description here


Code to reproduce the plots:

from collections import Counter
from collections import defaultdict
import numpy
import operator
import pandas
import perfplot


def counter(a):
    return Counter(a)


def count(a):
    return dict((i, a.count(i)) for i in set(a))


def bincount(a):
    return numpy.bincount(a)


def pandas_value_counts(a):
    return pandas.Series(a).value_counts()


def occur_dict(a):
    d = {}
    for i in a:
        if i in d:
            d[i] = d[i]+1
        else:
            d[i] = 1
    return d


def count_unsorted_list_items(items):
    counts = defaultdict(int)
    for item in items:
        counts[item] += 1
    return dict(counts)


def operator_countof(a):
    return dict((i, operator.countOf(a, i)) for i in set(a))


perfplot.show(
    setup=lambda n: list(numpy.random.randint(0, 100, n)),
    n_range=[2**k for k in range(20)],
    kernels=[
        counter, count, bincount, pandas_value_counts, occur_dict,
        count_unsorted_list_items, operator_countof
        ],
    equality_check=None,
    logx=True,
    logy=True,
    )

2.

from collections import Counter
from collections import defaultdict
import numpy
import operator
import pandas
import perfplot


def counter(a):
    return Counter(a)


def count(a):
    return dict((i, a.count(i)) for i in set(a))


def bincount(a):
    return numpy.bincount(a)


def pandas_value_counts(a):
    return pandas.Series(a).value_counts()


def occur_dict(a):
    d = {}
    for i in a:
        if i in d:
            d[i] = d[i]+1
        else:
            d[i] = 1
    return d


def count_unsorted_list_items(items):
    counts = defaultdict(int)
    for item in items:
        counts[item] += 1
    return dict(counts)


def operator_countof(a):
    return dict((i, operator.countOf(a, i)) for i in set(a))


perfplot.show(
    setup=lambda n: list(numpy.random.randint(0, 100, n)),
    n_range=[2**k for k in range(20)],
    kernels=[
        counter, count, bincount, pandas_value_counts, occur_dict,
        count_unsorted_list_items, operator_countof
        ],
    equality_check=None,
    logx=True,
    logy=True,
    )
2017/09/13

Source: https://stackoverflow.com/questions/2600191
Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Email: [email protected]