Advertisement
Advertisement


Fastest way to check if a value exists in a list


Question

What is the fastest way to know if a value exists in a list (a list with millions of values in it) and what its index is?

I know that all values in the list are unique as in this example.

The first method I try is (3.8 sec in my real code):

a = [4,2,3,1,5,6]

if a.count(7) == 1:
    b=a.index(7)
    "Do something with variable b"

The second method I try is (2x faster: 1.9 sec for my real code):

a = [4,2,3,1,5,6]

try:
    b=a.index(7)
except ValueError:
    "Do nothing"
else:
    "Do something with variable b"

Proposed methods from Stack Overflow user (2.74 sec for my real code):

a = [4,2,3,1,5,6]
if 7 in a:
    a.index(7)

In my real code, the first method takes 3.81 sec and the second method takes 1.88 sec. It's a good improvement, but:

I'm a beginner with Python/scripting, and is there a faster way to do the same things and save more processing time?

More specific explication for my application:

In the Blender API I can access a list of particles:

particles = [1, 2, 3, 4, etc.]

From there, I can access a particle's location:

particles[x].location = [x,y,z]

And for each particle I test if a neighbour exists by searching each particle location like so:

if [x+1,y,z] in particles.location
    "Find the identity of this neighbour particle in x:the particle's index
    in the array"
    particles.index([x+1,y,z])
1
857
10/1/2019 5:59:01 PM

Accepted Answer

7 in a

Clearest and fastest way to do it.

You can also consider using a set, but constructing that set from your list may take more time than faster membership testing will save. The only way to be certain is to benchmark well. (this also depends on what operations you require)

2011/09/27
1642
9/27/2011 3:57:30 PM

As stated by others, in can be very slow for large lists. Here are some comparisons of the performances for in, set and bisect. Note the time (in second) is in log scale.

enter image description here

Code for testing:

import random
import bisect
import matplotlib.pyplot as plt
import math
import time


def method_in(a, b, c):
    start_time = time.time()
    for i, x in enumerate(a):
        if x in b:
            c[i] = 1
    return time.time() - start_time


def method_set_in(a, b, c):
    start_time = time.time()
    s = set(b)
    for i, x in enumerate(a):
        if x in s:
            c[i] = 1
    return time.time() - start_time


def method_bisect(a, b, c):
    start_time = time.time()
    b.sort()
    for i, x in enumerate(a):
        index = bisect.bisect_left(b, x)
        if index < len(a):
            if x == b[index]:
                c[i] = 1
    return time.time() - start_time


def profile():
    time_method_in = []
    time_method_set_in = []
    time_method_bisect = []

    # adjust range down if runtime is to great or up if there are to many zero entries in any of the time_method lists
    Nls = [x for x in range(10000, 30000, 1000)]
    for N in Nls:
        a = [x for x in range(0, N)]
        random.shuffle(a)
        b = [x for x in range(0, N)]
        random.shuffle(b)
        c = [0 for x in range(0, N)]

        time_method_in.append(method_in(a, b, c))
        time_method_set_in.append(method_set_in(a, b, c))
        time_method_bisect.append(method_bisect(a, b, c))

    plt.plot(Nls, time_method_in, marker='o', color='r', linestyle='-', label='in')
    plt.plot(Nls, time_method_set_in, marker='o', color='b', linestyle='-', label='set')
    plt.plot(Nls, time_method_bisect, marker='o', color='g', linestyle='-', label='bisect')
    plt.xlabel('list size', fontsize=18)
    plt.ylabel('log(time)', fontsize=18)
    plt.legend(loc='upper left')
    plt.yscale('log')
    plt.show()


profile()
2020/07/23

You could put your items into a set. Set lookups are very efficient.

Try:

s = set(a)
if 7 in s:
  # do stuff

edit In a comment you say that you'd like to get the index of the element. Unfortunately, sets have no notion of element position. An alternative is to pre-sort your list and then use binary search every time you need to find an element.

2017/05/23

def check_availability(element, collection: iter):
    return element in collection

Usage

check_availability('a', [1,2,3,4,'a','b','c'])

I believe this is the fastest way to know if a chosen value is in an array.

2018/08/11

a = [4,2,3,1,5,6]

index = dict((y,x) for x,y in enumerate(a))
try:
   a_index = index[7]
except KeyError:
   print "Not found"
else:
   print "found"

This will only be a good idea if a doesn't change and thus we can do the dict() part once and then use it repeatedly. If a does change, please provide more detail on what you are doing.

2011/09/27

The original question was:

What is the fastest way to know if a value exists in a list (a list with millions of values in it) and what its index is?

Thus there are two things to find:

  1. is an item in the list, and
  2. what is the index (if in the list).

Towards this, I modified @xslittlegrass code to compute indexes in all cases, and added an additional method.

Results

enter image description here

Methods are:

  1. in--basically if x in b: return b.index(x)
  2. try--try/catch on b.index(x) (skips having to check if x in b)
  3. set--basically if x in set(b): return b.index(x)
  4. bisect--sort b with its index, binary search for x in sorted(b). Note mod from @xslittlegrass who returns the index in the sorted b, rather than the original b)
  5. reverse--form a reverse lookup dictionary d for b; then d[x] provides the index of x.

Results show that method 5 is the fastest.

Interestingly the try and the set methods are equivalent in time.


Test Code

import random
import bisect
import matplotlib.pyplot as plt
import math
import timeit
import itertools

def wrapper(func, *args, **kwargs):
    " Use to produced 0 argument function for call it"
    # Reference https://www.pythoncentral.io/time-a-python-function/
    def wrapped():
        return func(*args, **kwargs)
    return wrapped

def method_in(a,b,c):
    for i,x in enumerate(a):
        if x in b:
            c[i] = b.index(x)
        else:
            c[i] = -1
    return c

def method_try(a,b,c):
    for i, x in enumerate(a):
        try:
            c[i] = b.index(x)
        except ValueError:
            c[i] = -1

def method_set_in(a,b,c):
    s = set(b)
    for i,x in enumerate(a):
        if x in s:
            c[i] = b.index(x)
        else:
            c[i] = -1
    return c

def method_bisect(a,b,c):
    " Finds indexes using bisection "

    # Create a sorted b with its index
    bsorted = sorted([(x, i) for i, x in enumerate(b)], key = lambda t: t[0])

    for i,x in enumerate(a):
        index = bisect.bisect_left(bsorted,(x, ))
        c[i] = -1
        if index < len(a):
            if x == bsorted[index][0]:
                c[i] = bsorted[index][1]  # index in the b array

    return c

def method_reverse_lookup(a, b, c):
    reverse_lookup = {x:i for i, x in enumerate(b)}
    for i, x in enumerate(a):
        c[i] = reverse_lookup.get(x, -1)
    return c

def profile():
    Nls = [x for x in range(1000,20000,1000)]
    number_iterations = 10
    methods = [method_in, method_try, method_set_in, method_bisect, method_reverse_lookup]
    time_methods = [[] for _ in range(len(methods))]

    for N in Nls:
        a = [x for x in range(0,N)]
        random.shuffle(a)
        b = [x for x in range(0,N)]
        random.shuffle(b)
        c = [0 for x in range(0,N)]

        for i, func in enumerate(methods):
            wrapped = wrapper(func, a, b, c)
            time_methods[i].append(math.log(timeit.timeit(wrapped, number=number_iterations)))

    markers = itertools.cycle(('o', '+', '.', '>', '2'))
    colors = itertools.cycle(('r', 'b', 'g', 'y', 'c'))
    labels = itertools.cycle(('in', 'try', 'set', 'bisect', 'reverse'))

    for i in range(len(time_methods)):
        plt.plot(Nls,time_methods[i],marker = next(markers),color=next(colors),linestyle='-',label=next(labels))

    plt.xlabel('list size', fontsize=18)
    plt.ylabel('log(time)', fontsize=18)
    plt.legend(loc = 'upper left')
    plt.show()

profile()
2020/02/15

Source: https://stackoverflow.com/questions/7571635
Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Email: [email protected]