Advertisement
Advertisement


How do I check if a string is a number (float)?


Question

What is the best possible way to check if a string can be represented as a number in Python?

The function I currently have right now is:

def is_number(s):
    try:
        float(s)
        return True
    except ValueError:
        return False

Which, not only is ugly and slow, seems clunky. However I haven't found a better method because calling float in the main function is even worse.

2017/07/04
1
1649
7/4/2017 6:05:13 PM

Accepted Answer

Which, not only is ugly and slow

I'd dispute both.

A regex or other string parsing method would be uglier and slower.

I'm not sure that anything much could be faster than the above. It calls the function and returns. Try/Catch doesn't introduce much overhead because the most common exception is caught without an extensive search of stack frames.

The issue is that any numeric conversion function has two kinds of results

  • A number, if the number is valid
  • A status code (e.g., via errno) or exception to show that no valid number could be parsed.

C (as an example) hacks around this a number of ways. Python lays it out clearly and explicitly.

I think your code for doing this is perfect.

2019/04/25
714
4/25/2019 9:24:35 PM


TL;DR The best solution is s.replace('.','',1).isdigit()

I did some benchmarks comparing the different approaches

def is_number_tryexcept(s):
    """ Returns True is string is a number. """
    try:
        float(s)
        return True
    except ValueError:
        return False

import re    
def is_number_regex(s):
    """ Returns True is string is a number. """
    if re.match("^\d+?\.\d+?$", s) is None:
        return s.isdigit()
    return True


def is_number_repl_isdigit(s):
    """ Returns True is string is a number. """
    return s.replace('.','',1).isdigit()

If the string is not a number, the except-block is quite slow. But more importantly, the try-except method is the only approach that handles scientific notations correctly.

funcs = [
          is_number_tryexcept, 
          is_number_regex,
          is_number_repl_isdigit
          ]

a_float = '.1234'

print('Float notation ".1234" is not supported by:')
for f in funcs:
    if not f(a_float):
        print('\t -', f.__name__)

Float notation ".1234" is not supported by:
- is_number_regex

scientific1 = '1.000000e+50'
scientific2 = '1e50'


print('Scientific notation "1.000000e+50" is not supported by:')
for f in funcs:
    if not f(scientific1):
        print('\t -', f.__name__)




print('Scientific notation "1e50" is not supported by:')
for f in funcs:
    if not f(scientific2):
        print('\t -', f.__name__)

Scientific notation "1.000000e+50" is not supported by:
- is_number_regex
- is_number_repl_isdigit
Scientific notation "1e50" is not supported by:
- is_number_regex
- is_number_repl_isdigit

EDIT: The benchmark results

import timeit

test_cases = ['1.12345', '1.12.345', 'abc12345', '12345']
times_n = {f.__name__:[] for f in funcs}

for t in test_cases:
    for f in funcs:
        f = f.__name__
        times_n[f].append(min(timeit.Timer('%s(t)' %f, 
                      'from __main__ import %s, t' %f)
                              .repeat(repeat=3, number=1000000)))

where the following functions were tested

from re import match as re_match
from re import compile as re_compile

def is_number_tryexcept(s):
    """ Returns True is string is a number. """
    try:
        float(s)
        return True
    except ValueError:
        return False

def is_number_regex(s):
    """ Returns True is string is a number. """
    if re_match("^\d+?\.\d+?$", s) is None:
        return s.isdigit()
    return True


comp = re_compile("^\d+?\.\d+?$")    

def compiled_regex(s):
    """ Returns True is string is a number. """
    if comp.match(s) is None:
        return s.isdigit()
    return True


def is_number_repl_isdigit(s):
    """ Returns True is string is a number. """
    return s.replace('.','',1).isdigit()

enter image description here

2017/07/12

There is one exception that you may want to take into account: the string 'NaN'

If you want is_number to return FALSE for 'NaN' this code will not work as Python converts it to its representation of a number that is not a number (talk about identity issues):

>>> float('NaN')
nan

Otherwise, I should actually thank you for the piece of code I now use extensively. :)

G.

2010/09/01

how about this:

'3.14'.replace('.','',1).isdigit()

which will return true only if there is one or no '.' in the string of digits.

'3.14.5'.replace('.','',1).isdigit()

will return false

edit: just saw another comment ... adding a .replace(badstuff,'',maxnum_badstuff) for other cases can be done. if you are passing salt and not arbitrary condiments (ref:xkcd#974) this will do fine :P

2013/05/21

Which, not only is ugly and slow, seems clunky.

It may take some getting used to, but this is the pythonic way of doing it. As has been already pointed out, the alternatives are worse. But there is one other advantage of doing things this way: polymorphism.

The central idea behind duck typing is that "if it walks and talks like a duck, then it's a duck." What if you decide that you need to subclass string so that you can change how you determine if something can be converted into a float? Or what if you decide to test some other object entirely? You can do these things without having to change the above code.

Other languages solve these problems by using interfaces. I'll save the analysis of which solution is better for another thread. The point, though, is that python is decidedly on the duck typing side of the equation, and you're probably going to have to get used to syntax like this if you plan on doing much programming in Python (but that doesn't mean you have to like it of course).

One other thing you might want to take into consideration: Python is pretty fast in throwing and catching exceptions compared to a lot of other languages (30x faster than .Net for instance). Heck, the language itself even throws exceptions to communicate non-exceptional, normal program conditions (every time you use a for loop). Thus, I wouldn't worry too much about the performance aspects of this code until you notice a significant problem.

2008/12/11

Updated after Alfe pointed out you don't need to check for float separately as complex handles both:

def is_number(s):
    try:
        complex(s) # for int, long, float and complex
    except ValueError:
        return False

    return True

Previously said: Is some rare cases you might also need to check for complex numbers (e.g. 1+2i), which can not be represented by a float:

def is_number(s):
    try:
        float(s) # for int, long and float
    except ValueError:
        try:
            complex(s) # for complex
        except ValueError:
            return False

    return True
2016/04/22

Source: https://stackoverflow.com/questions/354038
Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Email: [email protected]