Advertisement
Advertisement


How to extract numbers from a string in Python?


Question

I would extract all the numbers contained in a string. Which is the better suited for the purpose, regular expressions or the isdigit() method?

Example:

line = "hello 12 hi 89"

Result:

[12, 89]
2019/05/13
1
449
5/13/2019 6:13:04 AM

Accepted Answer

If you only want to extract only positive integers, try the following:

>>> str = "h3110 23 cat 444.4 rabbit 11 2 dog"
>>> [int(s) for s in str.split() if s.isdigit()]
[23, 11, 2]

I would argue that this is better than the regex example for three reasons. First, you don't need another module; secondly, it's more readable because you don't need to parse the regex mini-language; and third, it is faster (and thus likely more pythonic):

python -m timeit -s "str = 'h3110 23 cat 444.4 rabbit 11 2 dog' * 1000" "[s for s in str.split() if s.isdigit()]"
100 loops, best of 3: 2.84 msec per loop

python -m timeit -s "import re" "str = 'h3110 23 cat 444.4 rabbit 11 2 dog' * 1000" "re.findall('\\b\\d+\\b', str)"
100 loops, best of 3: 5.66 msec per loop

This will not recognize floats, negative integers, or integers in hexadecimal format. If you can't accept these limitations, slim's answer below will do the trick.

2017/05/23
512
5/23/2017 12:18:24 PM


This is more than a bit late, but you can extend the regex expression to account for scientific notation too.

import re

# Format is [(<string>, <expected output>), ...]
ss = [("apple-12.34 ba33na fanc-14.23e-2yapple+45e5+67.56E+3",
       ['-12.34', '33', '-14.23e-2', '+45e5', '+67.56E+3']),
      ('hello X42 I\'m a Y-32.35 string Z30',
       ['42', '-32.35', '30']),
      ('he33llo 42 I\'m a 32 string -30', 
       ['33', '42', '32', '-30']),
      ('h3110 23 cat 444.4 rabbit 11 2 dog', 
       ['3110', '23', '444.4', '11', '2']),
      ('hello 12 hi 89', 
       ['12', '89']),
      ('4', 
       ['4']),
      ('I like 74,600 commas not,500', 
       ['74,600', '500']),
      ('I like bad math 1+2=.001', 
       ['1', '+2', '.001'])]

for s, r in ss:
    rr = re.findall("[-+]?[.]?[\d]+(?:,\d\d\d)*[\.]?\d*(?:[eE][-+]?\d+)?", s)
    if rr == r:
        print('GOOD')
    else:
        print('WRONG', rr, 'should be', r)

Gives all good!

Additionally, you can look at the AWS Glue built-in regex


I'm assuming you want floats not just integers so I'd do something like this:

l = []
for t in s.split():
    try:
        l.append(float(t))
    except ValueError:
        pass

Note that some of the other solutions posted here don't work with negative numbers:

>>> re.findall(r'\b\d+\b', 'he33llo 42 I\'m a 32 string -30')
['42', '32', '30']

>>> '-3'.isdigit()
False
2010/11/27

If you know it will be only one number in the string, i.e 'hello 12 hi', you can try filter.

For example:

In [1]: int(''.join(filter(str.isdigit, '200 grams')))
Out[1]: 200
In [2]: int(''.join(filter(str.isdigit, 'Counters: 55')))
Out[2]: 55
In [3]: int(''.join(filter(str.isdigit, 'more than 23 times')))
Out[3]: 23

But be carefull !!! :

In [4]: int(''.join(filter(str.isdigit, '200 grams 5')))
Out[4]: 2005
2019/02/06

# extract numbers from garbage string:
s = '12//n,[email protected]#$%3.14kjlw0xdadfackvj1.6e-19&*ghn334'
newstr = ''.join((ch if ch in '0123456789.-e' else ' ') for ch in s)
listOfNumbers = [float(i) for i in newstr.split()]
print(listOfNumbers)
[12.0, 3.14, 0.0, 1.6e-19, 334.0]
2018/03/29

I was looking for a solution to remove strings' masks, specifically from Brazilian phones numbers, this post not answered but inspired me. This is my solution:

>>> phone_number = '+55(11)8715-9877'
>>> ''.join([n for n in phone_number if n.isdigit()])
'551187159877'
2018/07/12

Source: https://stackoverflow.com/questions/4289331
Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Email: [email protected]