Advertisement
Advertisement


How to validate phone numbers using regex


Question

I'm trying to put together a comprehensive regex to validate phone numbers. Ideally it would handle international formats, but it must handle US formats, including the following:

  • 1-234-567-8901
  • 1-234-567-8901 x1234
  • 1-234-567-8901 ext1234
  • 1 (234) 567-8901
  • 1.234.567.8901
  • 1/234/567/8901
  • 12345678901

I'll answer with my current attempt, but I'm hoping somebody has something better and/or more elegant.

2020/02/14
1
935
2/14/2020 7:35:49 PM

Accepted Answer

Better option... just strip all non-digit characters on input (except 'x' and leading '+' signs), taking care because of the British tendency to write numbers in the non-standard form +44 (0) ... when asked to use the international prefix (in that specific case, you should discard the (0) entirely).

Then, you end up with values like:

 12345678901
 12345678901x1234
 345678901x1234
 12344678901
 12345678901
 12345678901
 12345678901
 +4112345678
 +441234567890

Then when you display, reformat to your hearts content. e.g.

  1 (234) 567-8901
  1 (234) 567-8901 x1234
2016/03/10
520
3/10/2016 10:47:57 AM

It turns out that there's something of a spec for this, at least for North America, called the NANP.

You need to specify exactly what you want. What are legal delimiters? Spaces, dashes, and periods? No delimiter allowed? Can one mix delimiters (e.g., +0.111-222.3333)? How are extensions (e.g., 111-222-3333 x 44444) going to be handled? What about special numbers, like 911? Is the area code going to be optional or required?

Here's a regex for a 7 or 10 digit number, with extensions allowed, delimiters are spaces, dashes, or periods:

^(?:(?:\+?1\s*(?:[.-]\s*)?)?(?:\(\s*([2-9]1[02-9]|[2-9][02-8]1|[2-9][02-8][02-9])\s*\)|([2-9]1[02-9]|[2-9][02-8]1|[2-9][02-8][02-9]))\s*(?:[.-]\s*)?)?([2-9]1[02-9]|[2-9][02-9]1|[2-9][02-9]{2})\s*(?:[.-]\s*)?([0-9]{4})(?:\s*(?:#|x\.?|ext\.?|extension)\s*(\d+))?$
2010/09/01

.*

If the users want to give you their phone numbers, then trust them to get it right. If they do not want to give it to you then forcing them to enter a valid number will either send them to a competitor's site or make them enter a random string that fits your regex. I might even be tempted to look up the number of a premium rate horoscope hotline and enter that instead.

I would also consider any of the following as valid entries on a web site:

"123 456 7890 until 6pm, then 098 765 4321"  
"123 456 7890 or try my mobile on 098 765 4321"  
"ex-directory - mind your own business"
2019/12/18

I would also suggest looking at the "libphonenumber" Google Library. I know it is not regex but it does exactly what you want.

For example, it will recognize that:

15555555555

is a possible number but not a valid number. It also supports countries outside the US.

Highlights of functionality:

  • Parsing/formatting/validating phone numbers for all countries/regions of the world.
  • getNumberType - gets the type of the number based on the number itself; able to distinguish Fixed-line, Mobile, Toll-free, Premium Rate, Shared Cost, VoIP and Personal Numbers (whenever feasible).
  • isNumberMatch - gets a confidence level on whether two numbers could be the same.
  • getExampleNumber/getExampleNumberByType - provides valid example numbers for all countries/regions, with the option of specifying which type of example phone number is needed.
  • isPossibleNumber - quickly guessing whether a number is a possible phonenumber by using only the length information, much faster than a full validation.
  • isValidNumber - full validation of a phone number for a region using length and prefix information.
  • AsYouTypeFormatter - formats phone numbers on-the-fly when users enter each digit.
  • findNumbers - finds numbers in text input.
  • PhoneNumberOfflineGeocoder - provides geographical information related to a phone number.

Examples

The biggest problem with phone number validation is it is very culturally dependant.

  • America
    • (408) 974–2042 is a valid US number
    • (999) 974–2042 is not a valid US number
  • Australia
    • 0404 999 999 is a valid Australian number
    • (02) 9999 9999 is also a valid Australian number
    • (09) 9999 9999 is not a valid Australian number

A regular expression is fine for checking the format of a phone number, but it's not really going to be able to check the validity of a phone number.

I would suggest skipping a simple regular expression to test your phone number against, and using a library such as Google's libphonenumber (link to GitHub project).

Introducing libphonenumber!

Using one of your more complex examples, 1-234-567-8901 x1234, you get the following data out of libphonenumber (link to online demo):

Validation Results

Result from isPossibleNumber()  true
Result from isValidNumber()     true

Formatting Results:

E164 format                    +12345678901
Original format                (234) 567-8901 ext. 123
National format                (234) 567-8901 ext. 123
International format           +1 234-567-8901 ext. 123
Out-of-country format from US  1 (234) 567-8901 ext. 123
Out-of-country format from CH  00 1 234-567-8901 ext. 123

So not only do you learn if the phone number is valid (which it is), but you also get consistent phone number formatting in your locale.

As a bonus, libphonenumber has a number of datasets to check the validity of phone numbers, as well, so checking a number such as +61299999999 (the international version of (02) 9999 9999) returns as a valid number with formatting:

Validation Results

Result from isPossibleNumber()  true
Result from isValidNumber()     true

Formatting Results

E164 format                    +61299999999
Original format                61 2 9999 9999
National format                (02) 9999 9999
International format           +61 2 9999 9999
Out-of-country format from US  011 61 2 9999 9999
Out-of-country format from CH  00 61 2 9999 9999

libphonenumber also gives you many additional benefits, such as grabbing the location that the phone number is detected as being, and also getting the time zone information from the phone number:

PhoneNumberOfflineGeocoder Results
Location        Australia

PhoneNumberToTimeZonesMapper Results
Time zone(s)    [Australia/Sydney]

But the invalid Australian phone number ((09) 9999 9999) returns that it is not a valid phone number.

Validation Results

Result from isPossibleNumber()  true
Result from isValidNumber()     false

Google's version has code for Java and Javascript, but people have also implemented libraries for other languages that use the Google i18n phone number dataset:

Unless you are certain that you are always going to be accepting numbers from one locale, and they are always going to be in one format, I would heavily suggest not writing your own code for this, and using libphonenumber for validating and displaying phone numbers.

2019/02/18

/^(?:(?:\(?(?:00|\+)([1-4]\d\d|[1-9]\d?)\)?)?[\-\.\ \\\/]?)?((?:\(?\d{1,}\)?[\-\.\ \\\/]?){0,})(?:[\-\.\ \\\/]?(?:#|ext\.?|extension|x)[\-\.\ \\\/]?(\d+))?$/i

This matches:

 - (+351) 282 43 50 50
 - 90191919908
 - 555-8909
 - 001 6867684
 - 001 6867684x1
 - 1 (234) 567-8901
 - 1-234-567-8901 x1234
 - 1-234-567-8901 ext1234
 - 1-234 567.89/01 ext.1234
 - 1(234)5678901x1234
 - (123)8575973
 - (0055)(123)8575973

On $n, it saves:

  1. Country indicator
  2. Phone number
  3. Extension

You can test it on https://www.regexpal.com/?fam=99127

2017/09/22

Although the answer to strip all whitespace is neat, it doesn't really solve the problem that's posed, which is to find a regex. Take, for instance, my test script that downloads a web page and extracts all phone numbers using the regex. Since you'd need a regex anyway, you might as well have the regex do all the work. I came up with this:

1?\W*([2-9][0-8][0-9])\W*([2-9][0-9]{2})\W*([0-9]{4})(\se?x?t?(\d*))?

Here's a perl script to test it. When you match, $1 contains the area code, $2 and $3 contain the phone number, and $5 contains the extension. My test script downloads a file from the internet and prints all the phone numbers in it.

#!/usr/bin/perl

my $us_phone_regex =
        '1?\W*([2-9][0-8][0-9])\W*([2-9][0-9]{2})\W*([0-9]{4})(\se?x?t?(\d*))?';


my @tests =
(
"1-234-567-8901",
"1-234-567-8901 x1234",
"1-234-567-8901 ext1234",
"1 (234) 567-8901",
"1.234.567.8901",
"1/234/567/8901",
"12345678901",
"not a phone number"
);

foreach my $num (@tests)
{
        if( $num =~ m/$us_phone_regex/ )
        {
                print "match [$1-$2-$3]\n" if not defined $4;
                print "match [$1-$2-$3 $5]\n" if defined $4;
        }
        else
        {
                print "no match [$num]\n";
        }
}

#
# Extract all phone numbers from an arbitrary file.
#
my $external_filename =
        'http://web.textfiles.com/ezines/PHREAKSANDGEEKS/PnG-spring05.txt';
my @external_file = `curl $external_filename`;
foreach my $line (@external_file)
{
        if( $line =~ m/$us_phone_regex/ )
        {
                print "match $1 $2 $3\n";
        }
}

Edit:

You can change \W* to \s*\W?\s* in the regex to tighten it up a bit. I wasn't thinking of the regex in terms of, say, validating user input on a form when I wrote it, but this change makes it possible to use the regex for that purpose.

'1?\s*\W?\s*([2-9][0-8][0-9])\s*\W?\s*([2-9][0-9]{2})\s*\W?\s*([0-9]{4})(\se?x?t?(\d*))?';
2008/09/23

Source: https://stackoverflow.com/questions/123559
Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Email: [email protected]