Regular Expression for alphanumeric and underscores


I would like to have a regular expression that checks if a string contains only upper and lowercase letters, numbers, and underscores.

12/3/2008 4:25:27 AM

There's a lot of verbosity in here, and I'm deeply against it, so, my conclusive answer would be:


\w is equivalent to [A-Za-z0-9_], which is pretty much what you want. (unless we introduce unicode to the mix)

Using the + quantifier you'll match one or more characters. If you want to accept an empty string too, use * instead.


You want to check that each character matches your requirements, which is why we use:


And you can even use the shorthand version:


Which is equivalent (in some regex flavors, so make sure you check before you use it). Then to indicate that the entire string must match, you use:


To indicate the string must start with that character, then use


To indicate the string must end with that character. Then use

\w+ or \w*

To indicate "1 or more", or "0 or more". Putting it all together, we have:


Um...question: Does it need to have at least one character or no? Can it be an empty string?


Will do at least one upper or lower case alphanumeric or underscore. If it can be zero length, then just substitute the + for *



If diacritics need to be included (such as cedilla - ç) then you would need to use the word character which does the same as the above, but includes the diacritic characters:




Although it's more verbose than \w, I personally appreciate the readability of the full POSIX character class names ( ), so I'd say:


However, while the documentation at the above links states that \w will "Match any character in the range 0 - 9, A - Z and a - z (equivalent of POSIX [:alnum:])", I have not found this to be true. Not with grep -P anyway. You need to explicitly include the underscore if you use [:alnum:] but not if you use \w. You can't beat the following for short and sweet:


Along with readability, using the POSIX character classes ( means that your regex can work on non ASCII strings, which the range based regexes won't do since they rely on the underlying ordering of the ASCII characters which may be different from other character sets and will therefore exclude some non-ASCII characters (letters such as œ) which you might want to capture.


In Computer Science, an Alphanumeric value often means the first character is not a number but is an alphabet or underscore. Thereafter the character can be 0-9, A-Z, a-z, or underscore (_).

Here is how you would do that:

Tested under php:

$regex = '/^[A-Za-z_][A-Za-z\d_]*$/'

or take this


and place it in your development language.


Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Email: [email protected]