Extract substring in Bash


Given a filename in the form someletters_12345_moreleters.ext, I want to extract the 5 digits and put them into a variable.

So to emphasize the point, I have a filename with x number of characters then a five digit sequence surrounded by a single underscore on either side then another set of x number of characters. I want to take the 5 digit number and put that into a variable.

I am very interested in the number of different ways that this can be accomplished.

10/9/2017 4:03:04 AM

Accepted Answer

Use cut:

echo 'someletters_12345_moreleters.ext' | cut -d'_' -f 2

More generic:

SUBSTRING=$(echo $INPUT| cut -d'_' -f 2)
11/4/2015 10:51:55 PM

Generic solution where the number can be anywhere in the filename, using the first of such sequences:

number=$(echo $filename | egrep -o '[[:digit:]]{5}' | head -n1)

Another solution to extract exactly a part of a variable:


If your filename always have the format stuff_digits_... you can use awk:

number=$(echo $filename | awk -F _ '{ print $2 }')

Yet another solution to remove everything except digits, use

number=$(echo $filename | tr -cd '[[:digit:]]')

just try to use cut -c startIndx-stopIndx


In case someone wants more rigorous information, you can also search it in man bash like this

$ man bash [press return key]
/substring  [press return key]
[press "n" key]
[press "n" key]
[press "n" key]
[press "n" key]


              Substring Expansion.  Expands to  up  to  length  characters  of
              parameter  starting  at  the  character specified by offset.  If
              length is omitted, expands to the substring of parameter  start‐
              ing at the character specified by offset.  length and offset are
              arithmetic expressions (see ARITHMETIC  EVALUATION  below).   If
              offset  evaluates  to a number less than zero, the value is used
              as an offset from the end of the value of parameter.  Arithmetic
              expressions  starting  with  a - must be separated by whitespace
              from the preceding : to be distinguished from  the  Use  Default
              Values  expansion.   If  length  evaluates to a number less than
              zero, and parameter is not @ and not an indexed  or  associative
              array,  it is interpreted as an offset from the end of the value
              of parameter rather than a number of characters, and the  expan‐
              sion is the characters between the two offsets.  If parameter is
              @, the result is length positional parameters beginning at  off‐
              set.   If parameter is an indexed array name subscripted by @ or
              *, the result is the length members of the array beginning  with
              ${parameter[offset]}.   A  negative  offset is taken relative to
              one greater than the maximum index of the specified array.  Sub‐
              string  expansion applied to an associative array produces unde‐
              fined results.  Note that a negative offset  must  be  separated
              from  the  colon  by  at least one space to avoid being confused
              with the :- expansion.  Substring indexing is zero-based  unless
              the  positional  parameters are used, in which case the indexing
              starts at 1 by default.  If offset  is  0,  and  the  positional
              parameters are used, $0 is prefixed to the list.

Here's how i'd do it:

[[ ${FN} =~ _([[:digit:]]{5})_ ]] && NUM=${BASH_REMATCH[1]}



Regular Expressions (RE): _([[:digit:]]{5})_

  • _ are literals to demarcate/anchor matching boundaries for the string being matched
  • () create a capture group
  • [[:digit:]] is a character class, i think it speaks for itself
  • {5} means exactly five of the prior character, class (as in this example), or group must match

In english, you can think of it behaving like this: the FN string is iterated character by character until we see an _ at which point the capture group is opened and we attempt to match five digits. If that matching is successful to this point, the capture group saves the five digits traversed. If the next character is an _, the condition is successful, the capture group is made available in BASH_REMATCH, and the next NUM= statement can execute. If any part of the matching fails, saved details are disposed of and character by character processing continues after the _. e.g. if FN where _1 _12 _123 _1234 _12345_, there would be four false starts before it found a match.


I'm surprised this pure bash solution didn't come up:

set $a
echo $2
# prints 12345

You probably want to reset IFS to what value it was before, or unset IFS afterwards!