How can I match a string with a regex in Bash?


I am trying to write a bash script that contains a function so when given a .tar, .tar.bz2, .tar.gz etc. file it uses tar with the relevant switches to decompress the file.

I am using if elif then statements which test the filename to see what it ends with and I cannot get it to match using regex metacharacters.

To save constantly rewriting the script I am using 'test' at the command line, I thought the statement below should work, I have tried every combination of brackets, quotes and metacharaters possible and still it fails.

test sed-4.2.2.tar.bz2 = tar\.bz2$; echo $?
(this returns 1, false)

I'm sure the problem is a simple one and I've looked everywhere, yet I cannot fathom how to do it. Does someone know how I can do this?

8/20/2019 8:21:30 PM

Accepted Answer

To match regexes you need to use the =~ operator.

Try this:

[[ sed-4.2.2.tar.bz2 =~ tar.bz2$ ]] && echo matched

Alternatively, you can use wildcards (instead of regexes) with the == operator:

[[ sed-4.2.2.tar.bz2 == *tar.bz2 ]] && echo matched

If portability is not a concern, I recommend using [[ instead of [ or test as it is safer and more powerful. See What is the difference between test, [ and [[ ? for details.

7/2/2013 8:37:37 AM

A Function To Do This

extract () {
  if [ -f $1 ] ; then
      case $1 in
          *.tar.bz2)   tar xvjf $1    ;;
          *.tar.gz)    tar xvzf $1    ;;
          *.bz2)       bunzip2 $1     ;;
          *.rar)       rar x $1       ;;
          *.gz)        gunzip $1      ;;
          *.tar)       tar xvf $1     ;;
          *.tbz2)      tar xvjf $1    ;;
          *.tgz)       tar xvzf $1    ;;
          *.zip)       unzip $1       ;;
          *.Z)         uncompress $1  ;;
          *.7z)        7z x $1        ;;
          *)           echo "don't know '$1'..." ;;
      echo "'$1' is not a valid file!"

Other Note

In response to Aquarius Power in the comment above, We need to store the regex on a var

The variable BASH_REMATCH is set after you match the expression, and ${BASH_REMATCH[n]} will match the nth group wrapped in parentheses ie in the following ${BASH_REMATCH[1]} = "compressed" and ${BASH_REMATCH[2]} = ".gz"

if [[ "compressed.gz" =~ ^(.*)(\.[a-z]{1,5})$ ]]; 
  echo ${BASH_REMATCH[2]} ; 
  echo "Not proper format"; 

(The regex above isn't meant to be a valid one for file naming and extensions, but it works for the example)


I don't have enough rep to comment here, so I'm submitting a new answer to improve on dogbane's answer. The dot . in the regexp

[[ sed-4.2.2.tar.bz2 =~ tar.bz2$ ]] && echo matched

will actually match any character, not only the literal dot between 'tar.bz2', for example

[[ sed-4.2.2.tar4bz2 =~ tar.bz2$ ]] && echo matched
[[ sed-4.2.2.tar┬žbz2 =~ tar.bz2$ ]] && echo matched

or anything that doesn't require escaping with '\'. The strict syntax should then be

[[ sed-4.2.2.tar.bz2 =~ tar\.bz2$ ]] && echo matched

or you can go even stricter and also include the previous dot in the regex:

[[ sed-4.2.2.tar.bz2 =~ \.tar\.bz2$ ]] && echo matched

Since you are using bash, you don't need to create a child process for doing this. Here is one solution which performs it entirely within bash:

[[ $TEST =~ ^(.*):\ +(.*)$ ]] && TEST=${BASH_REMATCH[1]}:${BASH_REMATCH[2]}

Explanation: The groups before and after the sequence "colon and one or more spaces" are stored by the pattern match operator in the BASH_REMATCH array.


if [[ $STR == *pattern* ]]
    echo "It is the string!"
    echo "It's not him!"

Works for me! GNU bash, version 4.3.11(1)-release (x86_64-pc-linux-gnu)


shopt -s nocasematch

if [[ sed-4.2.2.$LINE =~ (yes|y)$ ]]
 then exit 0