Advertisement
Advertisement


How to filter rows containing a string pattern from a Pandas dataframe


Question

Assume we have a data frame in Python Pandas that looks like this:

df = pd.DataFrame({'vals': [1, 2, 3, 4], 'ids': [u'aball', u'bball', u'cnut', u'fball']})

Or, in table form:

ids    vals
aball   1
bball   2
cnut    3
fball   4

How do I filter rows which contain the key word "ball?" For example, the output should be:

ids    vals
aball   1
bball   2
fball   4
2018/12/15
1
155
12/15/2018 2:57:27 PM

Accepted Answer

In [3]: df[df['ids'].str.contains("ball")]
Out[3]:
     ids  vals
0  aball     1
1  bball     2
3  fball     4
2015/01/15
301
1/15/2015 11:59:55 PM


>>> mask = df['ids'].str.contains('ball')    
>>> mask
0     True
1     True
2    False
3     True
Name: ids, dtype: bool

>>> df[mask]
     ids  vals
0  aball     1
1  bball     2
3  fball     4
2015/01/15

If you want to set the column you filter on as a new index, you could also consider to use .filter; if you want to keep it as a separate column then str.contains is the way to go.

Let's say you have

df = pd.DataFrame({'vals': [1, 2, 3, 4, 5], 'ids': [u'aball', u'bball', u'cnut', u'fball', 'ballxyz']})

       ids  vals
0    aball     1
1    bball     2
2     cnut     3
3    fball     4
4  ballxyz     5

and your plan is to filter all rows in which ids contains ball AND set ids as new index, you can do

df.set_index('ids').filter(like='ball', axis=0)

which gives

         vals
ids          
aball       1
bball       2
fball       4
ballxyz     5

But filter also allows you to pass a regex, so you could also filter only those rows where the column entry ends with ball. In this case you use

df.set_index('ids').filter(regex='ball$', axis=0)

       vals
ids        
aball     1
bball     2
fball     4

Note that now the entry with ballxyz is not included as it starts with ball and does not end with it.

If you want to get all entries that start with ball you can simple use

df.set_index('ids').filter(regex='^ball', axis=0)

yielding

         vals
ids          
ballxyz     5

The same works with columns; all you then need to change is the axis=0 part. If you filter based on columns, it would be axis=1.

2017/12/12