Drop data frame columns by name


I have a number of columns that I would like to remove from a data frame. I know that we can delete them individually using something like:

df$x <- NULL

But I was hoping to do this with fewer commands.

Also, I know that I could drop columns using integer indexing like this:

df <- df[ -c(1, 3:6, 12) ]

But I am concerned that the relative position of my variables may change.

Given how powerful R is, I figured there might be a better way than dropping each column one by one.

1/29/2017 11:06:18 AM

Accepted Answer

You can use a simple list of names :

DF <- data.frame(
drops <- c("x","z")
DF[ , !(names(DF) %in% drops)]

Or, alternatively, you can make a list of those to keep and refer to them by name :

keeps <- c("y", "a")

EDIT : For those still not acquainted with the drop argument of the indexing function, if you want to keep one column as a data frame, you do:

keeps <- "y"
DF[ , keeps, drop = FALSE]

drop=TRUE (or not mentioning it) will drop unnecessary dimensions, and hence return a vector with the values of column y.

2/26/2016 8:57:39 PM

within(df, rm(x))

is probably easiest, or for multiple variables:

within(df, rm(x, y))

Or if you're dealing with data.tables (per How do you delete a column by name in data.table?):

dt[, x := NULL]   # Deletes column x by reference instantly.

dt[, !"x"]   # Selects all but x into a new data.table.

or for multiple variables

dt[, c("x","y") := NULL]

dt[, !c("x", "y")]

You could use %in% like this:

df[, !(colnames(df) %in% c("x","bar","foo"))]

list(NULL) also works:

dat <- mtcars
# [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear"
# [11] "carb"
dat[,c("mpg","cyl","wt")] <- list(NULL)
# [1] "disp" "hp"   "drat" "qsec" "vs"   "am"   "gear" "carb"

If you want remove the columns by reference and avoid the internal copying associated with data.frames then you can use the data.table package and the function :=

You can pass a character vector names to the left hand side of the := operator, and NULL as the RHS.


df <- data.frame(a=1:10, b=1:10, c=1:10, d=1:10)
DT <- data.table(df)
# or more simply  DT <- data.table(a=1:10, b=1:10, c=1:10, d=1:10) #

DT[, c('a','b') := NULL]

If you want to predefine the names as as character vector outside the call to [, wrap the name of the object in () or {} to force the LHS to be evaluated in the calling scope not as a name within the scope of DT.

del <- c('a','b')
DT <- data.table(a=1:10, b=1:10, c=1:10, d=1:10)
DT[, (del) := NULL]
DT <-  <- data.table(a=1:10, b=1:10, c=1:10, d=1:10)
DT[, {del} := NULL]
# force or `c` would also work.   

You can also use set, which avoids the overhead of [.data.table, and also works for data.frames!

df <- data.frame(a=1:10, b=1:10, c=1:10, d=1:10)
DT <- data.table(df)

# drop `a` from df (no copying involved)

set(df, j = 'a', value = NULL)
# drop `b` from DT (no copying involved)
set(DT, j = 'b', value = NULL)

There is a potentially more powerful strategy based on the fact that grep() will return a numeric vector. If you have a long list of variables as I do in one of my dataset, some variables that end in ".A" and others that end in ".B" and you only want the ones that end in ".A" (along with all the variables that don't match either pattern, do this:

dfrm2 <- dfrm[ , -grep("\\.B$", names(dfrm)) ]

For the case at hand, using Joris Meys example, it might not be as compact, but it would be:

DF <- DF[, -grep( paste("^",drops,"$", sep="", collapse="|"), names(DF) )]