Create an empty data.frame
Create an empty data.frame
Question
I'm trying to initialize a data.frame without any rows. Basically, I want to specify the data types for each column and name them, but not have any rows created as a result.
The best I've been able to do so far is something like:
df <- data.frame(Date=as.Date("01/01/2000", format="%m/%d/%Y"),
File="", User="", stringsAsFactors=FALSE)
df <- df[-1,]
Which creates a data.frame with a single row containing all of the data types and column names I wanted, but also creates a useless row which then needs to be removed.
Is there a better way to do this?
Accepted Answer
Just initialize it with empty vectors:
df <- data.frame(Date=as.Date(character()),
File=character(),
User=character(),
stringsAsFactors=FALSE)
Here's an other example with different column types :
df <- data.frame(Doubles=double(),
Ints=integer(),
Factors=factor(),
Logicals=logical(),
Characters=character(),
stringsAsFactors=FALSE)
str(df)
> str(df)
'data.frame': 0 obs. of 5 variables:
$ Doubles : num
$ Ints : int
$ Factors : Factor w/ 0 levels:
$ Logicals : logi
$ Characters: chr
N.B. :
Initializing a data.frame
with an empty column of the wrong type does not prevent further additions of rows having columns of different types.
This method is just a bit safer in the sense that you'll have the correct column types from the beginning, hence if your code relies on some column type checking, it will work even with a data.frame
with zero rows.
Read more… Read less…
If you already have an existent data frame, let's say df
that has the columns you want, then you can just create an empty data frame by removing all the rows:
empty_df = df[FALSE,]
Notice that df
still contains the data, but empty_df
doesn't.
I found this question looking for how to create a new instance with empty rows, so I think it might be helpful for some people.
You can do it without specifying column types
df = data.frame(matrix(vector(), 0, 3,
dimnames=list(c(), c("Date", "File", "User"))),
stringsAsFactors=F)
You could use read.table
with an empty string for the input text
as follows:
colClasses = c("Date", "character", "character")
col.names = c("Date", "File", "User")
df <- read.table(text = "",
colClasses = colClasses,
col.names = col.names)
Alternatively specifying the col.names
as a string:
df <- read.csv(text="Date,File,User", colClasses = colClasses)
Thanks to Richard Scriven for the improvement
Just declare
table = data.frame()
when you try to rbind
the first line it will create the columns
The most efficient way to do this is to use structure
to create a list that has the class "data.frame"
:
structure(list(Date = as.Date(character()), File = character(), User = character()),
class = "data.frame")
# [1] Date File User
# <0 rows> (or 0-length row.names)
To put this into perspective compared to the presently accepted answer, here's a simple benchmark:
s <- function() structure(list(Date = as.Date(character()),
File = character(),
User = character()),
class = "data.frame")
d <- function() data.frame(Date = as.Date(character()),
File = character(),
User = character(),
stringsAsFactors = FALSE)
library("microbenchmark")
microbenchmark(s(), d())
# Unit: microseconds
# expr min lq mean median uq max neval
# s() 58.503 66.5860 90.7682 82.1735 101.803 469.560 100
# d() 370.644 382.5755 523.3397 420.1025 604.654 1565.711 100