Advertisement
Advertisement

How to convert a data frame column to numeric type?

Question

How do you convert a data frame column to a numeric type?

2015/10/10
1
272
10/10/2015 5:54:38 AM

Accepted Answer

Since (still) nobody got check-mark, I assume that you have some practical issue in mind, mostly because you haven't specified what type of vector you want to convert to `numeric`. I suggest that you should apply `transform` function in order to complete your task.

Now I'm about to demonstrate certain "conversion anomaly":

``````# create dummy data.frame
d <- data.frame(char = letters[1:5],
fake_char = as.character(1:5),
fac = factor(1:5),
char_fac = factor(letters[1:5]),
num = 1:5, stringsAsFactors = FALSE)
``````

Let us have a glance at `data.frame`

``````> d
char fake_char fac char_fac num
1    a         1   1        a   1
2    b         2   2        b   2
3    c         3   3        c   3
4    d         4   4        d   4
5    e         5   5        e   5
``````

and let us run:

``````> sapply(d, mode)
char   fake_char         fac    char_fac         num
"character" "character"   "numeric"   "numeric"   "numeric"
> sapply(d, class)
char   fake_char         fac    char_fac         num
"character" "character"    "factor"    "factor"   "integer"
``````

Now you probably ask yourself "Where's an anomaly?" Well, I've bumped into quite peculiar things in R, and this is not the most confounding thing, but it can confuse you, especially if you read this before rolling into bed.

Here goes: first two columns are `character`. I've deliberately called 2nd one `fake_char`. Spot the similarity of this `character` variable with one that Dirk created in his reply. It's actually a `numerical` vector converted to `character`. 3rd and 4th column are `factor`, and the last one is "purely" `numeric`.

If you utilize `transform` function, you can convert the `fake_char` into `numeric`, but not the `char` variable itself.

``````> transform(d, char = as.numeric(char))
char fake_char fac char_fac num
1   NA         1   1        a   1
2   NA         2   2        b   2
3   NA         3   3        c   3
4   NA         4   4        d   4
5   NA         5   5        e   5
Warning message:
In eval(expr, envir, enclos) : NAs introduced by coercion
``````

but if you do same thing on `fake_char` and `char_fac`, you'll be lucky, and get away with no NA's:

``````> transform(d, fake_char = as.numeric(fake_char),
char_fac = as.numeric(char_fac))

char fake_char fac char_fac num
1    a         1   1        1   1
2    b         2   2        2   2
3    c         3   3        3   3
4    d         4   4        4   4
5    e         5   5        5   5
``````

If you save transformed `data.frame` and check for `mode` and `class`, you'll get:

``````> D <- transform(d, fake_char = as.numeric(fake_char),
char_fac = as.numeric(char_fac))

> sapply(D, mode)
char   fake_char         fac    char_fac         num
"character"   "numeric"   "numeric"   "numeric"   "numeric"
> sapply(D, class)
char   fake_char         fac    char_fac         num
"character"   "numeric"    "factor"   "numeric"   "integer"
``````

So, the conclusion is: Yes, you can convert `character` vector into a `numeric` one, but only if it's elements are "convertible" to `numeric`. If there's just one `character` element in vector, you'll get error when trying to convert that vector to `numerical` one.

And just to prove my point:

``````> err <- c(1, "b", 3, 4, "e")
> mode(err)
[1] "character"
> class(err)
[1] "character"
> char <- as.numeric(err)
Warning message:
NAs introduced by coercion
> char
[1]  1 NA  3  4 NA
``````

And now, just for fun (or practice), try to guess the output of these commands:

``````> fac <- as.factor(err)
> fac
???
> num <- as.numeric(fac)
> num
???
``````

Kind regards to Patrick Burns! =)

2010/02/19
276
2/19/2010 12:31:30 AM

if `x` is the column name of dataframe `dat`, and `x` is of type factor, use:

``````as.numeric(as.character(dat\$x))
``````
2013/11/22

I would have added a comment (cant low rating)

Just to add on user276042 and pangratz

``````dat\$x = as.numeric(as.character(dat\$x))
``````

This will override the values of existing column x

2014/12/06

While your question is strictly on numeric, there are many conversions that are difficult to understand when beginning R. I'll aim to address methods to help. This question is similar to This Question.

Type conversion can be a pain in R because (1) factors can't be converted directly to numeric, they need to be converted to character class first, (2) dates are a special case that you typically need to deal with separately, and (3) looping across data frame columns can be tricky. Fortunately, the "tidyverse" has solved most of the issues.

This solution uses `mutate_each()` to apply a function to all columns in a data frame. In this case, we want to apply the `type.convert()` function, which converts strings to numeric where it can. Because R loves factors (not sure why) character columns that should stay character get changed to factor. To fix this, the `mutate_if()` function is used to detect columns that are factors and change to character. Last, I wanted to show how lubridate can be used to change a timestamp in character class to date-time because this is also often a sticking block for beginners.

``````library(tidyverse)
library(lubridate)

# Recreate data that needs converted to numeric, date-time, etc
data_df
#> # A tibble: 5 Ã— 9
#>             TIMESTAMP SYMBOL    EX  PRICE  SIZE  COND   BID BIDSIZ   OFR
#>                 <chr>  <chr> <chr>  <chr> <chr> <chr> <chr>  <chr> <chr>
#> 1 2012-05-04 09:30:00    BAC     T 7.8900 38538     F  7.89    523  7.90
#> 2 2012-05-04 09:30:01    BAC     Z 7.8850   288     @  7.88  61033  7.90
#> 3 2012-05-04 09:30:03    BAC     X 7.8900  1000     @  7.88   1974  7.89
#> 4 2012-05-04 09:30:07    BAC     T 7.8900 19052     F  7.88   1058  7.89
#> 5 2012-05-04 09:30:08    BAC     Y 7.8900 85053     F  7.88 108101  7.90

# Converting columns to numeric using "tidyverse"
data_df %>%
mutate_all(type.convert) %>%
mutate_if(is.factor, as.character) %>%
mutate(TIMESTAMP = as_datetime(TIMESTAMP, tz = Sys.timezone()))
#> # A tibble: 5 Ã— 9
#>             TIMESTAMP SYMBOL    EX PRICE  SIZE  COND   BID BIDSIZ   OFR
#>                <dttm>  <chr> <chr> <dbl> <int> <chr> <dbl>  <int> <dbl>
#> 1 2012-05-04 09:30:00    BAC     T 7.890 38538     F  7.89    523  7.90
#> 2 2012-05-04 09:30:01    BAC     Z 7.885   288     @  7.88  61033  7.90
#> 3 2012-05-04 09:30:03    BAC     X 7.890  1000     @  7.88   1974  7.89
#> 4 2012-05-04 09:30:07    BAC     T 7.890 19052     F  7.88   1058  7.89
#> 5 2012-05-04 09:30:08    BAC     Y 7.890 85053     F  7.88 108101  7.90
``````
2018/10/24

Tim is correct, and Shane has an omission. Here are additional examples:

``````R> df <- data.frame(a = as.character(10:15))
R> df <- data.frame(df, num = as.numeric(df\$a),
numchr = as.numeric(as.character(df\$a)))
R> df
a num numchr
1 10   1     10
2 11   2     11
3 12   3     12
4 13   4     13
5 14   5     14
6 15   6     15
R> summary(df)
a          num           numchr
10:1   Min.   :1.00   Min.   :10.0
11:1   1st Qu.:2.25   1st Qu.:11.2
12:1   Median :3.50   Median :12.5
13:1   Mean   :3.50   Mean   :12.5
14:1   3rd Qu.:4.75   3rd Qu.:13.8
15:1   Max.   :6.00   Max.   :15.0
R>
``````

Our `data.frame` now has a summary of the factor column (counts) and numeric summaries of the `as.numeric()` --- which is wrong as it got the numeric factor levels --- and the (correct) summary of the `as.numeric(as.character())`.

2011/12/06

With the following code you can convert all data frame columns to numeric (X is the data frame that we want to convert it's columns):

``````as.data.frame(lapply(X, as.numeric))
``````

and for converting whole matrix into numeric you have two ways: Either:

``````mode(X) <- "numeric"
``````

or:

``````X <- apply(X, 2, as.numeric)
``````

Alternatively you can use `data.matrix` function to convert everything into numeric, although be aware that the factors might not get converted correctly, so it is safer to convert everything to `character` first:

``````X <- sapply(X, as.character)
X <- data.matrix(X)
``````

I usually use this last one if I want to convert to matrix and numeric simultaneously

2017/07/04

Licensed under CC-BY-SA with attribution
Not affiliated with Stack Overflow
Email: [email protected]