How can I trim leading and trailing white space?
How can I trim leading and trailing white space?
Question
I am having some troubles with leading and trailing white space in a data.frame.
For example, I like to take a look at a specific row
in a data.frame
based on a certain condition:
> myDummy[myDummy$country == c("Austria"),c(1,2,3:7,19)]
[1] codeHelper country dummyLI dummyLMI dummyUMI
[6] dummyHInonOECD dummyHIOECD dummyOECD
<0 rows> (or 0-length row.names)
I was wondering why I didn't get the expected output since the country Austria obviously existed in my data.frame
. After looking through my code history and trying to figure out what went wrong I tried:
> myDummy[myDummy$country == c("Austria "),c(1,2,3:7,19)]
codeHelper country dummyLI dummyLMI dummyUMI dummyHInonOECD dummyHIOECD
18 AUT Austria 0 0 0 0 1
dummyOECD
18 1
All I have changed in the command is an additional white space after Austria.
Further annoying problems obviously arise. For example, when I like to merge two frames based on the country column. One data.frame
uses "Austria "
while the other frame has "Austria"
. The matching doesn't work.
- Is there a nice way to 'show' the white space on my screen so that I am aware of the problem?
- And can I remove the leading and trailing white space in R?
So far I used to write a simple Perl script which removes the whites pace, but it would be nice if I can somehow do it inside R.
Accepted Answer
Probably the best way is to handle the trailing white spaces when you read your data file. If you use read.csv
or read.table
you can set the parameterstrip.white=TRUE
.
If you want to clean strings afterwards you could use one of these functions:
# Returns string without leading white space
trim.leading <- function (x) sub("^\\s+", "", x)
# Returns string without trailing white space
trim.trailing <- function (x) sub("\\s+$", "", x)
# Returns string without leading or trailing white space
trim <- function (x) gsub("^\\s+|\\s+$", "", x)
To use one of these functions on myDummy$country
:
myDummy$country <- trim(myDummy$country)
To 'show' the white space you could use:
paste(myDummy$country)
which will show you the strings surrounded by quotation marks (") making white spaces easier to spot.
Popular Answer
As of R 3.2.0 a new function was introduced for removing leading/trailing white spaces:
trimws()
Read more... Read less...
To manipulate the white space, use str_trim() in the stringr package. The package has manual dated Feb 15, 2013 and is in CRAN. The function can also handle string vectors.
install.packages("stringr", dependencies=TRUE)
require(stringr)
example(str_trim)
d4$clean2<-str_trim(d4$V2)
(Credit goes to commenter: R. Cotton)
A simple function to remove leading and trailing whitespace:
trim <- function( x ) {
gsub("(^[[:space:]]+|[[:space:]]+$)", "", x)
}
Usage:
> text = " foo bar baz 3 "
> trim(text)
[1] "foo bar baz 3"
Ad 1) To see white spaces you could directly call print.data.frame
with modified arguments:
print(head(iris), quote=TRUE)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1 "5.1" "3.5" "1.4" "0.2" "setosa"
# 2 "4.9" "3.0" "1.4" "0.2" "setosa"
# 3 "4.7" "3.2" "1.3" "0.2" "setosa"
# 4 "4.6" "3.1" "1.5" "0.2" "setosa"
# 5 "5.0" "3.6" "1.4" "0.2" "setosa"
# 6 "5.4" "3.9" "1.7" "0.4" "setosa"
See also ?print.data.frame
for other options.
Removing leading and trailing blanks might be achieved through the trim() function from the gdata package as well:
require(gdata)
example(trim)
Usage example:
> trim(" Remove leading and trailing blanks ")
[1] "Remove leading and trailing blanks"
I'd prefer to add the answer as comment to user56's, but I am yet unable so writing as an independent answer.