(I'm using R.) For a list of words that's called "goodwords.corpus", I am looping through the documents in a corpus, and replacing each of the words on the list "goodwords.corpus" with the word + a number.
So for example if the word "good" is on the list, and "goodnight" is NOT on the list, then this document:
I am having a good time goodnight
would turn into:
I am having a good 1234 time goodnight
**I'm using this code (EDIT- made this reproducible):
goodwords.corpus <- c("good")
test <- "I am having a good time goodnight"
for (i in 1:length(goodwords.corpus)){
test <-gsub(goodwords.corpus[[i]], paste(goodwords.corpus[[i]], "1234"), test)
}
However, the problem is I want gsub to only replace ENTIRE words. The issue that arises is that: "good" is on the "goodwords.corpus" list, but then "goodnight", which is NOT on the list, is also affected. So I get this:
I am having a good 1234 time good 1234night
Is there anyway I can tell gsub to only replace ENTIRE words, and not words that might be a part of other words?
I want to use this:
test <-gsub("\<goodwords.corpus[[i]]\>", paste(goodwords.corpus[[i]], "1234"), test)
}
I've read that the < and > will tell gsub to only look for whole words. But obviously that doesn't work, because goodwords.corpus[[i]] won't work when it's in quotes.
Any suggestions?
See Question&Answers more detail:os