My text has some HTML escaped characters, for instance, instead of '
there is '
. Now I would like to unescape these sequences. Since I do not know which characters are escaped, I do not want to use a simple mapping such as in c("'"="'", ...)
.
I understand that the number after the ampersand is the decimal unicode number. So '
is u27
since 27 is the hexidecimal representation of 39. So I thought a solution that involves
sprintf("u%x", s)
where s
is the extracted number between &
and ;
. However, this results in an error: "u used without hex numbers."
What would be a better approach to convert HTML escaped sequences back to characters?
See Question&Answers more detail:os