From ?Quotes
:
xnn character with given hex code (1 or 2 hex digits) unnnn Unicode character with given code (1--4 hex digits)
In the case where the Unicode character has only one or two digits, I would expect these characters to be the same. In fact, one of the examples on the ?Quotes
help page shows:
"x48x65x6cx6cx6fx20x57x6fx72x6cx64x21"
## [1] "Hello World!"
"u48u65u6cu6cu6fu20u57u6fu72u6cu64u21"
## [1] "Hello World!"
However, under Linux, when trying to print a pound sign, I see
cat("ua3")
## £
cat("xa3")
## ?
That is, the x
hex code fails to display correctly. (This behaviour persisted with any locale that I tried.) Under Windows 7 both versions show a pound sign.
If I convert to integer and back then the pound sign displays correctly under Linux.
cat(intToUtf8(utf8ToInt("xa3")))
## £
Incidentally, this doesn't work under Windows, since utf8ToInt("xa3")
returns NA
.
Some x
characters return NA
under Windows but throw an error under Linux. For example:
utf8ToInt("xf0")
## Error in utf8ToInt("xf0") : invalid UTF-8 string
("uf0"
is a valid character.)
These examples show that there are some differences between x
and u
forms of characters, which seem to be OS-specific, but I can't see any logic in how they are defined.
What are the difference between these two character forms?
See Question&Answers more detail:os