I have a test site that has been using windows-1252 all along. They do need/use some symbols like the square root symbol. And they have no need to display in another language other than English. I was recently asked to switch it to UTF-8 because of some security concerns. After I changed it to UTF-8 the square roots and other symbols (which are being pulled out of an Oracle DB and passed through ColdFusion) would appear fine on the resulting web page. However, if I saved the document again (post to DB, page refreshes) the symbols transformed into strange characters. If I saved again even more strange characters would appear. So...
- If I don't need anything other than English is there anything wrong with sticking to windows-1252? Any security/hacking issues?
- Are there any implications of NOT using UTF-8 if you are using HTML5 (since that is the default encoding for HTML5)?
- If its recommended that I should switch to UTF-8, how do I get the currently stored square root symbols (and other symbols) to work?
I've already read all these pages, still having a little trouble grasping it all. Hoping someone here and help clarify for me. Thanks!
- https://www.owasp.org/index.php/Canonicalization,_locale_and_Unicode
- Excellent description of how UTF-8 came about, why it’s awesome, and the problems it solves… https://www.youtube.com/watch?v=MijmeoH9LT4
- http://www.w3.org/International/questions/qa-choosing-encodings “Use UTF-8, if you can”. “In fact the HTML5 specification draft currently says "Authors are encouraged to use UTF-8. Conformance checkers may advise authors against using legacy encodings. Authoring tools should default to using UTF-8 for newly-created documents."”
- http://www.w3schools.com/tags/ref_charactersets.asp “For HTML5, the default character encoding is UTF-8.”
- http://www.joelonsoftware.com/articles/Unicode.html
* * * UPDATE * * *
I appreciate all that help so far to make this easier to understand. I'll simplify the original 3 questions so hopefully a clear answer can be reached, so here it is: The customer doesn't need support for other languages, they will be using some HTML5 tags and a TON of JSON/XML traffic sent back and forth via jQuery.ajax(). Given that info, from a security standpoint, is there anything wrong with keeping the database set to NLS_CHARACTERSET: WE8MSWIN1252
and the webpages set to <CFHEADER NAME="Content-Type" value="text/html; charset=windows-1252">
? Thank you.
Here is another question that is a slight spin off from this one: Why am I able to use a character that's not part of a charset (windows-1252)?.
See Question&Answers more detail:os