Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

This question has unicode text that may not display correctly in all browsers.

clang now (>3.3) supports unicode characters in variable names http://llvm.org/releases/3.3/tools/clang/docs/ReleaseNotes.html#major-new-features.

However some special character are still forbiden.

int main(){
    double α = 2.; // alpha, ok!
    double ∞ = 99999.; // infinity, error
}

giving:

error: non-ASCII characters are not allowed outside of literals and identifiers
        double ∞ = 99999.;

What is the fundamental difference between α (alpha) and (infinty) for clang? That the former is unicode and the latter is not unicode but at the same time is not ASCII?

Is there a workaround or an option to allow this set of characters in clang (or BTW in gcc)?

Notes: 1) is just an example, there are a lot of characters that are potentially useful but also forbidden, like or ?. 2) I am not asking if it is good idea, please take it as a technical question. 3) I am interested in C++ compiler of clang 3.4 in Linux (gcc 4.8.3 doesn't support this). I am saving the source files with gedit using UTF-8 encoding and Unix/Linux line ending. 4) adding other normal first characters doesn't help: _∞


The answers point to a definite NO. Some ranges are indeed not allowed nor will they be soon. To move one step further to total craziness, the best alternative I found was to use characters that effectively look the same. (Now, this I might admit is not a good idea.) Those alternatives can be found here http://shapecatcher.com/. The result (sorry if it hurts your eyes):

//    double ∞ = 99999.; // still error
//    double ? = 99999.; // infinity negated still error
  double ? = 99999.; // letter oo
  double ? = 99999.; // letter OO
//    double ? = 99999.; // incomplete infinity still error

Other "alternative" dead ringers mentioned in the question that are in the allowed range: ?, ????????.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
404 views
Welcome To Ask or Share your Answers For Others

1 Answer

So the clang document says (emphasis mine):

This feature allows identifiers to contain certain Unicode characters, as specified by the active language standard;

This is covered in the draft C++ standard Annex E, the characters allowed are as follows:

E.1 Ranges of characters allowed [charname.allowed]

00A8, 00AA, 00AD,

00AF, 00B2-00B5, 00B7-00BA, 00BC-00BE, 00C0-00D6, 00D8-00F6, 00F8-00FF

0100-167F, 1681-180D, 180F-1FFF 200B-200D, 202A-202E, 203F-2040, 2054,

2060-206F 2070-218F, 2460-24FF, 2776-2793, 2C00-2DFF, 2E80-2FFF

3004-3007, 3021-302F, 3031-303F

3040-D7FF F900-FD3D, FD40-FDCF,

FDF0-FE44, FE47-FFFD

10000-1FFFD, 20000-2FFFD, 30000-3FFFD, 40000-4FFFD, 50000-5FFFD, 60000-6FFFD, 70000-7FFFD, 80000-8FFFD, 90000-9FFFD, A0000-AFFFD, B0000-BFFFD, C0000-CFFFD, D0000-DFFFD, E0000-EFFFD

The code for infinity 221E is not included in the list.

For reference: these are the codes above converted to unicode characters (some of them may not display correctly in all browsers/available fonts).

¨, a, -,

ˉ, 2-μ, ·-o, ?-?, à-?, ?-?, ?-?

ā-?, ?-?, ?-? ?-?, ?-?, ?-?, ?,

?-? ?-?, ①-?, ?-?, ?-?, ?-?

?-〇, 〡-?, ?-?

?-? 豈-?, ?-?,

?-﹄, ?-?

??-??, ??-??, ??-??, ??-??, ??-??, ??-??, ??-??, ??-??, ??-??, ??-??, ??-??, ??-??, ??-??, ??-??

I could not find an extensive document that covers the rationale for the ranges chosen although N3146: Recommendations for extended identifier characters for C and C++ does provides some details on the influences.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share

548k questions

547k answers

4 comments

86.3k users

...