Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

Can someone explain to me this odd thing:

When in python shell I type the following Cyrillic string:

>>> print 'абвгд'
абвгд

but when I type:

>>> print u'абвгд'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-9: ordinal not in range(128)

Since the first tring came out correctly, I reckon my OS X terminal can represent unicode, but it turns out it can't in the second case. Why ?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
736 views
Welcome To Ask or Share your Answers For Others

1 Answer

>>> print 'абвгд'
абвгд

When you type in some characters, your terminal decides how these characters are represented to the application. Your terminal might give the characters to the application encoded as utf-8, ISO-8859-5 or even something that only your terminal understands. Python gets these characters as some sequence of bytes. Then python prints out these bytes as they are, and your terminal interprets them in some way to display characters. Since your terminal usually interprets the bytes the same way as it encoded them before, everything is displayed like you typed it in.

>>> u'абвгд'

Here you type in some characters that arrive at the python interpreter as a sequence of bytes, maybe encoded in some way by the terminal. With the u prefix python tries to convert this data to unicode. To do this correctly python has to known what encoding your terminal uses. In your case it looks like Python guesses your terminals encoding would be ASCII, but the received data doesn't match that, so you get an encoding error.

The straight forward way to create unicode strings in an interactive session would therefore be something like this this:

>>> us = 'абвгд'.decode('my-terminal-encoding')

In files you can also specify the encoding of the file with a special mode line:

# -*- encoding: ISO-8859-5 -*-
us = u'абвгд'

For other ways to set the default input encoding you can look at sys.setdefaultencoding(...) or sys.stdin.encoding.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...