¸üÐÂʱ¼ä:2017Äê11ÔÂ09ÈÕ17ʱ46·Ö À´Ô´:ÀÖÓã²¥¿Í ä¯ÀÀ´ÎÊý:
1. ±¾ÎIJ»Ì¸¸´ÔÓµÄÀíÂÛ£¬¾Í¾Ñé½ÌÄã×Ö·û´¦Àí°Ë×ÖÕæÑÔ£ºÈ·¶¨±àÂ룬ͬÀཻ»¥¡£
2. ÎÄÕÂÕë¶ÔPython 2.7£¬Ö÷ÒªÒòΪ3¶ÔµÄ±àÂëÒѾÓÐÁ˺ܴóµÄ¸ÄÉÆ²¢ÇÒʵ¼ÊÔÀíÒ»Ñù£¬¸ü¸ÄһϲÙ×÷ÃüÁî¼´¿É¡£
3. Á˽âÍê±¾ÎÄ£¬Äã¿ÉÒÔÇáËɽâ¾öÎÄ×Ö´¦Àí£¬ÌØÊâÆ½Ì¨(Windows?)ϵıàÂ룬ÅÀ³æ±àÂëµÈÎÊÌâ¡£
ÔĶÁ½¨Òé
±¾ÎÄ·ÖΪÈçϼ¸¸ö²¿·Ö£º
• ÔÀí
• ¾ßÌå²Ù×÷
• ½¨ÒéµÄʹÓÃϰ¹ß
• ÒÉÄÑÎÊÌâ½â´ð
Èç¹ûÏëÒªÁ˽âÎÒ¸ø³öµÄʹÓÃϰ¹ß£¬¿ÉÒÔÖ±½ÓÌøµ½½¨ÒéµÄʹÓÃϰ¹ß¡£
Èç¹ûÖ»ÏëÒª½â¾öÏà¹ØÎÊÌâ¿ÉÒÔÖ±½ÓÌøµ½ÒÉÄÑÎÊÌâ½â´ð¡£
Ï£Íû±¾ÎÄÄܹ»°ïµ½Äã¡£
Ò». ÔÀí
ΪÁËÀí½â·½±ã£¬ÕâÀﲻ̸ÀíÂÛÖ»×öÀà±È£¬¾ßÌåÏëÒª½øÒ»²½Á˽â¸÷ÖÖ±àÂëµÄÀíÂ۵İٶÈһϺÃÁË¡£
1. Ê×ÏÈ˵һÏÂÎÒÃÇΪʲô»áÅöµ½¸÷ʽ¸÷ÑùµÄ±àÂëÎÊÌ⣺
• ÒòΪÎÒÃÇûÓÐͳһ±àÂë
• ÒòΪÎÒÃÇûÓÐÓöÔÃüÁî(´«¶ÔÊý¾Ý)
2. ÔÙ˵һϱàÂëÊÇʲô£¬PythonµÄ±àÂë¿´ËÆ¸´ÔÓ£¬Êµ¼ÊÉÏ¿ÉÒÔ¿´×öÖ»ÓÐÁ½Àà±àÂ룺Unicode£¬¶þ½øÖÆ
• Unicode ÏàÐŶ¼ºÜÊìϤ£º£¬¾ÍÊÇ\u0000ÕâÑùµÄ
• ¶þ½øÖƱàÂëÒ²ºÜ¼òµ¥£¬¾ÍÊÇ\x00\x00ÕâÑùµÄ£¬Æ½³£¿´µ½µÄutf-8,cp936¶¼ÊǶþ½øÖƱàÂë
• ¶þ½øÖƱàÂëÊǾßÏóµÄ£¬10001100ÔÑù¾Í¿ÉÒÔ´æ´¢£¬¶øUnicodeÊdzéÏóµÄ£¬²»ÄÜÕâÑù´æ
#coding=utf8
# Unicode±àÂëÑÝʾ
print('Unicode:')
print(repr('Unicode±àÂë'))`
# ¶þ½øÖƱàÂëÑÝʾ
print(u'¶þ½øÖƱàÂë:')
print(repr('Unicode±àÂë'))`
# ÕâÀïÖ»ÊÇ¿´¸öÑù×Ó£¬´úÂë²»±ØÈ¥É
3. ÔÙ˵Ôõô×ö£¬¾ÍÊÇÖ»ÓÐͬÖÖ±àÂëÖ®¼ä²Å¿ÉÒÔ²Ù×÷
• ¾Ù¸ö¼òµ¥µÄÀà±È
¾Í°ÑÒ»´®Êý¾Ý±ÈΪ¿¾Ñ¼£¬ÎÒÃÇ×÷ΪÈ˺ÍѼ×Ó²»Í¬ÖÖ¿´´ý¿¾Ñ¼µÄ̬¶ÈÍêÈ«²»Ò»Ñù¡£
ÎÒÃÇ¿´µ½µÄÊÇÍíÉϵÄÅä²Ë£¬Ñ¼×Ó¿´µ½µÄÊÇ×Ô¼º¶þ¾Ë¡£
Èç¹ûÎÒÓÃѼ×ÓµÄÑÛ¹â¹ä¿¾Ñ¼µêµÄʱºò¾Í»á³öÎÊÌâ¡£
ÒòΪÎÒÔÚ¿¾Ñ¼µê¿´µ½Á˶¼ÊÇ×Ô¼ºµÄ¶þ¾Ë¡£
• ÕâÀï˵µÄͬÖÖ¾ÍÊÇÎÒÃÇÊìϤµÄ¸÷ÖÖ±àÂ뷽ʽ£ºutf-8,unicode,ucs-bom
• ÕâÒ²¾ÍÊDZàÂëÎÊÌâµÄºËÐÄ£¬·Ç³£ÖØÒª¡£
4. ×îºó˵һÏÂPythonµÄ»·¾³
• ±¾Éí´úÂëÊÇÓÃAscii½âÂëµÄ£¬ÎļþÀïÓÐAsciiÎÞ·¨½âÂëµÄÄÚÈݵĻ°Òª¸æÖªPythonÔõô½âÂë
• ÄÚ²¿´óÁ¿ÃüÁî¶¼ÊÇĬÈϽÓÊÜUnicode
# ¸æÖªµÄÃüÁî¾ÍÊÇÏÂÃæÕâÒ»ÐУ¬É¾µô¾Í»á±¨´í
#coding=utf8
print(u'²âÊÔ±àÂë')
¶þ. ¾ßÌå²Ù×÷
Äõ½¸÷ÖÖ±àÂëµÄÄÚÈÝ×ÔÈ»ÊDz»ÓÃ˵£¬ÄÇôÈç¹ûÎÒÃÇÏëÒª×Ô¼º¹¹ÔìÔõô×öÄØ£¬¿´ÏÂÃæ£º
#coding=utf8
# ×Ö·û´®Ç°Ãæ¼Óu»áĬÈϹ¹Ôì³öUnicodeµÄ×Ö·û´®
unicodeString = u'Unicode×Ö·û´®'
# ×Ö·û´®Ç°ÃæÊ²Ã´¶¼²»¼Ó»á¹¹Ôì³öĬÈϱàÂë(Ê×ÐÐÏÞ¶¨ÁËÏÖÔÚµÄutf8)µÄ×Ö·û´®
utf8String = 'Utf-8×Ö·û´®'
# µ±È»£¬Ã»ÓÐÊ×ÐУ¬Ä¬ÈϵıàÂëÊÇAscii
ÄÇôËûÃÇÖ®¼äÔõôת»»ÄØ£¬Í¬ÑùºÜ¼òµ¥£º
# ½ÓÉÏÒ»¶Î³ÌÐò
# Unicodeת»¯Îª¶þ½øÖƱàÂëÖеÄÒ»ÖÖ£ºutf8
unicodeString.encode('utf8')
# ¶þ½øÖƱàÂë¸ù¾Ý×Ô¼ºµÄ±àÂëÖÖÀàת»¯ÎªUnicode
utf8String.decode('utf8')
# Èç¹û¶þ½øÖƱàÂëÖÐ»ì½øÁËÆæ¹ÖµÄ¶«Î÷¿ÉÒÔ¸ù¾ÝÐèÇóÓÃÌØÊâµÄdecode²ßÂÔ
print(repr('u8×Ö\x00·û´®'.decode('utf8', 'replace')))
ÄÇôÔõôÑù»á³öÏÖÎÊÌâÄØ£º
# ½Ó×ÅÉÏÒ»¶Î³ÌÐò˵
# Èç¹ûÎÒÃǰÑËûÃÇת»¯³ÉͬÑùµÄ±àÂ뷽ʽ¾Í¿ÉÒÔ²Ù×÷(ÀýÈçÏà¼Ó)
print(repr(unicodeString + utf8String.decode('utf8')))
print(repr(unicodeString.encode('utf8') + utf8String))
# µ«Èç¹û²»×ª»¯£¬µ±È»¾Í»á³öÏÖÂúÊÀ½çµÄ¿¾Ñ¼¶þ¾ËÀ²
unicodeString + utf8String
# ËùÒÔÁíÒ»·½ÃæÒ²·¢ÏÖ£¬±àÂëת»»ÊÇÐèÒªÎÒÃǸæËß³ÌÐòÔõô×öµÄ
# ËùÓÐ`decode`²Ù×÷¶¼»áÉú³ÉUnicode±àÂ룬ÕâÊÇΪÁË·½±ãÎÒ֮ǰ˵µÄ´óÁ¿½ÓÊÜUnicodeµÄÄÚ²¿ÃüÁî
ËùÒÔÎÒÃÇÐèҪȷ¶¨³ÌÐòʹÓõıàÂ룬ÕâÊÇÎÒÃÇÐèÒª¸æËß³ÌÐòµÄ¶«Î÷
• Ò»·½ÃæÔÚ²Ù×÷×Ö·û´®µÄʱºòÈ·¶¨ÊÇͬÖÖ±àÂë
• ÁíÒ»·½ÃæÔÚʹÓ÷Ç×Ô¼ºÐ´µÄÃüÁîʱ£¬Ò»°ãʹÓÃUnicode£¬»òÕßʹÓýÓÊÕ¶þ½øÖƱàÂëµÄÃüÁî
#coding=utf8
# ÕâÀïÄÃдÈëÎļþ¾ÙÀý
# Ò»°ãʹÓÃUnicode
with open('Unicode.txt', 'w') as f: f.write(u'Unicode²âÊÔ')
# »òÕßʹÓýÓÊÕ¶þ½øÖƱàÂëµÄÃüÁî
with open('Utf8.txt', 'wb') as f: f.write('Utf8²âÊÔ')
# Äã¿ÉÒÔ·´¹ýÀ´×ö¸ö²âÊÔ£¬×ÔÈ»»á±¨´í
# ¶þ½øÖƵÄÃüÁî·½±ãÁËÔÚ²»ÖªµÀÔõô½âÂëµÄÇé¿öÏÂÒ²ÄܽøÐвÙ×÷(дÈëÎļþ)
Èý. ½¨ÒéµÄʹÓÃϰ¹ß
ÏàÐŵ½ÕâÀïÎÒÒѾ°ÑÎÒ¶ÔÓÚ±àÂëµÄÀí½â½²ÍêÁË¡£
ÎÒÃÇΪʲô»áÅöµ½¸÷ʽ¸÷ÑùµÄ±àÂëÎÊÌ⣺
• ÒòΪÎÒÃÇûÓÐͳһ±àÂë
• ÒòΪÎÒÃÇûÓÐÓöÔÃüÁî(´«¶ÔÊý¾Ý)
ËùÒÔÕâÀïÔÙÖØÉêһϰË×ÖÕæÑÔ£ºÈ·¶¨±àÂ룬ͬÀཻ»¥
• Åöµ½ÎÊÌ⣬ÎÊÒ»ÏÂ×Ô¼º£¬ÎÒÏÖÔÚÊÇÄÄÖÖ±àÂë
• ͬһÖÖ±àÂë²ÅÄܽ»»¥£¬ÄÇÎÒÓ¦¸ÃÊÇÄÄÖÖ±àÂë
ÕâÀï¸ø³öÎÒµÄʹÓÃϰ¹ß£º
• È·¶¨Ò»ÖÖÄÚ²¿±àÂë
• ÄÚ²¿±àÂëµÄÑ¡ÔñÓÅÏȼ¶ÈçÏ£º³ÌÐò±ØÐëʹÓõıàÂë¡¢µÚÈý·½°üʹÓõıàÂë¡¢Äãϲ»¶µÄ±àÂë¡¢Unicode
• ÔÚÊä³öʱÔÙ¸ü¸Äµ½Ìض¨µÄ±àÂë
¼ÇµÃÔÚ¿ªÊ¼Õû¸ö³ÌÐò֮ǰȷ¶¨ÄÚ²¿µÄ±àÂ룬·ñÔò±àÂëÒ»ÍÅÔã»á²úÉúºÜ¶à²»±ØÒªµÄbug¡£
²»ÒªÃÔÐÅÄÚ²¿Unicode£¬ÀýÈçEvernote¿ª·¢¾ÍÓ¦¸Ã¸ù¾ÝµÚÈý·½°üʹÓõÄUtf8È·¶¨ÄÚ²¿±àÂë¡£
ËÄ. ÒÉÄÑÎÊÌâ½â´ð
±àÂëʶ±ð
˵ÁËҪȷ¶¨±àÂ룬ÄÇôÄõ½Ò»´®¶þ½øÖÆÒªÔõôȷ¶¨±àÂëÄØ?
×î¼òµ¥µÄ·½·¨ÊÇchardet£º(ÐèÒª°²×°)
python -m pip install chardet
ʹÓ÷dz£¼òµ¥£º
#coding=utf8
from chardet import detect
print(detect('ÕâÊÇÒ»´®utf8µÄ²âÊÔ×Ö·û'))
# ½á¹û£º`{'confidence': 0.99, 'encoding': 'utf-8'}`
ÁíÍâÀýÈç×¥È¡ÍøÕ¾£¬ÄÇôͷÎļþÖкÜÓпÉÄÜÓÐÌáʾÈçºÎ½âÂ룬¼ÇµÃ²»ÒªÍü¼ÇÁË¡£
±àÂëת»»
ºÜ¿ÉÄÜÒòΪ×Ö·û´®ÖвÎÔÓÁËÆæ¹ÖµÄ¶«Î÷£¬µ¼Ö¼´Ê¹±àÂëÖÖÀàÕýÈ·£¬ÒÀ¾ÉÎÞ·¨½âÂë¡£
ÎÒÖªµÀÎÒ֮ǰ½²¹ýÁË£¬µ«¿ÉÄÜÓÐÈËÖ±½ÓÌøÒÉÄÑÎÊÌâ½â´ðÂï¡£
ÕâÀï¿ÉÒÔʹÓÃdecodeµÄµÚ¶þ¸ö²ÎÊý£º
#coding=utf8
# ×Ö·û´®ÖÐ»ì½øÁË\x00
rubbishUtf8String = 'Utf-8×Ö\x00·û´®'
print(repr(rubbishUtf8String.decode('utf8', 'replace')))
print(repr(rubbishUtf8String.decode('utf8', 'ignore')))
ÌØÊâÆ½Ì¨Ï±àÂë
ºÜ¶àÈ˶¼ËµWindowsÊǸö¿Ó£¬¼´Ê¹ÔÚPython 3ÏÂÃæÒ²Ò»Ñù¡£
ÒòΪÖÐÎÄÎļþÃû³öÀ´¶¼ÊÇÂÒÂë¡£
ÕâÀïʹÓÃÒ»¸öÈ¡Çɵķ½·¨£ºÆ½Ì¨±àÂëÔÙÌØÊ⣬ÆðÂëÃüÁîÐжÁÈ¡ºÍ´´½¨Ò»¸öÎļþ¼Ð²»»á³öÂÒÂë°É¡£
import sys, os
for folder in os.walk('.').next()[1]:
print(folder.decode(sys.stdin.encoding))
ͬÑùµÄÊäÈëÊä³öÒ²¿ÉÒÔÕâÑù×öÓÅ»¯£º
import sys
def sys_print(msg):
print(msg.encode(sys.stdin.encoding))
def sys_input(msg):
return raw_input(msg.encode(sys.stdin.encoding)).decode(sys.stdin.encoding)
ÎļþдÈë
Èç¹û×¥ÏÂÀ´Ò»¸öÄÚÈݲ»ÖªµÀÔõô½âÂ룬µ«»¹ÊÇÏëҪдÈëÎļþÔõô°ì
дÈëÎļþµÄʱºòÖÆ¶¨Óöþ½øÖÆÃüÁî¼´¿É£º
#coding=utf8
import urllib
with open('Utf8.txt', 'wb') as f: f.write('Utf8²âÊÔ')
# ±ÈÈç×¥Á˸öÍøÒ³£¬²»ÖªµÀ±àÂëÒ²¿ÉÒÔдÈëÎļþ½øÐÐһϵÁвÙ×÷
content = urllib.urlopen('http://www.baidu.com').read()
with open('baidu.txt', 'wb') as f: f.write(content)
ÂãUnicode×Ö·û
Unicode´æ³ÉÁù¸öAscii×Ö·ûÔõô°ì?ÆäʵҲ¿ÉÒÔdecode
#coding=utf8
# ÕâÊÇÆÕͨµÄUnicode
s = u'²â'
for i in s: print(i)
print(repr(s))
# ÕâÊÇÂãUnicode£¬Êµ¼Ê´æ³ÉÁËÁù¸öAscii
s = repr(s)[2:-1]
for i in s: print(i)
print(repr(s))
# ת»¯ÆäʵҲºÜ¼òµ¥
s = s.decode('unicode-escape')
for i in s: print(i)
print(repr(s))
ºÃÁË£¬Ï£ÍûÕâÆªÎÄÕ¶Դó¼Ò½â¾öPython±àÂëÎÊÌâÓÐËù°ïÖú!¡¾×¢£ºÐèÒª¸ü¶àÃâ·ÑѧϰÊÓÆµ+×ÊÁÏ+Ô´Â룬Çë¼ÓQQ£º3276250747¡¿
±±¾©Ð£Çø