字符编码与转码
字符編碼與轉(zhuǎn)碼
?詳細(xì)文章:
?http://www.cnblogs.com/yuanchenqi/articles/5956943.html
?http://www.diveintopython3.net/strings.html
?
1.在python2默認(rèn)編碼是ASCII, python3里默認(rèn)是unicode
2.unicode 分為 utf-32(占4個(gè)字節(jié)),utf-16(占兩個(gè)字節(jié)),utf-8(占1-4個(gè)字節(jié)), so utf-16就是現(xiàn)在最常用的unicode版本, 不過(guò)在文件里存的還是utf-8,因?yàn)閡tf8省空間
3.在py3中encode,在轉(zhuǎn)碼的同時(shí)還會(huì)把string 變成bytes類(lèi)型,decode在解碼的同時(shí)還會(huì)把bytes變回string
?
??上圖僅適用于python2
?
在python2中
#-*- encoding:utf-8 -*-import sysprint(sys.getdefaultencoding()) #打印系統(tǒng)默認(rèn)編碼 s='你好' s_to_unicode=s.decode('utf-8') #先解碼成unicodeprint(s_to_unicode,type(s_to_unicode))s_to_gbk=s_to_unicode.encode('gbk') #再編碼成gbkprint(s_to_gbk)print('你好')#把gbk再轉(zhuǎn)成utf-8 gbk_to_utf8=s_to_gbk.decode('gbk').encode('utf-8')print(gbk_to_utf8)?
在python3中默認(rèn)的字符編碼是unicode-utf8所以不需要decode了
#-*-coding:gb2312 -*- #這個(gè)也可以去掉 #默認(rèn)字符集為gb2312 __author__ = 'Alex Li'import sys print(sys.getdefaultencoding())msg = "我愛(ài)北京天安門(mén)"#msg_gb2312 = msg.decode("utf-8").encode("gb2312") #在python2中需要先解碼成unicode再編碼成gb2312
msg_gb2312 = msg.encode("gb2312") #python3中默認(rèn)就是unicode utf8,不用再decode,喜大普奔
gb2312_to_unicode = msg_gb2312.decode("gb2312") gb2312_to_utf8 = msg_gb2312.decode("gb2312").encode("utf-8")print(msg) print(msg_gb2312) print(gb2312_to_unicode) print(gb2312_to_utf8)
?
轉(zhuǎn)載于:https://www.cnblogs.com/sunhao96/p/7601506.html
總結(jié)
- 上一篇: m邻接
- 下一篇: 路由框架ARouter最全源码解析