data transformation python_Python 编码为什么那么蛋疼?
Python一共支持多少字符集?這里有個列表
{'ascii', 'base64-codec', 'big5', 'big5hkscs', 'bz2-codec', 'charmap', 'cp037', 'cp1006', 'cp1026', 'cp1140', 'cp1250', 'cp1251', 'cp1252', 'cp1253', 'cp1254', 'cp1255', 'cp1256', 'cp1257', 'cp1258', 'cp1361', 'cp367', 'cp424', 'cp437', 'cp500', 'cp720', 'cp737', 'cp775', 'cp819', 'cp850', 'cp852', 'cp855', 'cp856', 'cp857', 'cp858', 'cp860', 'cp861', 'cp862', 'cp863', 'cp864', 'cp865', 'cp866', 'cp869', 'cp874', 'cp875', 'cp932', 'cp936', 'cp949', 'cp950', 'euc-jis-2004', 'euc-jisx0213', 'euc-jp', 'euc-kr', 'gb18030', 'gb2312', 'gbk', 'hex-codec', 'hp-roman8', 'hz', 'idna', 'iso2022-jp', 'iso2022-jp-1', 'iso2022-jp-2', 'iso2022-jp-2004', 'iso2022-jp-3', 'iso2022-jp-ext', 'iso2022-kr', 'iso8859-1', 'iso8859-10', 'iso8859-11', 'iso8859-13', 'iso8859-14', 'iso8859-15', 'iso8859-16', 'iso8859-2', 'iso8859-3', 'iso8859-4', 'iso8859-5', 'iso8859-6', 'iso8859-7', 'iso8859-8', 'iso8859-9', 'johab', 'koi8-r', 'koi8-u', 'latin-1', 'mac-arabic', 'mac-centeuro', 'mac-croatian', 'mac-cyrillic', 'mac-farsi', 'mac-greek', 'mac-iceland', 'mac-latin2', 'mac-roman', 'mac-romanian', 'mac-turkish', 'mbcs', 'palmos', 'ptcp154', 'punycode', 'quopri-codec', 'raw-unicode-escape', 'rot-13', 'shift-jis', 'shift-jis-2004', 'shift-jisx0213', 'tis-620', 'unicode-escape', 'unicode-internal', 'utf-16', 'utf-16-be', 'utf-16-le', 'utf-32', 'utf-32-be', 'utf-32-le', 'utf-7', 'utf-8', 'utf-8-sig', 'uu-codec', 'zlib-codec'}
這些編碼中,有多少是和ascii 兼容的?
我把printable ascii 字符用上述編碼都encode一遍看看。。
s=T.printAscii
r=[];re=[]
for i in T.gcoding:
try:
r.append([i,s.encode(i) ])
except Exception as e:
re.append([i,e])
Out[60]: [118 通過, 2 錯誤, 120 總數]
In [61]: re
Out[61]:
[['cp864',
UnicodeEncodeError('charmap',
u' !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~',
5,
6,
'character maps to ')],
['idna', UnicodeError('label empty or too long')]]
In [62]: [i[0] for i in _]
Out[62]: ['cp864', 'idna']
ascii 數值 是處于 32-126 之間的,我再把編碼后的去除非ascii字符
rs=[]
for i in r:
rs.append([i[0],''])
for c in i[1]:
if not 31
rs[-1][-1]+=c
ri=[len(i[1]) for i in rs]
data = pd.Series(ri)
data.plot.hist()
plt.show()
總結
以上是生活随笔為你收集整理的data transformation python_Python 编码为什么那么蛋疼?的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: leetcode练习——数组篇(1)(s
- 下一篇: python分类预测_python做lo