python-代码段记录2

发表于 2023-02-06 更新于 2023-02-27 阅读次数：

python3对中文应该是友好一些,只不过目前工作用到的是2.7

目前理解:

str在python中内部是unicode编码,在做编码转换时,通常要以unicode作为中间码,
要先知道原str的编码,然后使用decode解码成unicode,再转换encode成其他编码(如常用的utf-8)

Python2.7默认使用的字符集是ASCII,只是让你的程序在运行的过程中显示中文.

1	# -- coding: UTF-8 --

1
2
3

regex = re.compile(u'[\u4e00-\u9fa5]+')
if regex.search(i) is None:

1
2
3

import sys
reload(sys)
sys.setdefaultencoding('utf-8')

# 编
def encode_utf8(string):
    return string.encode('utf-8')
# 解
def decode_utf8(string)
    return unicode(string, encoding='utf-8')

1
2
3

print(str.strip().decode('utf-8'))
#或者
b"str".decode('utf-8')