1500字范文,内容丰富有趣,写作好帮手!
1500字范文 > python3--中英文混合字符串的切分(中文按字断开 英文按单词分开 数字按空格等特殊

python3--中英文混合字符串的切分(中文按字断开 英文按单词分开 数字按空格等特殊

时间:2022-01-11 19:38:37

相关推荐

python3--中英文混合字符串的切分(中文按字断开 英文按单词分开 数字按空格等特殊

待切分句子:

s = "12、China"s Legend Holdings will split its several business arms to go public on stock markets, the group"s president Zhu Linan said on Tuesday.该集团总裁朱利安周二表示,haha中国联想控股将分拆其多个业务部门在股市上市,。"

切分结果:

["12", "china", "s", "legend", "holdings", "will", "split", "its", "several", "business", "arms", "to", "go", "public", "on", "stock", "markets", "the", "group", "s", "president", "zhu", "linan", "said", "on", "tuesday", "该", "集", "团", "总", "裁", "朱", "利", "安", "周", "二", "表", "示", "haha", "中", "国", "联", "想", "控", "股", "将", "分", "拆", "其", "多", "个", "业", "务", "部", "门", "在", "股", "市", "上", "市"]

代码:

import redef get_word_list(s1): # 把句子按字分开,中文按字分,英文按单词,数字按空格 regEx = pile("[\\W]*") # 我们可以使用正则表达式来切分句子,切分的规则是除单词,数字外的任意字符串 res = pile(r"([\u4e00-\u9fa5])") # [\u4e00-\u9fa5]中文范围 p1 = regEx.split(s1.lower()) str1_list = [] for str in p1: if res.split(str) == None: str1_list.append(str) else: ret = res.split(str) for ch in ret: str1_list.append(ch) list_word1 = [w for w in str1_list if len(w.strip()) > 0] # 去掉为空的字符 return list_word1if __name__ == "__main__": s = "12、China"s Legend Holdings will split its several business arms to go public on stock markets, the group"s president Zhu Linan said on Tuesday.该集团总裁朱利安周二表示,haha中国联想控股将分拆其多个业务部门在股市上市。" list_word1=get_word_list(s) print(list_word1)

python3--中英文混合字符串的切分(中文按字断开 英文按单词分开 数字按空格等特殊符号断开)

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。