Skip to content

模糊拼音转汉字,主要可用于中文ASR语言模型

Notifications You must be signed in to change notification settings

mokundong/MHPinYin2Hanzi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

鸣谢

本项目基本参考@letiantian大神的Pinyin2Hanzi作品,详情可以点击查看

安装

安装和使用方法参考Pinyin2Hanzi

示例

from MHPinyin2Hanzi import DefaultHmmParams
from MHPinyin2Hanzi import viterbi

hmmparams = DefaultHmmParams()

# 2个候选
pinyin = (
   "si","dao","luan","le","ma","xiong","hai","zi")

result = viterbi(hmm_params=hmmparams, observations=pinyin,
                 path_num=2)

print(pinyin)
for item in result:
    print(item.score, item.path)

输出

('si', 'dao', 'luan', 'le', 'ma', 'xiong', 'hai', 'zi')
5.950955778865384e-20 ['是', '捣', '乱', '了', '吗', '熊', '孩', '子']
1.7656844345456464e-20 ['是', '捣', '乱', '了', '嘛', '熊', '孩', '子']

更多示例见example

训练

原始数据和训练代码在train目录下。数据来自jpinyinpinyin搜狗语料库-互联网词库等。处理数据时用到了汉字转拼音 工具ChineseTone

原理

如何实现拼音与汉字的互相转换的基础上,我们将常见易发错的拼音作为多音字用于模型训练,见train下processTrain.py

License

MIT

About

模糊拼音转汉字,主要可用于中文ASR语言模型

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published