Aiml中文包含英文(字母,特殊符号)识别问题的解决
????????aiml對識別純英文是沒問題的,但是,如果語句中包含了中文和英文字母就識別不了,主要原因是在每個漢子、字母中會加空格,與樣本中的標題不匹配,故找不到答案。
? ? ? ? 網上很多寫法是改寫aiml包Kernel類中_check_contain_english的方法,這種方法可行,但不方便,如果程序每部署到一個地方,就要改寫下環境包,實則不是明智之舉,另外,如果哪天環境一升級,問題又來了。
? ? ? ? 既然不能改變別人,那就改變我們自己,我們改造下自己程序就可以了。創建一個類,我們重寫下aiml.Kernel就可以了,我重新的是learn、respond兩個方法,不讓他們加空格就是了。當然,應該根據自己的實際情況而定,因為我的項目只是拿來作為智能客服系統檢索答案用,對話中也不會用到全英文(語義識別、實體抽取等用的是NLP其他模型),上代碼:
# !/usr/bin/env python # -*- coding: UTF-8 –*-import sys import os import aiml import time import glob import xml.sax from aiml.Kernel import create_parser from aiml import Utilsclass myAiml(aiml.Kernel):def __init__(self):super(myAiml, self).__init__()def learn(self, filename):"""Load and learn the contents of the specified AIML file.If filename includes wildcard characters, all matching fileswill be loaded and learned."""for f in glob.glob(filename):if self._verboseMode: print("Loading %s..." % f, end="")start = time.clock()# Load and parse the AIML file.parser = create_parser()handler = parser.getContentHandler()handler.setEncoding(self._textEncoding)try:parser.parse(f)except xml.sax.SAXParseException as msg:err = "\nFATAL PARSE ERROR in file %s:\n%s\n" % (f, msg)sys.stderr.write(err)continue# store the pattern/template pairs in the PatternMgr.em_ext = os.path.splitext(filename)[1]for key, tem in handler.categories.items():new_key = keyif key and key[0] and key[1] and key[2] and em_ext == '.aiml' and (not self._check_contain_english(key[0])):new_key = (''.join(key[0]), key[1], key[2])elif key and key[0] and key[1] and key[2] and em_ext == '.aiml' and self._check_contain_english(key[0]):new_key = (key[0].upper(), key[1], key[2])self._brain.add(new_key, tem)# Parsing was successful.if self._verboseMode:print("done (%.2f seconds)" % (time.clock() - start))def respond(self, input_, sessionID=aiml.Kernel._globalSessionID):"""Return the Kernel's response to the input string."""if len(input_) == 0:return u""# Decode the input (assumed to be an encoded string) into a unicode# string. Note that if encoding is False, this will be a no-optry:input_ = self._cod.dec(input_)except UnicodeError:passexcept AttributeError:pass# prevent other threads from stomping all over us.self._respondLock.acquire()try:# Add the session, if it doesn't already existself._addSession(sessionID)# split the input into discrete sentencessentences = Utils.sentences(input_)finalResponse = u""for index, s in enumerate(sentences):if not self._check_contain_english(s):s = ''.join(s)# Add the input to the history list before fetching the# response, so that <input/> tags work properly.inputHistory = self.getPredicate(self._inputHistory, sessionID)inputHistory.append(s)while len(inputHistory) > self._maxHistorySize:inputHistory.pop(0)self.setPredicate(self._inputHistory, inputHistory, sessionID)# Fetch the responseresponse = self._respond(s, sessionID)# add the data from this exchange to the history listsoutputHistory = self.getPredicate(self._outputHistory, sessionID)outputHistory.append(response)while len(outputHistory) > self._maxHistorySize:outputHistory.pop(0)self.setPredicate(self._outputHistory, outputHistory, sessionID)# append this response to the final response.finalResponse += (response + u" ")finalResponse = finalResponse.strip()# print( "@ASSERT", self.getPredicate(self._inputStack, sessionID))assert (len(self.getPredicate(self._inputStack, sessionID)) == 0)# and return, encoding the string into the I/O encodingreturn self._cod.enc(finalResponse)finally:# release the lockself._respondLock.release()以上代碼,主要是把空格去掉(標紅色),然后就可以對中文和英文混合句子進行識別了。
調用如下:
from .myAiml import myAiml
self.__alice__ = myAiml() ?# 創建機器人alice對象
self.__alice__.learn('startup.xml') ?# 加載startup.xml
self.__alice__.respond('這里是目錄') ?# 加載目錄下的語料庫
跟正常一樣調用。
總結
以上是生活随笔為你收集整理的Aiml中文包含英文(字母,特殊符号)识别问题的解决的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 超强免费OCR文字识别工具推荐
- 下一篇: nodejs历史版本下载