當前位置：首頁 > 编程语言 > python >内容正文

python

【编译原理】让我们来构建一个简单的解释器（Let’s Build A Simple Interpreter. Part 4.）（python/c/c++版）（笔记）

發(fā)布時間：2025/3/20 python 38 豆豆

生活随笔收集整理的這篇文章主要介紹了【编译原理】让我们来构建一个简单的解释器（Let’s Build A Simple Interpreter. Part 4.）（python/c/c++版）（笔记）小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

【編譯原理】讓我們來構建一個簡單的解釋器（Let’s Build A Simple Interpreter. Part 4.）

文章目錄

- python代碼
- C語言代碼
- 總結

在上一篇文章中，您學習了如何解析（識別）和解釋包含任意數(shù)量的加號或減號運算符的算術表達式，例如“7 - 3 + 2 - 1”。您還了解了語法圖以及如何使用它們來指定編程語言的語法。

今天，您將學習如何解析和解釋包含任意數(shù)量的乘法和除法運算符的算術表達式，例如“7 * 4 / 2 * 3”。本文中的除法將是一個整數(shù)除法，所以如果表達式是“9 / 4”，那么答案將是一個整數(shù)：2。

今天我還將討論另一種廣泛使用的用于指定編程語言語法的符號。它被稱為上下文無關文法（context-free grammars）（簡稱為文法（grammars））或BNF（巴科斯-諾爾形式（Backus-Naur Form））。出于本文的目的，我不會使用純BNF表示法，而是更像是一種修改后的EBNF 表示法。

以下是使用文法的幾個原因：

文法以簡潔的方式指定了編程語言的語法。與語法圖不同，語法非常緊湊。在以后的文章中，您會看到我越來越多地使用文法。
文法可以作為很好的文檔。
即使您從頭開始手動編寫解析器，文法也是一個很好的起點。通常，您只需遵循一組簡單的規(guī)則即可將文法轉(zhuǎn)換為代碼。
有一組工具，稱為解析器生成器，它們接受文法作為輸入并根據(jù)該語法自動為您生成解析器。我將在本系列的后面部分討論這些工具。
現(xiàn)在，讓我們談談文法的機械方面，好嗎？

這是一個描述算術表達式的文法，如“7 * 4 / 2 * 3”（它只是該文法可以生成的眾多表達式之一）：

文法由一系列規(guī)則組成，也稱為產(chǎn)生式。我們的文法中有兩條規(guī)則：

規(guī)則由一個非終結符（稱為產(chǎn)生式的頭部或左側）、一個冒號和一系列終結符和/或非終結符（稱為產(chǎn)生式的主體或右側）組成：

在我上面展示的文法中，像MUL、DIV和INTEGER這樣的標記被稱為終端，而像expr和factor這樣的變量被稱為非終端。非終結符通常由一系列終結符和/或非終結符組成：

第一條規(guī)則左側的非終結符稱為起始符號。在我們的文法中，起始符號是expr：

您可以將規(guī)則expr解讀為“一個expr可以是一個因子，可選地后跟乘法或除法運算符，然后是另一個因子，后者又可選地后跟乘法或除法運算符，然后是另一個因子，依此類推。 ”

什么是因子（factor）？就本文而言，因子只是一個整數(shù)。

讓我們快速瀏覽一下語法中使用的符號及其含義。

| - 備擇方案。條形表示“或”。所以 ( MUL | DIV ) 表示MUL或DIV。
( … ) - 左括號和右括號表示對終端和/或非終端進行分組，如 ( MUL | DIV )。
( … ) * - 匹配組內(nèi)的內(nèi)容零次或多次。

如果您過去使用過正則表達式，那么符號| , ()和(…) * 對您來說應該很熟悉。

文法通過解釋語言可以形成的句子來定義語言。這是使用語法推導出算術表達式的方法：首先以起始符號expr開始，然后用該非終結符的規(guī)則體重復替換非終結符，直到生成僅由終結符組成的句子. 這些句子形成語言的語法定義。

如果文法無法導出某個算術表達式，則它不支持該表達式，并且解析器在嘗試識別該表達式時將產(chǎn)生語法錯誤。

我認為有幾個例子是有序的。這是文法推導表達式3 的方式：

這就是文法推導表達式3 * 7 的方式：

現(xiàn)在，讓我們將該文法映射到代碼，好嗎？

以下是我們將用于將文法轉(zhuǎn)換為源代碼的指南。通過遵循它們，您可以從字面上將文法轉(zhuǎn)換為工作解析器：

1、文法中定義的每條規(guī)則R成為同名的方法，對該規(guī)則的引用成為方法調(diào)用：R()。該方法的主體遵循使用完全相同的準則的規(guī)則主體的流程。
2、替代方案(a1 | a2 | aN)成為if - elif - else 語句
3、一個可選的分組**(…) *** 變成一個可以循環(huán)零次或多次的while語句
4、每個標記引用T成為對方法eat的調(diào)用：eat(T)。eat方法的工作方式是，如果token T與當前的lookahead token匹配，那么它將使用token T，然后從lexer獲取一個新的token，并將該token分配給當前的 token內(nèi)部變量。

從視覺上看，指南如下所示：

讓我們開始行動，按照上述指南將我們的語法轉(zhuǎn)換為代碼。

我們的文法中有兩條規(guī)則：一條expr規(guī)則和一條factor規(guī)則。讓我們從因子規(guī)則（生產(chǎn)）開始。根據(jù)指南，您需要創(chuàng)建一個名為factor（指南 1）的方法，該方法只需調(diào)用Eat方法即可使用INTEGER標記（指南 4）：

def factor(self):self.eat(INTEGER)

很容易，不是嗎？

繼續(xù)！

規(guī)則expr變成了expr方法（再次根據(jù)準則 1）。規(guī)則的主體開始與一參考因子變?yōu)橐粋€因子（）方法的調(diào)用。可選的分組(…)*成為一個while循環(huán)，( MUL | DIV )替代品成為一個if-elif-else語句。通過將這些部分組合在一起，我們得到以下expr 方法：

def expr(self):self.factor()while self.current_token.type in (MUL, DIV):token = self.current_tokenif token.type == MUL:self.eat(MUL)self.factor()elif token.type == DIV:self.eat(DIV)self.factor()

請花一些時間研究我如何將語法映射到源代碼。確保你理解那部分，因為它稍后會派上用場。

為方便起見，我將上述代碼放入parser.py文件中，該文件包含一個詞法分析器和一個沒有解釋器的解析器。您可以直接從GitHub下載該文件并使用它。它有一個交互式提示，您可以在其中輸入表達式并查看它們是否有效：也就是說，根據(jù)語法構建的解析器是否可以識別表達式。

我忍不住再次提到語法圖。這是相同expr規(guī)則的語法圖的外觀：

是時候深入研究新算術表達式解釋器的源代碼了。下面是一個計算器的代碼，它可以處理包含整數(shù)和任意數(shù)量的乘法和除法（整數(shù)除法）運算符的有效算術表達式。您還可以看到，我將詞法分析器重構為一個單獨的Lexer類，并更新了Interpreter類以將Lexer實例作為參數(shù)：

python代碼

# -*- coding: utf-8 -*- """ @File : calc4.py @Time : 2021/7/18 20:37 @Author : Dontla @Email : sxana@qq.com @Software: PyCharm """ # Token types # # EOF (end-of-file) token is used to indicate that # there is no more input left for lexical analysis INTEGER, MUL, DIV, EOF = 'INTEGER', 'MUL', 'DIV', 'EOF'class Token(object):def __init__(self, type, value):# token type: INTEGER, MUL, DIV, or EOFself.type = type# token value: non-negative integer value, '*', '/', or Noneself.value = valuedef __str__(self):"""String representation of the class instance.Examples:Token(INTEGER, 3)Token(MUL, '*')"""return 'Token({type}, {value})'.format(type=self.type,value=repr(self.value))def __repr__(self):return self.__str__()class Lexer(object):def __init__(self, text):# client string input, e.g. "3 * 5", "12 / 3 * 4", etcself.text = text# self.pos is an index into self.textself.pos = 0self.current_char = self.text[self.pos]def error(self):raise Exception('Invalid character')def advance(self):"""Advance the `pos` pointer and set the `current_char` variable."""self.pos += 1if self.pos > len(self.text) - 1:self.current_char = None # Indicates end of inputelse:self.current_char = self.text[self.pos]def skip_whitespace(self):while self.current_char is not None and self.current_char.isspace():self.advance()def integer(self):"""Return a (multidigit) integer consumed from the input."""result = ''while self.current_char is not None and self.current_char.isdigit():result += self.current_charself.advance()return int(result)def get_next_token(self):"""Lexical analyzer (also known as scanner or tokenizer)This method is responsible for breaking a sentenceapart into tokens. One token at a time."""while self.current_char is not None:if self.current_char.isspace():self.skip_whitespace()continueif self.current_char.isdigit():return Token(INTEGER, self.integer())if self.current_char == '*':self.advance()return Token(MUL, '*')if self.current_char == '/':self.advance()return Token(DIV, '/')self.error()return Token(EOF, None)class Interpreter(object):def __init__(self, lexer):self.lexer = lexer# set current token to the first token taken from the inputself.current_token = self.lexer.get_next_token()def error(self):raise Exception('Invalid syntax')def eat(self, token_type):# compare the current token type with the passed token# type and if they match then "eat" the current token# and assign the next token to the self.current_token,# otherwise raise an exception.if self.current_token.type == token_type:self.current_token = self.lexer.get_next_token()else:self.error()def factor(self):"""Return an INTEGER token value.factor : INTEGER"""token = self.current_tokenself.eat(INTEGER)return token.valuedef expr(self):"""Arithmetic expression parser / interpreter.expr : factor ((MUL | DIV) factor)*factor : INTEGER"""result = self.factor()while self.current_token.type in (MUL, DIV):token = self.current_tokenif token.type == MUL:self.eat(MUL)result = result * self.factor()elif token.type == DIV:self.eat(DIV)result = result / self.factor()return resultdef main():while True:try:# To run under Python3 replace 'raw_input' call# with 'input'# text = raw_input('calc> ')text = input('calc> ')except EOFError:breakif not text:continuelexer = Lexer(text)interpreter = Interpreter(lexer)result = interpreter.expr()print(result)if __name__ == '__main__':main()

運行結果：

D:\python_virtualenv\my_flask\Scripts\python.exe C:/Users/Administrator/Desktop/編譯原理/python/calc4.py calc> 3 * 4 /2 6.0 calc> 4 / 5 * 4 3.2 calc>

代碼給人看來有點懵逼，為什么要多加一個lexer？不用加不也能實現(xiàn)嗎，為下張加括號做準備？

C語言代碼

#include <stdio.h> #include <stdlib.h> #include <memory.h> #include <string.h> #include<math.h>#define flag_digital 0 #define flag_plus 1 #define flag_minus 2 #define flag_multiply 3 #define flag_divide 4#define flag_EOF 5struct Token {int type;int value; };struct Lexer {char* text;int pos; };struct Interpreter {struct Lexer* lexer;struct Token current_token; };void error() {printf("輸入非法！\n");exit(-1); }void skip_whitespace(struct Lexer* le) {while (le->text[le->pos] == ' ') {le->pos++;} }//判斷Interpreter中當前pos是不是數(shù)字 int is_integer(char c) {if (c >= '0' && c <= '9')return 1;elsereturn 0; }void advance(struct Lexer* le) {le->pos++; }char current_char(struct Lexer* le) {return(le->text[le->pos]); }//獲取數(shù)字token的數(shù)值（把數(shù)字字符數(shù)組轉(zhuǎn)換為數(shù)字） int integer(struct Lexer* le) {char temp[20];int i = 0;while (is_integer(le->text[le->pos])) {temp[i] = le->text[le->pos];i++;advance(le);}int result = 0;int j = 0;int len = i;while (j < len) {result += (temp[j] - '0') * pow(10, len - j - 1);j++;}return result; }void get_next_token(struct Interpreter* pipt) {//先跳空格，再判斷有沒有結束符if (current_char(pipt->lexer) == ' ')skip_whitespace(pipt->lexer);if (pipt->lexer->pos > (strlen(pipt->lexer->text) - 1)) {pipt->current_token = { flag_EOF, NULL };return;}char current = current_char(pipt->lexer);if (is_integer(current)) {pipt->current_token = { flag_digital, integer(pipt->lexer)};return;}if (current == '*') {pipt->current_token = { flag_multiply, NULL };pipt->lexer->pos++;return;}if (current == '/') {pipt->current_token = { flag_divide, NULL };pipt->lexer->pos++;return;}error();//如果都不是以上的字符，則報錯并退出程序 }int eat(struct Interpreter* pipt, int type) {int current_token_value = pipt->current_token.value;if (pipt->current_token.type == type) {get_next_token(pipt);return current_token_value;}else {error();} }int factor(struct Interpreter* pipt) {return eat(pipt, flag_digital); }int expr(struct Interpreter* pipt) {get_next_token(pipt);int result;result = factor(pipt);while (true) {int token_type = pipt->current_token.type;if (token_type == flag_multiply) {eat(pipt, flag_multiply);result = result * factor(pipt);}else if (token_type == flag_divide) {eat(pipt, flag_divide);result = result / factor(pipt);}else {return result;}} }int main() {char text[50];while (1){printf("請輸入算式：\n");//scanf_s("%s", text, sizeof(text));//sanf沒法輸入空格？int i = 0;while ((text[i] = getchar()) != '\n') {//putchar(text[i]);i++;}text[i] = '\0';struct Lexer le = {text, 0};struct Interpreter ipt = { &le };int result = expr(&ipt);printf("= %d\n\n", result);}return 0; }

運行結果：

請輸入算式： 3*5 = 15請輸入算式： 3*5/2 = 7請輸入算式： 3 * 4 /44 = 0請輸入算式：6 * 3 /9 = 2請輸入算式：

都已經(jīng)實現(xiàn)了，為什么還是很懵逼呢？？？

總結

新練習：

編寫一個語法來描述包含任意數(shù)量的 +、-、* 或 / 運算符的算術表達式。使用語法，您應該能夠推導出諸如“2 + 7 * 4”、“7 - 8 / 4”、“14 + 2 * 3 - 6 / 2”等表達式。
使用語法編寫一個解釋器，該解釋器可以計算包含任意數(shù)量的 +、-、* 或 / 運算符的算術表達式。您的解釋器應該能夠處理諸如“2 + 7 * 4”、“7 - 8 / 4”、“14 + 2 * 3 - 6 / 2”等表達式。
如果您完成了上述練習，請放松并享受:)

記住今天文章中的文法，回答以下問題，根據(jù)需要參考下圖：

什么是上下文無關文法（grammar）？
文法有多少規(guī)則/產(chǎn)生式？
什么是終端？（識別圖中所有終端）
什么是非終端？（識別圖中所有非終端）
什么是規(guī)則頭？（識別圖片中的所有頭部/左側）
什么是規(guī)則體？（識別圖片中的所有身體/右側）
-文法的起始符號是什么？

與50位技術專家面對面20年技術見證，附贈技術全景圖

總結

以上是生活随笔為你收集整理的【编译原理】让我们来构建一个简单的解释器（Let’s Build A Simple Interpreter. Part 4.）（python/c/c++版）（笔记）的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇：【编译原理】让我们来构建一个简单的解释器
下一篇：【编译原理】让我们来构建一个简单的解释器