當(dāng)前位置:
首頁(yè) >
前端技术
> javascript
>内容正文
javascript
json解析对应的value为null_徒手撸一个JSON解析器
生活随笔
收集整理的這篇文章主要介紹了
json解析对应的value为null_徒手撸一个JSON解析器
小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.
??Java大聯(lián)盟
????"name"?:?"小明",
????"age":?18
}
結(jié)果詞法分析后,得到一組 Token,如下:
????"name",?"小明"
}那么在語(yǔ)法分析階段,語(yǔ)法分析器分析完 Token name后,認(rèn)為它是一個(gè)符合規(guī)則的 Token,并且認(rèn)為它是一個(gè)鍵。接下來(lái),語(yǔ)法分析器讀取下一個(gè) Token,期望這個(gè) Token 是?:。但當(dāng)它讀取了這個(gè) Token,發(fā)現(xiàn)這個(gè) Token 是,,并非其期望的:,于是文法分析器就會(huì)報(bào)錯(cuò)誤。這里簡(jiǎn)單總結(jié)一下上面兩個(gè)流程,詞法分析是將字符串解析成一組 Token 序列,而語(yǔ)法分析則是檢查輸入的 Token 序列所構(gòu)成的 JSON 格式是否合法。這里大家對(duì) JSON 的解析流程有個(gè)印象就好,接下來(lái)我會(huì)詳細(xì)分析每個(gè)流程。2.1 詞法分析在本章開(kāi)始,我說(shuō)了詞法解析的目的,即按照“構(gòu)詞規(guī)則”將 JSON 字符串解析成 Token 流。請(qǐng)注意雙引號(hào)引起來(lái)詞--構(gòu)詞規(guī)則,所謂構(gòu)詞規(guī)則是指詞法分析模塊在將字符串解析成 Token 時(shí)所參考的規(guī)則。在 JSON 中,構(gòu)詞規(guī)則對(duì)應(yīng)于幾種數(shù)據(jù)類(lèi)型,當(dāng)詞法解析器讀入某個(gè)詞,且這個(gè)詞類(lèi)型符合 JSON 所規(guī)定的數(shù)據(jù)類(lèi)型時(shí),詞法分析器認(rèn)為這個(gè)詞符合構(gòu)詞規(guī)則,就會(huì)生成相應(yīng)的 Token。這里我們可以參考http://www.json.org/對(duì) JSON 的定義,羅列一下 JSON 所規(guī)定的數(shù)據(jù)類(lèi)型:
????BEGIN_OBJECT(1),
????END_OBJECT(2),
????BEGIN_ARRAY(4),
????END_ARRAY(8),
????NULL(16),
????NUMBER(32),
????STRING(64),
????BOOLEAN(128),
????SEP_COLON(256),
????SEP_COMMA(512),
????END_DOCUMENT(1024);
????TokenType(int?code)?{
????????this.code?=?code;
????}
????private?int?code;
????public?int?getTokenCode()?{
????????return?code;
????}
}在解析過(guò)程中,僅有 TokenType 類(lèi)型還不行。我們除了要將某個(gè)詞的類(lèi)型保存起來(lái),還需要保存這個(gè)詞的字面量。所以,所以這里還需要定義一個(gè) Token 類(lèi)。用于封裝詞類(lèi)型和字面量,如下:public?class?Token?{
????private?TokenType?tokenType;
????private?String?value;
????//?省略不重要的代碼
}定義好了 Token 類(lèi),接下來(lái)再來(lái)定義一個(gè)讀取字符串的類(lèi)。如下:public?CharReader(Reader?reader)?{
????????this.reader?=?reader;
????????buffer?=?new?char[BUFFER_SIZE];
????}
????/**
?????*?返回?pos?下標(biāo)處的字符,并返回
?????*?@return?
?????*?@throws?IOException
?????*/
????public?char?peek()?throws?IOException?{
????????if?(pos?-?1?>=?size)?{
????????????return?(char)?-1;
????????}
????????return?buffer[Math.max(0,?pos?-?1)];
????}
????/**
?????*?返回?pos?下標(biāo)處的字符,并將?pos?+?1,最后返回字符
?????*?@return?
?????*?@throws?IOException
?????*/
????public?char?next()?throws?IOException?{
????????if?(!hasMore())?{
????????????return?(char)?-1;
????????}
????????return?buffer[pos++];
????}
????public?void?back()?{
????????pos?=?Math.max(0,?--pos);
????}
????public?boolean?hasMore()?throws?IOException?{
????????if?(pos?????????????return?true;
????????}
????????fillBuffer();
????????return?pos?????}
????void?fillBuffer()?throws?IOException?{
????????int?n?=?reader.read(buffer);
????????if?(n?==?-1)?{
????????????return;
????????}
????????pos?=?0;
????????size?=?n;
????}
有了 TokenType、Token 和 CharReader 這三個(gè)輔助類(lèi),接下來(lái)我們就可以實(shí)現(xiàn)詞法解析器了。public?class?Tokenizer?{
????private?CharReader?charReader;
????private?TokenList?tokens;
????public?TokenList?tokenize(CharReader?charReader)?throws?IOException?{
????????this.charReader?=?charReader;
????????tokens?=?new?TokenList();
????????tokenize();
????????return?tokens;
????}
????private?void?tokenize()?throws?IOException?{
????????//?使用do-while處理空文件
????????Token?token;
????????do?{
????????????token?=?start();
????????????tokens.add(token);
????????}?while?(token.getTokenType()?!=?TokenType.END_DOCUMENT);
????}
????private?Token?start()?throws?IOException?{
????????char?ch;
????????for(;;)?{
????????????if?(!charReader.hasMore())?{
????????????????return?new?Token(TokenType.END_DOCUMENT,?null);
????????????}
????????????ch?=?charReader.next();
????????????if?(!isWhiteSpace(ch))?{
????????????????break;
????????????}
????????}
????????switch?(ch)?{
????????????case?'{':
????????????????return?new?Token(TokenType.BEGIN_OBJECT,?String.valueOf(ch));
????????????case?'}':
????????????????return?new?Token(TokenType.END_OBJECT,?String.valueOf(ch));
????????????case?'[':
????????????????return?new?Token(TokenType.BEGIN_ARRAY,?String.valueOf(ch));
????????????case?']':
????????????????return?new?Token(TokenType.END_ARRAY,?String.valueOf(ch));
????????????case?',':
????????????????return?new?Token(TokenType.SEP_COMMA,?String.valueOf(ch));
????????????case?':':
????????????????return?new?Token(TokenType.SEP_COLON,?String.valueOf(ch));
????????????case?'n':
????????????????return?readNull();
????????????case?'t':
????????????case?'f':
????????????????return?readBoolean();
????????????case?'"':
????????????????return?readString();
????????????case?'-':
????????????????return?readNumber();
????????}
????????if?(isDigit(ch))?{
????????????return?readNumber();
????????}
????????throw?new?JsonParseException("Illegal?character");
????}
????private?Token?readNull()?{...}
????private?Token?readBoolean()?{...}
????private?Token?readString()?{...}
????private?Token?readNumber()?{...}
}
上面的代碼是詞法分析器的實(shí)現(xiàn),部分代碼這里沒(méi)有貼出來(lái),后面具體分析的時(shí)候再貼。先來(lái)看看詞法分析器的核心方法 start,這個(gè)方法代碼量不多,并不復(fù)雜。其通過(guò)一個(gè)死循環(huán)不停的讀取字符,然后再根據(jù)字符的類(lèi)型,執(zhí)行不同的解析邏輯。上面說(shuō)過(guò),JSON 的解析過(guò)程比較簡(jiǎn)單。原因在于,在解析時(shí),只需通過(guò)每個(gè)詞第一個(gè)字符即可判斷出這個(gè)詞的 Token Type。比如:第一個(gè)字符是{、}、[、]、,、:,直接封裝成相應(yīng)的 Token 返回即可第一個(gè)字符是n,期望這個(gè)詞是null,Token 類(lèi)型是NULL第一個(gè)字符是t或f,期望這個(gè)詞是true或者false,Token 類(lèi)型是BOOLEAN第一個(gè)字符是",期望這個(gè)詞是字符串,Token 類(lèi)型為String第一個(gè)字符是0~9或-,期望這個(gè)詞是數(shù)字,類(lèi)型為NUMBER正如上面所說(shuō),詞法分析器只需要根據(jù)每個(gè)詞的第一個(gè)字符,即可知道接下來(lái)它所期望讀取的到的內(nèi)容是什么樣的。如果滿足期望了,則返回 Token,否則返回錯(cuò)誤。下面就來(lái)看看詞法解析器在碰到第一個(gè)字符是n和"時(shí)的處理過(guò)程。先看碰到字符n的處理過(guò)程:private?Token?readNull()?throws?IOException?{
????if?(!(charReader.next()?==?'u'?&&?charReader.next()?==?'l'?&&?charReader.next()?==?'l'))?{
????????throw?new?JsonParseException("Invalid?json?string");
????}
????return?new?Token(TokenType.NULL,?"null");
}上面的代碼很簡(jiǎn)單,詞法分析器在讀取字符n后,期望后面的三個(gè)字符分別是u,l,l,與 n 組成詞 null。如果滿足期望,則返回類(lèi)型為 NULL 的 Token,否則報(bào)異常。readNull 方法邏輯很簡(jiǎn)單,不多說(shuō)了。接下來(lái)看看 string 類(lèi)型的數(shù)據(jù)處理過(guò)程:private?Token?readString()?throws?IOException?{
????StringBuilder?sb?=?new?StringBuilder();
????for?(;;)?{
????????char?ch?=?charReader.next();
????????//?處理轉(zhuǎn)義字符
????????if?(ch?==?'\\')?{
????????????if?(!isEscape())?{
????????????????throw?new?JsonParseException("Invalid?escape?character");
????????????}
????????????sb.append('\\');
????????????ch?=?charReader.peek();
????????????sb.append(ch);
????????????//?處理 Unicode 編碼,形如?\u4e2d。且只支持?\u0000?~?\uFFFF 范圍內(nèi)的編碼
????????????if?(ch?==?'u')?{
????????????????for?(int?i?=?0;?i?4;?i++)?{
????????????????????ch?=?charReader.next();
????????????????????if?(isHex(ch))?{
????????????????????????sb.append(ch);
????????????????????}?else?{
????????????????????????throw?new?JsonParseException("Invalid?character");
????????????????????}
????????????????}
????????????}
????????}?else?if?(ch?==?'"')?{????//?碰到另一個(gè)雙引號(hào),則認(rèn)為字符串解析結(jié)束,返回?Token
????????????return?new?Token(TokenType.STRING,?sb.toString());
????????}?else?if?(ch?==?'\r'?||?ch?==?'\n')?{????//?傳入的?JSON?字符串不允許換行
????????????throw?new?JsonParseException("Invalid?character");
????????}?else?{
????????????sb.append(ch);
????????}
????}
}
private?boolean?isEscape()?throws?IOException?{
????char?ch?=?charReader.next();
????return?(ch?==?'"'?||?ch?==?'\\'?||?ch?==?'u'?||?ch?==?'r'
????????????????||?ch?==?'n'?||?ch?==?'b'?||?ch?==?'t'?||?ch?==?'f');
}
private?boolean?isHex(char?ch)?{
????return?((ch?>=?'0'?&&?ch?<=?'9')?||?('a'?<=?ch?&&?ch?<=?'f')
????????????||?('A'?<=?ch?&&?ch?<=?'F'));
}String 類(lèi)型的數(shù)據(jù)解析起來(lái)要稍微復(fù)雜一些,主要是需要處理一些特殊類(lèi)型的字符。JSON 所允許的特殊類(lèi)型的字符如下:
members?=?pair?|?pair?,?members
pair?=?string?:?value
array?=?[]?|?[?elements?]
elements?=?value??|?value?,?elements
value?=?string?|?number?|?object?|?array?|?true?|?false?|?null語(yǔ)法分析器的實(shí)現(xiàn)需要借助兩個(gè)輔助類(lèi),也就是語(yǔ)法分析器的輸出類(lèi),分別是 JsonObject 和 JsonArray,代碼如下:public?class?JsonObject?{
????private?Map?map?=?new?HashMap();public?void?put(String?key,?Object?value)?{map.put(key,?value);
????}public?Object?get(String?key)?{return?map.get(key);
????}public?List>?getAllKeyValue()?{return?new?ArrayList<>(map.entrySet());
????}public?JsonObject?getJsonObject(String?key)?{if?(!map.containsKey(key))?{throw?new?IllegalArgumentException("Invalid?key");
????????}
????????Object?obj?=?map.get(key);if?(!(obj?instanceof?JsonObject))?{throw?new?JsonTypeException("Type?of?value?is?not?JsonObject");
????????}return?(JsonObject)?obj;
????}public?JsonArray?getJsonArray(String?key)?{if?(!map.containsKey(key))?{throw?new?IllegalArgumentException("Invalid?key");
????????}
????????Object?obj?=?map.get(key);if?(!(obj?instanceof?JsonArray))?{throw?new?JsonTypeException("Type?of?value?is?not?JsonArray");
????????}return?(JsonArray)?obj;
????}
????@Overridepublic?String?toString()?{return?BeautifyJsonUtils.beautify(this);
????}
}public?class?JsonArray?implements?Iterable?{private?List?list?=?new?ArrayList();public?void?add(Object?obj)?{list.add(obj);
????}public?Object?get(int?index)?{return?list.get(index);
????}public?int?size()?{return?list.size();
????}public?JsonObject?getJsonObject(int?index)?{
????????Object?obj?=?list.get(index);if?(!(obj?instanceof?JsonObject))?{throw?new?JsonTypeException("Type?of?value?is?not?JsonObject");
????????}return?(JsonObject)?obj;
????}public?JsonArray?getJsonArray(int?index)?{
????????Object?obj?=?list.get(index);if?(!(obj?instanceof?JsonArray))?{throw?new?JsonTypeException("Type?of?value?is?not?JsonArray");
????????}return?(JsonArray)?obj;
????}
????@Overridepublic?String?toString()?{return?BeautifyJsonUtils.beautify(this);
????}public?Iterator?iterator()?{return?list.iterator();
????}
}語(yǔ)法解析器的核心邏輯封裝在了 parseJsonObject 和 parseJsonArray 兩個(gè)方法中,接下來(lái)我會(huì)詳細(xì)分析 parseJsonObject 方法,parseJsonArray 方法大家自己分析吧,parseJsonObject 方法實(shí)現(xiàn)如下:private?JsonObject?parseJsonObject()?{
????JsonObject?jsonObject?=?new?JsonObject();
????int?expectToken?=?STRING_TOKEN?|?END_OBJECT_TOKEN;
????String?key?=?null;
????Object?value?=?null;
????while?(tokens.hasMore())?{
????????Token?token?=?tokens.next();
????????TokenType?tokenType?=?token.getTokenType();
????????String?tokenValue?=?token.getValue();
????????switch?(tokenType)?{
????????case?BEGIN_OBJECT:
????????????checkExpectToken(tokenType,?expectToken);
????????????jsonObject.put(key,?parseJsonObject());????//?遞歸解析?json?object
????????????expectToken?=?SEP_COMMA_TOKEN?|?END_OBJECT_TOKEN;
????????????break;
????????case?END_OBJECT:
????????????checkExpectToken(tokenType,?expectToken);
????????????return?jsonObject;
????????case?BEGIN_ARRAY:????//?解析?json?array
????????????checkExpectToken(tokenType,?expectToken);
????????????jsonObject.put(key,?parseJsonArray());
????????????expectToken?=?SEP_COMMA_TOKEN?|?END_OBJECT_TOKEN;
????????????break;
????????case?NULL:
????????????checkExpectToken(tokenType,?expectToken);
????????????jsonObject.put(key,?null);
????????????expectToken?=?SEP_COMMA_TOKEN?|?END_OBJECT_TOKEN;
????????????break;
????????case?NUMBER:
????????????checkExpectToken(tokenType,?expectToken);
????????????if?(tokenValue.contains(".")?||?tokenValue.contains("e")?||?tokenValue.contains("E"))?{
????????????????jsonObject.put(key,?Double.valueOf(tokenValue));
????????????}?else?{
????????????????Long?num?=?Long.valueOf(tokenValue);
????????????????if?(num?>?Integer.MAX_VALUE?||?num?????????????????????jsonObject.put(key,?num);
????????????????}?else?{
????????????????????jsonObject.put(key,?num.intValue());
????????????????}
????????????}
????????????expectToken?=?SEP_COMMA_TOKEN?|?END_OBJECT_TOKEN;
????????????break;
????????case?BOOLEAN:
????????????checkExpectToken(tokenType,?expectToken);
????????????jsonObject.put(key,?Boolean.valueOf(token.getValue()));
????????????expectToken?=?SEP_COMMA_TOKEN?|?END_OBJECT_TOKEN;
????????????break;
????????case?STRING:
????????????checkExpectToken(tokenType,?expectToken);
????????????Token?preToken?=?tokens.peekPrevious();
????????????/*
?????????????*?在 JSON 中,字符串既可以作為鍵,也可作為值。
?????????????*?作為鍵時(shí),只期待下一個(gè) Token 類(lèi)型為 SEP_COLON。
?????????????*?作為值時(shí),期待下一個(gè)?Token?類(lèi)型為?SEP_COMMA?或?END_OBJECT
?????????????*/
????????????if?(preToken.getTokenType()?==?TokenType.SEP_COLON)?{
????????????????value?=?token.getValue();
????????????????jsonObject.put(key,?value);
????????????????expectToken?=?SEP_COMMA_TOKEN?|?END_OBJECT_TOKEN;
????????????}?else?{
????????????????key?=?token.getValue();
????????????????expectToken?=?SEP_COLON_TOKEN;
????????????}
????????????break;
????????case?SEP_COLON:
????????????checkExpectToken(tokenType,?expectToken);
????????????expectToken?=?NULL_TOKEN?|?NUMBER_TOKEN?|?BOOLEAN_TOKEN?|?STRING_TOKEN
????????????????????|?BEGIN_OBJECT_TOKEN?|?BEGIN_ARRAY_TOKEN;
????????????break;
????????case?SEP_COMMA:
????????????checkExpectToken(tokenType,?expectToken);
????????????expectToken?=?STRING_TOKEN;
????????????break;
????????case?END_DOCUMENT:
????????????checkExpectToken(tokenType,?expectToken);
????????????return?jsonObject;
????????default:
????????????throw?new?JsonParseException("Unexpected?Token.");
????????}
????}
????throw?new?JsonParseException("Parse?error,?invalid?Token.");
}
private?void?checkExpectToken(TokenType?tokenType,?int?expectToken)?{
????if?((tokenType.getTokenCode()?&?expectToken)?==?0)?{
????????throw?new?JsonParseException("Parse?error,?invalid?Token.");
????}
}
parseJsonObject 方法解析流程大致如下:1、讀取一個(gè) Token,檢查這個(gè) Token 是否是其所期望的類(lèi)型。2、如果是,更新期望的 Token 類(lèi)型。否則,拋出異常,并退出。3、重復(fù)步驟1和2,直至所有的 Token 都解析完,或出現(xiàn)異常。上面的步驟并不復(fù)雜,但有可能不好理解。這里舉個(gè)例子說(shuō)明一下,有如下的 Token 序列:
? 致力于最高效的Java學(xué)習(xí)
關(guān)注作者 | 田小波
cnblogs.com/nullllun/p/8358146.html1、背景JSON(JavaScript Object Notation) 是一種輕量級(jí)的數(shù)據(jù)交換格式。相對(duì)于另一種數(shù)據(jù)交換格式 XML,JSON 有著諸多優(yōu)點(diǎn)。比如易讀性更好,占用空間更少等。在 web 應(yīng)用開(kāi)發(fā)領(lǐng)域內(nèi),得益于 JavaScript 對(duì) JSON 提供的良好支持,JSON 要比 XML 更受開(kāi)發(fā)人員青睞。所以作為開(kāi)發(fā)人員,如果有興趣的話,還是應(yīng)該深入了解一下 JSON 相關(guān)的知識(shí)。本著探究 JSON 原理的目的,我將會(huì)在這篇文章中詳細(xì)向大家介紹一個(gè)簡(jiǎn)單的JSON解析器的解析流程和實(shí)現(xiàn)細(xì)節(jié)。由于 JSON 本身比較簡(jiǎn)單,解析起來(lái)也并不復(fù)雜。所以如果大家感興趣的話,在看完本文后,不妨自己動(dòng)手實(shí)現(xiàn)一個(gè) JSON 解析器。好了,其他的話就不多說(shuō)了,接下來(lái)讓我們移步到重點(diǎn)章節(jié)吧。2. JSON 解析器實(shí)現(xiàn)原理JSON 解析器從本質(zhì)上來(lái)說(shuō)就是根據(jù) JSON 文法規(guī)則創(chuàng)建的狀態(tài)機(jī),輸入是一個(gè) JSON 字符串,輸出是一個(gè) JSON 對(duì)象。一般來(lái)說(shuō),解析過(guò)程包括詞法分析和語(yǔ)法分析兩個(gè)階段。詞法分析階段的目標(biāo)是按照構(gòu)詞規(guī)則將 JSON 字符串解析成 Token 流,比如有如下的 JSON 字符串:{????"name"?:?"小明",
????"age":?18
}
結(jié)果詞法分析后,得到一組 Token,如下:
{、 name、 :、 小明、 ,、 age、 :、 18、 }
詞法分析解析出 Token 序列后,接下來(lái)要進(jìn)行語(yǔ)法分析。語(yǔ)法分析的目的是根據(jù) JSON 文法檢查上面 Token 序列所構(gòu)成的 JSON 結(jié)構(gòu)是否合法。比如 JSON 文法要求非空 JSON 對(duì)象以鍵值對(duì)的形式出現(xiàn),形如 object = {string : value}。如果傳入了一個(gè)格式錯(cuò)誤的字符串,比如{????"name",?"小明"
}那么在語(yǔ)法分析階段,語(yǔ)法分析器分析完 Token name后,認(rèn)為它是一個(gè)符合規(guī)則的 Token,并且認(rèn)為它是一個(gè)鍵。接下來(lái),語(yǔ)法分析器讀取下一個(gè) Token,期望這個(gè) Token 是?:。但當(dāng)它讀取了這個(gè) Token,發(fā)現(xiàn)這個(gè) Token 是,,并非其期望的:,于是文法分析器就會(huì)報(bào)錯(cuò)誤。這里簡(jiǎn)單總結(jié)一下上面兩個(gè)流程,詞法分析是將字符串解析成一組 Token 序列,而語(yǔ)法分析則是檢查輸入的 Token 序列所構(gòu)成的 JSON 格式是否合法。這里大家對(duì) JSON 的解析流程有個(gè)印象就好,接下來(lái)我會(huì)詳細(xì)分析每個(gè)流程。2.1 詞法分析在本章開(kāi)始,我說(shuō)了詞法解析的目的,即按照“構(gòu)詞規(guī)則”將 JSON 字符串解析成 Token 流。請(qǐng)注意雙引號(hào)引起來(lái)詞--構(gòu)詞規(guī)則,所謂構(gòu)詞規(guī)則是指詞法分析模塊在將字符串解析成 Token 時(shí)所參考的規(guī)則。在 JSON 中,構(gòu)詞規(guī)則對(duì)應(yīng)于幾種數(shù)據(jù)類(lèi)型,當(dāng)詞法解析器讀入某個(gè)詞,且這個(gè)詞類(lèi)型符合 JSON 所規(guī)定的數(shù)據(jù)類(lèi)型時(shí),詞法分析器認(rèn)為這個(gè)詞符合構(gòu)詞規(guī)則,就會(huì)生成相應(yīng)的 Token。這里我們可以參考http://www.json.org/對(duì) JSON 的定義,羅列一下 JSON 所規(guī)定的數(shù)據(jù)類(lèi)型:
BEGIN_OBJECT({)
END_OBJECT(})
BEGIN_ARRAY([)
END_ARRAY(])
NULL(null)
NUMBER(數(shù)字)
STRING(字符串)
BOOLEAN(true/false)
SEP_COLON(:)
SEP_COMMA(,)
????BEGIN_OBJECT(1),
????END_OBJECT(2),
????BEGIN_ARRAY(4),
????END_ARRAY(8),
????NULL(16),
????NUMBER(32),
????STRING(64),
????BOOLEAN(128),
????SEP_COLON(256),
????SEP_COMMA(512),
????END_DOCUMENT(1024);
????TokenType(int?code)?{
????????this.code?=?code;
????}
????private?int?code;
????public?int?getTokenCode()?{
????????return?code;
????}
}在解析過(guò)程中,僅有 TokenType 類(lèi)型還不行。我們除了要將某個(gè)詞的類(lèi)型保存起來(lái),還需要保存這個(gè)詞的字面量。所以,所以這里還需要定義一個(gè) Token 類(lèi)。用于封裝詞類(lèi)型和字面量,如下:public?class?Token?{
????private?TokenType?tokenType;
????private?String?value;
????//?省略不重要的代碼
}定義好了 Token 類(lèi),接下來(lái)再來(lái)定義一個(gè)讀取字符串的類(lèi)。如下:public?CharReader(Reader?reader)?{
????????this.reader?=?reader;
????????buffer?=?new?char[BUFFER_SIZE];
????}
????/**
?????*?返回?pos?下標(biāo)處的字符,并返回
?????*?@return?
?????*?@throws?IOException
?????*/
????public?char?peek()?throws?IOException?{
????????if?(pos?-?1?>=?size)?{
????????????return?(char)?-1;
????????}
????????return?buffer[Math.max(0,?pos?-?1)];
????}
????/**
?????*?返回?pos?下標(biāo)處的字符,并將?pos?+?1,最后返回字符
?????*?@return?
?????*?@throws?IOException
?????*/
????public?char?next()?throws?IOException?{
????????if?(!hasMore())?{
????????????return?(char)?-1;
????????}
????????return?buffer[pos++];
????}
????public?void?back()?{
????????pos?=?Math.max(0,?--pos);
????}
????public?boolean?hasMore()?throws?IOException?{
????????if?(pos?????????????return?true;
????????}
????????fillBuffer();
????????return?pos?????}
????void?fillBuffer()?throws?IOException?{
????????int?n?=?reader.read(buffer);
????????if?(n?==?-1)?{
????????????return;
????????}
????????pos?=?0;
????????size?=?n;
????}
有了 TokenType、Token 和 CharReader 這三個(gè)輔助類(lèi),接下來(lái)我們就可以實(shí)現(xiàn)詞法解析器了。public?class?Tokenizer?{
????private?CharReader?charReader;
????private?TokenList?tokens;
????public?TokenList?tokenize(CharReader?charReader)?throws?IOException?{
????????this.charReader?=?charReader;
????????tokens?=?new?TokenList();
????????tokenize();
????????return?tokens;
????}
????private?void?tokenize()?throws?IOException?{
????????//?使用do-while處理空文件
????????Token?token;
????????do?{
????????????token?=?start();
????????????tokens.add(token);
????????}?while?(token.getTokenType()?!=?TokenType.END_DOCUMENT);
????}
????private?Token?start()?throws?IOException?{
????????char?ch;
????????for(;;)?{
????????????if?(!charReader.hasMore())?{
????????????????return?new?Token(TokenType.END_DOCUMENT,?null);
????????????}
????????????ch?=?charReader.next();
????????????if?(!isWhiteSpace(ch))?{
????????????????break;
????????????}
????????}
????????switch?(ch)?{
????????????case?'{':
????????????????return?new?Token(TokenType.BEGIN_OBJECT,?String.valueOf(ch));
????????????case?'}':
????????????????return?new?Token(TokenType.END_OBJECT,?String.valueOf(ch));
????????????case?'[':
????????????????return?new?Token(TokenType.BEGIN_ARRAY,?String.valueOf(ch));
????????????case?']':
????????????????return?new?Token(TokenType.END_ARRAY,?String.valueOf(ch));
????????????case?',':
????????????????return?new?Token(TokenType.SEP_COMMA,?String.valueOf(ch));
????????????case?':':
????????????????return?new?Token(TokenType.SEP_COLON,?String.valueOf(ch));
????????????case?'n':
????????????????return?readNull();
????????????case?'t':
????????????case?'f':
????????????????return?readBoolean();
????????????case?'"':
????????????????return?readString();
????????????case?'-':
????????????????return?readNumber();
????????}
????????if?(isDigit(ch))?{
????????????return?readNumber();
????????}
????????throw?new?JsonParseException("Illegal?character");
????}
????private?Token?readNull()?{...}
????private?Token?readBoolean()?{...}
????private?Token?readString()?{...}
????private?Token?readNumber()?{...}
}
上面的代碼是詞法分析器的實(shí)現(xiàn),部分代碼這里沒(méi)有貼出來(lái),后面具體分析的時(shí)候再貼。先來(lái)看看詞法分析器的核心方法 start,這個(gè)方法代碼量不多,并不復(fù)雜。其通過(guò)一個(gè)死循環(huán)不停的讀取字符,然后再根據(jù)字符的類(lèi)型,執(zhí)行不同的解析邏輯。上面說(shuō)過(guò),JSON 的解析過(guò)程比較簡(jiǎn)單。原因在于,在解析時(shí),只需通過(guò)每個(gè)詞第一個(gè)字符即可判斷出這個(gè)詞的 Token Type。比如:第一個(gè)字符是{、}、[、]、,、:,直接封裝成相應(yīng)的 Token 返回即可第一個(gè)字符是n,期望這個(gè)詞是null,Token 類(lèi)型是NULL第一個(gè)字符是t或f,期望這個(gè)詞是true或者false,Token 類(lèi)型是BOOLEAN第一個(gè)字符是",期望這個(gè)詞是字符串,Token 類(lèi)型為String第一個(gè)字符是0~9或-,期望這個(gè)詞是數(shù)字,類(lèi)型為NUMBER正如上面所說(shuō),詞法分析器只需要根據(jù)每個(gè)詞的第一個(gè)字符,即可知道接下來(lái)它所期望讀取的到的內(nèi)容是什么樣的。如果滿足期望了,則返回 Token,否則返回錯(cuò)誤。下面就來(lái)看看詞法解析器在碰到第一個(gè)字符是n和"時(shí)的處理過(guò)程。先看碰到字符n的處理過(guò)程:private?Token?readNull()?throws?IOException?{
????if?(!(charReader.next()?==?'u'?&&?charReader.next()?==?'l'?&&?charReader.next()?==?'l'))?{
????????throw?new?JsonParseException("Invalid?json?string");
????}
????return?new?Token(TokenType.NULL,?"null");
}上面的代碼很簡(jiǎn)單,詞法分析器在讀取字符n后,期望后面的三個(gè)字符分別是u,l,l,與 n 組成詞 null。如果滿足期望,則返回類(lèi)型為 NULL 的 Token,否則報(bào)異常。readNull 方法邏輯很簡(jiǎn)單,不多說(shuō)了。接下來(lái)看看 string 類(lèi)型的數(shù)據(jù)處理過(guò)程:private?Token?readString()?throws?IOException?{
????StringBuilder?sb?=?new?StringBuilder();
????for?(;;)?{
????????char?ch?=?charReader.next();
????????//?處理轉(zhuǎn)義字符
????????if?(ch?==?'\\')?{
????????????if?(!isEscape())?{
????????????????throw?new?JsonParseException("Invalid?escape?character");
????????????}
????????????sb.append('\\');
????????????ch?=?charReader.peek();
????????????sb.append(ch);
????????????//?處理 Unicode 編碼,形如?\u4e2d。且只支持?\u0000?~?\uFFFF 范圍內(nèi)的編碼
????????????if?(ch?==?'u')?{
????????????????for?(int?i?=?0;?i?4;?i++)?{
????????????????????ch?=?charReader.next();
????????????????????if?(isHex(ch))?{
????????????????????????sb.append(ch);
????????????????????}?else?{
????????????????????????throw?new?JsonParseException("Invalid?character");
????????????????????}
????????????????}
????????????}
????????}?else?if?(ch?==?'"')?{????//?碰到另一個(gè)雙引號(hào),則認(rèn)為字符串解析結(jié)束,返回?Token
????????????return?new?Token(TokenType.STRING,?sb.toString());
????????}?else?if?(ch?==?'\r'?||?ch?==?'\n')?{????//?傳入的?JSON?字符串不允許換行
????????????throw?new?JsonParseException("Invalid?character");
????????}?else?{
????????????sb.append(ch);
????????}
????}
}
private?boolean?isEscape()?throws?IOException?{
????char?ch?=?charReader.next();
????return?(ch?==?'"'?||?ch?==?'\\'?||?ch?==?'u'?||?ch?==?'r'
????????????????||?ch?==?'n'?||?ch?==?'b'?||?ch?==?'t'?||?ch?==?'f');
}
private?boolean?isHex(char?ch)?{
????return?((ch?>=?'0'?&&?ch?<=?'9')?||?('a'?<=?ch?&&?ch?<=?'f')
????????????||?('A'?<=?ch?&&?ch?<=?'F'));
}String 類(lèi)型的數(shù)據(jù)解析起來(lái)要稍微復(fù)雜一些,主要是需要處理一些特殊類(lèi)型的字符。JSON 所允許的特殊類(lèi)型的字符如下:
\"
\
\b
\f
\n
\r
\t
\u four-hex-digits
\/
members?=?pair?|?pair?,?members
pair?=?string?:?value
array?=?[]?|?[?elements?]
elements?=?value??|?value?,?elements
value?=?string?|?number?|?object?|?array?|?true?|?false?|?null語(yǔ)法分析器的實(shí)現(xiàn)需要借助兩個(gè)輔助類(lèi),也就是語(yǔ)法分析器的輸出類(lèi),分別是 JsonObject 和 JsonArray,代碼如下:public?class?JsonObject?{
????private?Map?map?=?new?HashMap();public?void?put(String?key,?Object?value)?{map.put(key,?value);
????}public?Object?get(String?key)?{return?map.get(key);
????}public?List>?getAllKeyValue()?{return?new?ArrayList<>(map.entrySet());
????}public?JsonObject?getJsonObject(String?key)?{if?(!map.containsKey(key))?{throw?new?IllegalArgumentException("Invalid?key");
????????}
????????Object?obj?=?map.get(key);if?(!(obj?instanceof?JsonObject))?{throw?new?JsonTypeException("Type?of?value?is?not?JsonObject");
????????}return?(JsonObject)?obj;
????}public?JsonArray?getJsonArray(String?key)?{if?(!map.containsKey(key))?{throw?new?IllegalArgumentException("Invalid?key");
????????}
????????Object?obj?=?map.get(key);if?(!(obj?instanceof?JsonArray))?{throw?new?JsonTypeException("Type?of?value?is?not?JsonArray");
????????}return?(JsonArray)?obj;
????}
????@Overridepublic?String?toString()?{return?BeautifyJsonUtils.beautify(this);
????}
}public?class?JsonArray?implements?Iterable?{private?List?list?=?new?ArrayList();public?void?add(Object?obj)?{list.add(obj);
????}public?Object?get(int?index)?{return?list.get(index);
????}public?int?size()?{return?list.size();
????}public?JsonObject?getJsonObject(int?index)?{
????????Object?obj?=?list.get(index);if?(!(obj?instanceof?JsonObject))?{throw?new?JsonTypeException("Type?of?value?is?not?JsonObject");
????????}return?(JsonObject)?obj;
????}public?JsonArray?getJsonArray(int?index)?{
????????Object?obj?=?list.get(index);if?(!(obj?instanceof?JsonArray))?{throw?new?JsonTypeException("Type?of?value?is?not?JsonArray");
????????}return?(JsonArray)?obj;
????}
????@Overridepublic?String?toString()?{return?BeautifyJsonUtils.beautify(this);
????}public?Iterator?iterator()?{return?list.iterator();
????}
}語(yǔ)法解析器的核心邏輯封裝在了 parseJsonObject 和 parseJsonArray 兩個(gè)方法中,接下來(lái)我會(huì)詳細(xì)分析 parseJsonObject 方法,parseJsonArray 方法大家自己分析吧,parseJsonObject 方法實(shí)現(xiàn)如下:private?JsonObject?parseJsonObject()?{
????JsonObject?jsonObject?=?new?JsonObject();
????int?expectToken?=?STRING_TOKEN?|?END_OBJECT_TOKEN;
????String?key?=?null;
????Object?value?=?null;
????while?(tokens.hasMore())?{
????????Token?token?=?tokens.next();
????????TokenType?tokenType?=?token.getTokenType();
????????String?tokenValue?=?token.getValue();
????????switch?(tokenType)?{
????????case?BEGIN_OBJECT:
????????????checkExpectToken(tokenType,?expectToken);
????????????jsonObject.put(key,?parseJsonObject());????//?遞歸解析?json?object
????????????expectToken?=?SEP_COMMA_TOKEN?|?END_OBJECT_TOKEN;
????????????break;
????????case?END_OBJECT:
????????????checkExpectToken(tokenType,?expectToken);
????????????return?jsonObject;
????????case?BEGIN_ARRAY:????//?解析?json?array
????????????checkExpectToken(tokenType,?expectToken);
????????????jsonObject.put(key,?parseJsonArray());
????????????expectToken?=?SEP_COMMA_TOKEN?|?END_OBJECT_TOKEN;
????????????break;
????????case?NULL:
????????????checkExpectToken(tokenType,?expectToken);
????????????jsonObject.put(key,?null);
????????????expectToken?=?SEP_COMMA_TOKEN?|?END_OBJECT_TOKEN;
????????????break;
????????case?NUMBER:
????????????checkExpectToken(tokenType,?expectToken);
????????????if?(tokenValue.contains(".")?||?tokenValue.contains("e")?||?tokenValue.contains("E"))?{
????????????????jsonObject.put(key,?Double.valueOf(tokenValue));
????????????}?else?{
????????????????Long?num?=?Long.valueOf(tokenValue);
????????????????if?(num?>?Integer.MAX_VALUE?||?num?????????????????????jsonObject.put(key,?num);
????????????????}?else?{
????????????????????jsonObject.put(key,?num.intValue());
????????????????}
????????????}
????????????expectToken?=?SEP_COMMA_TOKEN?|?END_OBJECT_TOKEN;
????????????break;
????????case?BOOLEAN:
????????????checkExpectToken(tokenType,?expectToken);
????????????jsonObject.put(key,?Boolean.valueOf(token.getValue()));
????????????expectToken?=?SEP_COMMA_TOKEN?|?END_OBJECT_TOKEN;
????????????break;
????????case?STRING:
????????????checkExpectToken(tokenType,?expectToken);
????????????Token?preToken?=?tokens.peekPrevious();
????????????/*
?????????????*?在 JSON 中,字符串既可以作為鍵,也可作為值。
?????????????*?作為鍵時(shí),只期待下一個(gè) Token 類(lèi)型為 SEP_COLON。
?????????????*?作為值時(shí),期待下一個(gè)?Token?類(lèi)型為?SEP_COMMA?或?END_OBJECT
?????????????*/
????????????if?(preToken.getTokenType()?==?TokenType.SEP_COLON)?{
????????????????value?=?token.getValue();
????????????????jsonObject.put(key,?value);
????????????????expectToken?=?SEP_COMMA_TOKEN?|?END_OBJECT_TOKEN;
????????????}?else?{
????????????????key?=?token.getValue();
????????????????expectToken?=?SEP_COLON_TOKEN;
????????????}
????????????break;
????????case?SEP_COLON:
????????????checkExpectToken(tokenType,?expectToken);
????????????expectToken?=?NULL_TOKEN?|?NUMBER_TOKEN?|?BOOLEAN_TOKEN?|?STRING_TOKEN
????????????????????|?BEGIN_OBJECT_TOKEN?|?BEGIN_ARRAY_TOKEN;
????????????break;
????????case?SEP_COMMA:
????????????checkExpectToken(tokenType,?expectToken);
????????????expectToken?=?STRING_TOKEN;
????????????break;
????????case?END_DOCUMENT:
????????????checkExpectToken(tokenType,?expectToken);
????????????return?jsonObject;
????????default:
????????????throw?new?JsonParseException("Unexpected?Token.");
????????}
????}
????throw?new?JsonParseException("Parse?error,?invalid?Token.");
}
private?void?checkExpectToken(TokenType?tokenType,?int?expectToken)?{
????if?((tokenType.getTokenCode()?&?expectToken)?==?0)?{
????????throw?new?JsonParseException("Parse?error,?invalid?Token.");
????}
}
parseJsonObject 方法解析流程大致如下:1、讀取一個(gè) Token,檢查這個(gè) Token 是否是其所期望的類(lèi)型。2、如果是,更新期望的 Token 類(lèi)型。否則,拋出異常,并退出。3、重復(fù)步驟1和2,直至所有的 Token 都解析完,或出現(xiàn)異常。上面的步驟并不復(fù)雜,但有可能不好理解。這里舉個(gè)例子說(shuō)明一下,有如下的 Token 序列:
{、 id、 :、 1、 }
parseJsonObject 解析完?{ Token 后,接下來(lái)它將期待 STRING 類(lèi)型的 Token 或者 END_OBJECT 類(lèi)型的 Token 出現(xiàn)。于是 parseJsonObject 讀取了一個(gè)新的 Token,發(fā)現(xiàn)這個(gè) Token 的類(lèi)型是 STRING 類(lèi)型,滿足期望。于是 parseJsonObject 更新期望Token 類(lèi)型為 SEL_COLON,即:。如此循環(huán)下去,直至 Token 序列解析結(jié)束或者拋出異常退出。上面的解析流程雖然不是很復(fù)雜,但在具體實(shí)現(xiàn)的過(guò)程中,還是需要注意一些細(xì)節(jié)問(wèn)題。比如:在 JSON 中,字符串既可以作為鍵,也可以作為值。作為鍵時(shí),語(yǔ)法分析器期待下一個(gè) Token 類(lèi)型為 SEP_COLON。而作為值時(shí),則期待下一個(gè) Token 類(lèi)型為 SEP_COMMA 或 END_OBJECT。所以這里要判斷該字符串是作為鍵還是作為值,判斷方法也比較簡(jiǎn)單,即判斷上一個(gè) Token 的類(lèi)型即可。如果上一個(gè) Token 是 SEP_COLON,即:,那么此處的字符串只能作為值了。否則,則只能做為鍵。對(duì)于整數(shù)類(lèi)型的 Token 進(jìn)行解析時(shí),簡(jiǎn)單點(diǎn)處理,可以直接將該整數(shù)解析成 Long 類(lèi)型。但考慮到空間占用問(wèn)題,對(duì)于?[Integer.MIN_VALUE, Integer.MAX_VALUE]范圍內(nèi)的整數(shù)來(lái)說(shuō),解析成 Integer 更為合適,所以解析的過(guò)程中也需要注意一下。3. 測(cè)試及效果展示為了驗(yàn)證代碼的正確性,這里對(duì)代碼進(jìn)行了簡(jiǎn)單的測(cè)試。測(cè)試數(shù)據(jù)來(lái)自網(wǎng)易音樂(lè),大約有4.5W個(gè)字符。為了避免每次下載數(shù)據(jù),因數(shù)據(jù)發(fā)生變化而導(dǎo)致測(cè)試不通過(guò)的問(wèn)題。我將某一次下載的數(shù)據(jù)保存在了 music.json 文件中,后面每次測(cè)試都會(huì)從文件中讀取數(shù)據(jù)。關(guān)于測(cè)試部分,這里就不貼代碼和截圖了。大家有興趣的話,可以自己下載源碼測(cè)試玩玩。測(cè)試就不多說(shuō)了,接下來(lái)看看 JSON 美化效果展示。這里隨便模擬點(diǎn)數(shù)據(jù),就模擬王者榮耀里的狄仁杰英雄信息吧(對(duì),這個(gè)英雄我經(jīng)常用)。如下圖:關(guān)于 JSON 美化的代碼這里也不講解了,并非重點(diǎn),只算一個(gè)彩蛋吧。4. 寫(xiě)作最后到此,本文差不多要結(jié)束了。本文對(duì)應(yīng)的代碼已經(jīng)放到了 github 上,需要的話,大家可自行下載。傳送門(mén):https://github.com/code4wt/JSONParser
這里需要聲明一下,本文對(duì)應(yīng)的代碼實(shí)現(xiàn)了一個(gè)比較簡(jiǎn)陋的 JSON 解析器,實(shí)現(xiàn)的目的是探究 JSON 的解析原理。JSONParser 只算是一個(gè)練習(xí)性質(zhì)的項(xiàng)目,代碼實(shí)現(xiàn)的并不優(yōu)美,而且缺乏充足的測(cè)試。同時(shí),限于本人的能力(編譯原理基礎(chǔ)基本可以忽略),我并無(wú)法保證本文以及對(duì)應(yīng)的代碼中不出現(xiàn)錯(cuò)誤。如果大家在閱讀代碼的過(guò)程中,發(fā)現(xiàn)了一些錯(cuò)誤,或者寫(xiě)的不好的地方,可以提出來(lái),我來(lái)修改。如果這些錯(cuò)誤對(duì)你造成了困擾,這里先說(shuō)一聲很抱歉。最后,本文及實(shí)現(xiàn)主要參考了一起寫(xiě)一個(gè)JSON解析器和如何編寫(xiě)一個(gè)JSON解析器兩篇文章及兩篇文章對(duì)應(yīng)的實(shí)現(xiàn)代碼,在這里向著兩篇博文的作者表示感謝。推薦閱讀
1、一次性把JVM講清楚,別再被面試官問(wèn)住了
2、axios異步請(qǐng)求數(shù)據(jù)的12種操作
3、一文搞懂前后端分離
4、快速上手Spring Boot+Vue前后端分離
總結(jié)
以上是生活随笔為你收集整理的json解析对应的value为null_徒手撸一个JSON解析器的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: 虫草价格多少钱一斤啊?
- 下一篇: 2560介绍_炒股高手收益翻10倍,只因