當前位置：首頁 > 编程语言 > java >内容正文

java

Java-Java I/O流解读之基于字符的I / O和字符流

發布時間：2025/3/21 java 21 豆豆

生活随笔收集整理的這篇文章主要介紹了 Java-Java I/O流解读之基于字符的I / O和字符流小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

概述
Abstract superclass Reader and Writer
File IO Character-Streams - FileReader FileWriter
Buffered IO Character-Streams - BufferedReader BufferedWriter
Character Set or Charset - Package javaniocharset JDK 14 字符集 - javaniocharsetJDK 14包
Text File IO - InputStreamReader and OutputStreamWriter
代碼

概述

Java內部將字符（字符類型）存儲在16位UCS-2字符集中。但外部數據源/接收器可以將字符存儲在其他字符集（例如US-ASCII，ISO-8859-x，UTF-8，UTF-16等等）中，固定長度為8位或16位，位或以1到4字節的可變長度。 [讀取“字符集和編碼方案”]。

因此，Java必須區分用于處理8位原始字節的基于字節的I / O和用于處理文本的基于字符的I / O。

字符流需要在外部I / O設備使用的字符集和Java內部UCS-2格式之間進行轉換。例如，字符“您”在UCS-2（Java內部）存儲為 “60 A8”，在UTF8中為“E6 82 A8”，GBK / GB2312中為“C4 FA”， BIG5中為“B1 7A”。如果將這個字符寫入文件使用UTF-8，則字符流需要將“60 A8”轉換為“E6 82 A8”。轉換發生在讀取操作中。

字節/字符流是指Java程序中的操作單元，不需要與從外部I / O設備傳送的數據量相對應。這是因為一些字符集使用固定長度的8位（例如，US-ASCII，ISO-8859-1）或16位（例如，UCS-16），而某些使用可變長度為1-4字節例如，UTF-8，UTF-16，UTF-16-BE，UTF-16-LE，GBK，BIG5）。

當使用字符流讀取8位ASCII文件時，將從文件讀取8位數據，并將其放入Java程序的16位字符位置。

Abstract superclass Reader and Writer

除了操作和字符集轉換（這非常復雜）之外，基于字符的I / O幾乎與基于字節的I / O相同。而不是InputStream和OutputStream，我們使用Reader和Writer來進行基于字符的I / O。

抽象超類Reader操作char,它聲明一個抽象方法read（）從輸入源讀取一個字符。

read（）將字符返回為0到65535之間的一個int（Java中的一個char可以被視為一個無符號的16位整數）;

如果檢測到end-of-stream，則為-1。

read（）還有兩個變量可以將一個字符塊讀入char數組。

public abstract int read() throws IOException public int read(char[] chars, int offset, int length) throws IOException public int read(char[] chars) throws IOException

File I/O Character-Streams - FileReader & FileWriter

FileReader和FileWriter是抽象超類Reader和Writer的具體實現，用于從磁盤文件支持I / O。 FileReader / FileWriter假定磁盤文件使用默認字符編碼（charset）。

默認的字符集保存在JVM的系統屬性“file.encoding”中。您可以通過靜態方法java.nio.charset.Charset.defaultCharset（）或System.getProperty（“file.encoding”）獲取默認字符集。

如果默認字符集與ASCII兼容（例如US-ASCII，ISO-8859-x，UTF-8和許多其他，但不是UTF-16，UTF），則使用FileReader / FileWriter可以安全地使用ASCII文本 -16BE，UTF-16LE等等）。

當無法控制文件編碼字符集，不建議使用FileReader / FileWriter。

Buffered I/O Character-Streams - BufferedReader & BufferedWriter

BufferedReader和BufferedWriter可以堆疊在FileReader / FileWriter或其他字符流的頂部，以執行緩沖I / O，而不是逐個字符的讀取。

BufferedReader提供了一個新的方法readLine（），它讀取一行并返回一個String（沒有行分隔符）。

分隔線可以由“\ n”（Unix），“\ r \ n”（Windows）或“\ r”（Mac）分隔。

示例：

package com.xgj.master.java.io.fileDemo.characterStreams;import java.io.BufferedReader; import java.io.BufferedWriter; import java.io.File; import java.io.FileReader; import java.io.FileWriter; import java.io.IOException; import java.nio.charset.Charset;import org.junit.Test;/*** * * @ClassName: BufferedFileReaderWriterJDK7* * @Description: Demo of BufferedFileReader and BufferedWriter in JDK7.* * Write a text message to an output file, then read it back.* * NOTE: FileReader/FileWriter uses the default charset for file* encoding in this demo.* * @author: Mr.Yang* * @date: 2017年9月7日上午11:36:56*/ public class BufferedFileReaderWriterJDK7 {@Testpublic void test() {String fileName = "D:\\xgj.txt";// 2 lines of textsString message = "Character Streams!\nCharacter Stream Operation!\n";// Print the default charsetSystem.out.println(Charset.defaultCharset());System.out.println(System.getProperty("file.encoding"));// JDK7中的寫法// write content to filetry (BufferedWriter bw = new BufferedWriter(new FileWriter(new File(fileName)))) {bw.write(message);bw.flush();// flush} catch (IOException e) {e.printStackTrace();}// read content from filetry (BufferedReader br = new BufferedReader(new FileReader(new File(fileName)))) {String inLine;// BufferedReader provides a new method readLine(), which reads a// line and returns a String , if null means end of charcterStreamswhile ((inLine = br.readLine()) != null) {System.out.println(inLine);}} catch (IOException e) {e.printStackTrace();}}}

Character Set (or Charset) - Package java.nio.charset (JDK 1.4) 字符集 - java.nio.charset（JDK 1.4）包

JDK 1.4提供了一個新的包java.nio.charset作為NIO（New IO）的一部分，以支持Java程序內部使用的Unicode（UCS-2）和以任何其他格式編碼的外部設備之間的字符進行轉換（例如， US-ASCII，ISO-8859-x，UTF-8，UTF-16，UTF-16BE，UTF-16LE等）

主類java.nio.charset.Charset提供了用于測試是否支持特定字符集的靜態方法，通過名稱查找字符集實例，并列出所有可用的字符集和默認字符集。

public static SortedMap<String,Charset> availableCharsets() // lists all the available charsets public static Charset defaultCharset() // Returns the default charset public static Charset forName(String charsetName) // Returns a Charset instance for the given charset name (in String) public static boolean isSupported(String charsetName) // Tests if this charset name is supported

示例：

package com.xgj.master.java.io.fileDemo.characterStreams;import java.nio.charset.Charset;import org.junit.Test;public class TestCharset {@Testpublic void test() {// print the default charstSystem.out.println("The default charset is " + Charset.defaultCharset());System.out.println("The default charset is " + System.getProperty("file.encoding"));// Print the list of available Charsets in name=CharsetSystem.out.println("The available charsets are:");System.out.println(Charset.availableCharsets());// Check if the given charset name is supportedSystem.out.println(Charset.isSupported("UTF-8")); // trueSystem.out.println(Charset.isSupported("UTF8")); // trueSystem.out.println(Charset.isSupported("UTF_8")); // false// Get an instance of a CharsetCharset charset = Charset.forName("UTF8");// Print this Charset nameSystem.out.println(charset.name()); // "UTF-8"// Print all the other aliasesSystem.out.println(charset.aliases()); // [UTF8, unicolor-1-1-utf-8]}}

輸出：

The default charset is UTF-8 The default charset is UTF-8 The available charsets are: {Big5=Big5, Big5-HKSCS=Big5-HKSCS, EUC-JP=EUC-JP, EUC-KR=EUC-KR, GB18030=GB18030, GB2312=GB2312, GBK=GBK, IBM-Thai=IBM-Thai, IBM00858=IBM00858, IBM01140=IBM01140, IBM01141=IBM01141, IBM01142=IBM01142, IBM01143=IBM01143, IBM01144=IBM01144, IBM01145=IBM01145, IBM01146=IBM01146, IBM01147=IBM01147, IBM01148=IBM01148, IBM01149=IBM01149, IBM037=IBM037, IBM1026=IBM1026, IBM1047=IBM1047, IBM273=IBM273, IBM277=IBM277, IBM278=IBM278, IBM280=IBM280, IBM284=IBM284, IBM285=IBM285, IBM290=IBM290, IBM297=IBM297, IBM420=IBM420, IBM424=IBM424, IBM437=IBM437, IBM500=IBM500, IBM775=IBM775, IBM850=IBM850, IBM852=IBM852, IBM855=IBM855, IBM857=IBM857, IBM860=IBM860, IBM861=IBM861, IBM862=IBM862, IBM863=IBM863, IBM864=IBM864, IBM865=IBM865, IBM866=IBM866, IBM868=IBM868, IBM869=IBM869, IBM870=IBM870, IBM871=IBM871, IBM918=IBM918, ISO-2022-CN=ISO-2022-CN, ISO-2022-JP=ISO-2022-JP, ISO-2022-JP-2=ISO-2022-JP-2, ISO-2022-KR=ISO-2022-KR, ISO-8859-1=ISO-8859-1, ISO-8859-13=ISO-8859-13, ISO-8859-15=ISO-8859-15, ISO-8859-2=ISO-8859-2, ISO-8859-3=ISO-8859-3, ISO-8859-4=ISO-8859-4, ISO-8859-5=ISO-8859-5, ISO-8859-6=ISO-8859-6, ISO-8859-7=ISO-8859-7, ISO-8859-8=ISO-8859-8, ISO-8859-9=ISO-8859-9, JIS_X0201=JIS_X0201, JIS_X0212-1990=JIS_X0212-1990, KOI8-R=KOI8-R, KOI8-U=KOI8-U, Shift_JIS=Shift_JIS, TIS-620=TIS-620, US-ASCII=US-ASCII, UTF-16=UTF-16, UTF-16BE=UTF-16BE, UTF-16LE=UTF-16LE, UTF-32=UTF-32, UTF-32BE=UTF-32BE, UTF-32LE=UTF-32LE, UTF-8=UTF-8, windows-1250=windows-1250, windows-1251=windows-1251, windows-1252=windows-1252, windows-1253=windows-1253, windows-1254=windows-1254, windows-1255=windows-1255, windows-1256=windows-1256, windows-1257=windows-1257, windows-1258=windows-1258, windows-31j=windows-31j, x-Big5-HKSCS-2001=x-Big5-HKSCS-2001, x-Big5-Solaris=x-Big5-Solaris, x-euc-jp-linux=x-euc-jp-linux, x-EUC-TW=x-EUC-TW, x-eucJP-Open=x-eucJP-Open, x-IBM1006=x-IBM1006, x-IBM1025=x-IBM1025, x-IBM1046=x-IBM1046, x-IBM1097=x-IBM1097, x-IBM1098=x-IBM1098, x-IBM1112=x-IBM1112, x-IBM1122=x-IBM1122, x-IBM1123=x-IBM1123, x-IBM1124=x-IBM1124, x-IBM1364=x-IBM1364, x-IBM1381=x-IBM1381, x-IBM1383=x-IBM1383, x-IBM300=x-IBM300, x-IBM33722=x-IBM33722, x-IBM737=x-IBM737, x-IBM833=x-IBM833, x-IBM834=x-IBM834, x-IBM856=x-IBM856, x-IBM874=x-IBM874, x-IBM875=x-IBM875, x-IBM921=x-IBM921, x-IBM922=x-IBM922, x-IBM930=x-IBM930, x-IBM933=x-IBM933, x-IBM935=x-IBM935, x-IBM937=x-IBM937, x-IBM939=x-IBM939, x-IBM942=x-IBM942, x-IBM942C=x-IBM942C, x-IBM943=x-IBM943, x-IBM943C=x-IBM943C, x-IBM948=x-IBM948, x-IBM949=x-IBM949, x-IBM949C=x-IBM949C, x-IBM950=x-IBM950, x-IBM964=x-IBM964, x-IBM970=x-IBM970, x-ISCII91=x-ISCII91, x-ISO-2022-CN-CNS=x-ISO-2022-CN-CNS, x-ISO-2022-CN-GB=x-ISO-2022-CN-GB, x-iso-8859-11=x-iso-8859-11, x-JIS0208=x-JIS0208, x-JISAutoDetect=x-JISAutoDetect, x-Johab=x-Johab, x-MacArabic=x-MacArabic, x-MacCentralEurope=x-MacCentralEurope, x-MacCroatian=x-MacCroatian, x-MacCyrillic=x-MacCyrillic, x-MacDingbat=x-MacDingbat, x-MacGreek=x-MacGreek, x-MacHebrew=x-MacHebrew, x-MacIceland=x-MacIceland, x-MacRoman=x-MacRoman, x-MacRomania=x-MacRomania, x-MacSymbol=x-MacSymbol, x-MacThai=x-MacThai, x-MacTurkish=x-MacTurkish, x-MacUkraine=x-MacUkraine, x-MS932_0213=x-MS932_0213, x-MS950-HKSCS=x-MS950-HKSCS, x-MS950-HKSCS-XP=x-MS950-HKSCS-XP, x-mswin-936=x-mswin-936, x-PCK=x-PCK, x-SJIS_0213=x-SJIS_0213, x-UTF-16LE-BOM=x-UTF-16LE-BOM, X-UTF-32BE-BOM=X-UTF-32BE-BOM, X-UTF-32LE-BOM=X-UTF-32LE-BOM, x-windows-50220=x-windows-50220, x-windows-50221=x-windows-50221, x-windows-874=x-windows-874, x-windows-949=x-windows-949, x-windows-950=x-windows-950, x-windows-iso2022jp=x-windows-iso2022jp} true true false UTF-8 [unicode-1-1-utf-8, UTF8]

文件編碼的默認字符集保存在系統屬性“file.encoding”中。
要更改JVM的文件編碼默認字符集，可以使用命令行VM選項“-Dfile.encoding”。
例如，以下命令運行具有UTF-8的默認字符集的程序。

> java -Dfile.encoding=UTF-8 TestCharset

最重要的是，Charset類提供了在Java程序中使用的UCS-2和外部設備（如UTF-8）中使用的特定字符集對字符進行編碼/解碼的方法。

public final ByteBuffer encode(String s) public final ByteBuffer encode(CharBuffer cb) // Encodes Unicode UCS-2 characters in the CharBuffer/String // into a "byte sequence" using this charset, and returns a ByteBuffer.public final CharBuffer decode(ByteBuffer bb) // Decode the byte sequence encoded using this charset in the ByteBuffer // to Unicode UCS-2, and return a charBuffer.

示例：

以下示例在各種編碼方案中對一些Unicode文本進行編碼，并顯示編碼字節序列的十六進制代碼。

package com.xgj.master.java.io.fileDemo.characterStreams;import java.nio.ByteBuffer; import java.nio.charset.Charset;import org.junit.Test;public class TestCharsetEncodeDecode {@Testpublic void test() {// Try these charsets for encodingString[] charsetNames = { "US-ASCII", "ISO-8859-1", "UTF-8", "UTF-16", "UTF-16BE", "UTF-16LE", "GBK", "BIG5" };String message = "Hi,您好!"; // Unicode message to be encoded// Print UCS-2 in hex codesSystem.out.printf("%10s: ", "UCS-2");for (int i = 0; i < message.length(); ++i) {System.out.printf("%04X ", (int) message.charAt(i));}System.out.println();for (String charsetName : charsetNames) {// Get a Charset instance given the charset name stringCharset charset = Charset.forName(charsetName);System.out.printf("%10s: ", charset.name());// Encode the Unicode UCS-2 characters into a byte sequence in this// charset.ByteBuffer bb = charset.encode(message);while (bb.hasRemaining()) {System.out.printf("%02X ", bb.get()); // Print hex code}System.out.println();bb.rewind();}} }

輸出結果解讀：

示例二：

以下示例嘗試使用CharBuffer和ByteBuffer進行編碼/解碼

package com.xgj.master.java.io.fileDemo.characterStreams;import java.nio.ByteBuffer; import java.nio.CharBuffer; import java.nio.charset.Charset;import org.junit.Test;public class TestCharsetEncodeByteBuffer {@Testpublic void test() {// "Hi,您好!"byte[] bytes = { 0x00, 0x48, 0x00, 0x69, 0x00, 0x2C, 0x60, (byte) 0xA8, 0x59, 0x7D, 0x00, 0x21 };// Print UCS-2 in hex codesSystem.out.printf("%10s: ", "UCS-2");for (int i = 0; i < bytes.length; ++i) {System.out.printf("%02X ", bytes[i]);}System.out.println();Charset charset = Charset.forName("UTF-8");// Encode from UCS-2 to UTF-8// Create a ByteBuffer by wrapping a byte arrayByteBuffer bb = ByteBuffer.wrap(bytes);// Create a CharBuffer from a view of this ByteBufferCharBuffer cb = bb.asCharBuffer();ByteBuffer bbOut = charset.encode(cb);// Print hex codeSystem.out.printf("%10s: ", charset.name());while (bbOut.hasRemaining()) {System.out.printf("%02X ", bbOut.get());}System.out.println();// Decode from UTF-8 to UCS-2bbOut.rewind();CharBuffer cbOut = charset.decode(bbOut);System.out.printf("%10s: ", "UCS-2");while (cbOut.hasRemaining()) {char aChar = cbOut.get();// Print char & hex codeSystem.out.printf("'%c'[%04X] ", aChar, (int) aChar);}System.out.println();} }

運行結果：

UCS-2: 00 48 00 69 00 2C 60 A8 59 7D 00 21 UTF-8: 48 69 2C E6 82 A8 E5 A5 BD 21 UCS-2: 'H'[0048] 'i'[0069] ','[002C] '您'[60A8] '好'[597D] '!'[0021]

Text File I/O - InputStreamReader and OutputStreamWriter

如前所述，Java內部存儲16位UCS-2字符集中的字符（字符類型）。但外部數據源/接收器可以將字符存儲在其他字符集（例如US-ASCII，ISO-8859-x，UTF-8，UTF-16等等）中，固定長度為8位或16位，位或以1到4字節的可變長度。

前面介紹的FileReader / FileWriter使用默認字符集進行解碼/編碼，導致非便攜式程序。

要選擇字符集，我們需要使用InputStreamReader和OutputStreamWriter。 InputStreamReader和OutputStreamWriter被認為是字節到字符的橋梁。

我們以在InputStreamReader的構造函數中選擇字符集：

public InputStreamReader(InputStream in) // Use default charset public InputStreamReader(InputStream in, String charsetName) throws UnsupportedEncodingException public InputStreamReader(InputStream in, Charset cs)

我們可以通過靜態方法java.nio.charset.Charset.availableCharsets（）列出可用的字符集。 Java支持的常用字符串名稱如下：

“US-ASCII”: 7-bit ASCII (aka ISO646-US)
“ISO-8859-1”: Latin-1
“UTF-8”: Most commonly-used encoding scheme for Unicode
“UTF-16BE”: Big-endian (big byte first) (big-endian is usually the
default)
“UTF-16LE”: Little-endian (little byte first)
“UTF-16”: with a 2-byte BOM (Byte-Order-Mark) to specify the byte
order. FE FF indicates big-endian, FF FE indicates little-endian.

由于InputStreamReader / OutputStreamWriter通常需要以多個字節讀/寫，最好用BufferedReader / BufferedWriter包裝它。

示例：

以下程序使用各種字符集將Unicode文本寫入磁盤文件進行文件編碼。然后，它逐個字節（通過基于字節的輸入流）讀取文件，以檢查各種字符集中的編碼字符。最后，它使用基于字符的reader讀取文件。

package com.xgj.master.java.io.fileDemo.characterStreams;import java.io.BufferedInputStream; import java.io.BufferedReader; import java.io.BufferedWriter; import java.io.FileInputStream; import java.io.FileOutputStream; import java.io.IOException; import java.io.InputStreamReader; import java.io.OutputStreamWriter;import org.junit.Test;/*** * * @ClassName: TextFileEncodingJDK7* * @Description: Write texts to file using OutputStreamWriter specifying its* charset encoding.* * Read byte-by-byte using FileInputStream.* * Read char-by-char using InputStreamReader specifying its* charset encoding.* * @author: Mr.Yang* * @date: 2017年9月7日下午1:35:15*/ public class TextFileEncodingJDK7 {@Testpublic void test() {String message = "Hi,您好!"; // with non-ASCII chars// Java internally stores char in UCS-2/UTF-16// Print the characters stored with Hex codesfor (int i = 0; i < message.length(); ++i) {char aChar = message.charAt(i);System.out.printf("[%d]'%c'(%04X) ", (i + 1), aChar, (int) aChar);}System.out.println();// Try these charsets for encoding text fileString[] csStrs = { "UTF-8", "UTF-16BE", "UTF-16LE", "UTF-16", "GB2312", "GBK", "BIG5" };String outFileExt = "-out.txt"; // Output filenames are// "charset-out.txt"// Write text file in the specified file encoding charsetfor (int i = 0; i < csStrs.length; ++i) {// Buffered for efficiencytry (OutputStreamWriter out = new OutputStreamWriter(new FileOutputStream(csStrs[i] + outFileExt),csStrs[i]); BufferedWriter bufOut = new BufferedWriter(out)) {// Print file encoding charset efficiencySystem.out.println(out.getEncoding());bufOut.write(message);bufOut.flush();} catch (IOException ex) {ex.printStackTrace();}}// Read raw bytes from various encoded files// to check how the characters were encoded.for (int i = 0; i < csStrs.length; ++i) {// Buffered for efficiencytry (BufferedInputStream in = new BufferedInputStream(new FileInputStream(csStrs[i] + outFileExt))) {// Print file encoding charsetSystem.out.printf("%10s", csStrs[i]);int inByte;while ((inByte = in.read()) != -1) {// Print Hex codesSystem.out.printf("%02X ", inByte);}System.out.println();} catch (IOException ex) {ex.printStackTrace();}}// Read text file with character-stream specifying its encoding.// The char will be translated from its file encoding charset to// Java internal UCS-2.for (int i = 0; i < csStrs.length; ++i) {// Buffered for efficiencytry (InputStreamReader in = new InputStreamReader(new FileInputStream(csStrs[i] + outFileExt), csStrs[i]);BufferedReader bufIn = new BufferedReader(in)) {// print file encoding charsetSystem.out.println(in.getEncoding());int inChar;int count = 0;while ((inChar = in.read()) != -1) {++count;System.out.printf("[%d]'%c'(%04X) ", count, (char) inChar, inChar);}System.out.println();} catch (IOException ex) {ex.printStackTrace();}}} }

輸出分析：

從輸出中可以看出，“您好”的字符在不同的字符集中被不同地編碼。
盡管如此，InputStreamReader能夠將字符轉換為與Java程序中使用的相同的UCS-2

代碼

代碼已托管到Github—> https://github.com/yangshangwei/JavaMaster

總結

以上是生活随笔為你收集整理的Java-Java I/O流解读之基于字符的I / O和字符流的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

字符
JAVA

上一篇： Java-Java I/O流解读之基于字
下一篇： Java-Java I/O流解读之jav