當前位置：首頁 > 编程语言 > java >内容正文

java

Java - 从文件压缩聊一聊I/O一二事

發(fā)布時間：2025/3/21 java 12 豆豆

生活随笔收集整理的這篇文章主要介紹了 Java - 从文件压缩聊一聊I/O一二事小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

文章目錄

背景
問題復(fù)現(xiàn)
- Version1: no buffer
- Version 2 : with buffer
- - 提速原因源碼分析
- Version 3 : nio - Channel
- Version 4 : nio - Channel With Buffer
- Version 5 : MMAP
- Version 6 : PIPE
擴展知識
- 內(nèi)核空間和用戶空間
- 直接緩沖區(qū)和非直接緩沖區(qū)
- - 非直接緩沖區(qū)
  - 直接緩沖區(qū)
  - 比較

背景

有個文件壓縮的需求，小伙伴一頓操作猛如虎，小文件那是咔咔一頓騷

可是突然一個幾十兆的文件，跑了100秒還沒出來。。。。

/*** @author 小工匠* @version 1.0* @description: TODO* @date 2021/2/3 16:40* @mark: show me the code , change the world*/ public class FileCompress {//要壓縮的文件所在所存放位置public static String COMPRESS_FILE_PATH = "D:/test/1.pdf";//zip壓縮包所存放的位置public static String ZIP_FILE = "D:/test/1.zip";//要壓縮的文件public static File COMPRESS_FILE = null;//文件大小public static long FILE_SIZE = 0;//文件名public static String FILE_NAME = "";//文件后綴名public static String SUFFIX_FILE = "";static {File file = new File(COMPRESS_FILE_PATH);COMPRESS_FILE = file;FILE_NAME = file.getName();FILE_SIZE = file.length();SUFFIX_FILE = FILE_NAME.substring(FILE_NAME.indexOf('.'));}public static void main(String[] args) throws RunnerException {.................................} }

問題復(fù)現(xiàn)

為了說明問題，模擬下事發(fā)現(xiàn)場

Version1: no buffer

public static void zipFileVersion1() {File zipFile = new File(ZIP_FILE);try (ZipOutputStream zipOut = new ZipOutputStream(new FileOutputStream(zipFile))) {//開始時間long beginTime = System.currentTimeMillis();try (InputStream input = new FileInputStream(COMPRESS_FILE)) {zipOut.putNextEntry(new ZipEntry(FILE_NAME + 1));int temp = 0;while ((temp = input.read()) != -1) {zipOut.write(temp);}}long cost = (System.currentTimeMillis() - beginTime);System.out.println("fileSize:" + FILE_SIZE / 1024 / 1024 + "M");System.out.println("zip file cost time:" + cost / 1000 + "s");} catch (Exception e) {e.printStackTrace();}}

就壓縮一個pdf ， 60來兆

問題很明顯，連緩沖也不用，面子不能給呀

Version 2 : with buffer

public static void zipFileVersion() {long beginTime = System.currentTimeMillis();FileOutputStream fos = null;ZipOutputStream zos = null;try {byte[] buffer = new byte[1024];fos = new FileOutputStream(ZIP_FILE);zos = new ZipOutputStream(fos);File srcFile = COMPRESS_FILE;FileInputStream fis = new FileInputStream(srcFile);zos.putNextEntry(new ZipEntry("artisan" + SUFFIX_FILE));int length;while ((length = fis.read(buffer)) > 0) {zos.write(buffer, 0, length);}zos.closeEntry();fis.close();} catch (IOException e) {System.out.println("Error : " + e);} finally {try {zos.close();} catch (IOException e) {e.printStackTrace();}long cost = (System.currentTimeMillis() - beginTime);System.out.println("fileSize:" + FILE_SIZE / 1024 / 1024 + "M");System.out.println("test zip file cost time:" + cost + "ms");}}

public static void zipFileVersion2() { long beginTime = System.currentTimeMillis();File zipFile = new File(ZIP_FILE);try {ZipOutputStream zipOut = new ZipOutputStream(new BufferedOutputStream(new FileOutputStream(zipFile)));BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(zipOut);BufferedInputStream bufferedInputStream = new BufferedInputStream(new FileInputStream(COMPRESS_FILE));zipOut.putNextEntry(new ZipEntry("artisan"+ SUFFIX_FILE));int temp = 0;while ((temp = bufferedInputStream.read()) != -1) {bufferedOutputStream.write(temp);}zipOut.closeEntry();bufferedInputStream.close();zipOut.close();long cost = (System.currentTimeMillis() - beginTime);System.out.println("fileSize:" + FILE_SIZE / 1024 / 1024 + "M");System.out.println("zip file cost time:" + cost + "ms");} catch (Exception e) {e.printStackTrace();}}

wtf , 直接干到2.8秒，發(fā)生了什么

提速原因源碼分析

我們先看一下，version1的核心讀取文件的方法

FileInputStream#read

可以看到read0() 一個調(diào)用本地方法與原生操作系統(tǒng)進行交互，從磁盤中讀取數(shù)據(jù)。每讀取一個字節(jié)的數(shù)據(jù)就調(diào)用一次本地方法與操作系統(tǒng)交互，一個63M的文檔，轉(zhuǎn)換成直接，那得交互多少次…那耗時…

而如果使用緩沖區(qū)的話（這里假設(shè)初始的緩沖區(qū)大小足夠放下63M的數(shù)據(jù)）那么只需要調(diào)用一次就行。因為緩沖區(qū)在第一次調(diào)用read()方法的時候會直接從磁盤中將數(shù)據(jù)直接讀取到內(nèi)存中，隨后再一個字節(jié)一個字節(jié)的慢慢返回。

可以看到 BufferedInputStream內(nèi)部封裝了一個byte數(shù)組用于存放數(shù)據(jù)，默認大小是8192

Version 3 : nio - Channel

滿足了嗎？

上面都是傳統(tǒng)I/O操作，不想用用nio么？

NIO中的Channel和ByteBuffer，它們的結(jié)構(gòu)更加符合操作系統(tǒng)執(zhí)行I/O的方式，所以其速度相比較于傳統(tǒng)IO而言速度有了顯著的提高。

Channel管道比作成鐵路，buffer緩沖區(qū)比作成火車(運載著貨物) .

NIO就是通過Channel管道運輸著存儲數(shù)據(jù)的Buffer緩沖區(qū)的來實現(xiàn)數(shù)據(jù)的處理

在NIO中能夠產(chǎn)生FileChannel的有三個類分別是

FileInputStream
FileOutputStream
既能讀又能寫的RandomAccessFile

public static void zipFileVersion3() { long beginTime = System.currentTimeMillis();File zipFile = new File(ZIP_FILE);try (ZipOutputStream zipOut = new ZipOutputStream(new FileOutputStream(zipFile));WritableByteChannel writableByteChannel = Channels.newChannel(zipOut)) {try (FileChannel fileChannel = new FileInputStream(COMPRESS_FILE).getChannel()) {zipOut.putNextEntry(new ZipEntry("artisan" + SUFFIX_FILE));fileChannel.transferTo(0, FILE_SIZE, writableByteChannel);}long cost = (System.currentTimeMillis() - beginTime);System.out.println("fileSize:" + FILE_SIZE / 1024 / 1024 + "M");System.out.println("zip file cost time:" + cost + "ms");} catch (Exception e) {e.printStackTrace();}}

可以看到這里并沒有使用ByteBuffer進行數(shù)據(jù)傳輸，而是使用了transferTo的方法。這個方法是將兩個通道進行直連。

來看一下官方的說明

This method is potentially much more efficient than a simple loop * that reads from this channel and writes to the target channel. Many * operating systems can transfer bytes directly from the filesystem cache * to the target channel without actually copying them.

大概意思就是使用transferTo的效率比循環(huán)一個Channel讀取出來然后再循環(huán)寫入另一個Channel好。操作系統(tǒng)能夠直接傳輸字節(jié)從文件系統(tǒng)緩存到目標的Channel中，而不需要實際的copy階段。

那什么是copy階段呢？【從內(nèi)核空間轉(zhuǎn)到用戶空間的一個過程】

Version 4 : nio - Channel With Buffer

public static void zipFileChannelBuffer() { long beginTime = System.currentTimeMillis();File zipFile = new File(ZIP_FILE);try (ZipOutputStream zipOut = new ZipOutputStream(new BufferedOutputStream(new FileOutputStream(zipFile)));WritableByteChannel writableByteChannel = Channels.newChannel(zipOut)) {try (FileChannel fileChannel = new FileInputStream(COMPRESS_FILE).getChannel()) {zipOut.putNextEntry(new ZipEntry("artisan" + SUFFIX_FILE));fileChannel.transferTo(0, FILE_SIZE, writableByteChannel);}printInfo(beginTime);} catch (Exception e) {e.printStackTrace();}}

Version 5 : MMAP

NIO中新出的另一個特性就是內(nèi)存映射文件，內(nèi)存映射文件為什么速度快呢？其實是在內(nèi)存中開辟了一段直接緩沖區(qū),與數(shù)據(jù)直接作交互。

public static void zipFileMMAP() {//開始時間long beginTime = System.currentTimeMillis();File zipFile = new File(ZIP_FILE);try (ZipOutputStream zipOut = new ZipOutputStream(new FileOutputStream(zipFile));WritableByteChannel writableByteChannel = Channels.newChannel(zipOut)) {zipOut.putNextEntry(new ZipEntry("artisan" + SUFFIX_FILE));//內(nèi)存中的映射文件MappedByteBuffer mappedByteBuffer = new RandomAccessFile(COMPRESS_FILE_PATH, "r").getChannel().map(FileChannel.MapMode.READ_ONLY, 0, FILE_SIZE);writableByteChannel.write(mappedByteBuffer);long cost = (System.currentTimeMillis() - beginTime);System.out.println("fileSize:" + FILE_SIZE / 1024 / 1024 + "M");System.out.println("mmap file cost time:" + cost + "ms");} catch (Exception e) {e.printStackTrace();}}

Version 6 : PIPE

Java NIO 管道是2個線程之間的單向數(shù)據(jù)連接。Pipe有一個source通道和一個sink通道。其中source通道用于讀取數(shù)據(jù)，sink通道用于寫入數(shù)據(jù)。

Whether or not a thread writing bytes to a pipe will block until anotherthread reads those bytes

大概意思就是寫入線程會阻塞至有讀線程從通道中讀取數(shù)據(jù)。如果沒有數(shù)據(jù)可讀，讀線程也會阻塞至寫線程寫入數(shù)據(jù)。直至通道關(guān)閉。

public static void zipFilePip() {long beginTime = System.currentTimeMillis();try(WritableByteChannel out = Channels.newChannel(new FileOutputStream(ZIP_FILE))) {Pipe pipe = Pipe.open();//異步任務(wù)CompletableFuture.runAsync(()->runTask(pipe));//獲取讀通道ReadableByteChannel readableByteChannel = pipe.source();ByteBuffer buffer = ByteBuffer.allocate(((int) FILE_SIZE)*10);while (readableByteChannel.read(buffer)>= 0) {buffer.flip();out.write(buffer);buffer.clear();}}catch (Exception e){e.printStackTrace();}printInfo(beginTime);}//異步任務(wù)public static void runTask(Pipe pipe) {try(ZipOutputStream zos = new ZipOutputStream(Channels.newOutputStream(pipe.sink()));WritableByteChannel out = Channels.newChannel(zos)) {System.out.println("Begin");zos.putNextEntry(new ZipEntry("artisan"+SUFFIX_FILE));FileChannel jpgChannel = new FileInputStream(new File(COMPRESS_FILE_PATH)).getChannel();jpgChannel.transferTo(0, FILE_SIZE, out);jpgChannel.close();}catch (Exception e){e.printStackTrace();}}

擴展知識

內(nèi)核空間和用戶空間

在常用的操作系統(tǒng)中為了保護系統(tǒng)中的核心資源，于是將系統(tǒng)設(shè)計為四個區(qū)域，越往里權(quán)限越大，所以Ring0被稱之為內(nèi)核空間，用來訪問一些關(guān)鍵性的資源。Ring3被稱之為用戶空間。

用戶態(tài)、內(nèi)核態(tài)：線程處于內(nèi)核空間稱之為內(nèi)核態(tài)，線程處于用戶空間屬于用戶態(tài)。

首先需要明確的一點是：應(yīng)用程序是都屬于用戶態(tài) 。那么如果應(yīng)用程序需要訪問核心資源怎么辦呢？

那就需要調(diào)用內(nèi)核中所暴露出的接口用以調(diào)用，稱之為系統(tǒng)調(diào)用。比如需要訪問磁盤上的文件。此時應(yīng)用程序就會調(diào)用系統(tǒng)調(diào)用的接口open方法，然后內(nèi)核去訪問磁盤中的文件，將文件內(nèi)容返回給應(yīng)用程序。

大致的流程如下

直接緩沖區(qū)和非直接緩沖區(qū)

非直接緩沖區(qū)

NIO通過Channel連接磁盤文件與應(yīng)用程序，通過ByteBuffer緩沖區(qū)存取數(shù)據(jù)進行雙向的數(shù)據(jù)傳輸。

物理磁盤的存取是操作系統(tǒng)進行管理的，與物理磁盤的數(shù)據(jù)操作需要經(jīng)過內(nèi)核地址空間 ,而應(yīng)用程序是通過JVM分配的緩沖空間。一個屬于內(nèi)核空間，一個屬于應(yīng)用空間，而數(shù)據(jù)需要在內(nèi)核空間和用戶空間進行數(shù)據(jù)的來回拷貝。

那有什么辦法避免用戶態(tài)和內(nèi)核態(tài)的切換嗎。少切換是不是可以提高效率呢？

有的，直接緩沖區(qū)

直接緩沖區(qū)

直接緩沖區(qū)則不再通過內(nèi)核地址空間和用戶地址空間的緩存數(shù)據(jù)的復(fù)制傳遞，而是在物理內(nèi)存中申請了一塊空間，這塊空間映射到內(nèi)核地址空間和用戶地址空間，應(yīng)用程序與磁盤之間的數(shù)據(jù)存取之間通過這塊直接申請的物理內(nèi)存進行。

比較

那既然直接緩沖區(qū)的性能更高、效率更快，為什么還要存在兩種緩沖區(qū)呢？因為直接緩沖區(qū)也存在著一些缺點：

（1）不安全

（2）消耗更多，因為它不是在JVM中直接開辟空間。這部分內(nèi)存的回收只能依賴于垃圾回收機制，垃圾什么時候回收不受我們控制。

（3）數(shù)據(jù)寫入物理內(nèi)存緩沖區(qū)中，程序就喪失了對這些數(shù)據(jù)的管理，即什么時候這些數(shù)據(jù)被最終寫入從磁盤只能由操作系統(tǒng)來決定，應(yīng)用程序無法再干涉。

所以剛才使用transferTo方法就是直接開辟了一段直接緩沖區(qū)。所以性能相比而言提高了許多

總結(jié)

以上是生活随笔為你收集整理的Java - 从文件压缩聊一聊I/O一二事的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇：深入理解分布式技术 - 分库分表后的扩容
下一篇： OS - MMAP初探