Google Archive Patch 源码解析
如果你覺得本篇文章太長,可以直接看我總結的結論:
Google Archive Patch是嚴格的基于Zip文件格式的差量算法,其核心生成差量的算法還是BsDiff,核心合成文件的算法還是BsPatch,只是它將舊Zip文件和新Zip文件里的內容解壓出來分別轉為了差量友好的一個文件,使用差量算法生成差量文件;合成時,將舊Zip文件里的內容解壓出來轉為差量友好的一個文件,應用合成算法,生成新文件的差量友好的一個文件,再利用patch文件中每個ZipEntry的偏移和長度,以及壓縮等級,編碼策略,nowrap等標記,將其恢復為Zip文件。之所以使用差量友好的文件是因為一個文件,如果未壓縮,那么可以很簡單的描述其變化,如字符串”abc”,變為了”abcd”,我們可以直觀的描述其變化,增加了一個字符”d”;但是如果字符串經過了壓縮,那么這個變化不再可以這么容易的被描述。因此需要將壓縮后的文件轉為未壓縮的文件,生成差量友好的文件。
生成慢:Google Archive Patch之所以生成Patch的時間變長了,是因為Zip文件解壓出來后,生成的差量友好的文件變大了,因此使用BsDiff時,耗費的時間變長了。比如解壓出來后變大了2倍,則時間消耗變為了原來整個文件的生成差量的時間的2倍。
合成慢:合成的時間變長了,一方面的消耗也是因為生成差量友好的文件變大了,但是這不是本質原因,BsPatch合成是極快的,就算double倍時間,這點時間也是可以忽略不計的。其耗費時間的根本問題還在于重新生成zip文件時需要對流做各種判斷操作,一部分數據需要壓縮,一部分數據需要拷貝,基本上大部分的耗時操作都花在了數據的壓縮操作上。
生成文件小:小的原因上面已經解釋過了,是因為基于差量友好的文件生成差量文件的,文件間的變換變得很容易描述。
項目地址
Google Archive Patch
主要看三個模塊,一個是shared,一個是generator,另一個是applier。shared為另外兩個的公共模塊,generator為差量生成模塊,applier為差量應用模塊,其中generator中實現了一份java版的bsdiff算法,applier中實現了一份java版的bspatch算法。
Zip文件格式
Google Archive Patch是嚴格基于Zip文件格式的差量算法,因此有必要了解一下Zip文件格式。參考了網上的幾篇文章,發現其介紹文件格式的時候犯了一個小問題,他們都是正序的介紹其構成,但是其實應該倒過來,這樣更加便于理解。
一個Zip文件一般有三段構成
我們一一來解釋這三段,首先看最后一段
End of Central Directory
| 0 | 4 | End of Central Directory SIGNATURE = 0x06054b50 | 區塊頭部標記,固定值0x06054b50 |
| 4 | 2 | disk number for this archive | 忽略 |
| 6 | 2 | disk number for the central directory | 忽略 |
| 8 | 2 | num entries in the central directory on this disk | 忽略 |
| 10 | 2 | num entries in the central directory overall | 核心目錄結構總數 |
| 12 | 4 | the length of the central directory | 核心目錄的大小 |
| 16 | 4 | the file offset of the central directory | 核心目錄的偏移 |
| 20 | 2 | the length of the zip file comment | 注釋長度 |
| 22 | n | from here to the EOF is the zip file comment | 注釋內容 |
該段由一個表格所示的結構構成。這段的作用就是為了找出Central Directory的位置。
Central Directory
由End of Central Directory可以索引出Central Directory,看看其構成。
| 0 | 4 | Central Directory SIGNATURE = 0x02014b50 | 區塊頭部標記,固定值0x02014b50 |
| 4 | 2 | the version-made-by | 忽略 |
| 6 | 2 | the version-needed-to-extract | 忽略 |
| 8 | 2 | the general-purpose flags, read for language encoding | 通用位標記 |
| 10 | 2 | the compression method | 壓縮方法 |
| 12 | 2 | the MSDOS last modified file time | 文件最后修改時間 |
| 14 | 2 | the MSDOS last modified file date | 文件最后修改日期 |
| 16 | 4 | the CRC32 of the uncompressed data | crc32校驗碼 |
| 20 | 4 | the compressed size | 壓縮后的大小 |
| 24 | 4 | the uncompressed size | 未壓縮的大小 |
| 28 | 2 | the length of the file name | 文件名長度 |
| 30 | 2 | the length of the extras | 擴展域長度 |
| 32 | 2 | the length of the comment | 文件注釋長度 |
| 34 | 2 | the disk number | 忽略 |
| 36 | 2 | the internal file attributes | 忽略 |
| 38 | 4 | the external file attributes | 忽略 |
| 42 | 4 | the offset of the local section entry, where the data is | local entry所在偏移 |
| 46 | i | the file name | 文件名 |
| 46+i | j | the extras | 擴展域 |
| 46+i+j | k | the comment | 文件注釋 |
該段由n個表格表示的結構構成。這段的作用就是為了找出Zip文件真實數據所在的位置。
Contents of ZIP entries
由Central Directory段可以索引出Local Entry段,最后看一下Local Entry段
| 0 | 4 | Local Entry SIGNATURE = 0x04034b50 | 區塊頭部標記,固定值0x04034b50 |
| 4 | 2 | the version-needed-to-extract | 忽略 |
| 6 | 2 | the general-purpose flags | 通用位標記 |
| 8 | 2 | the compression method | 壓縮方法 |
| 10 | 2 | the MSDOS last modified file time | 文件最后修改時間 |
| 12 | 2 | the MSDOS last modified file date | 文件最后修改日期 |
| 14 | 4 | the CRC32 of the uncompressed data | crc32校驗碼 |
| 18 | 4 | the compressed size | 壓縮后的大小 |
| 22 | 4 | the uncompressed size | 未壓縮的大小 |
| 26 | 2 | the length of the file name | 文件名長度 |
| 28 | 2 | the length of the extras | 擴展域長度 |
| 30 | i | the file name | 文件名 |
| 30+i | j | the extras | 擴展區 |
| 30+i+j | k | file data | 真實壓縮數據所在位置 |
該段由n個表格表示的結構構成。
Google Archive Patch解析Zip文件代碼
Google Archive Patch內部實現了一個解析Zip文件的mini結構,解析的工作主要由com.google.archivepatcher.generator.MinimalZipParser類負責,承載解析出來的數據主要由MinimalCentralDirectoryMetadata、MinimalZipArchive和MinimalZipEntry負責。解析完成后,最終輸出的是一個按照偏移量排序的MinimalZipEntry列表。
| 12345678910111213141516171819202122232425262728293031323334353637383940414243444546 | private static List<MinimalZipEntry> listEntriesInternal(RandomAccessFileInputStream in) throws IOException { // Step 1: Locate the end-of-central-directory record header. long offsetOfEocd = MinimalZipParser.locateStartOfEocd(in, 32768); if (offsetOfEocd == -1) { // Archive is weird, abort. throw new ZipException("EOCD record not found in last 32k of archive, giving up");} // Step 2: Parse the end-of-central-directory data to locate the central directory itselfin.setRange(offsetOfEocd, in.length() - offsetOfEocd);MinimalCentralDirectoryMetadata centralDirectoryMetadata = MinimalZipParser.parseEocd(in); // Step 3: Extract a list of all central directory entries (contiguous data stream)in.setRange(centralDirectoryMetadata.getOffsetOfCentralDirectory(),centralDirectoryMetadata.getLengthOfCentralDirectory());List<MinimalZipEntry> minimalZipEntries = new ArrayList<MinimalZipEntry>(centralDirectoryMetadata.getNumEntriesInCentralDirectory()); for (int x = 0; x < centralDirectoryMetadata.getNumEntriesInCentralDirectory(); x++) {minimalZipEntries.add(MinimalZipParser.parseCentralDirectoryEntry(in));} // Step 4: Sort the entries in file order, not central directory order.Collections.sort(minimalZipEntries, LOCAL_ENTRY_OFFSET_COMAPRATOR); // Step 5: Seek out each local entry and calculate the offset of the compressed data within for (int x = 0; x < minimalZipEntries.size(); x++) {MinimalZipEntry entry = minimalZipEntries.get(x); long offsetOfNextEntry; if (x < minimalZipEntries.size() - 1) { // Don't allow reading past the start of the next entry, for sanity.offsetOfNextEntry = minimalZipEntries.get(x + 1).getFileOffsetOfLocalEntry();} else { // Last entry. Don't allow reading into the central directory, for sanity.offsetOfNextEntry = centralDirectoryMetadata.getOffsetOfCentralDirectory();} long rangeLength = offsetOfNextEntry - entry.getFileOffsetOfLocalEntry();in.setRange(entry.getFileOffsetOfLocalEntry(), rangeLength); long relativeDataOffset = MinimalZipParser.parseLocalEntryAndGetCompressedDataOffset(in);entry.setFileOffsetOfCompressedData(entry.getFileOffsetOfLocalEntry() + relativeDataOffset);} // Done! return minimalZipEntries;} |
以上代碼主要做了下面幾件事
- 定位End of Central Directory起始偏移量
- 找到Central Directory段
- 解析Central Directory段
- 排序,按照偏移量升序
- 解析真實數據,找到其偏移量
如何定位End of Central Directory起始偏移量,其實很簡單,掃描字節,找到特定的頭部,即0x06054b50,其內部實現是掃描zip文件的最后32k部分的字節數組,找到了就返回,找不到就拋異常。這里有一個問題,如果最后32k找不到怎么辦,找了相關資料,也沒找到End of Central Directory一定在最后32k的說法,翻了Android Multidex的實現,發現它掃描的是最后64k的字節,這里就姑且認為它一定能掃描得到吧。其實現如下:
| 12345678910111213141516171819202122232425 | public static long locateStartOfEocd(RandomAccessFileInputStream in, int searchBufferLength) throws IOException { final int maxBufferSize = (int) Math.min(searchBufferLength, in.length()); final byte[] buffer = new byte[maxBufferSize];//32k final long rangeStart = in.length() - buffer.length;in.setRange(rangeStart, buffer.length);readOrDie(in, buffer, 0, buffer.length);//read to buffer int offset = locateStartOfEocd(buffer);//locate if (offset == -1) { return -1;} return rangeStart + offset;} public static int locateStartOfEocd(byte[] buffer) { int last4Bytes = 0; // This is the 32 bits of data from the file for (int offset = buffer.length - 1; offset >= 0; offset--) {last4Bytes <<= 8;last4Bytes |= buffer[offset]; if (last4Bytes == EOCD_SIGNATURE) {//0x06054b50 return offset;}} return -1;} |
找到End of Central Directory的起始偏移位置之后,就是解析該段數據,返回MinimalCentralDirectoryMetadata數據結構了。解析代碼如下:
| 1234567891011121314151617181920212223242526 | public static MinimalCentralDirectoryMetadata parseEocd(InputStream in) throws IOException, ZipException { if (((int) read32BitUnsigned(in)) != EOCD_SIGNATURE) {//0x06054b50 throw new ZipException("Bad eocd header");} // *** 4 bytes encode EOCD_SIGNATURE, ignore (already found and verified). // 2 bytes encode disk number for this archive, ignore. // 2 bytes encode disk number for the central directory, ignore. // 2 bytes encode num entries in the central directory on this disk, ignore. // *** 2 bytes encode num entries in the central directory overall [READ THIS] // *** 4 bytes encode the length of the central directory [READ THIS] // *** 4 bytes encode the file offset of the central directory [READ THIS] // 2 bytes encode the length of the zip file comment, ignore. // Everything else from here to the EOF is the zip file comment, or junk. Ignore.skipOrDie(in, 2 + 2 + 2); int numEntriesInCentralDirectory = read16BitUnsigned(in);//number if (numEntriesInCentralDirectory == 0xffff) { // If 0xffff, this is a zip64 archive and this code doesn't handle that. throw new ZipException("No support for zip64");} long lengthOfCentralDirectory = read32BitUnsigned(in);//length long offsetOfCentralDirectory = read32BitUnsigned(in);//offset return new MinimalCentralDirectoryMetadata(numEntriesInCentralDirectory, offsetOfCentralDirectory, lengthOfCentralDirectory);} |
從代碼中可以看出,其實只是解析出了三個重要的數據,分別是:
- Central Directory 個數 n
- Central Directory起始偏移 offset
- Central Directory總長度 length
之后就是鎖定數據區域在[offset,offest+length],內部實現是RandomAccessFile。for循環,循環次數為n,依次解析各個Central Directory。其解析單個Central Directory的代碼如下:
| 1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950 | public static MinimalZipEntry parseCentralDirectoryEntry(InputStream in) throws IOException { // *** 4 bytes encode the CENTRAL_DIRECTORY_ENTRY_SIGNATURE, verify for sanity // 2 bytes encode the version-made-by, ignore // 2 bytes encode the version-needed-to-extract, ignore // *** 2 bytes encode the general-purpose flags, read for language encoding. [READ THIS] // *** 2 bytes encode the compression method, [READ THIS] // 2 bytes encode the MSDOS last modified file time, ignore // 2 bytes encode the MSDOS last modified file date, ignore // *** 4 bytes encode the CRC32 of the uncompressed data [READ THIS] // *** 4 bytes encode the compressed size [READ THIS] // *** 4 bytes encode the uncompressed size [READ THIS] // *** 2 bytes encode the length of the file name [READ THIS] // *** 2 bytes encode the length of the extras, needed to skip the bytes later [READ THIS] // *** 2 bytes encode the length of the comment, needed to skip the bytes later [READ THIS] // 2 bytes encode the disk number, ignore // 2 bytes encode the internal file attributes, ignore // 4 bytes encode the external file attributes, ignore // *** 4 bytes encode the offset of the local section entry, where the data is [READ THIS] // n bytes encode the file name // n bytes encode the extras // n bytes encode the comment if (((int) read32BitUnsigned(in)) != CENTRAL_DIRECTORY_ENTRY_SIGNATURE) { throw new ZipException("Bad central directory header");}skipOrDie(in, 2 + 2); // Skip version stuff int generalPurposeFlags = read16BitUnsigned(in); int compressionMethod = read16BitUnsigned(in);skipOrDie(in, 2 + 2); // Skip MSDOS junk long crc32OfUncompressedData = read32BitUnsigned(in); long compressedSize = read32BitUnsigned(in); long uncompressedSize = read32BitUnsigned(in); int fileNameLength = read16BitUnsigned(in); int extrasLength = read16BitUnsigned(in); int commentLength = read16BitUnsigned(in);skipOrDie(in, 2 + 2 + 4); // Skip the disk number and file attributes long fileOffsetOfLocalEntry = read32BitUnsigned(in); byte[] fileNameBuffer = new byte[fileNameLength];readOrDie(in, fileNameBuffer, 0, fileNameBuffer.length);skipOrDie(in, extrasLength + commentLength); // General purpose flag bit 11 is an important hint for the character set used for file names. boolean generalPurposeFlagBit11 = (generalPurposeFlags & (0x1 << 10)) != 0; return new MinimalZipEntry(compressionMethod,crc32OfUncompressedData,compressedSize,uncompressedSize,fileNameBuffer,generalPurposeFlagBit11,fileOffsetOfLocalEntry);} |
主要解析出如下的數據:
- 壓縮方法
- crc32校驗碼
- 壓縮前大小
- 壓縮后大小
- 文件名
- 通用標記位
- local entry偏移位置offset
返回了一個list,里面有n個MinimalZipEntry結構,經過按offset升序排序后,再遍歷list,解析其在local entry中的真實數據的偏移,其解析代碼如下:
| 123456789101112131415161718192021222324252627 | public static long parseLocalEntryAndGetCompressedDataOffset(InputStream in) throws IOException { // *** 4 bytes encode the LOCAL_ENTRY_SIGNATURE, verify for sanity // 2 bytes encode the version-needed-to-extract, ignore // 2 bytes encode the general-purpose flags, ignore // 2 bytes encode the compression method, ignore (redundant with central directory) // 2 bytes encode the MSDOS last modified file time, ignore // 2 bytes encode the MSDOS last modified file date, ignore // 4 bytes encode the CRC32 of the uncompressed data, ignore (redundant with central directory) // 4 bytes encode the compressed size, ignore (redundant with central directory) // 4 bytes encode the uncompressed size, ignore (redundant with central directory) // *** 2 bytes encode the length of the file name, needed to skip the bytes later [READ THIS] // *** 2 bytes encode the length of the extras, needed to skip the bytes later [READ THIS] // The rest is the data, which is the main attraction here. if (((int) read32BitUnsigned(in)) != LOCAL_ENTRY_SIGNATURE) { throw new ZipException("Bad local entry header");} int junkLength = 2 + 2 + 2 + 2 + 2 + 4 + 4 + 4;skipOrDie(in, junkLength); // Skip everything up to the length of the file name final int fileNameLength = read16BitUnsigned(in); final int extrasLength = read16BitUnsigned(in); // The file name is already known and will match the central directory, so no need to read it. // The extra field length can be different here versus in the central directory and is used for // things like zipaligning APKs. This single value is the critical part as it dictates where the // actual DATA for the entry begins. return 4 + junkLength + 2 + 2 + fileNameLength + extrasLength;} |
很簡單,跳過了locat entry的真實數據前面的所有字節,獲得偏移。
至此Zip文件解析完成。
差量文件的生成
實現代碼主要在FileByFileV1DeltaGenerator中,代碼如下:
| 123456789101112131415161718192021222324252627282930313233 | public void generateDelta(File oldFile, File newFile, OutputStream patchOut) throws IOException, InterruptedException { try (TempFileHolder deltaFriendlyOldFile = new TempFileHolder();TempFileHolder deltaFriendlyNewFile = new TempFileHolder();TempFileHolder deltaFile = new TempFileHolder();FileOutputStream deltaFileOut = new FileOutputStream(deltaFile.file);BufferedOutputStream bufferedDeltaOut = new BufferedOutputStream(deltaFileOut)) {PreDiffExecutor.Builder builder = new PreDiffExecutor.Builder().readingOriginalFiles(oldFile, newFile).writingDeltaFriendlyFiles(deltaFriendlyOldFile.file, deltaFriendlyNewFile.file); for (RecommendationModifier modifier : recommendationModifiers) {builder.withRecommendationModifier(modifier);}PreDiffExecutor executor = builder.build();PreDiffPlan preDiffPlan = executor.prepareForDiffing();DeltaGenerator deltaGenerator = getDeltaGenerator();deltaGenerator.generateDelta(deltaFriendlyOldFile.file, deltaFriendlyNewFile.file, bufferedDeltaOut);bufferedDeltaOut.close();PatchWriter patchWriter = new PatchWriter(preDiffPlan,deltaFriendlyOldFile.file.length(),deltaFriendlyNewFile.file.length(),deltaFile.file);patchWriter.writeV1Patch(patchOut);}}protected DeltaGenerator getDeltaGenerator() { return new BsDiffDeltaGenerator();} |
干了如下幾件事:
- 生成了三個臨時文件,分別用于存儲舊文件的差量友好文件,新文件的差量友好文件,差量文件,這三個文件會在jvm退出時自動刪除。
- 調用PreDiffExecutor的prepareForDiffing生成PreDiffPlan對象,該函數做了很多很多十分復雜的事情,后面細說
- 應用BsDiff差量算法生成差量文件
- 生成patch文件,patch文件格式后面細說。
現在來看下PreDiffExecutor的prepareForDiffing函數:
| 123456789101112131415 | public PreDiffPlan prepareForDiffing() throws IOException {PreDiffPlan preDiffPlan = generatePreDiffPlan();List<TypedRange<JreDeflateParameters>> deltaFriendlyNewFileRecompressionPlan = null; if (deltaFriendlyOldFile != null) { // Builder.writingDeltaFriendlyFiles() ensures old and new are non-null when called, so a // check on either is sufficient.deltaFriendlyNewFileRecompressionPlan =Collections.unmodifiableList(generateDeltaFriendlyFiles(preDiffPlan));} return new PreDiffPlan(preDiffPlan.getQualifiedRecommendations(),preDiffPlan.getOldFileUncompressionPlan(),preDiffPlan.getNewFileUncompressionPlan(),deltaFriendlyNewFileRecompressionPlan);} |
干了下面幾件事:
- 調用generatePreDiffPlan函數,生成一個PreDiffPlan對象,這個函數后面細說
- 根據返回的PreDiffPlan對象,調用generateDeltaFriendlyFiles函數生成差量友好文件,這個函數后面細說
- 創建一個PreDiffPlan對象,將相關參數傳入,分別是建議列表,舊文件需要被解壓的列表,新聞需要被解壓的列表,還有生成的新文件的差量友好相關的列表
現在來看看generatePreDiffPlan函數:
| 12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394 | private PreDiffPlan generatePreDiffPlan() throws IOException {Map<ByteArrayHolder, MinimalZipEntry> originalOldArchiveZipEntriesByPath = new HashMap<ByteArrayHolder, MinimalZipEntry>();Map<ByteArrayHolder, MinimalZipEntry> originalNewArchiveZipEntriesByPath = new HashMap<ByteArrayHolder, MinimalZipEntry>();Map<ByteArrayHolder, JreDeflateParameters> originalNewArchiveJreDeflateParametersByPath = new HashMap<ByteArrayHolder, JreDeflateParameters>(); for (MinimalZipEntry zipEntry : MinimalZipArchive.listEntries(originalOldFile)) {ByteArrayHolder key = new ByteArrayHolder(zipEntry.getFileNameBytes());originalOldArchiveZipEntriesByPath.put(key, zipEntry);}DefaultDeflateCompressionDiviner diviner = new DefaultDeflateCompressionDiviner(); for (DivinationResult divinationResult : diviner.divineDeflateParameters(originalNewFile)) {ByteArrayHolder key = new ByteArrayHolder(divinationResult.minimalZipEntry.getFileNameBytes());originalNewArchiveZipEntriesByPath.put(key, divinationResult.minimalZipEntry);originalNewArchiveJreDeflateParametersByPath.put(key, divinationResult.divinedParameters);}PreDiffPlanner preDiffPlanner = new PreDiffPlanner(originalOldFile,originalOldArchiveZipEntriesByPath,originalNewFile,originalNewArchiveZipEntriesByPath,originalNewArchiveJreDeflateParametersByPath,recommendationModifiers.toArray(new RecommendationModifier[] {})); return preDiffPlanner.generatePreDiffPlan();}public List<DivinationResult> divineDeflateParameters(File archiveFile) throws IOException {List<DivinationResult> results = new ArrayList<>(); for (MinimalZipEntry minimalZipEntry : MinimalZipArchive.listEntries(archiveFile)) {JreDeflateParameters divinedParameters = null; if (minimalZipEntry.isDeflateCompressed()) { // TODO(pasc): Reuse streams to avoid churning file descriptorsMultiViewInputStreamFactory isFactory = new RandomAccessFileInputStreamFactory(archiveFile,minimalZipEntry.getFileOffsetOfCompressedData(),minimalZipEntry.getCompressedSize()); // Keep small entries in memory to avoid unnecessary file I/O. if (minimalZipEntry.getCompressedSize() < (100 * 1024)) { try (InputStream is = isFactory.newStream()) { byte[] compressedBytes = new byte[(int) minimalZipEntry.getCompressedSize()];is.read(compressedBytes);divinedParameters =divineDeflateParameters(new ByteArrayInputStreamFactory(compressedBytes));} catch (Exception ignore) {divinedParameters = null;}} else {divinedParameters = divineDeflateParameters(isFactory);}}results.add(new DivinationResult(minimalZipEntry, divinedParameters));} return results;}public JreDeflateParameters divineDeflateParameters( MultiViewInputStreamFactory compressedDataInputStreamFactory) throws IOException { byte[] copyBuffer = new byte[32 * 1024]; // Iterate over all relevant combinations of nowrap, strategy and level. for (boolean nowrap : new boolean[] {true, false}) {Inflater inflater = new Inflater(nowrap);Deflater deflater = new Deflater(0, nowrap);strategy_loop: for (int strategy : new int[] {0, 1, 2}) {deflater.setStrategy(strategy); for (int level : LEVELS_BY_STRATEGY.get(strategy)) {deflater.setLevel(level);inflater.reset();deflater.reset(); try { if (matches(inflater, deflater, compressedDataInputStreamFactory, copyBuffer)) {end(inflater, deflater); return JreDeflateParameters.of(level, strategy, nowrap);}} catch (ZipException e) { // Parse error in input. The only possibilities are corruption or the wrong nowrap. // Skip all remaining levels and strategies. break strategy_loop;}}}end(inflater, deflater);} return null;} |
generatePreDiffPlan做的事情是生成三個map對象。
- 第一個map對象是持有舊文件的相關數據。key為Zip Entry的文件名對應的字節數組的holder類ByteArrayHolder,value為MinimalZipEntry。
- 第二個map對象的持有新文件的相關數據。key為Zip Entry的文件名對應的字節數組的holder類ByteArrayHolder,value為MinimalZipEntry。
- 第三個map數據就是持有推測出來的新文件的Zip Entry的壓縮級別,策略,是否是nowrap三個數據。key為Zip Entry的文件名對應的字節數組的holder類ByteArrayHolder,value為JreDeflateParameters
對于前兩個數據調用前面解析過的Zip文件結構相關函數,返回MinimalZipEntry的List類型,key就來自MinimalZipEntry.getFileNameBytes(),而值就是其本身。
而第三個數據來的比較艱辛,需要經過推測,推測的方法很暴力,三層for循環,將壓縮的數據解壓縮,再利用三個參數的排列組合,即level,strategy,nowrap排列,進行重新壓縮,壓縮后的數據如果等于從Zip中解析出來的壓縮數據,則得到對應的level,strategy,nowrap值。這三個值的承載方式就是JreDeflateParameters。
利用這三個map構建了一個PreDiffPlanner對象,調用該對象的generatePreDiffPlan方法返回PreDiffPlan,其代碼如下:
| 12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879 | PreDiffPlan generatePreDiffPlan() throws IOException {List<QualifiedRecommendation> recommendations = getDefaultRecommendations(); for (RecommendationModifier modifier : recommendationModifiers) { // Allow changing the recommendations base on arbitrary criteria.recommendations = modifier.getModifiedRecommendations(oldFile, newFile, recommendations);} // Process recommendations to extract ranges for decompression & recompressionSet<TypedRange<Void>> oldFilePlan = new HashSet<>();Set<TypedRange<JreDeflateParameters>> newFilePlan = new HashSet<>(); for (QualifiedRecommendation recommendation : recommendations) { if (recommendation.getRecommendation().uncompressOldEntry) { long offset = recommendation.getOldEntry().getFileOffsetOfCompressedData(); long length = recommendation.getOldEntry().getCompressedSize();TypedRange<Void> range = new TypedRange<Void>(offset, length, null);oldFilePlan.add(range);} if (recommendation.getRecommendation().uncompressNewEntry) { long offset = recommendation.getNewEntry().getFileOffsetOfCompressedData(); long length = recommendation.getNewEntry().getCompressedSize();JreDeflateParameters newJreDeflateParameters =newArchiveJreDeflateParametersByPath.get( new ByteArrayHolder(recommendation.getNewEntry().getFileNameBytes()));TypedRange<JreDeflateParameters> range = new TypedRange<JreDeflateParameters>(offset, length, newJreDeflateParameters);newFilePlan.add(range);}}List<TypedRange<Void>> oldFilePlanList = new ArrayList<>(oldFilePlan);Collections.sort(oldFilePlanList);List<TypedRange<JreDeflateParameters>> newFilePlanList = new ArrayList<>(newFilePlan);Collections.sort(newFilePlanList); return new PreDiffPlan(Collections.unmodifiableList(recommendations),Collections.unmodifiableList(oldFilePlanList),Collections.unmodifiableList(newFilePlanList));}private List<QualifiedRecommendation> getDefaultRecommendations() throws IOException {List<QualifiedRecommendation> recommendations = new ArrayList<>(); // This will be used to find files that have been renamed, but not modified. This is relatively // cheap to construct as it just requires indexing all entries by the uncompressed CRC32, and // the CRC32 is already available in the ZIP headers.SimilarityFinder trivialRenameFinder = new Crc32SimilarityFinder(oldFile, oldArchiveZipEntriesByPath.values()); // Iterate over every pair of entries and get a recommendation for what to do. for (Map.Entry<ByteArrayHolder, MinimalZipEntry> newEntry :newArchiveZipEntriesByPath.entrySet()) {ByteArrayHolder newEntryPath = newEntry.getKey();MinimalZipEntry oldZipEntry = oldArchiveZipEntriesByPath.get(newEntryPath); if (oldZipEntry == null) { // The path is only present in the new archive, not in the old archive. Try to find a // similar file in the old archive that can serve as a diff base for the new file.List<MinimalZipEntry> identicalEntriesInOldArchive =trivialRenameFinder.findSimilarFiles(newFile, newEntry.getValue()); if (!identicalEntriesInOldArchive.isEmpty()) { // An identical file exists in the old archive at a different path. Use it for the // recommendation and carry on with the normal logic. // All entries in the returned list are identical, so just pick the first one. // NB, in principle it would be optimal to select the file that required the least work // to apply the patch - in practice, it is unlikely that an archive will contain multiple // copies of the same file that are compressed differently, so don't bother with that // degenerate case.oldZipEntry = identicalEntriesInOldArchive.get(0);}} // If the attempt to find a suitable diff base for the new entry has failed, oldZipEntry is // null (nothing to do in that case). Otherwise, there is an old entry that is relevant, so // get a recommendation for what to do. if (oldZipEntry != null) {recommendations.add(getRecommendation(oldZipEntry, newEntry.getValue()));}} return recommendations;} |
該函數主要生成兩個List對象,分別是:
- 舊文件的建議解壓的Zip Entry的壓縮數據偏移位置和數據長度,承載的載體是TypedRange,泛型是Void,所有相關文件組成一個List對象
- 新文件的建議解壓的Zip Entry的壓縮數據的偏移位置和數據長度,承載的載體是TypedRange,泛型是JreDeflateParameters,泛型參數對應的值來自上一步解析出來的第三個map,所有相關件組成一個List對象
上面兩個List對象各自按偏移升序排序。
上面提到建議解壓的Zip Entry,那么這個數據是怎么來的呢?來自下面這個函數
| 12345678910111213141516171819202122232425262728293031323334353637383940 | private List<QualifiedRecommendation> getDefaultRecommendations() throws IOException {List<QualifiedRecommendation> recommendations = new ArrayList<>(); // This will be used to find files that have been renamed, but not modified. This is relatively // cheap to construct as it just requires indexing all entries by the uncompressed CRC32, and // the CRC32 is already available in the ZIP headers.SimilarityFinder trivialRenameFinder = new Crc32SimilarityFinder(oldFile, oldArchiveZipEntriesByPath.values()); // Iterate over every pair of entries and get a recommendation for what to do. for (Map.Entry<ByteArrayHolder, MinimalZipEntry> newEntry :newArchiveZipEntriesByPath.entrySet()) {ByteArrayHolder newEntryPath = newEntry.getKey();MinimalZipEntry oldZipEntry = oldArchiveZipEntriesByPath.get(newEntryPath); if (oldZipEntry == null) { // The path is only present in the new archive, not in the old archive. Try to find a // similar file in the old archive that can serve as a diff base for the new file.List<MinimalZipEntry> identicalEntriesInOldArchive =trivialRenameFinder.findSimilarFiles(newFile, newEntry.getValue()); if (!identicalEntriesInOldArchive.isEmpty()) { // An identical file exists in the old archive at a different path. Use it for the // recommendation and carry on with the normal logic. // All entries in the returned list are identical, so just pick the first one. // NB, in principle it would be optimal to select the file that required the least work // to apply the patch - in practice, it is unlikely that an archive will contain multiple // copies of the same file that are compressed differently, so don't bother with that // degenerate case.oldZipEntry = identicalEntriesInOldArchive.get(0);}} // If the attempt to find a suitable diff base for the new entry has failed, oldZipEntry is // null (nothing to do in that case). Otherwise, there is an old entry that is relevant, so // get a recommendation for what to do. if (oldZipEntry != null) {recommendations.add(getRecommendation(oldZipEntry, newEntry.getValue()));}} return recommendations;} |
這個函數主要做如下工作:
- 創建一個相似文件查找器,內部使用Map進行查找,key為crc32,值為舊文件的MinimalZipEntry,且是一個List,因為crc32相同的文件可能有多個。
- 遍歷新文件MinimalZipEntry的List對象,查看對應名字在舊文件中是否存在,如果不存在,則通過第一步的相似文件查找器查找crc32相同的文件,如果找到了,取List對象的第一個。如果找不到,則表示這個文件被移除了,不需要管它。
- 通過舊Entry和新Entry調用getRecommendation函數返回QualifiedRecommendation對象,add到List對象中;該對象持有了新舊Entry,以及新舊文件是否被解壓等相關信息。
- 返回找到的QualifiedRecommendation列表
QualifiedRecommendation的生成算法是什么呢,它是調用getRecommendation返回的,該函數代碼如下:
| 1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768 | private QualifiedRecommendation getRecommendation(MinimalZipEntry oldEntry, MinimalZipEntry newEntry) throws IOException { // Reject anything that is unsuitable for uncompressed diffing. // Reason singled out in order to monitor unsupported versions of zlib. if (unsuitableDeflate(newEntry)) { return new QualifiedRecommendation(oldEntry,newEntry,Recommendation.UNCOMPRESS_NEITHER,RecommendationReason.DEFLATE_UNSUITABLE);} // Reject anything that is unsuitable for uncompressed diffing. if (unsuitable(oldEntry, newEntry)) { return new QualifiedRecommendation(oldEntry,newEntry,Recommendation.UNCOMPRESS_NEITHER,RecommendationReason.UNSUITABLE);} // If both entries are already uncompressed there is nothing to do. if (bothEntriesUncompressed(oldEntry, newEntry)) { return new QualifiedRecommendation(oldEntry,newEntry,Recommendation.UNCOMPRESS_NEITHER,RecommendationReason.BOTH_ENTRIES_UNCOMPRESSED);} // The following are now true: // 1. At least one of the entries is compressed. // 1. The old entry is either uncompressed, or is compressed with deflate. // 2. The new entry is either uncompressed, or is reproducibly compressed with deflate. if (uncompressedChangedToCompressed(oldEntry, newEntry)) { return new QualifiedRecommendation(oldEntry,newEntry,Recommendation.UNCOMPRESS_NEW,RecommendationReason.UNCOMPRESSED_CHANGED_TO_COMPRESSED);} if (compressedChangedToUncompressed(oldEntry, newEntry)) { return new QualifiedRecommendation(oldEntry,newEntry,Recommendation.UNCOMPRESS_OLD,RecommendationReason.COMPRESSED_CHANGED_TO_UNCOMPRESSED);} // At this point, both entries must be compressed with deflate. if (compressedBytesChanged(oldEntry, newEntry)) { return new QualifiedRecommendation(oldEntry,newEntry,Recommendation.UNCOMPRESS_BOTH,RecommendationReason.COMPRESSED_BYTES_CHANGED);} // If the compressed bytes have not changed, there is no need to do anything. return new QualifiedRecommendation(oldEntry,newEntry,Recommendation.UNCOMPRESS_NEITHER,RecommendationReason.COMPRESSED_BYTES_IDENTICAL);} |
主要有7種類型:
- 該文件被壓縮過,但是無法推測出其JreDeflateParamyaseters參數,也就是無法獲得其壓縮級別,編碼策略,nowrap三個參數,沒有了這三個參數,我們就無法重新進行壓縮,因此,對于這種情況,返回的是不建議解壓,原因是找不到合適的deflate參數還原壓縮數據
- 舊文件,或新文件被壓縮了,但是是不支持的壓縮算法,則返回不建議解壓縮,原因是使用了不支持的壓縮算法
- 如果新舊文件都沒有被壓縮,則返回不需要解壓,原因是都沒有被壓縮
- 如果舊文件未壓縮,新文件已壓縮,則返回新文件需要解壓,原因是從未壓縮文件變成了已壓縮文件
- 如果舊文件已壓縮,新文件未壓縮,則返回舊文件需要解壓,原因是從已壓縮文件變成了未壓縮文件
- 如果新舊文件都已經壓縮,且發生了變化,則返回需要解壓新舊文件,原因是文件發生改變
- 沒有新舊文件沒有發生變化,則返回不需要解壓新舊文件,原因是文件未發生改變
有了以上信息,再來看看差量友好的文件是怎么生成的:
| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384 | private List<TypedRange<JreDeflateParameters>> generateDeltaFriendlyFiles(PreDiffPlan preDiffPlan) throws IOException { try (FileOutputStream out = new FileOutputStream(deltaFriendlyOldFile);BufferedOutputStream bufferedOut = new BufferedOutputStream(out)) {DeltaFriendlyFile.generateDeltaFriendlyFile(preDiffPlan.getOldFileUncompressionPlan(), originalOldFile, bufferedOut);} try (FileOutputStream out = new FileOutputStream(deltaFriendlyNewFile);BufferedOutputStream bufferedOut = new BufferedOutputStream(out)) { return DeltaFriendlyFile.generateDeltaFriendlyFile(preDiffPlan.getNewFileUncompressionPlan(), originalNewFile, bufferedOut);}}public static <T> List<TypedRange<T>> generateDeltaFriendlyFile(List<TypedRange<T>> rangesToUncompress, File file, OutputStream deltaFriendlyOut) throws IOException { return generateDeltaFriendlyFile(rangesToUncompress, file, deltaFriendlyOut, true, DEFAULT_COPY_BUFFER_SIZE);} public static <T> List<TypedRange<T>> generateDeltaFriendlyFile(List<TypedRange<T>> rangesToUncompress,File file,OutputStream deltaFriendlyOut, boolean generateInverse, int copyBufferSize) throws IOException {List<TypedRange<T>> inverseRanges = null; if (generateInverse) {inverseRanges = new ArrayList<TypedRange<T>>(rangesToUncompress.size());} long lastReadOffset = 0;RandomAccessFileInputStream oldFileRafis = null;PartiallyUncompressingPipe filteredOut = new PartiallyUncompressingPipe(deltaFriendlyOut, copyBufferSize); try {oldFileRafis = new RandomAccessFileInputStream(file); for (TypedRange<T> rangeToUncompress : rangesToUncompress) { long gap = rangeToUncompress.getOffset() - lastReadOffset; if (gap > 0) { // Copy bytes up to the range start pointoldFileRafis.setRange(lastReadOffset, gap);filteredOut.pipe(oldFileRafis, PartiallyUncompressingPipe.Mode.COPY);} // Now uncompress the range.oldFileRafis.setRange(rangeToUncompress.getOffset(), rangeToUncompress.getLength()); long inverseRangeStart = filteredOut.getNumBytesWritten(); // TODO(andrewhayden): Support nowrap=false here? Never encountered in practice. // This would involve catching the ZipException, checking if numBytesWritten is still zero, // resetting the stream and trying again.filteredOut.pipe(oldFileRafis, PartiallyUncompressingPipe.Mode.UNCOMPRESS_NOWRAP);lastReadOffset = rangeToUncompress.getOffset() + rangeToUncompress.getLength(); if (generateInverse) { long inverseRangeEnd = filteredOut.getNumBytesWritten(); long inverseRangeLength = inverseRangeEnd - inverseRangeStart;TypedRange<T> inverseRange = new TypedRange<T>(inverseRangeStart, inverseRangeLength, rangeToUncompress.getMetadata());inverseRanges.add(inverseRange);}} // Finish the final bytes of the file long bytesLeft = oldFileRafis.length() - lastReadOffset; if (bytesLeft > 0) {oldFileRafis.setRange(lastReadOffset, bytesLeft);filteredOut.pipe(oldFileRafis, PartiallyUncompressingPipe.Mode.COPY);}} finally { try {oldFileRafis.close();} catch (Exception ignored) { // Nothing} try {filteredOut.close();} catch (Exception ignored) { // Nothing}} return inverseRanges;} |
這個函數比較巧妙,也比較復雜,其過程如下:
- 遍歷需要解壓的列表,獲得其偏移,將該偏移減去上次讀的偏移位置lastReadOffset,得到一個gap值,這個值使用COPY直接拷貝子杰數組
- 然后將數據定位到[offset,offset+length]之間,獲得已經解壓寫入的所有數據大小,賦值給inverseRangeStart,然后將壓縮數據使用對應的參數進行解壓,將上次讀的偏移位置lastReadOffset設置為當前的offset+length值。
- 判斷generateInverse是否為true,這里這個值永遠為true,因為入參傳了true。獲得已經解壓寫入的所有數據大小,賦值給inverseRangeEnd,使用inverseRangeEnd減去inverseRangeStart就是解壓之后的大小,構建TypedRange對象,add到list中
- 所有數據遍歷完之后,判斷當前讀的位置到文件結尾是否還有數據剩余,如果有,則繼續寫入
- 返回TypedRange的List對象。
這個過程比較復雜抽象,用一張圖來說明整個文件解壓過程。
上圖是zip文件,綠色的gap是一些描述信息,紅色的表示真實的壓縮數據,藍色的表示文件末尾遺留的數據,對于gap,執行拷貝操作,對于壓縮數據,執行解壓操作,并返回解壓之后真實的偏移offset和解壓之后真實數據的大小length,所有數據遍歷完之后,文件末尾還有一部分遺留數據,對其執行拷貝操作。
特別注意返回的TypedRange是新文件的解壓之后的offset和length,這個數據十分重要,還原zip文件就靠這個數據了。
有了新舊文件的差量友好文件之后做什么呢,很簡單,使用BsDiff生成差量文件,然后將差量文件寫入patch文件。patch文件的格式如下:
| 0 | 8 | Versioned Identifier | 頭部標記,固定值”GFbFv1_0”,UTF-8字符串 |
| 8 | 4 | Flags (currently unused, but reserved) | 標記未,預留 |
| 12 | 8 | Delta-friendly old archive size | 舊文件差量友好文件大小,64位無符號整型 |
| 20 | 4 | Num old archive uncompression ops | 舊文件待解壓文件個數,32位無符號整型 |
| 24 | i | Old archive uncompression op 1…n | 舊文件待解壓文件的偏移和大小,總共n個 |
| 24+i | 4 | Num new archive recompression ops | 新文件待壓縮文件個數,32位無符號整型 |
| 24+i+4 | j | New archive recompression op 1…n | 新文件待壓縮文件的偏移和大小,總共n個 |
| 24+i+4+j | 4 | Num delta descriptor records | 新文件差量描述個數,32位無符號整型 |
| 24+i+4+j+4 | k | Delta descriptor record 1…n | 差量算法描述記錄,總共n個 |
| 24+i+4+j+4+k | l | Delta 1…n | 差量算法描述 |
Old Archive Uncompression Op的數據結構如下
| 8 | Offset of first byte to uncompress | 待解壓的偏移位置,64位無符號整型 |
| 8 | Number of bytes to uncompress | 待解壓的字節個數,64位無符號整型 |
New Archive Recompression Op的數據結構如下
| 8 | Offset of first byte to compress | 待壓縮的偏移位置,64位無符號整型 |
| 8 | Number of bytes to compress | 待壓縮的字節個數,64位無符號整型 |
| 4 | Compression settings | 壓縮參數,即壓縮級別,編碼策略,nowrap |
Compression Settings的數據結構如下
| 1 | Compatibility window ID | 兼容窗口,當前取值為0,即默認兼容窗口 |
| 1 | Deflate level | 壓縮級別,取值[1,9] |
| 1 | Deflate strategy | 編碼策略,取值[0,2] |
| 1 | Wrap mode | 取值0=wrap,1=nowrap |
Compatibility Window即兼容窗口,其默認的兼容窗口ID取值為0,默認兼容窗口使用如下配置
- 使用deflate算法進行壓縮(zlib)
- 32768個字節的buffer大小
- 已經被驗證的壓縮級別,1-9
- 已經被驗證過的編碼策略,0-2
- 已經被驗證過的wrap模式,wrap和nowrap
默認兼容窗口可以兼容Android4.0之后的系統。
這個兼容窗口是怎么得到的呢,其中有一個類叫DefaultDeflateCompatibilityWindow,可以調用getIncompatibleValues獲得其不兼容的參數列表JreDeflateParameters(壓縮級別,編碼策略,nowrap的承載體),內部通過排列組合這三個參數,對一段內容進行壓縮,產生壓縮后的數據的16進制的編碼,與內置的預期數據進行對比,如果相同則表示兼容,不相同表示不兼容。
這里有一個問題,官方表示可以兼容壓縮級別1-9,編碼策略0-2,wrap和nowrap,但是實際我測試下來,發現在pc上有一部分組合是不兼容的,大概約4個組合。Android上沒有測試過,不知道是否有這個問題。
Delta Descriptor Record用于描述差量算法,在當前的V1版Patch中,只有BsDiff算法,因此只有一條該數據結構,其數據結構如下:
| 1 | Delta format ID | 差量算法對應的枚舉id,bsdiff取值0 |
| 8 | Old delta-friendly region start | 舊文件差量算法應用的偏移位置 |
| 8 | Old delta-friendly region length | 舊文件差量算法應用的長度 |
| 8 | New delta-friendly region start | 新文件差量算法應用的偏移位置 |
| 8 | New delta-friendly region length | 新文件差量算法應用的長度 |
| 8 | Delta length | 生成的差量文件的長度 |
生成patch文件的函數是writeV1Patch,其代碼如下:
| 12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758 | public void writeV1Patch(OutputStream out) throws IOException { // Use DataOutputStream for ease of writing. This is deliberately left open, as closing it would // close the output stream that was passed in and that is not part of the method's documented // behavior. ("resource")DataOutputStream dataOut = new DataOutputStream(out);dataOut.write(PatchConstants.IDENTIFIER.getBytes("US-ASCII"));//GFbFv1_0dataOut.writeInt(0); // Flags (reserved)dataOut.writeLong(deltaFriendlyOldFileSize); // Write out all the delta-friendly old file uncompression instructionsdataOut.writeInt(plan.getOldFileUncompressionPlan().size()); for (TypedRange<Void> range : plan.getOldFileUncompressionPlan()) {dataOut.writeLong(range.getOffset());dataOut.writeLong(range.getLength());} // Write out all the delta-friendly new file recompression instructionsdataOut.writeInt(plan.getDeltaFriendlyNewFileRecompressionPlan().size()); for (TypedRange<JreDeflateParameters> range : plan.getDeltaFriendlyNewFileRecompressionPlan()) {dataOut.writeLong(range.getOffset());dataOut.writeLong(range.getLength()); // Write the deflate informationdataOut.write(PatchConstants.CompatibilityWindowId.DEFAULT_DEFLATE.patchValue);dataOut.write(range.getMetadata().level);dataOut.write(range.getMetadata().strategy);dataOut.write(range.getMetadata().nowrap ? 1 : 0);} // Now the delta section // First write the number of deltas present in the patch. In v1, there is always exactly one // delta, and it is for the entire input; in future versions there may be multiple deltas, of // arbitrary types.dataOut.writeInt(1); // In v1 the delta format is always bsdiff, so write it unconditionally.dataOut.write(PatchConstants.DeltaFormat.BSDIFF.patchValue); // Write the working ranges. In v1 these are always the entire contents of the delta-friendly // old file and the delta-friendly new file. These are for forward compatibility with future // versions that may allow deltas of arbitrary formats to be mapped to arbitrary ranges.dataOut.writeLong(0); // i.e., start of the working range in the delta-friendly old filedataOut.writeLong(deltaFriendlyOldFileSize); // i.e., length of the working range in olddataOut.writeLong(0); // i.e., start of the working range in the delta-friendly new filedataOut.writeLong(deltaFriendlyNewFileSize); // i.e., length of the working range in new // Finally, the length of the delta and the delta itself.dataOut.writeLong(deltaFile.length()); try (FileInputStream deltaFileIn = new FileInputStream(deltaFile);BufferedInputStream deltaIn = new BufferedInputStream(deltaFileIn)) { byte[] buffer = new byte[32768]; int numRead = 0; while ((numRead = deltaIn.read(buffer)) >= 0) {dataOut.write(buffer, 0, numRead);}}dataOut.flush();} |
主要做了如下幾步:
- 寫入文件頭,”GFbFv1_0”
- 寫入標記位,預留,值為0
- 寫入舊文件差量友好文件的大小
- 寫入舊文件需要解壓的entry個數
- 依次寫入舊文件n個待解壓的entry的偏移和長度
- 寫入新文件需要壓縮的entry的個數
- 依次寫入新文件n個待壓縮的entry的偏移和長度,兼容窗口(窗口id,壓縮級別,壓縮策略,nowrap)
- 寫入差量算法描述個數,只使用了bsdiff,因此值為1
- 寫入差量算法id,舊文件差量友好文件應用差量算法的偏移和長度,新文件差量友好文件應用差量算法的偏移和長度
- 寫入patch文件的大小
- 寫入bsdiff生成的patch文件內容
新文件的合成
合成主要通過com.google.archivepatcher.applier.FileByFileV1DeltaApplier的applyDelta,最終會調用到applyDeltaInternal方法,其代碼如下:
| 1234567891011121314151617181920212223242526 | private void applyDeltaInternal( File oldBlob, File deltaFriendlyOldBlob, InputStream deltaIn, OutputStream newBlobOut) throws IOException { // First, read the patch plan from the patch stream.PatchReader patchReader = new PatchReader();PatchApplyPlan plan = patchReader.readPatchApplyPlan(deltaIn);writeDeltaFriendlyOldBlob(plan, oldBlob, deltaFriendlyOldBlob); // Apply the delta. In v1 there is always exactly one delta descriptor, it is bsdiff, and it // takes up the rest of the patch stream - so there is no need to examine the list of // DeltaDescriptors in the patch at all. long deltaLength = plan.getDeltaDescriptors().get(0).getDeltaLength();DeltaApplier deltaApplier = getDeltaApplier(); // Don't close this stream, as it is just a limiting wrapper. ("resource")LimitedInputStream limitedDeltaIn = new LimitedInputStream(deltaIn, deltaLength); // Don't close this stream, as it would close the underlying OutputStream (that we don't own). ("resource")PartiallyCompressingOutputStream recompressingNewBlobOut = new PartiallyCompressingOutputStream(plan.getDeltaFriendlyNewFileRecompressionPlan(),newBlobOut,DEFAULT_COPY_BUFFER_SIZE);deltaApplier.applyDelta(deltaFriendlyOldBlob, limitedDeltaIn, recompressingNewBlobOut);recompressingNewBlobOut.flush();} |
主要做了如下幾件事:
- 解析patch文件生成PatchApplyPlan對象
- 生成舊文件的差量友好文件
- 應用合成算法,合成新文件的差量友好文件,于此同時新文件zip包在流的寫入過程中完成合成。
對于第一步,來看看如何解析的,解析代碼如下:
| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111 | public PatchApplyPlan readPatchApplyPlan(InputStream in) throws IOException { // Use DataOutputStream for ease of writing. This is deliberately left open, as closing it would // close the output stream that was passed in and that is not part of the method's documented // behavior. ("resource")DataInputStream dataIn = new DataInputStream(in); // Read header and flags. byte[] expectedIdentifier = PatchConstants.IDENTIFIER.getBytes("US-ASCII"); byte[] actualIdentifier = new byte[expectedIdentifier.length];dataIn.readFully(actualIdentifier); if (!Arrays.equals(expectedIdentifier, actualIdentifier)) { throw new PatchFormatException("Bad identifier");}dataIn.skip(4); // Flags (ignored in v1) long deltaFriendlyOldFileSize = checkNonNegative(dataIn.readLong(), "delta-friendly old file size"); // Read old file uncompression instructions. int numOldFileUncompressionInstructions = (int) checkNonNegative(dataIn.readInt(), "old file uncompression instruction count");List<TypedRange<Void>> oldFileUncompressionPlan = new ArrayList<TypedRange<Void>>(numOldFileUncompressionInstructions); long lastReadOffset = -1; for (int x = 0; x < numOldFileUncompressionInstructions; x++) { long offset = checkNonNegative(dataIn.readLong(), "old file uncompression range offset"); long length = checkNonNegative(dataIn.readLong(), "old file uncompression range length"); if (offset < lastReadOffset) { throw new PatchFormatException("old file uncompression ranges out of order or overlapping");}TypedRange<Void> range = new TypedRange<Void>(offset, length, null);oldFileUncompressionPlan.add(range);lastReadOffset = offset + length; // To check that the next range starts after the current one} // Read new file recompression instructions int numDeltaFriendlyNewFileRecompressionInstructions = dataIn.readInt();checkNonNegative(numDeltaFriendlyNewFileRecompressionInstructions, "delta-friendly new file recompression instruction count");List<TypedRange<JreDeflateParameters>> deltaFriendlyNewFileRecompressionPlan = new ArrayList<TypedRange<JreDeflateParameters>>(numDeltaFriendlyNewFileRecompressionInstructions);lastReadOffset = -1; for (int x = 0; x < numDeltaFriendlyNewFileRecompressionInstructions; x++) { long offset = checkNonNegative(dataIn.readLong(), "delta-friendly new file recompression range offset"); long length = checkNonNegative(dataIn.readLong(), "delta-friendly new file recompression range length"); if (offset < lastReadOffset) { throw new PatchFormatException( "delta-friendly new file recompression ranges out of order or overlapping");}lastReadOffset = offset + length; // To check that the next range starts after the current one // Read the JreDeflateParameters // Note that v1 only supports the default deflate compatibility window.checkRange(dataIn.readByte(),PatchConstants.CompatibilityWindowId.DEFAULT_DEFLATE.patchValue,PatchConstants.CompatibilityWindowId.DEFAULT_DEFLATE.patchValue, "compatibility window id"); int level = (int) checkRange(dataIn.readUnsignedByte(), 1, 9, "recompression level"); int strategy = (int) checkRange(dataIn.readUnsignedByte(), 0, 2, "recompression strategy"); int nowrapInt = (int) checkRange(dataIn.readUnsignedByte(), 0, 1, "recompression nowrap");TypedRange<JreDeflateParameters> range = new TypedRange<JreDeflateParameters>(offset,length,JreDeflateParameters.of(level, strategy, nowrapInt == 0 ? false : true));deltaFriendlyNewFileRecompressionPlan.add(range);} // Read the delta metadata, but stop before the first byte of the actual delta. // V1 has exactly one delta and it must be bsdiff. int numDeltaRecords = (int) checkRange(dataIn.readInt(), 1, 1, "num delta records");List<DeltaDescriptor> deltaDescriptors = new ArrayList<DeltaDescriptor>(numDeltaRecords); for (int x = 0; x < numDeltaRecords; x++) { byte deltaFormatByte = (byte)checkRange(dataIn.readByte(),PatchConstants.DeltaFormat.BSDIFF.patchValue,PatchConstants.DeltaFormat.BSDIFF.patchValue, "delta format"); long deltaFriendlyOldFileWorkRangeOffset = checkNonNegative(dataIn.readLong(), "delta-friendly old file work range offset"); long deltaFriendlyOldFileWorkRangeLength = checkNonNegative(dataIn.readLong(), "delta-friendly old file work range length"); long deltaFriendlyNewFileWorkRangeOffset = checkNonNegative(dataIn.readLong(), "delta-friendly new file work range offset"); long deltaFriendlyNewFileWorkRangeLength = checkNonNegative(dataIn.readLong(), "delta-friendly new file work range length"); long deltaLength = checkNonNegative(dataIn.readLong(), "delta length");DeltaDescriptor descriptor = new DeltaDescriptor(PatchConstants.DeltaFormat.fromPatchValue(deltaFormatByte), new TypedRange<Void>(deltaFriendlyOldFileWorkRangeOffset, deltaFriendlyOldFileWorkRangeLength, null), new TypedRange<Void>(deltaFriendlyNewFileWorkRangeOffset, deltaFriendlyNewFileWorkRangeLength, null),deltaLength);deltaDescriptors.add(descriptor);} return new PatchApplyPlan(Collections.unmodifiableList(oldFileUncompressionPlan),deltaFriendlyOldFileSize,Collections.unmodifiableList(deltaFriendlyNewFileRecompressionPlan),Collections.unmodifiableList(deltaDescriptors));} |
分為以下幾個步驟:
- 讀文件頭,校驗文件頭
- 忽略4個字節的標記位
- 讀舊文件差量友好文件的大小,并校驗,非負數
- 讀舊文件待解壓的個數,并校驗,非負數
- 讀n個舊文件待解壓的偏移,長度,并校驗,非負數
- 讀新文件待壓縮的個數,并校驗,非負數
- 讀n個新文件待壓縮的偏移,長度,并校驗,非負數,壓縮級別,編碼策略,nowrap值
- 讀差量算法個數
- 讀n個差量算法描述。差量算法id,舊文件應用差量算法的偏移和長度,新文件應用差量算法的偏移和長度,生成的差量文件的大小
- 返回PatchApplyPlan對象
接下來就是根據返回的PatchApplyPlan對象,獲得舊文件待解壓的一個TypedRange的List對象,然后使用DeltaFriendlyFile.generateDeltaFriendlyFile生產差量友好文件,這個過程和生產patch的那個過程一樣,不重復描述。其代碼如下:
| 1234567891011121314151617181920 | private void writeDeltaFriendlyOldBlob( PatchApplyPlan plan, File oldBlob, File deltaFriendlyOldBlob) throws IOException {RandomAccessFileOutputStream deltaFriendlyOldFileOut = null; try {deltaFriendlyOldFileOut = new RandomAccessFileOutputStream(deltaFriendlyOldBlob, plan.getDeltaFriendlyOldFileSize());DeltaFriendlyFile.generateDeltaFriendlyFile(plan.getOldFileUncompressionPlan(),oldBlob,deltaFriendlyOldFileOut, false,DEFAULT_COPY_BUFFER_SIZE);} finally { try {deltaFriendlyOldFileOut.close();} catch (Exception ignored) { // Nothing}} |
接下里就是合成新文件了,使用BsPatch算法完成合成并寫入Outputstream中,而這個OutputStream經過裝飾者模式包裝,最終傳入的是PartiallyCompressingOutputStream輸出流,構建PartiallyCompressingOutputStream對象所需參數就是新文件差量友好文件需要重新壓縮的數據的TypedRange的List對象。最終,合成Zip文件的工作會輾轉到PartiallyCompressingOutputStream中的writeChunk函數,其代碼如下:
| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778 | private int writeChunk(byte[] buffer, int offset, int length) throws IOException { if (bytesTillCompressionStarts() == 0 && !currentlyCompressing()) { // Compression will begin immediately.JreDeflateParameters parameters = nextCompressedRange.getMetadata(); if (deflater == null) {deflater = new Deflater(parameters.level, parameters.nowrap);} else if (lastDeflateParameters.nowrap != parameters.nowrap) { // Last deflater must be destroyed because nowrap settings do not match.deflater.end();deflater = new Deflater(parameters.level, parameters.nowrap);} // Deflater will already have been reset at the end of this method, no need to do it again. // Just set up the right parameters.deflater.setLevel(parameters.level);deflater.setStrategy(parameters.strategy);deflaterOut = new DeflaterOutputStream(normalOut, deflater, compressionBufferSize);} int numBytesToWrite;OutputStream writeTarget; if (currentlyCompressing()) { // Don't write past the end of the compressed range.numBytesToWrite = (int) Math.min(length, bytesTillCompressionEnds());writeTarget = deflaterOut;} else {writeTarget = normalOut; if (nextCompressedRange == null) { // All compression ranges have been consumed.numBytesToWrite = length;} else { // Don't write past the point where the next compressed range begins.numBytesToWrite = (int) Math.min(length, bytesTillCompressionStarts());}}writeTarget.write(buffer, offset, numBytesToWrite);numBytesWritten += numBytesToWrite; if (currentlyCompressing() && bytesTillCompressionEnds() == 0) { // Compression range complete. Finish the output and set up for the next run.deflaterOut.finish();deflaterOut.flush();deflaterOut = null;deflater.reset();lastDeflateParameters = nextCompressedRange.getMetadata(); if (rangeIterator.hasNext()) { // More compression ranges await in the future.nextCompressedRange = rangeIterator.next();} else { // All compression ranges have been consumed.nextCompressedRange = null;deflater.end();deflater = null;}} return numBytesToWrite;} private boolean currentlyCompressing() { return deflaterOut != null;} private long bytesTillCompressionStarts() { if (nextCompressedRange == null) { // All compression ranges have been consumed return -1L;} return nextCompressedRange.getOffset() - numBytesWritten;} private long bytesTillCompressionEnds() { if (nextCompressedRange == null) { // All compression ranges have been consumed return -1L;} return (nextCompressedRange.getOffset() + nextCompressedRange.getLength()) - numBytesWritten;} |
合成算法的核心就是這個函數了,這個函數設計的十分巧妙,建議打個斷點跑一跑,好好理解一下。這里簡單介紹一下這個過程。
- 在PartiallyCompressingOutputStream的構造函數中,獲得了compressionRanges的第一個數據
- 判斷寫入的數據距離下一個壓縮數據開始如果是0,且當前并不在壓縮,則獲得壓縮設置,即壓縮級別,編碼策略,nowrap,并進行設置。并包裝輸出流為壓縮流。
- 如果當前正在壓縮,則判斷當前寫入的數據長度和待壓縮的數據長度,取其中小的一個,設置目標輸出流為壓縮流,即負責壓縮工作,而不是拷貝工作。
- 如果當前不在壓縮,如果沒有下一個壓縮數據了,則直接寫入對應長度的數據,如果還有下一個壓縮數據,則取當前寫入數據的長度和距離下一個壓縮數據的偏移位置的長度,取其中小的一個,設置目標輸出流為正常流,即進行拷貝工作,而不是壓縮工作
- 判斷當前是否正在壓縮,并且當前節點所有壓縮數據都已經寫入完全,執行壓縮流的finish和flush操作,重置壓縮相關配置項,并移動待壓縮的數據到下一條記錄。
- 重復以上操作,直到所有數據寫入完全。
過程比較復雜,同樣的用一張圖來表示:
合成的新的差量友好的文件數據如上圖表示。
當遇到綠色的gap區域時,則執行二進制拷貝操作,將其拷貝到輸出流去,當遇到紅色的已經解壓的數據,會使用對應的壓縮級別,編碼策略,nowrap參數將數據進行壓縮,將藍色的剩余數據寫入目標數據。
就這樣整個合成操作就完成了。
這樣為何能完成zip文件的合成呢,上面的解析已經很清楚了,其實Google Archive Patch記錄了所有新文件中需要重新壓縮的數據的參數,對于這些數據,使用這些參數壓縮,得到對應的壓縮數據,寫入其在新文件的真實位置,而對于Zip文件中的其他數據,則執行的是拷貝操作,這樣兩種操作合起來,最終就產生了新的Zip文件。且對于Apk來說,我們也無需關心其簽名。
這個過程真是巧妙,感嘆一下 !
Patch文件的壓縮和解壓
Google Archive Patch不對patch文件進行壓縮,壓縮工作需要自己進行,保證patch文件的大小很小,而客戶端接受到patch后需要對應的解壓。這么做,保證了壓縮patch算法的充分自由,可自行選擇,方便擴展。
通用的差量生成和合成框架
這幾天簡單的實現了一個通用的差量生成和合成框架,github地址見?CorePatch?,目前已經實現bsdiff和Google Archive Patch以及全量合成(直接拷貝文件)
優化
生成差量文件優化
生成差量文件使用的是bsdiff,但是對應的基礎文件經過解壓之后,其文件大小大大變大,導致生成差量文件的時間大大增加,這里沒有辦法優化,唯一的優化點就是使用其他更優差量生成算法,而不是BsDiff算法。
合成新文件優化
合成使用BsPatch進行合成,這個過程是十分快的,因此這里可以不優化,但是需要優化的點是合成新的Zip文件過程,即上面提到的writeChunk函數,而這個函數,唯一的耗時點就是壓縮操作,基本上壓縮操作耗時占全部耗時的80%-90%左右,所以這里基本沒什么優化點。
總結
Google Archive Patch的核心是生成差量友好文件,應用差量算法,記錄新文件差量友好文件中需要重新壓縮的偏移和長度,應用合成算法合成新文件時,對于需要重新壓縮的數據,用patch中的壓縮相關的參數進行壓縮,得到壓縮數據,而對于非壓縮數據,如Zip文件格式中其他數據,則執行拷貝操作。最終完美的合成了新文件,這種方式的優點是patch比基于文件級別的bsdiff生成的要小,缺點是生成時間長,合成時間長。
該算法核心的一個基本要求就是使用相同的壓縮級別,編碼策略和nowrap參數,對相同的數據進行壓縮,得到的數據數據。如果這個前提如果不滿足,則該算法就沒有意義了。
http://fucknmb.com/2017/10/05/Google-Archive-Patch-%E6%BA%90%E7%A0%81%E8%A7%A3%E6%9E%90/ http://fucknmb.com/2017/10/05/Google-Archive-Patch-%E6%BA%90%E7%A0%81%E8%A7%A3%E6%9E%90/總結
以上是生活随笔為你收集整理的Google Archive Patch 源码解析的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 又掌握了一项新技能 - 断点调试 Gra
- 下一篇: Android application