C语言字符串压缩之ZSTD算法怎么使用
這篇“C語言字符串壓縮之ZSTD算法怎么使用”文章的知識點大部分人都不太理解,所以小編給大家總結(jié)了以下內(nèi)容,內(nèi)容詳細,步驟清晰,具有一定的借鑒價值,希望大家閱讀完這篇文章能有所收獲,下面我們一起來看看這篇“C語言字符串壓縮之ZSTD算法怎么使用”文章吧。
前言
字符串壓縮,我們通常的需求有幾個,一是高壓縮率,二是壓縮速率高,三是解壓速率高。不過高壓縮率與高壓縮速率是魚和熊掌的關系,不可皆得,優(yōu)秀的算法一般也是采用壓縮率與性能折中的方案。從壓縮率、壓縮速率、解壓速率考慮,zstd與lz4有較好的壓縮與解壓性能,最終選取zstd與lz4進行調(diào)研。
zstd是facebook開源的提供高壓縮比的快速壓縮算法,很想了解一下它在壓縮與解壓方面的實際表現(xiàn)。
一、zstd壓縮與解壓
ZSTD_compress屬于ZSTD的Simple API范疇,只有壓縮級別可以設置。
ZSTD_compress函數(shù)原型如下:
size_tZSTD_compress(void* dst, size_t dstCapacity, const void* src, size_t srcSize, int compressionLevel)
ZSTD_decompress函數(shù)原型如下:
size_t ZSTD_decompress( void* dst, size_t dstCapacity, const void* src, size_t compressedSize); 我們先來看看zstd的壓縮與解壓縮示例。
#include<stdio.h>
#include<string.h>
#include<sys/time.h>
#include<malloc.h>
#include<zstd.h>
#include<iostream>
usingnamespacestd;
intmain()
{
//compress
size_tcom_space_size;
size_tpeppa_pig_text_size;
char*com_ptr=NULL;
charpeppa_pig_buf[2048]="Narrator:Itisrainingtoday.So,PeppaandGeorgecannotplayoutside.Peppa:Daddy,it'sstoppedraining.Canwegoouttoplay?Daddy:Alright,runalongyoutwo.Narrator:Peppalovesjumpinginmuddypuddles.Peppa:Ilovemuddypuddles.Mummy:Peppa.Ifyoujumpinginmuddypuddles,youmustwearyourboots.Peppa:Sorry,Mummy.Narrator:Georgelikestojumpinmuddypuddles,too.Peppa:George.Ifyoujumpinmuddypuddles,youmustwearyourboots.Narrator:Peppalikestolookafterherlittlebrother,George.Peppa:George,let'sfindsomemorepuddles.Narrator:PeppaandGeorgearehavingalotoffun.Peppahasfoundalttlepuddle.Georgehasfoundabigpuddle.Peppa:Look,George.There'sareallybigpuddle.Narrator:Georgewantstojumpintothebigpuddlefirst.Peppa:Stop,George.|mustcheckifit'ssafeforyou.Good.Itissafeforyou.Sorry,George.It'sonlymud.Narrator:PeppaandGeorgelovejumpinginmuddypuddles.Peppa:Comeon,George.Let'sgoandshowDaddy.Daddy:Goodnessme.Peppa:Daddy.Daddy.Guesswhatwe'vebeendoing.Daddy:Letmethink...Haveyoubeenwatchingtelevision?Peppa:No.No.Daddy.Daddy:Haveyoujusthadabath?Peppa:No.No.Daddy:|know.You'vebeenjumpinginmuddypuddles.Peppa:Yes.Yes.Daddy.We'vebeenjumpinginmuddypuddles.Daddy:Ho.Ho.Andlookatthemessyou'rein.Peppa:Oooh....Daddy:Oh,well,it'sonlymud.Let'scleanupquicklybeforeMummyseesthemess.Peppa:Daddy,whenwe'vecleanedup,willyouandMummyComeandplay,too?Daddy:Yes,wecanallplayinthegarden.Narrator:PeppaandGeorgearewearingtheirboots.MummyandDaddyarewearingtheirboots.Peppalovesjumpingupanddowninmuddypuddles.Everyonelovesjumpingupanddowninmuddypuddles.Mummy:Oh,Daddypig,lookatthemessyou'rein..Peppa:It'sonlymud.";
peppa_pig_text_size=strlen(peppa_pig_buf);
com_space_size=ZSTD_compressBound(peppa_pig_text_size);
com_ptr=(char*)malloc(com_space_size);
if(NULL==com_ptr){
cout<<"compressmallocfailed"<<endl;
return-1;
}
size_tcom_size;
com_size=ZSTD_compress(com_ptr,com_space_size,peppa_pig_buf,peppa_pig_text_size,ZSTD_fast);
cout<<"peppapigtextsize:"<<peppa_pig_text_size<<endl;
cout<<"compresstextsize:"<<com_size<<endl;
cout<<"compressratio:"<<(float)peppa_pig_text_size/(float)com_size<<endl<<endl;
//decompress
char*decom_ptr=NULL;
unsignedlonglongdecom_buf_size;
decom_buf_size=ZSTD_getFrameContentSize(com_ptr,com_size);
decom_ptr=(char*)malloc((size_t)decom_buf_size);
if(NULL==decom_ptr){
cout<<"decompressmallocfailed"<<endl;
return-1;
}
size_tdecom_size;
decom_size=ZSTD_decompress(decom_ptr,decom_buf_size,com_ptr,com_size);
cout<<"decompresstextsize:"<<decom_size<<endl;
if(strncmp(peppa_pig_buf,decom_ptr,peppa_pig_text_size)){
cout<<"decompresstextisnotequalpeppapigtext"<<endl;
}
free(com_ptr);
free(decom_ptr);
return0;
}
執(zhí)行結(jié)果:
從結(jié)果可以發(fā)現(xiàn),壓縮之前的peppa pig文本長度為1827,壓縮后的文本長度為759,壓縮率為2.4,解壓后的長度與壓縮前相等。
另外,上文提到可以調(diào)整ZSTD_compress函數(shù)的壓縮級別,zstd的默認級別為ZSTD_CLEVEL_DEFAULT = 3,最小值為0,最大值為ZSTD_MAX_CLEVEL = 22。另外也提供一些策略設置,例如ZSTD_fast, ZSTD_greedy, ZSTD_lazy, ZSTD_lazy2, ZSTD_btlazy2。壓縮級別越高,壓縮率越高,但是壓縮速率越低。
二、ZSTD壓縮與解壓性能探索
上面探索了zstd的基礎壓縮與解壓方法,接下來再摸索一下zstd的壓縮與解壓縮性能。
測試方法是,使用ZSTD_compress連續(xù)壓縮同一段文本并持續(xù)10秒,最后得到每一秒的平均壓縮速率。測試壓縮性能的代碼示例如下:
#include<stdio.h>
#include<string.h>
#include<sys/time.h>
#include<malloc.h>
#include<zstd.h>
#include<iostream>
usingnamespacestd;
intmain()
{
intcnt=0;
size_tcom_size;
size_tcom_space_size;
size_tpeppa_pig_text_size;
char*com_ptr=NULL;
charpeppa_pig_buf[2048]="Narrator:Itisrainingtoday.So,PeppaandGeorgecannotplayoutside.Peppa:Daddy,it'sstoppedraining.Canwegoouttoplay?Daddy:Alright,runalongyoutwo.Narrator:Peppalovesjumpinginmuddypuddles.Peppa:Ilovemuddypuddles.Mummy:Peppa.Ifyoujumpinginmuddypuddles,youmustwearyourboots.Peppa:Sorry,Mummy.Narrator:Georgelikestojumpinmuddypuddles,too.Peppa:George.Ifyoujumpinmuddypuddles,youmustwearyourboots.Narrator:Peppalikestolookafterherlittlebrother,George.Peppa:George,let'sfindsomemorepuddles.Narrator:PeppaandGeorgearehavingalotoffun.Peppahasfoundalttlepuddle.Georgehasfoundabigpuddle.Peppa:Look,George.There'sareallybigpuddle.Narrator:Georgewantstojumpintothebigpuddlefirst.Peppa:Stop,George.|mustcheckifit'ssafeforyou.Good.Itissafeforyou.Sorry,George.It'sonlymud.Narrator:PeppaandGeorgelovejumpinginmuddypuddles.Peppa:Comeon,George.Let'sgoandshowDaddy.Daddy:Goodnessme.Peppa:Daddy.Daddy.Guesswhatwe'vebeendoing.Daddy:Letmethink...Haveyoubeenwatchingtelevision?Peppa:No.No.Daddy.Daddy:Haveyoujusthadabath?Peppa:No.No.Daddy:|know.You'vebeenjumpinginmuddypuddles.Peppa:Yes.Yes.Daddy.We'vebeenjumpinginmuddypuddles.Daddy:Ho.Ho.Andlookatthemessyou'rein.Peppa:Oooh....Daddy:Oh,well,it'sonlymud.Let'scleanupquicklybeforeMummyseesthemess.Peppa:Daddy,whenwe'vecleanedup,willyouandMummyComeandplay,too?Daddy:Yes,wecanallplayinthegarden.Narrator:PeppaandGeorgearewearingtheirboots.MummyandDaddyarewearingtheirboots.Peppalovesjumpingupanddowninmuddypuddles.Everyonelovesjumpingupanddowninmuddypuddles.Mummy:Oh,Daddypig,lookatthemessyou'rein..Peppa:It'sonlymud.";
timevalst,et;
peppa_pig_text_size=strlen(peppa_pig_buf);
com_space_size=ZSTD_compressBound(peppa_pig_text_size);
gettimeofday(&st,NULL);
while(1){
com_ptr=(char*)malloc(com_space_size);
com_size=ZSTD_compress(com_ptr,com_space_size,peppa_pig_buf,peppa_pig_text_size,ZSTD_fast);
free(com_ptr);
cnt++;
gettimeofday(&et,NULL);
if(et.tv_sec-st.tv_sec>=10){
break;
}
}
cout<<"compresspersecond:"<<cnt/10<<"times"<<endl;
return0;
}
執(zhí)行結(jié)果:
結(jié)果顯示ZSTD的壓縮性能大概在每秒6-7萬次左右,這個結(jié)果其實并不是太理想。需要說明的是壓縮性能與待壓縮文本的長度、字符內(nèi)容也是有關系的。
我們再來探索一下ZSTD的解壓縮性能。與上面的測試方法類似,先對本文進行壓縮,然后連續(xù)解壓同一段被壓縮過的數(shù)據(jù)并持續(xù)10秒,最后得到每一秒的平均解壓速率。測試解壓性能的代碼示例如下:
#include<stdio.h>
#include<string.h>
#include<sys/time.h>
#include<malloc.h>
#include<zstd.h>
#include<iostream>
usingnamespacestd;
intmain()
{
intcnt=0;
size_tcom_size;
size_tcom_space_size;
size_tpeppa_pig_text_size;
timevalst,et;
char*com_ptr=NULL;
charpeppa_pig_buf[2048]="Narrator:Itisrainingtoday.So,PeppaandGeorgecannotplayoutside.Peppa:Daddy,it'sstoppedraining.Canwegoouttoplay?Daddy:Alright,runalongyoutwo.Narrator:Peppalovesjumpinginmuddypuddles.Peppa:Ilovemuddypuddles.Mummy:Peppa.Ifyoujumpinginmuddypuddles,youmustwearyourboots.Peppa:Sorry,Mummy.Narrator:Georgelikestojumpinmuddypuddles,too.Peppa:George.Ifyoujumpinmuddypuddles,youmustwearyourboots.Narrator:Peppalikestolookafterherlittlebrother,George.Peppa:George,let'sfindsomemorepuddles.Narrator:PeppaandGeorgearehavingalotoffun.Peppahasfoundalttlepuddle.Georgehasfoundabigpuddle.Peppa:Look,George.There'sareallybigpuddle.Narrator:Georgewantstojumpintothebigpuddlefirst.Peppa:Stop,George.|mustcheckifit'ssafeforyou.Good.Itissafeforyou.Sorry,George.It'sonlymud.Narrator:PeppaandGeorgelovejumpinginmuddypuddles.Peppa:Comeon,George.Let'sgoandshowDaddy.Daddy:Goodnessme.Peppa:Daddy.Daddy.Guesswhatwe'vebeendoing.Daddy:Letmethink...Haveyoubeenwatchingtelevision?Peppa:No.No.Daddy.Daddy:Haveyoujusthadabath?Peppa:No.No.Daddy:|know.You'vebeenjumpinginmuddypuddles.Peppa:Yes.Yes.Daddy.We'vebeenjumpinginmuddypuddles.Daddy:Ho.Ho.Andlookatthemessyou'rein.Peppa:Oooh....Daddy:Oh,well,it'sonlymud.Let'scleanupquicklybeforeMummyseesthemess.Peppa:Daddy,whenwe'vecleanedup,willyouandMummyComeandplay,too?Daddy:Yes,wecanallplayinthegarden.Narrator:PeppaandGeorgearewearingtheirboots.MummyandDaddyarewearingtheirboots.Peppalovesjumpingupanddowninmuddypuddles.Everyonelovesjumpingupanddowninmuddypuddles.Mummy:Oh,Daddypig,lookatthemessyou'rein..Peppa:It'sonlymud.";
size_tdecom_size;
char*decom_ptr=NULL;
unsignedlonglongdecom_buf_size;
peppa_pig_text_size=strlen(peppa_pig_buf);
com_space_size=ZSTD_compressBound(peppa_pig_text_size);
com_ptr=(char*)malloc(com_space_size);
com_size=ZSTD_compress(com_ptr,com_space_size,peppa_pig_buf,peppa_pig_text_size,1);
gettimeofday(&st,NULL);
decom_buf_size=ZSTD_getFrameContentSize(com_ptr,com_size);
while(1){
decom_ptr=(char*)malloc((size_t)decom_buf_size);
decom_size=ZSTD_decompress(decom_ptr,decom_buf_size,com_ptr,com_size);
if(decom_size!=peppa_pig_text_size){
cout<<"decompresserror"<<endl;
break;
}
free(decom_ptr);
cnt++;
gettimeofday(&et,NULL);
if(et.tv_sec-st.tv_sec>=10){
break;
}
}
cout<<"decompresspersecond:"<<cnt/10<<"times"<<endl;
free(com_ptr);
return0;
}
執(zhí)行結(jié)果:
結(jié)果顯示ZSTD的解壓縮性能大概在每秒12萬次左右,解壓性能比壓縮性能高。
三、zstd的高級用法
zstd提供了一個名為PZSTD的壓縮和解壓工具。PZSTD(parallel zstd),并行壓縮的zstd,是一個使用多線程對待壓縮文本進行切片分段,且進行并行壓縮的命令行工具。
其實高版本(v1.4.0及以上)的zstd也提供了指定多線程對文本進行并行壓縮的相關API接口,也就是本小節(jié)要介紹的zstd高級API用法。下面我們再來探索一下zstd的多線程壓縮使用方法。
多線程并行壓縮的兩個關鍵API,一個是參數(shù)設置API,另一個是壓縮API。
參數(shù)設置API的原型是:
size_t ZSTD_CCtx_setParameter(ZSTD_CCtx* cctx, ZSTD_cParameter param, int value)
壓縮API的原型是:
size_t ZSTD_compress2(ZSTD_CCtx* cctx, void* dst, size_t dstCapacity, const void* src, size_t srcSize)
下面給出zstd并行壓縮的示例demo,通過ZSTD_CCtx_setParameter設置線程數(shù)為3,即指定宏ZSTD_c_nbWorkers為3,通過ZSTD_compress2壓縮相關文本。另外,為了展示zstd確實使用了多線程,需要先讀取一個非常大的文件,作為zstd的壓縮文本源,盡量使zstd運行較長時間。
#include<stdio.h>
#include<string.h>
#include<sys/time.h>
#include<malloc.h>
#include<zstd.h>
#include<iostream>
usingnamespacestd;
intmain()
{
size_tcom_size;
size_tcom_space_size;
FILE*fp=NULL;
unsignedintfile_len;
char*com_ptr=NULL;
char*file_text_ptr=NULL;
fp=fopen("xxxxxx","r");
if(NULL==fp){
cout<<"fileopenfailed"<<endl;
return-1;
}
fseek(fp,0,SEEK_END);
file_len=ftell(fp);
fseek(fp,0,SEEK_SET);
cout<<"filelength:"<<file_len<<endl;
//mallocspaceforfilecontent
file_text_ptr=(char*)malloc(file_len);
if(NULL==file_text_ptr){
cout<<"mallocfailed"<<endl;
return-1;
}
//mallocspaceforcompressspace
com_space_size=ZSTD_compressBound(file_len);
com_ptr=(char*)malloc(com_space_size);
if(NULL==com_ptr){
cout<<"mallocfailed"<<endl;
return-1;
}
//readtextfromsourcefile
fread(file_text_ptr,1,file_len,fp);
fclose(fp);
ZSTD_CCtx*cctx;
cctx=ZSTD_createCCtx();
//setmulti-threadparameter
ZSTD_CCtx_setParameter(cctx,ZSTD_c_nbWorkers,3);
ZSTD_CCtx_setParameter(cctx,ZSTD_c_compressionLevel,ZSTD_btlazy2);
com_size=ZSTD_compress2(cctx,com_ptr,com_space_size,file_text_ptr,file_len);
free(com_ptr);
free(file_text_ptr);
return0;
}
運行上述demo,可見zstd確實啟動了3個線程對文本進行了并行壓縮。且設置的線程數(shù)越多,壓縮時間越短,這里就不詳細展示了,讀者可以自行實驗。
需要說明的是,zstd當前默認編譯單線程的庫文件,要實現(xiàn)多線程的API調(diào)用,需要在make的時候指定編譯參數(shù)ZSTD_MULTITHREAD。
另外,zstd還支持線程池的方式,線程池的函數(shù)原型:
POOL_ctx* ZSTD_createThreadPool(size_t numThreads)
線程池可以避免在多次、連續(xù)壓縮場景時頻繁的去創(chuàng)建線程、撤銷線程產(chǎn)生的非必要開銷,使得算力主要開銷在文本壓縮方面。
總結(jié)
以上是生活随笔為你收集整理的C语言字符串压缩之ZSTD算法怎么使用的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: python队列是线程安全的吗_pyth
- 下一篇: java项目设计_java项目设计