HOG(方向梯度直方图)
結(jié)合這周看的論文,我對(duì)這周研究的Histogram of oriented gradients(HOG)談?wù)勛约旱睦斫?#xff1a;
HOG descriptors 是應(yīng)用在計(jì)算機(jī)視覺(jué)和圖像處理領(lǐng)域,用于目標(biāo)檢測(cè)的特征描寫(xiě)敘述器。這項(xiàng)技術(shù)是用來(lái)計(jì)算局部圖像梯度的方向信息的統(tǒng)計(jì)值。這樣的方法跟邊緣方向直方圖(edge orientation histograms)、尺度不變特征變換(scale-invariant feature transform descriptors)以及形狀上下文方法( shape contexts)有非常多相似之處,但與它們的不同點(diǎn)是:HOG描寫(xiě)敘述器是在一個(gè)網(wǎng)格密集的大小統(tǒng)一的細(xì)胞單元(dense grid of uniformly spaced cells)上計(jì)算,并且為了提高性能,還採(cǎi)用了重疊的局部對(duì)照度歸一化(overlapping local contrast normalization)技術(shù)。
這篇文章的作者Navneet Dalal和Bill Triggs是法國(guó)國(guó)家計(jì)算機(jī)技術(shù)和控制研究所French National Institute for Research in Computer Science and Control (INRIA)的研究員。他們?cè)谶@篇文章中首次提出了HOG方法。這篇文章被發(fā)表在2005年的CVPR上。他們主要是將這樣的方法應(yīng)用在靜態(tài)圖像中的行人檢測(cè)上,但在后來(lái),他們也將其應(yīng)用在電影和視頻中的行人檢測(cè),以及靜態(tài)圖像中的車(chē)輛和常見(jiàn)動(dòng)物的檢測(cè)。
HOG描寫(xiě)敘述器最重要的思想是:在一副圖像中,局部目標(biāo)的表象和形狀(appearance and shape)能夠被梯度或邊緣的方向密度分布非常好地描寫(xiě)敘述。詳細(xì)的實(shí)現(xiàn)方法是:首先將圖像分成小的連通區(qū)域,我們把它叫細(xì)胞單元。然后採(cǎi)集細(xì)胞單元中各像素點(diǎn)的梯度的或邊緣的方向直方圖。最后把這些直方圖組合起來(lái)就能夠構(gòu)成特征描寫(xiě)敘述器。為了提高性能,我們還能夠把這些局部直方圖在圖像的更大的范圍內(nèi)(我們把它叫區(qū)間或block)進(jìn)行對(duì)照度歸一化(contrast-normalized),所採(cǎi)用的方法是:先計(jì)算各直方圖在這個(gè)區(qū)間(block)中的密度,然后依據(jù)這個(gè)密度對(duì)區(qū)間中的各個(gè)細(xì)胞單元做歸一化。通過(guò)這個(gè)歸一化后,能對(duì)光照變化和陰影獲得更好的效果。
與其它的特征描寫(xiě)敘述方法相比,HOG描寫(xiě)敘述器后非常多長(zhǎng)處。首先,因?yàn)镠OG方法是在圖像的局部細(xì)胞單元上操作,所以它對(duì)圖像幾何的(geometric)和光學(xué)的(photometric)形變都能保持非常好的不變性,這兩種形變僅僅會(huì)出如今更大的空間領(lǐng)域上。其次,作者通過(guò)實(shí)驗(yàn)發(fā)現(xiàn),在粗的空域抽樣(coarse spatial sampling)、精細(xì)的方向抽樣(fine orientation sampling)以及較強(qiáng)的局部光學(xué)歸一化(strong local photometric normalization)等條件下,僅僅要行人大體上可以保持直立的姿勢(shì),就容許行人有一些細(xì)微的肢體動(dòng)作,這些細(xì)微的動(dòng)作可以被忽略而不影響檢測(cè)效果。綜上所述,HOG方法是特別適合于做圖像中的行人檢測(cè)的。
?
上圖是作者做的行人檢測(cè)試驗(yàn),當(dāng)中(a)表示全部訓(xùn)練圖像集的平均梯度(average gradient across their training images);(b)和(c)分別表示:圖像中每個(gè)區(qū)間(block)上的最大最大正、負(fù)SVM權(quán)值;(d)表示一副測(cè)試圖像;(e)計(jì)算完R-HOG后的測(cè)試圖像;(f)和(g)分別表示被正、負(fù)SVM權(quán)值加權(quán)后的R-HOG圖像。
算法的實(shí)現(xiàn):
色彩和伽馬歸一化(color and gamma normalization)
作者分別在灰度空間、RGB色彩空間和LAB色彩空間上對(duì)圖像進(jìn)行色彩和伽馬歸一化,但實(shí)驗(yàn)結(jié)果顯示,這個(gè)歸一化的預(yù)處理工作對(duì)最后的結(jié)果沒(méi)有影響,原因可能是:在興許步驟中也有歸一化的過(guò)程,那些過(guò)程能夠代替這個(gè)預(yù)處理的歸一化。所以,在實(shí)際應(yīng)用中,這一步能夠省略。
梯度的計(jì)算(Gradient computation)
最經(jīng)常使用的方法是:簡(jiǎn)單地使用一個(gè)一維的離散微分模板(1-D centered, point discrete derivative mask)在一個(gè)方向上或者同一時(shí)候在水平和垂直兩個(gè)方向上對(duì)圖像進(jìn)行處理,更確切地說(shuō),這種方法須要使用以下的濾波器核濾除圖像中的色彩或變化劇烈的數(shù)據(jù)(color or intensity data)
?
作者也嘗試了其它一些更復(fù)雜的模板,如3×3 Sobel 模板,或?qū)蔷€(xiàn)模板(diagonal masks),可是在這個(gè)行人檢測(cè)的實(shí)驗(yàn)中,這些復(fù)雜模板的表現(xiàn)都較差,所以作者的結(jié)論是:模板越簡(jiǎn)單,效果反而越好。作者也嘗試了在使用微分模板前增加一個(gè)高斯平滑濾波,可是這個(gè)高斯平滑濾波的增加使得檢測(cè)效果更差,原因是:很多實(shí)用的圖像信息是來(lái)自變化劇烈的邊緣,而在計(jì)算梯度之前增加高斯濾波會(huì)把這些邊緣濾除掉。
構(gòu)建方向的直方圖(creating the orientation histograms)
第三步就是為圖像的每個(gè)細(xì)胞單元構(gòu)建梯度方向直方圖。細(xì)胞單元中的每個(gè)像素點(diǎn)都為某個(gè)基于方向的直方圖通道(orientation-based histogram channel)投票。投票是採(cǎi)取加權(quán)投票(weighted voting)的方式,即每一票都是帶權(quán)值的,這個(gè)權(quán)值是依據(jù)該像素點(diǎn)的梯度幅度計(jì)算出來(lái)。能夠採(cǎi)用幅值本身或者它的函數(shù)來(lái)表示這個(gè)權(quán)值,實(shí)際測(cè)試表明:使用幅值來(lái)表示權(quán)值能獲得最佳的效果,當(dāng)然,也能夠選擇幅值的函數(shù)來(lái)表示,比方幅值的平方根(square root)、幅值的平方(square of the gradient magnitude)、幅值的截?cái)嘈问?#xff08;clipped version of the magnitude)等。細(xì)胞單元能夠是矩形的(rectangular),也能夠是星形的(radial)。直方圖通道是平均分布在0-1800(無(wú)向)或0-3600(有向)范圍內(nèi)。作者發(fā)現(xiàn),採(cǎi)用無(wú)向的梯度和9個(gè)直方圖通道,能在行人檢測(cè)試驗(yàn)中取得最佳的效果。
把細(xì)胞單元組合成大的區(qū)間(grouping the cells together into larger blocks)
因?yàn)榫植抗庹盏淖兓?#xff08;variations of illumination)以及前景-背景對(duì)照度(foreground-background contrast)的變化,使得梯度強(qiáng)度(gradient strengths)的變化范圍很大。這就須要對(duì)梯度強(qiáng)度做歸一化,作者採(cǎi)取的辦法是:把各個(gè)細(xì)胞單元組合成大的、空間上連通的區(qū)間(blocks)。這樣以來(lái),HOG描寫(xiě)敘述器就變成了由各區(qū)間全部細(xì)胞單元的直方圖成分所組成的一個(gè)向量。這些區(qū)間是互有重疊的,這就意味著:每一個(gè)細(xì)胞單元的輸出都多次作用于終于的描寫(xiě)敘述器。區(qū)間有兩個(gè)基本的幾何形狀——矩形區(qū)間(R-HOG)和環(huán)形區(qū)間(C-HOG)。R-HOG區(qū)間大體上是一些方形的格子,它能夠有三個(gè)參數(shù)來(lái)表征:每一個(gè)區(qū)間中細(xì)胞單元的數(shù)目、每一個(gè)細(xì)胞單元中像素點(diǎn)的數(shù)目、每一個(gè)細(xì)胞的直方圖通道數(shù)目。作者通過(guò)實(shí)驗(yàn)表明,行人檢測(cè)的最佳參數(shù)設(shè)置是:3×3細(xì)胞/區(qū)間、6×6像素/細(xì)胞、9個(gè)直方圖通道。作者還發(fā)現(xiàn),在對(duì)直方圖做處理之前,給每一個(gè)區(qū)間(block)加一個(gè)高斯空域窗體(Gaussian spatial window)是很必要的,由于這樣能夠減少邊緣的周?chē)袼攸c(diǎn)(pixels around the edge)的權(quán)重。
R-HOG跟SIFT描寫(xiě)敘述器看起來(lái)非常相似,但他們的不同之處是:R-HOG是在單一尺度下、密集的網(wǎng)格內(nèi)、沒(méi)有對(duì)方向排序的情況下被計(jì)算出來(lái)(are computed in dense grids at some single scale without orientation alignment);而SIFT描寫(xiě)敘述器是在多尺度下、稀疏的圖像關(guān)鍵點(diǎn)上、對(duì)方向排序的情況下被計(jì)算出來(lái)(are computed at sparse, scale-invariant key image points and are rotated to align orientation)。補(bǔ)充一點(diǎn),R-HOG是各區(qū)間被組合起來(lái)用于對(duì)空域信息進(jìn)行編碼(are used in conjunction to encode spatial form information),而SIFT的各描寫(xiě)敘述器是單獨(dú)使用的(are used singly)。
C-HOG區(qū)間(blocks)有兩種不同的形式,它們的差別在于:一個(gè)的中心細(xì)胞是完整的,一個(gè)的中心細(xì)胞是被切割的。如右圖所看到的:
作者發(fā)現(xiàn)C-HOG的這兩種形式都能取得同樣的效果。C-HOG區(qū)間(blocks)能夠用四個(gè)參數(shù)來(lái)表征:角度盒子的個(gè)數(shù)(number of angular bins)、半徑盒子個(gè)數(shù)(number of radial bins)、中心盒子的半徑(radius of the center bin)、半徑的伸展因子(expansion factor for the radius)。通過(guò)實(shí)驗(yàn),對(duì)于行人檢測(cè),最佳的參數(shù)設(shè)置為:4個(gè)角度盒子、2個(gè)半徑盒子、中心盒子半徑為4個(gè)像素、伸展因子為2。前面提到過(guò),對(duì)于R-HOG,中間加一個(gè)高斯空域窗體是非常有必要的,但對(duì)于C-HOG,這顯得沒(méi)有必要。C-HOG看起來(lái)非常像基于形狀上下文(Shape Contexts)的方法,但不同之處是:C-HOG的區(qū)間中包括的細(xì)胞單元有多個(gè)方向通道(orientation channels),而基于形狀上下文的方法只只用到了一個(gè)單一的邊緣存在數(shù)(edge presence count)。
區(qū)間歸一化(Block normalization)
作者採(cǎi)用了四中不同的方法對(duì)區(qū)間進(jìn)行歸一化,并對(duì)結(jié)果進(jìn)行了比較。引入v表示一個(gè)還沒(méi)有被歸一化的向量,它包括了給定區(qū)間(block)的全部直方圖信息。| | vk | |表示v的k階范數(shù),這里的k去1、2。用e表示一個(gè)非常小的常數(shù)。這時(shí),歸一化因子能夠表演示樣例如以下:
L2-norm:
L1-norm:
L1-sqrt:
還有第四種歸一化方式:L2-Hys,它能夠通過(guò)先進(jìn)行L2-norm,對(duì)結(jié)果進(jìn)行截短(clipping),然后再又一次歸一化得到。作者發(fā)現(xiàn):採(cǎi)用L2-Hys, L2-norm, 和 L1-sqrt方式所取得的效果是一樣的,L1-norm略微表現(xiàn)出一點(diǎn)點(diǎn)不可靠性。可是對(duì)于沒(méi)有被歸一化的數(shù)據(jù)來(lái)說(shuō),這四種方法都表現(xiàn)出來(lái)顯著的改進(jìn)。
SVM分類(lèi)器(SVM classifier)
最后一步就是把提取的HOG特征輸入到SVM分類(lèi)器中,尋找一個(gè)最優(yōu)超平面作為決策函數(shù)。作者採(cǎi)用的方法是:使用免費(fèi)的SVMLight軟件包加上HOG分類(lèi)器來(lái)尋找測(cè)試圖像中的行人。
zz from http://hi.baidu.com/ykaitao_handsome/blog/item/d7a2c3156e368a0a4b90a745.html
?
本文來(lái)自CSDN博客,轉(zhuǎn)載請(qǐng)標(biāo)明出處:http://blog.csdn.net/forsiny/archive/2010/03/22/5404268.aspx
posted @ 2011-06-01 13:51 我陪你面朝大海 閱讀(403) 評(píng)論(0) 編輯 (轉(zhuǎn))peopledetect學(xué)習(xí),來(lái)自opencv中文論壇OpenCV2.0提供了行人檢測(cè)的樣例,用的是法國(guó)人Navneet Dalal最早在CVPR2005會(huì)議上提出的方法。
近期正在學(xué)習(xí)它,以下是自己的學(xué)習(xí)體會(huì),希望共同探討提高。
1、VC 2008 Express下安裝OpenCV2.0--能夠直接使用2.1,不用使用CMake進(jìn)行編譯了,避免編譯出錯(cuò)
????? 這是一切工作的基礎(chǔ),感謝版主提供的參考:http://www.opencv.org.cn/index.php/VC_2008_Express????è£OpenCV2.0
2、體會(huì)該程序
在DOS界面,進(jìn)入例如以下路徑: C:\OpenCV2.0\samples\c peopledetect.exe filename.jpg
當(dāng)中filename.jpg為待檢測(cè)的文件名稱(chēng)
3、編譯程序
創(chuàng)建一個(gè)控制臺(tái)程序,從C:\OpenCV2.0\samples\c下將peopledetect.cpp增加到project中;按步驟1的方法進(jìn)行設(shè)置。編譯成功,可是在DEBUG模式下生成的EXE文件執(zhí)行出錯(cuò),非常奇怪 。
改成RELEASE模式后再次編譯,生成的EXE文件能夠執(zhí)行。
4程序代碼簡(jiǎn)要說(shuō)明
1) getDefaultPeopleDetector() 獲得3780維檢測(cè)算子(105 blocks with 4 histograms each and 9 bins per histogram there are 3,780 values)--(為什么是105blocks?)
2).cv::HOGDescriptor hog; 創(chuàng)建類(lèi)的對(duì)象 一系列變量初始化
winSize(64,128), blockSize(16,16), blockStride(8,8),
cellSize(8,8), nbins(9), derivAperture(1), winSigma(-1),
histogramNormType(L2Hys), L2HysThreshold(0.2), gammaCorrection(true)
3). 調(diào)用函數(shù):detectMultiScale(img, found, 0, cv::Size(8,8), cv::Size(24,16), 1.05, 2);
參數(shù)分別為待檢圖像、返回結(jié)果列表、門(mén)檻值hitThreshold、窗體步長(zhǎng)winStride、圖像padding margin、比例系數(shù)、門(mén)檻值groupThreshold;通過(guò)改動(dòng)參數(shù)發(fā)現(xiàn),就所用的某圖片,參數(shù)0改為0.01就檢測(cè)不到,改為0.001能夠;1.05改為1.1就不行,1.06能夠;2改為1能夠,0.8下面不行,(24,16)改成(0,0)也能夠,(32,32)也行
該函數(shù)內(nèi)容例如以下
(1) 得到層數(shù) levels
某圖片(530,402)為例,lg(402/128)/lg1.05=23.4 則得到層數(shù)為24
(2) 循環(huán)levels次,每次運(yùn)行內(nèi)容例如以下
HOGThreadData& tdata = threadData[getThreadNum()];
Mat smallerImg(sz, img.type(), tdata.smallerImgBuf.data);
調(diào)用下面核心函數(shù)
detect(smallerImg, tdata.locations, hitThreshold, winStride, padding);
其參數(shù)分別為,該比例下圖像、返回結(jié)果列表、門(mén)檻值、步長(zhǎng)、margin
該函數(shù)內(nèi)容例如以下:
(a)得到補(bǔ)齊圖像尺寸paddedImgSize
(b)創(chuàng)建類(lèi)的對(duì)象 HOGCache cache(this, img, padding, padding, nwindows == 0, cacheStride); 在創(chuàng)建過(guò)程中,首先初始化 HOGCache::init,包含:計(jì)算梯度 descriptor->computeGradient、得到塊的個(gè)數(shù)105、每塊參數(shù)個(gè)數(shù)36
(c)獲得窗體個(gè)數(shù)nwindows,以第一層為例,其窗體數(shù)為(530+32*2-64)/8+1、(402+32*2-128)/8+1 =67*43=2881,當(dāng)中(32,32)為winStride參數(shù),也可用(24,16)
(d)在每一個(gè)窗體運(yùn)行循環(huán),內(nèi)容例如以下
在105個(gè)塊中運(yùn)行循環(huán),每一個(gè)塊內(nèi)容為:通過(guò)getblock函數(shù)計(jì)算HOG特征并歸一化,36個(gè)數(shù)分別與算子中相應(yīng)數(shù)進(jìn)行相應(yīng)運(yùn)算;推斷105個(gè)塊的總和 s >= hitThreshold 則覺(jué)得檢測(cè)到目標(biāo)
4)主體部分感覺(jué)就是以上這些,但非常多細(xì)節(jié)還須要進(jìn)一步弄清。
5、原文獻(xiàn)寫(xiě)的算法流程
文獻(xiàn)NavneetDalalThesis.pdf 78頁(yè)圖5.5描寫(xiě)敘述了The complete object detection algorithm.
前2步為初始化,上面基本提到了。后面2步例如以下
For each scale Si = [Ss, SsSr, . . . , Sn]
(a) Rescale the input image using bilinear interpolation
(b) Extract features (Fig. 4.12) and densely scan the scaled image with stride Ns for object/non-object detections
(c) Push all detections with t(wi) > c to a list
Non-maximum suppression
(a) Represent each detection in 3-D position and scale space yi
(b) Using (5.9), compute the uncertainty matrices Hi for each point
(c) Compute the mean shift vector (5.7) iteratively for each point in the list until it converges to a mode
(d) The list of all of the modes gives the final fused detections
(e) For each mode compute the bounding box from the final centre point and scale
下面內(nèi)容節(jié)選自文獻(xiàn)NavneetDalalThesis.pdf,把重要的部分挑出來(lái)了。當(dāng)中保留了原文章節(jié)號(hào),便于查找。
4. Histogram of Oriented Gradients Based Encoding of Images
Default Detector.
As a yardstick for the purpose of comparison, throughout this section we compare results to our
default detector which has the following properties: input image in RGB colour space (without
any gamma correction); image gradient computed by applying [?1, 0, 1] filter along x- and yaxis
with no smoothing; linear gradient voting into 9 orientation bins in 0_–180_; 16×16 pixel
blocks containing 2×2 cells of 8×8 pixel; Gaussian block windowing with _ = 8 pixel; L2-Hys
(Lowe-style clipped L2 norm) block normalisation; blocks spaced with a stride of 8 pixels (hence
4-fold coverage of each cell); 64×128 detection window; and linear SVM classifier. We often
quote the performance at 10?4 false positives per window (FPPW) – the maximum false positive
rate that we consider to be useful for a real detector given that 103–104 windows are tested for
each image.
4.3.2 Gradient Computation
The simple [?1, 0, 1] masks give the best performance.
4.3.3 Spatial / Orientation Binning
Each pixel contributes a weighted vote for orientation based on the orientation of the gradient element centred on it.
The votes are accumulated into orientation bins over local spatial regions that we call cells.
To reduce aliasing, votes are interpolated trilinearly between the neighbouring bin centres in both orientation and position.
Details of the trilinear interpolation voting procedure are presented in Appendix D.
The vote is a function of the gradient magnitude at the pixel, either the magnitude itself, its square, its
square root, or a clipped form of the magnitude representing soft presence/absence of an edge at the pixel. In practice, using the magnitude itself gives the best results.
4.3.4 Block Normalisation Schemes and Descriptor Overlap
good normalisation is critical and including overlap significantly improves the performance.
Figure 4.4(d) shows that L2-Hys, L2-norm and L1-sqrt all perform equally well for the person detector.
such as cars and motorbikes, L1-sqrt gives the best results.
4.3.5 Descriptor Blocks
R-HOG.
For human detection, 3×3 cell blocks of 6×6 pixel cells perform best with 10.4% miss-rate
at 10?4 FPPW. Our standard 2×2 cell blocks of 8×8 cells are a close second.
We find 2×2 and 3×3 cell blocks work best.
4.3.6 Detector Window and Context
Our 64×128 detection window includes about 16 pixels of margin around the person on all four
sides.
4.3.7 Classifier
By default we use a soft (C=0.01) linear SVM trained with SVMLight [Joachims 1999].We modified
SVMLight to reduce memory usage for problems with large dense descriptor vectors.
---------------------------------
5. Multi-Scale Object Localisation
the detector scans the image with a detection window at all positions and scales, running the classifier in each window and fusing multiple overlapping detections to yield the final object detections.
We represent detections using kernel density estimation (KDE) in 3-D position and scale space. KDE is a data-driven process where continuous densities are evaluated by applying a smoothing kernel to observed data points. The bandwidth of the smoothing kernel defines the local neighbourhood. The detection scores are incorporated by weighting the observed detection points by their score values while computing the density estimate. Thus KDE naturally incorporates the first two criteria. The overlap criterion follows from the fact that detections at very different scales or positions are far off in 3-D position and scale space, and are thus not smoothed together. The modes (maxima) of the density estimate correspond to the positions and scales of final detections.
Let xi = [xi, yi] and s0i denote the detection position and scale, respectively, for the i-th detection.
the detections are represented in 3-D space as y = [x, y, s], where s = log(s’).
the variable bandwidth mean shift vector is defined as (5.7)
For each of the n point the mean shift based iterative procedure is guaranteed to converge to a mode2.
Detection Uncertainty Matrix Hi.
One key input to the above mode detection algorithm is the amount of uncertainty Hi to be associated with each point. We assume isosymmetric covariances, i.e. the Hi’s are diagonal matrices.
Let diag [H] represent the 3 diagonal elements of H. We use scale dependent covariance
matrices such that diag
[Hi] = [(exp(si)_x)2, (exp(si)_y)2, (_s)2] (5.9)
where _x, _y and _s are user supplied smoothing values.
The term t(wi) provides the weight for each detection. For linear SVMs we usually use threshold = 0.
the smoothing parameters _x, _y,and _s used in the non-maximum suppression stage. These parameters can have a significant impact on performance so proper evaluation is necessary. For all of the results here, unless otherwise noted, a scale ratio of 1.05, a stride of 8 pixels, and _x = 8, _y = 16, _s = log(1.3) are used as default values.
A scale ratio of 1.01 gives the best performance, but significantly slows the overall process.
Scale smoothing of log(1.3)–log(1.6) gives good performance for most object classes.
We group these mode candidates using a proximity measure. The final location is the ode corresponding to the highest density.
----------------------------------------------------
附錄 A. INRIA Static Person Data Set
The (centred and normalised) positive windows are supplied by the user, and the initial set of negatives is created once and for all by randomly sampling negative images.A preliminary classifier is thus trained using these. Second, the preliminary detector is used to exhaustively scan the negative training images for hard examples (false positives). The classifier is then re-trained using this augmented training set (user supplied positives, initial negatives and hard examples) to produce the final detector.
INRIA Static Person Data Set
As images of people are highly variable, to learn an effective classifier, the positive training examples need to be properly normalized and centered to minimize the variance among them. For this we manually annotated all upright people in the original images.
The image regions belonging to the annotations were cropped and rescaled to 64×128 pixel image windows. On average the subjects height is 96 pixels in these normalised windows to allow for an approximately16 pixel margin on each side. In practise we leave a further 16 pixel margin around each side of the image window to ensure that flow and gradients can be computed without boundary effects. The margins were added by appropriately expanding the annotations on each side before cropping the image regions.
//<------------------------以上摘自datal的博士畢業(yè)論文
關(guān)于INRIA Person Dataset的很多其它介紹,見(jiàn)下面鏈接
http://pascal.inrialpes.fr/data/human/
Original Images
??????????? Folders 'Train' and 'Test' correspond, respectively, to original training and test images. Both folders have three sub folders: (a) 'pos' (positive training or test images), (b) 'neg' (negative training or test images), and (c) 'annotations' (annotation files for positive images in Pascal Challenge format).
Normalized Images
??????? Folders 'train_64x128_H96' and 'test_64x128_H96' correspond to normalized dataset as used in above referenced paper. Both folders have two sub folders: (a) 'pos' (normalized positive training or test images centered on the person with their left-right reflections), (b) 'neg' (containing original negative training or test images). Note images in folder 'train/pos' are of 96x160 pixels (a margin of 16 pixels around each side), and images in folder 'test/pos' are of 70x134 pixels (a margin of 3 pixels around each side). This has been done to avoid boundary conditions (thus to avoid any particular bias in the classifier). In both folders, use the centered 64x128 pixels window for original detection task.
Negative windows
??????? To generate negative training windows from normalized images, a fixed set of 12180 windows (10 windows per negative image) are sampled randomly from 1218 negative training photos providing the initial negative training set. For each detector and parameter combination, a preliminary detector is trained and all negative training images are searched exhaustively (over a scale-space pyramid) for false positives (`hard examples'). All examples with score greater than zero are considered hard examples. The method is then re-trained using this augmented set (initial 12180 + hard examples) to produce the final detector. The set of hard examples is subsampled if necessary, so that the descriptors of the final training set fit into 1.7 GB of RAM for SVM training.
//------------------------------------------------------______________>
原作者對(duì) OpenCV2.0 peopledetect 進(jìn)行了2次更新
https://code.ros.org/trac/opencv/changeset/2314/trunk
近期一次改為例如以下:
---------------------
#include "cvaux.h"
#include "highgui.h"
#include <stdio.h>
#include <string.h>
#include <ctype.h>
using namespace cv;
using namespace std;
int main(int argc, char** argv)
{
Mat img;
FILE* f = 0;
char _filename[1024];
if( argc == 1 )
{
printf("Usage: peopledetect (<image_filename> | <image_list>.txt)\n");
return 0;
}
img = imread(argv[1]);
if( img.data )
{
strcpy(_filename, argv[1]);
}
else
{
f = fopen(argv[1], "rt");
if(!f)
{
fprintf( stderr, "ERROR: the specified file could not be loaded\n");
return -1;
}
}
HOGDescriptor hog;
hog.setSVMDetector(HOGDescriptor::getDefaultPeopleDetector());
for(;;)
{
char* filename = _filename;
if(f)
{
if(!fgets(filename, (int)sizeof(_filename)-2, f))
break;
//while(*filename && isspace(*filename))
// ++filename;
if(filename[0] == '#')
continue;
int l = strlen(filename);
while(l > 0 && isspace(filename[l-1]))
--l;
filename[l] = '\0';
img = imread(filename);
}
printf("%s:\n", filename);
if(!img.data)
continue;
fflush(stdout);
vector<Rect> found, found_filtered;
double t = (double)getTickCount();
// run the detector with default parameters. to get a higher hit-rate
// (and more false alarms, respectively), decrease the hitThreshold and
// groupThreshold (set groupThreshold to 0 to turn off the grouping completely).
int can = img.channels();
hog.detectMultiScale(img, found, 0, Size(8,8), Size(32,32), 1.05, 2);
t = (double)getTickCount() - t;
printf("tdetection time = %gms\n", t*1000./cv::getTickFrequency());
size_t i, j;
for( i = 0; i < found.size(); i++ )
{
Rect r = found[i];
for( j = 0; j < found.size(); j++ )
if( j != i && (r & found[j]) == r)
break;
if( j == found.size() )
found_filtered.push_back(r);
}
for( i = 0; i < found_filtered.size(); i++ )
{
Rect r = found_filtered[i];
// the HOG detector returns slightly larger rectangles than the real objects.
// so we slightly shrink the rectangles to get a nicer output.
r.x += cvRound(r.width*0.1);
r.width = cvRound(r.width*0.1);
r.y += cvRound(r.height*0.07);
r.height = cvRound(r.height*0.1);
rectangle(img, r.tl(), r.br(), cv::Scalar(0,255,0), 3);
}
imshow("people detector", img);
int c = waitKey(0) & 255;
if( c == 'q' || c == 'Q' || !f)
break;
}
if(f)
fclose(f);
return 0;
}
更新后能夠批量檢測(cè)圖片!
將須要批量檢測(cè)的圖片,構(gòu)造一個(gè)TXT文本,文件名稱(chēng)為filename.txt, 其內(nèi)容例如以下
1.jpg
2.jpg
......
然后在DOS界面輸入 peopledetect filename.txt , 就可以自己主動(dòng)檢測(cè)每一個(gè)圖片。
//------------------------------Navneet Dalal的OLT工作流程描寫(xiě)敘述
Navneet Dalal在下面站點(diǎn)提供了INRIA Object Detection and Localization Toolkit
http://pascal.inrialpes.fr/soft/olt/
Wilson Suryajaya Leoputra提供了它的windows版本號(hào)
http://www.computing.edu.au/~12482661/hog.html
須要 Copy all the dll's (boost_1.34.1*.dll, blitz_0.9.dll, opencv*.dll) into "<ROOT_PROJECT_DIR>/debug/"
Navneet Dalal提供了linux下的可執(zhí)行程序,借別人的linux系統(tǒng),執(zhí)行一下,先把整體流程了解了。
以下結(jié)合OLTbinaries\readme和OLTbinaries\HOG\record兩個(gè)文件把其流程描寫(xiě)敘述一下。
1.下載 INRIA person detection database 解壓到OLTbinaries\;把當(dāng)中的'train_64x128_H96' 重命名為 'train' ; 'test_64x128_H96' 重命名為 'test'.
2.在linux下執(zhí)行 'runall.sh' script.
等待結(jié)果出來(lái)后,打開(kāi)matlab 執(zhí)行 plotdet.m 可繪制 DET曲線(xiàn);
------這是一步到位法--------------------------------------------------
-------此外,它還提供了分步運(yùn)行法-------------------------------------
1、由pos.lst列表提供的圖片,計(jì)算正樣本R-HOG特征,pos.lst列表格式例如以下
train/pos/crop_000010a.png
train/pos/crop_000010b.png
train/pos/crop_000011a.png
------下面表示-linux下運(yùn)行語(yǔ)句(下同)------
./bin//dump_rhog -W 64,128 -C 8,8 -N 2,2 -B 9 -G 8,8 -S 0 --wtscale 2 --maxvalue 0.2 -- epsilon 1 --fullcirc 0 -v 3 --proc rgb_sqrt --norm l2hys -s 1 train/pos.lst HOG/train_pos.RHOG
2.計(jì)算負(fù)樣本R-HOG特征
./bin//dump_rhog -W 64,128 -C 8,8 -N 2,2 -B 9 -G 8,8 -S 0 --wtscale 2 --maxvalue 0.2 -- epsilon 1 --fullcirc 0 -v 3 --proc rgb_sqrt --norm l2hys -s 10 train/neg.lst HOG/train_neg.RHOG
3.訓(xùn)練
./bin//dump4svmlearn -p HOG/train_pos.RHOG -n HOG/train_neg.RHOG HOG/train_BiSVMLight.blt -v
4.創(chuàng)建 model file: HOG/model_4BiSVMLight.alt
./bin//svm_learn -j 3 -B 1 -z c -v 1 -t 0 HOG/train_BiSVMLight.blt HOG/model_4BiSVMLight.alt
5.創(chuàng)建目錄
mkdir -p HOG/hard
6.分類(lèi)
./bin//classify_rhog train/neg.lst HOG/hard/list.txt HOG/model_4BiSVMLight.alt -d HOG/hard/hard_neg.txt -c HOG/hard/hist.txt -m 0 -t 0 --no_nonmax 1 --avsize 0 --margin 0 --scaleratio 1.2 -l N -W 64,128 -C 8,8 -N 2,2 -B 9 -G 8,8 -S 0 --wtscale 2 --maxvalue 0.2 --
epsilon 1 --fullcirc 0 -v 3 --proc rgb_sqrt --norm l2hys
--------
false +/- 分類(lèi)結(jié)果會(huì)寫(xiě)入 HOG/hard/hard_neg.txt
7. 將hard增加到neg,再次計(jì)算RHOG特征
./bin//dump_rhog -W 64,128 -C 8,8 -N 2,2 -B 9 -G 8,8 -S 0 --wtscale 2 --maxvalue 0.2 -- epsilon 1 --fullcirc 0 -v 3 --proc rgb_sqrt --norm l2hys -s 0 HOG/hard/hard_neg.txt OG/train_hard_neg.RHOG --poscases 2416 --negcases 12180 --dumphard 1 --hardscore 0 -- memorylimit 1700
8.再次訓(xùn)練
./bin//dump4svmlearn -p HOG/train_pos.RHOG -n HOG/train_neg.RHOG -n HOG/train_hard_neg.RHOG HOG/train_BiSVMLight.blt -v 4
9.得到終于的模型
./bin//svm_learn -j 3 -B 1 -z c -v 1 -t 0 HOG/train_BiSVMLight.blt HOG/model_4BiSVMLight.alt
Opencv中用到的3780 個(gè)值,應(yīng)該就在這個(gè)模型里面model_4BiSVMLight.alt,只是它的格式未知,無(wú)法直接讀取,可是能夠研究svm_learn程序是怎樣生成它的;此外,該模型由程序classify_rhog調(diào)用,研究它怎樣調(diào)用,預(yù)計(jì)是一個(gè)解析此格式的思路
10.創(chuàng)建目錄
mkdir -p HOG/WindowTest_Negative
11.負(fù)樣本檢測(cè)結(jié)果
./bin//classify_rhog -W 64,128 -C 8,8 -N 2,2 -B 9 -G 8,8 -S 0 --wtscale 2 --maxvalue 0.2 --epsilon 1 --fullcirc 0 -v 3 --proc rgb_sqrt --norm l2hys -p 1 --no_nonmax 1 --nopyramid 0 - -scaleratio 1.2 -t 0 -m 0 --avsize 0 --margin 0 test/neg.lst HOG/WindowTest_Negative/list.txt HOG/model_4BiSVMLight.alt -c HOG/WindowTest_Negative/histogram.txt
12.創(chuàng)建目錄
mkdir -p HOG/WindowTest_Positive
13.正樣本檢測(cè)結(jié)果
./bin//classify_rhog -W 64,128 -C 8,8 -N 2,2 -B 9 -G 8,8 -S 0 --wtscale 2 --maxvalue 0.2 -- epsilon 1 --fullcirc 0 -v 3 --proc rgb_sqrt --norm l2hys -p 1 --no_nonmax 1 --nopyramid 1 -t 0 -m 0 --avsize 0 --margin 0 test/pos.lst HOG/WindowTest_Positive/list.txt HOG/model_4BiSVMLight.alt -c HOG/WindowTest_Positive/histogram.txt
怎樣制作訓(xùn)練樣本
分析了原作者的數(shù)據(jù)集,結(jié)合網(wǎng)上一些資料,以下描寫(xiě)敘述怎樣制作訓(xùn)練樣本
1、怎樣從原始圖片生成樣本
對(duì)照INRIAPerson\INRIAPerson\Train\pos(原始圖片),INRIAPerson\train_64x128_H96\pos(生成樣本)能夠發(fā)現(xiàn),作者從原始圖片裁剪出一些站立的人,要求該人不被遮擋,然后對(duì)剪裁的圖片left-right reflect。以第一張圖片為例crop001001,它剪裁了2個(gè)不被遮擋的人,再加上原照片,共3張,再加左右鏡像,總共6張。
2、裁剪
可利用基于opencv1.0的程序imageclipper,進(jìn)行裁剪并保存,它會(huì)自己主動(dòng)生成文件名稱(chēng)并保存在同一路徑下新生成的imageclipper目錄下。
3.改變圖片大小
能夠利用Acdsee軟件,Tools/open in editor,進(jìn)去后到Resize選項(xiàng); tools/rotate還可實(shí)現(xiàn)left-right reflect
自己編了一個(gè)程序,批量改變圖片大小,代碼見(jiàn)下一樓
4. 制作pos.lst列表
進(jìn)入dos界面,定位到須要制作列表的圖片目錄下,輸入 dir /b> pos.lst,就可以生成文件列表;
/
#include "cv.h"
#include "highgui.h"
#include "cvaux.h"
int main(int argc,char * argv[])
{
IplImage* src ;
IplImage* dst = 0;
CvSize dst_size;
FILE* f = 0;
char _filename[1024];
int l;
f = fopen(argv[1], "rt");
if(!f)
{
fprintf( stderr, "ERROR: the specified file could not be loaded\n");
return -1;
}
for(;;)
{
char* filename = _filename;
if(f)
{
if(!fgets(filename, (int)sizeof(_filename)-2, f))
break;
if(filename[0] == '#')
continue;
l = strlen(filename);
while(l > 0 && isspace(filename[l-1]))
--l;
filename[l] = '\0';
src=cvLoadImage(filename,1);
}
dst_size.width = 96;
dst_size.height = 160;
dst=cvCreateImage(dst_size,src->depth,src->nChannels);
cvResize(src,dst,CV_INTER_LINEAR);//
char* filename2 = _filename;char* filename3 = _filename; filename3="_96x160.jpg";
strncat(filename2, filename,l-4);
strcat(filename2, filename3);
cvSaveImage(filename2, dst);
}
if(f)
fclose(f);
cvWaitKey(-1);
cvReleaseImage( &src );
cvReleaseImage( &dst );
return 0;
}
?
總結(jié)
以上是生活随笔為你收集整理的HOG(方向梯度直方图)的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: mac地址对应的厂商
- 下一篇: Linux 命令平时积累