hac 估计 matlab程序,CV算法:K-means 、HAC、Mean Shift Clustering
At the high level, we can specify Mean Shift as follows :
Fix a window around each data point.
Compute the mean of data within the window.
Shift the window to the mean and repeat till convergence.
K-means
將多種顏色匯集成幾種
隨機分配幾個cluster centers,計算每個點跟這些cluster center的距離,然后選擇最小的距離進行站隊,重新根據站同一隊的點計算均值得到新的cluster center,直到沒有新的點重新換隊。
問題:選擇幾個cluster centers?任意得到局部最優解而不是全局最優(可以多進行幾次,然后投票得到結果)。
公式推導:
下圖大意,我們首先定K,表示聚合的中心,用J來表示損耗函數,我們要將其降到最小,注意式中r的含義。在聚合點固定的時候,計算點離哪個比較近,選擇站隊,可以使得J減小。在站隊固定的時候,根據導數的計算,可知戰隊的點的均值作為聚合點J最小。
程序大致流程:
預處理:
Kmeans函數輸入的是一個矩陣,每一行是一個向量,原始圖像有多少個點就有多少個向量。以一個3733633通道的彩色圖片為例,需要轉化為135399*3,其中135399代表原始圖像的點數,3代表每個點的顏色空間的位置。然后還要轉化為浮點數。
Mat image = imread("F:\\cv\\cv-pictures\\hands.jpg"); //373*363
Mat reshapedImage = image.reshape(1, image.cols * image.rows); //135399*3(代表有135399個,每個都對應顏色空間的一個值)
Mat reshaped_image32f;
reshapedImage.convertTo(reshaped_image32f, CV_32FC1, 1.0 / 255.0);
Kmeans:
iterals:是重新開始的次數,因為怕得到局部最優解,所以需要每次隨機重新初始化。centers是計算出來的k個中心,labels的大小是135399*1,1代表是第幾個center。
int k = 9;
int iterals = 1;
Mat labels; // 用來存放輸入向量計算后對應的cluster的序號
cv::Mat centers; //cluster centers
TermCriteria criteria{TermCriteria::COUNT,100,1}; //迭代100次
kmeans(reshaped_image32f, k, labels, criteria, iterals, KMEANS_RANDOM_CENTERS, centers);
后處理:
將中心點轉化為CV_8UC3,記住計算后的是浮點型的。
Mat centers_u8c3;
centers.convertTo(centers_u8c3, CV_8UC3, 255.0);
重新計算得到圖像,新的圖像只有對應centers的幾種顏色了,
typedef Vec3b Pixel;
//typedef Point3_ Pixel;
Mat rgb_image(image.rows, image.cols,CV_8UC3);
MatIterator_ rgb_first = rgb_image.begin();
MatIterator_ rgb_end = rgb_image.end();
MatConstIterator_ labes_first = labels.begin();
while (rgb_first != rgb_end) {
Pixel& rgb = centers_u8c3.ptr(*labes_first)[0];
(*rgb_first) = rgb;
++labes_first;
++rgb_first;
}
最后結果:
顏色被規整化了,相近顏色只剩下一種,可以很輕松地劃分區域
cluster是窗口名字的兩倍,可以看到顏色越多越近。
特征空間(Feature Space)
我們上面的例子用的是三種顏色RGB的三維特征空間,如果是黑白圖片,只有一維的特征空間。
特征空間為 texture
還可以是 空間加強度,這樣的特征空間強加了空間維度,通常沒有意義。
Clusters don’t have to be spatially coherent
怎么選擇聚合點?
用驗證集進行測試,一般情況下聚合點越多越好。
k-means優缺點
k-means具體實現
產生4堆數據
# Cluster 1
mean1 = [-1, 0]
cov1 = [[0.1, 0], [0, 0.1]]
X1 = np.random.multivariate_normal(mean1, cov1, 100)
# Cluster 2
mean2 = [0, 1]
cov2 = [[0.1, 0], [0, 0.1]]
X2 = np.random.multivariate_normal(mean2, cov2, 100)
# Cluster 3
mean3 = [1, 0]
cov3 = [[0.1, 0], [0, 0.1]]
X3 = np.random.multivariate_normal(mean3, cov3, 100)
# Cluster 4
mean4 = [0, -1]
cov4 = [[0.1, 0], [0, 0.1]]
X4 = np.random.multivariate_normal(mean4, cov4, 100)
# Merge two sets of data points
X = np.concatenate((X1, X2, X3, X4))
# Plot data points
plt.scatter(X[:, 0], X[:, 1])
plt.axis('equal')
plt.show()
K-means效果與實現
### Clustering Methods
def kmeans(features, k, num_iters=100):
""" Use kmeans algorithm to group features into k clusters.
K-Means algorithm can be broken down into following steps:
1. Randomly initialize cluster centers
2. Assign each point to the closest center
3. Compute new center of each cluster
4. Stop if cluster assignments did not change
5. Go to step 2
Args:
features - Array of N features vectors. Each row represents a feature
vector.
k - Number of clusters to form.
num_iters - Maximum number of iterations the algorithm will run.
Returns:
assignments - Array representing cluster assignment of each point.
(e.g. i-th point is assigned to cluster assignments[i])
"""
N, D = features.shape
assert N >= k, 'Number of clusters cannot be greater than number of points'
# Randomly initalize cluster centers
idxs = np.random.choice(N, size=k, replace=False)
centers = features[idxs]
assignments = np.zeros(N)
for n in range(num_iters):
distances_to_centers = []
### YOUR CODE HERE
for i in range(k):
ds = np.sum(np.square(features - centers[i]), axis=1, keepdims=True)
distances_to_centers.append(ds)
distances_to_centers = np.concatenate(distances_to_centers, axis=1)
assignments = np.argmin(distances_to_centers, axis=1)
new_centers = []
for i in range(k):
new_centers.append(np.mean(features[assignments == i],axis=0))
if np.allclose(centers,new_centers):
break
else:
centers = new_centers
### END YOUR CODE
assert np.size(assignments) == N,"返回的結果數量不對"
return assignments
HAC(hierarchical agglomerative clustering algorithm)
算法介紹具體看注釋和代碼,很簡單
def hierarchical_clustering(features, k):
""" Run the hierarchical agglomerative clustering algorithm.
The algorithm is conceptually simple:
Assign each point to its own cluster
While the number of clusters is greater than k:
Compute the distance between all pairs of clusters
Merge the pair of clusters that are closest to each other
We will use Euclidean distance to define distance between two clusters.
Recomputing the centroids of all clusters and the distances between all
pairs of centroids at each step of the loop would be very slow. Thankfully
most of the distances and centroids remain the same in successive
iterations of the outer loop; therefore we can speed up the computation by
only recomputing the centroid and distances for the new merged cluster.
Even with this trick, this algorithm will consume a lot of memory and run
very slowly when clustering large set of points. In practice, you probably
do not want to use this algorithm to cluster more than 10,000 points.
Args:
features - Array of N features vectors. Each row represents a feature
vector.
k - Number of clusters to form.
Returns:
assignments - Array representing cluster assignment of each point.
(e.g. i-th point is assigned to cluster assignments[i])
"""
N, D = features.shape
assert N >= k, 'Number of clusters cannot be greater than number of points'
# Assign each point to its own cluster
assignments = np.arange(N)
centers = np.copy(features)
n_clusters = N
from scipy.spatial.distance import pdist
while n_clusters > k:
### YOUR CODE HERE
distances = pdist(centers)
matrixDistances = squareform(distances)
matrixDistances = np.where(matrixDistances != 0.0, matrixDistances, 1e10)
minValue = np.argmin(matrixDistances)
min_i = minValue // n_clusters
min_j = minValue - min_i * n_clusters
if min_j < min_i:
min_i, min_j = min_j, min_i
for i in range(N):
if assignments[i] == min_j:
assignments[i] = min_i
for i in range(N):
if assignments[i] > min_j:
assignments[i] -= 1
centers = np.delete(centers, min_j, axis = 0)
centers[min_i] = np.mean(features[assignments == min_i], axis = 0)
n_clusters -= 1
### END YOUR CODE
return assignments
核密度估計(kernel density estimation)
直方圖估計概率分布的缺點:
1.不平滑
2.跟端點有關
3.跟柱子的寬度有關
不平滑
首先直方圖是分段不變的,柱子邊緣跟相鄰柱子的邊緣應該是要接近的,而不是跟所在柱子的另外一端一樣。
端點
同樣的數據,同樣的間隔,因為端點不同,一個單峰的,一個是雙峰的
柱子寬度
顯而易見
解決方案
為了解決上述問題,我們可以在之前直方圖的基礎上增加分辨率,得到下面更精細的結構,雖然還是不連續的。這叫做box-car內核,差不多就是平滑濾波。其他的一些內核也不過是平滑濾波的廣義化。一般來說點附近會更大,而不是整個柱子都是一樣大。
選擇內核主要有兩個參數:形狀和帶寬,帶寬描述了曲線下降的速度,如果你使用的是平的內核,那么帶寬就是其寬度,帶寬經驗上比形狀更重要。
帶寬太小和帶寬太大
怎么選擇合適的帶寬?
看不懂...
Mean shift
基本操作
kmeans的點是固定的,聚集點不斷地向點密集的地方移動,而Means shift是先計算KDE,然后點根據梯度的方向移動(梯度上升,直接到達梯度為0的點),聚集到幾個點上。
Using Mean-Shift on Color Models 兩種方法:
[Lecture 29: Video Tracking: Mean-shift]
(http://www.cse.psu.edu/~rtc12/CSE486/lecture29.pdf)
方法1:
基于mean-shift的跟蹤主要是通過直方圖的反向投影來獲得目標在圖像中位置的概率估計,從概率估計圖中使用均值漂移到概率最大點。反向投影(backproject請看我另外一篇文章)會得到一張灰度圖片,越亮的區域說明該區域的顏色在追蹤的圖像中占的比例越大,從而說明該區域為追蹤區域的可能性越大(顏色上相關性比較大)。然后可能會閾值化?大于閾值的點說明極有可能為追蹤區域的點,然后用均值漂移進行聚類,將點聚集在追蹤區域。也可以不閾值化,將大小當作系數加到公式上,見下面。mean-shift就是用來追蹤密集區域的。
下面公式的w(a)是上面說到的不閾值化,直接用參數。a是泛指所有的點,K(a-x)*(a-x)是梯度((a-x)是二次方求導出來的),下面公式推導就知道了。
??
應用:
方法2:
用直方圖表示目標的顏色分布,然后用mean-shift找到直方圖分布最相似的分布。具體可能是以每個點為中心,圈出一片區域,得到對應直方圖,每個直方圖計算出來一個向量,跟目標進行比較相似度(余弦相似性),然后就在圖片的每個點上有一個對應的相似度。接下來用mean-shift了,找到系數比較大,分布比較密集的區域。
數學原理
最重要是要先看下本文的 核密度估計(kernel density estimation) 部分,
K是核函數,實踐中有時也用下面兩個近似,用一維的核函數相乘近似模擬多維(比如高斯一維相乘模擬高維,之前看到這樣是要有兩個維度獨立的假設得到,同時一維相乘不能完全模擬高維,一維相乘可以得到圓,但是不能得到橢圓),或者用向量長度的函數。
正經核函數
整篇文章的推導最容易看懂:
https://zhuanlan.zhihu.com/p/31183313
經過以上你應該可以看懂這篇文章(Mean Shift Clustering)的程序了,只看更新的表達式。
for p in copied_points:
while not at_kde_peak:
p = shift(p, original_points)
def shift(p, original_points):
shift_x = float(0)
shift_y = float(0)
scale_factor = float(0)
for p_temp in original_points:
# numerator
dist = euclidean_dist(p, p_temp)
weight = kernel(dist, kernel_bandwidth)
shift_x += p_temp[0] * weight
shift_y += p_temp[1] * weight
# denominator
scale_factor += weight
shift_x = shift_x / scale_factor
shift_y = shift_y / scale_factor
return [shift_x, shift_y]
總結
以上是生活随笔為你收集整理的hac 估计 matlab程序,CV算法:K-means 、HAC、Mean Shift Clustering的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: NIST加密标准是什么意思?
- 下一篇: 【09年特长生第四题】开发区规划