當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

ML之kNN：k最近邻kNN算法的简介、应用、经典案例之详细攻略

發布時間：2025/3/21 编程问答 27 豆豆

生活随笔收集整理的這篇文章主要介紹了 ML之kNN：k最近邻kNN算法的简介、应用、经典案例之详细攻略小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

ML之kNN：k最近鄰kNN算法的簡介、應用、經典案例之詳細攻略

kNN算法的簡介

1、kNN思路過程

1.1、k的意義

1.2、kNN求最近距離案例解釋原理—通過實際案例，探究kNN思路過程

2、K 近鄰算法的三要素

k最近鄰kNN算法的應用

1、kNN代碼解讀

k最近鄰kNN算法的經典案例

1、基礎案例

kNN算法的簡介

? ? ? 鄰近算法，或者說K最近鄰(kNN，k-NearestNeighbor)分類算法是數據挖掘分類技術中最簡單的方法之一。所謂K最近鄰，就是k個最近的鄰居的意思，說的是每個樣本都可以用它最接近的k個鄰居來代表。

? ? ? kNN算法的核心思想：如果一個樣本在特征空間中的k個最相鄰的樣本中的大多數屬于某一個類別，則該樣本也屬于這個類別，并具有這個類別上樣本的特性。

該方法在確定分類決策上只依據最鄰近的一個或者幾個樣本的類別來決定待分樣本所屬的類別。 kNN方法在類別決策時，只與極少量的相鄰樣本有關。
由于kNN方法主要靠周圍有限的鄰近的樣本，而不是靠判別類域的方法來確定所屬類別的，因此對于類域的交叉或重疊較多的待分樣本集來說，kNN方法較其他方法更為適合。

? ? ? kNN算法不僅可以用于分類，還可以用于回歸。通過找出一個樣本的k個最近鄰居，將這些鄰居的屬性的平均值賦給該樣本，就可以得到該樣本的屬性。如下圖是kNN算法中，k等于不同值時的算法分類結果。
? ? ? 簡單來說，kNN可以看成：有那么一堆你已經知道分類的數據，然后當一個新數據進入的時候，就開始跟訓練數據里的每個點求距離，然后選擇離這個訓練數據最近的k個點，看看這幾個點屬于什么類型，然后用少數服從多數的原則，給新數據歸類。

1、kNN思路過程

1.1、k的意義

1.2、kNN求最近距離案例解釋原理—通過實際案例，探究kNN思路過程

? ?共有22圖片→label屬于[0，21]，每一個label對應一個長度距離，最后預測encodings中，一張圖片中的兩個目標

knn_clf.kneighbors())

(array([[0.30532235, 0.31116033],

???????[0.32661427, 0.33672689],

???????[0.23773344, 0.32330168],

???????[0.23773344, 0.31498658],

???????[0.33672689, 0.33821827],

???????[0.38318684, 0.40261368],

???????[0.36961207, 0.37032072],

???????[0.30532235, 0.32875857],

???????[0.31116033, 0.31498658],

???????[0.34639613, 0.37008633],

???????[0.34639613, 0.38417308],

???????[0.38043224, 0.40495343],

???????[0.37008633, 0.38417308],

???????[0.36410526, 0.38557585],

???????[0.40495343, 0.42797409],

???????[0.36410526, 0.40118199],

???????[0.31723113, 0.340506 ?],

???????[0.37033616, 0.37823567],

???????[0.32446263, 0.33810974],

???????[0.31723113, 0.32446263],

???????[0.33810974, 0.37878755],

???????[0.340506 ?, 0.3755613 ]]),

array([[ 7, ?8],

???????[ 0, ?4],

???????[ 3, ?8],

???????[ 2, ?8],

???????[ 1, ?3],

???????[ 1, ?8],

???????[ 4, ?7],

???????[ 0, ?8],

???????[ 0, ?3],

???????[10, 12],

???????[ 9, 12],

???????[ 9, 14],

???????[ 9, 10],

???????[15, ?9],

???????[11, 10],

???????[13, 12],

???????[19, 21],

???????[19, 20],

???????[16, 18],

???????[18, 16],

???????[16, 19]], dtype=int64))

knn_clf.kneighbors(encodings, n_neighbors=1)

(array([[0.33233257],[0.31491284]]),

array([[20],[12]], dtype=int64))

2、K 近鄰算法的三要素

K 近鄰算法使用的模型實際上對應于對特征空間的劃分。K 值的選擇，距離度量和分類決策規則是該算法的三個基本要素：

K 值的選擇會對算法的結果產生重大影響。K值較小意味著只有與輸入實例較近的訓練實例才會對預測結果起作用，但容易發生過擬合；如果 K 值較大，優點是可以減少學習的估計誤差，但缺點是學習的近似誤差增大，這時與輸入實例較遠的訓練實例也會對預測起作用，使預測發生錯誤。在實際應用中，K 值一般選擇一個較小的數值，通常采用交叉驗證的方法來選擇最優的 K 值。隨著訓練實例數目趨向于無窮和 K=1 時，誤差率不會超過貝葉斯誤差率的2倍，如果K也趨向于無窮，則誤差率趨向于貝葉斯誤差率。

該算法中的分類決策規則往往是多數表決，即由輸入實例的 K 個最臨近的訓練實例中的多數類決定輸入實例的類別

距離度量一般采用 Lp 距離，當p=2時，即為歐氏距離，在度量之前，應該將每個屬性的值規范化，這樣有助于防止具有較大初始值域的屬性比具有較小初始值域的屬性的權重過大。

?

k最近鄰kNN算法的應用

1、kNN代碼解讀

? ? """Regression based on k-nearest neighbors. ? ? The target is predicted by local interpolation of the targets associated of the nearest neighbors in the training set. ? ? Read more in the :ref:`User Guide <regression>`. ? ?? ? ? Parameters ? ? ---------- ? ? n_neighbors : int, optional (default = 5) ? ? Number of neighbors to use by default for :meth:`kneighbors queries. ? ? weights : str or callable ? ? weight function used in prediction. ?Possible values: ? ?? ? ? - 'uniform' : uniform weights. ?All points in each neighborhood are weighted equally. ? ? - 'distance' : weight points by the inverse of their distance. ? ? in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away. ? ? - [callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights. ? ?? ? ? Uniform weights are used by default. ? ?? ? ? algorithm : {'auto', 'ball_tree', 'kd_tree', 'brute'}, optional ? ? Algorithm used to compute the nearest neighbors: ? ?? ? ? - 'ball_tree' will use :class:`BallTree` ? ? - 'kd_tree' will use :class:`KDTree` ? ? - 'brute' will use a brute-force search. ? ? - 'auto' will attempt to decide the most appropriate algorithm ? ? based on the values passed to :meth:`fit` method.	基于k近鄰的回歸。通過對訓練集中最近鄰相關的目標進行局部插值來預測目標。請參閱:ref: ' User Guide <regression> '。</regression> 參數 --------- n_neighbors: int，可選(默認= 5) kneighbors:meth: ' kneighbors查詢默認使用的鄰居數。權值:str或callable 用于預測的權函數。可能的值: -“均勻”:重量均勻。每個鄰域中的所有點的權值都是相等的。 -“距離”:權重點的距離的倒數。在這種情況下，查詢點附近的鄰居比遠處的鄰居有更大的影響。 - [callable]:一個用戶定義的函數，它接受一個距離數組，并返回一個包含權值的形狀相同的數組。默認情況下使用統一的權重。算法:{'auto'， 'ball_tree'， 'kd_tree'， 'brute'}，可選計算最近鄰的算法: - 'ball_tree'將使用:class: ' BallTree ' - 'kd_tree'將使用:class: ' KDTree ' -“蠻力”將使用蠻力搜索。 - 'auto'將嘗試決定最合適的算法基于傳遞給:meth: ' fit '方法的值。
? ? Note: fitting on sparse input will override the setting of this parameter, using brute force. ? ?? ? ? leaf_size : int, optional (default = 30) ? ? Leaf size passed to BallTree or KDTree. ?This can affect the speed of the construction and query, as well as the memory?required to store the tree. ?The optimal value depends on the nature of the problem. ? ?? ? ? p : integer, optional (default = 2) ? ? Power parameter for the Minkowski metric. When p = 1, this is equivalent to using manhattan_distance (l1), and? euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used. ? ?? ? ? metric : string or callable, default 'minkowski' ? ? the distance metric to use for the tree. ?The default metric is minkowski, and with p=2 is equivalent to the standard Euclidean metric. See the documentation of the DistanceMetric class for a list of available metrics. ? ?? ? ? metric_params : dict, optional (default = None) ? ? Additional keyword arguments for the metric function. ? ?? ? ? n_jobs : int, optional (default = 1) ? ? The number of parallel jobs to run for neighbors search. ? ? If ``-1``, then the number of jobs is set to the number of CPU? cores. ? ? Doesn't affect :meth:`fit` method.	注意:擬合稀疏輸入將覆蓋該參數的設置，使用蠻力。 leaf_size: int，可選(默認值為30) 葉大小傳遞給BallTree或KDTree。這可能會影響構造和查詢的速度，以及存儲樹所需的內存。最優值取決于問題的性質。 p:整數，可選(默認= 2) Minkowski度規的功率參數。當p = 1時，這相當于在p = 2時使用manhattan_distance (l1)和euclidean_distance (l2)。對于任意p，使用minkowski_distance (l_p)。度量:字符串或可調用，默認'minkowski' 用于樹的距離度量。默認的度量是minkowski, p=2等于標準的歐幾里德度量。有關可用指標的列表，請參閱distancem類的文檔。 metric_params: dict，可選(默認= None) 度量函數的附加關鍵字參數。 n_jobs: int，可選(默認值為1) 要為鄰居搜索運行的并行作業的數量。如果' ' -1 ' '，則作業的數量被設置為CPU核心的數量。不影響:冰毒:'適合'方法。
? ? Examples ? ? -------- ? ? >>> X = [[0], [1], [2], [3]] ? ? >>> y = [0, 0, 1, 1] ? ? >>> from sklearn.neighbors import KNeighborsRegressor ? ? >>> neigh = KNeighborsRegressor(n_neighbors=2) ? ? >>> neigh.fit(X, y) # doctest: +ELLIPSIS ? ? KNeighborsRegressor(...) ? ? >>> print(neigh.predict([[1.5]])) ? ? [ 0.5] ? ?? ? ? See also ? ? -------- ? ? NearestNeighbors ? ? RadiusNeighborsRegressor ? ? KNeighborsClassifier ? ? RadiusNeighborsClassifier	例子 -------- >>> X = [[0]， [1]， [2]， [3]] >>> y = [0, 0, 1, 1] 從sklearn > > >。鄰居進口KNeighborsRegressor >>> neigh = KNeighborsRegressor(n_neighbors=2) > > >馬嘶聲。fit(X, y) # doctest: +省略號 KNeighborsRegressor (…) > > >打印(neigh.predict ([[1.5]])) [0.5] 另請參閱 -------- NearestNeighbors RadiusNeighborsRegressor KNeighborsClassifier RadiusNeighborsClassifier
? ? Notes ? ? ----- ? ? See :ref:`Nearest Neighbors <neighbors>` in the online? documentation for a discussion of the choice of ``algorithm`` and ``leaf_size``. ? ?? ? ? .. warning:: ? ?? ? ? Regarding the Nearest Neighbors algorithms, if it is found that? two neighbors, neighbor `k+1` and `k`, have identical distances but different labels, the results will depend on the ordering of the training data. ? ?? ? ? https://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm ? ? """	筆記 ----- 參見:ref: ' Nearest Neighbors < Neighbors > ' in the online documentation，其中討論了' '算法' '和' ' leaf_size ' '的選擇。 . .警告:: 對于最近鄰算法，如果發現相鄰的‘k+1’和‘k’這兩個相鄰的距離相同，但是標簽不同，那么結果將取決于訓練數據的排序。 https://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm ”“”

class KNeighborsRegressor Found at: sklearn.neighbors.regressionclass KNeighborsRegressor(NeighborsBase, KNeighborsMixin, SupervisedFloatMixin, RegressorMixin):def __init__(self, n_neighbors=5, weights='uniform', algorithm='auto', leaf_size=30, p=2, metric='minkowski', metric_params=None, n_jobs=1, **kwargs):self._init_params(n_neighbors=n_neighbors, algorithm=algorithm, leaf_size=leaf_size, metric=metric, p=p, metric_params=metric_params, n_jobs=n_jobs, **kwargs)self.weights = _check_weights(weights)def predict(self, X):"""Predict the target for the provided dataParameters----------X : array-like, shape (n_query, n_features), \or (n_query, n_indexed) if metric == 'precomputed'Test samples.Returns-------y : array of int, shape = [n_samples] or [n_samples, n_outputs]Target values"""X = check_array(X, accept_sparse='csr')neigh_dist, neigh_ind = self.kneighbors(X)weights = _get_weights(neigh_dist, self.weights)_y = self._yif _y.ndim == 1:_y = _y.reshape((-1, 1))if weights is None:y_pred = np.mean(_y[neigh_ind], axis=1)else:y_pred = np.empty((X.shape[0], _y.shape[1]), dtype=np.float64)denom = np.sum(weights, axis=1)for j in range(_y.shape[1]):num = np.sum(neigh_indj]_y[ * weights, axis=1)y_pred[:j] = num / denomif self._y.ndim == 1:y_pred = y_pred.ravel()return y_pred

k最近鄰kNN算法的經典案例

1、基礎案例

ML之kNN：利用kNN算法對鶯尾(Iris)數據集進行多分類預測
ML之kNN(兩種)：基于兩種kNN(平均回歸、加權回歸)對Boston(波士頓房價)數據集(506,13+1)進行價格回歸預測并對比各自性能
CV之kNN：基于ORB提取+kNN檢測器、基于SIFT提取+flann檢測器的圖片相似度可視化

總結

以上是生活随笔為你收集整理的ML之kNN：k最近邻kNN算法的简介、应用、经典案例之详细攻略的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： BigData：根据最新2018人工智能
下一篇： Bigdata之sql+mplot：利用

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

编程问答

ML之kNN：k最近邻kNN算法的简介、应用、经典案例之详细攻略

kNN算法的簡介

1、kNN思路過程

1.1、k的意義

1.2、kNN求最近距離案例解釋原理—通過實際案例，探究kNN思路過程

2、K 近鄰算法的三要素

?

k最近鄰kNN算法的應用

1、kNN代碼解讀

k最近鄰kNN算法的經典案例

1、基礎案例

總結