深入分析Kubernetes Critical Pod(二)
深入分析Kubernetes Critical Pod(一)介紹了Scheduler對Critical Pod的處理邏輯,下面我們再看下Kubelet Eviction Manager對Critical Pod的處理邏輯是怎樣的,以便我們了解Kubelet Evict Pod時對Critical Pod是否有保護措施,如果有,又是如何保護的。
Kubelet Eviction Manager Admit
kubelet在syncLoop中每個1s會循環調用syncLoopIteration,從config change channel | pleg channel | sync channel | houseKeeping channel | liveness manager's update channel中獲取event,然后分別調用對應的event handler進行處理。
- configCh: dispatch the pods for the config change to the appropriate handler callback for the event type
- plegCh: update the runtime cache; sync pod
- syncCh: sync all pods waiting for sync
- houseKeepingCh: trigger cleanup of pods
- liveness manager's update channel: sync pods that have failed or in which one or more containers have failed liveness checks
特別提一下,houseKeeping channel是每隔houseKeeping(10s)時間就會有event,然后執行HandlePodCleanups,執行以下清理操作:
- Stop the workers for no-longer existing pods.(每個pod對應會有一個worker,也就是goruntine)
- killing unwanted pods
- removes the volumes of pods that should not be running and that have no containers running.
- Remove any orphaned mirror pods.
- Remove any cgroups in the hierarchy for pods that are no longer running.
syncLoopIteration中定義了當kubelet配置變更重啟后的邏輯:kubelet會對正在running的Pods進行Admission處理,Admission的結果有可能會讓該Pod被本節點拒絕。
HandlePodAdditions就是用來處理Kubelet ConficCh中的event的Handler。
// HandlePodAdditions is the callback in SyncHandler for pods being added from a config source. func (kl *Kubelet) HandlePodAdditions(pods []*v1.Pod) {start := kl.clock.Now()sort.Sort(sliceutils.PodsByCreationTime(pods))for _, pod := range pods {...if !kl.podIsTerminated(pod) {...// Check if we can admit the pod; if not, reject it.if ok, reason, message := kl.canAdmitPod(activePods, pod); !ok {kl.rejectPod(pod, reason, message)continue}}...} }如果該Pod Status不是屬于Terminated,就調用canAdmitPod對該Pod進行準入檢查。如果準入檢查結果表示該Pod被拒絕,那么就會將該Pod Phase設置為Failed。
pkg/kubelet/kubelet.go:1643func (kl *Kubelet) canAdmitPod(pods []*v1.Pod, pod *v1.Pod) (bool, string, string) {// the kubelet will invoke each pod admit handler in sequence// if any handler rejects, the pod is rejected.// TODO: move out of disk check into a pod admitter// TODO: out of resource eviction should have a pod admitter call-outattrs := &lifecycle.PodAdmitAttributes{Pod: pod, OtherPods: pods}for _, podAdmitHandler := range kl.admitHandlers {if result := podAdmitHandler.Admit(attrs); !result.Admit {return false, result.Reason, result.Message}}return true, "", "" }canAdmitPod就會調用kubelet啟動時注冊的一系列admitHandlers對該Pod進行準入檢查,其中就包括kubelet eviction manager對應的admitHandle。
pkg/kubelet/eviction/eviction_manager.go:123// Admit rejects a pod if its not safe to admit for node stability. func (m *managerImpl) Admit(attrs *lifecycle.PodAdmitAttributes) lifecycle.PodAdmitResult {m.RLock()defer m.RUnlock()if len(m.nodeConditions) == 0 {return lifecycle.PodAdmitResult{Admit: true}}if utilfeature.DefaultFeatureGate.Enabled(features.ExperimentalCriticalPodAnnotation) && kubelettypes.IsCriticalPod(attrs.Pod) {return lifecycle.PodAdmitResult{Admit: true}}if hasNodeCondition(m.nodeConditions, v1.NodeMemoryPressure) {notBestEffort := v1.PodQOSBestEffort != v1qos.GetPodQOS(attrs.Pod)if notBestEffort {return lifecycle.PodAdmitResult{Admit: true}}}return lifecycle.PodAdmitResult{Admit: false,Reason: reason,Message: fmt.Sprintf(message, m.nodeConditions),} }eviction manager的Admit的邏輯如下:
- 如果該node的Conditions為空,則Admit成功;
-
如果enable了ExperimentalCriticalPodAnnotation Feature Gate,并且該Pod是Critical Pod(Pod有Critical的Annotation,或者Pod的優先級不小于SystemCriticalPriority),則Admit成功;
- SystemCriticalPriority的值為2 billion。
- 如果該node的Condition為Memory Pressure,并且Pod QoS為非best-effort,則Admit成功;
- 其他情況都表示Admit失敗,即不允許該Pod在該node上Running。
Kubelet Eviction Manager SyncLoop
另外,在kubelet eviction manager的syncLoop中,也會對Critical Pod有特殊處理,代碼如下。
pkg/kubelet/eviction/eviction_manager.go:226// synchronize is the main control loop that enforces eviction thresholds. // Returns the pod that was killed, or nil if no pod was killed. func (m *managerImpl) synchronize(diskInfoProvider DiskInfoProvider, podFunc ActivePodsFunc) []*v1.Pod {...// we kill at most a single pod during each eviction intervalfor i := range activePods {pod := activePods[i]if utilfeature.DefaultFeatureGate.Enabled(features.ExperimentalCriticalPodAnnotation) &&kubelettypes.IsCriticalPod(pod) && kubepod.IsStaticPod(pod) {continue}...return []*v1.Pod{pod}}glog.Infof("eviction manager: unable to evict any pods from the node")return nil }當觸發了kubelet evict pod時,如果該pod滿足以下所有條件時,將不會被kubelet eviction manager kill掉。
- 該Pod Status不是Terminated;
- Enable ExperimentalCriticalPodAnnotation Feature Gate;
- 該Pod是Critical Pod;
- 該Pod時Static Pod;
總結
經過上面的分析,我們得到以下Kubelet Eviction Manager對Critical Pod處理的關鍵點:
- kubelet重啟后,eviction manager的Admit流程中對Critical Pod做如下特殊處理:如果enable了ExperimentalCriticalPodAnnotation Feature Gate,則允許該Critical Pod準入該node,無視該node的Condition。
-
當觸發了kubelet evict pod時,如果該Critical Pod滿足以下所有條件時,將不會被kubelet eviction manager kill掉。
- 該Pod Status不是Terminated;
- Enable ExperimentalCriticalPodAnnotation Feature Gate;
- 該Pod是Static Pod;
總結
以上是生活随笔為你收集整理的深入分析Kubernetes Critical Pod(二)的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 一款基于RxJava2+Retrofit
- 下一篇: Alpha系列(四)——主动投资管理定律