From f8a59446e87a436de851a928c0ee0706f33e28dd Mon Sep 17 00:00:00 2001 From: Damian Nowak Date: Fri, 9 Feb 2018 10:39:40 -0600 Subject: [PATCH] Enable OOM killing When etcd exceeds its memory limit, it becomes useless but keeps running. We should let OOM killer kill etcd process in the container, so systemd can spot the problem and restart etcd according to "Restart" setting in etcd.service unit file. If OOME problem keep repeating, i.e. it happens every single restart, systemd will eventually back off and stop restarting it anyway. --restart=on-failure:5 in this file has no effect because memory allocation error doesn't by itself cause the process to die Related: https://github.com/kubernetes-incubator/kubespray/blob/master/roles/etcd/templates/etcd-docker.service.j2 This kind of reverts a change introduced in #1860. --- roles/etcd/templates/etcd.j2 | 1 - 1 file changed, 1 deletion(-) diff --git a/roles/etcd/templates/etcd.j2 b/roles/etcd/templates/etcd.j2 index d916a7570..9ac08e073 100644 --- a/roles/etcd/templates/etcd.j2 +++ b/roles/etcd/templates/etcd.j2 @@ -9,7 +9,6 @@ {% if etcd_memory_limit is defined %} --memory={{ etcd_memory_limit|regex_replace('Mi', 'M') }} \ {% endif %} - --oom-kill-disable \ {% if etcd_cpu_limit is defined %} --cpu-shares={{ etcd_cpu_limit|regex_replace('m', '') }} \ {% endif %}