* etcd: throttle restart for availability
During upgrade, etcd member are restarted all at once.
This can impact the availability of the etcd cluster and subsequently of
the Kubernetes cluster.
Limit the concurrent restart so that the etcd cluster can keep quorum.
* Simplify etcd handlers
* containerd: refactor handlers to use 'listen'
* cri-dockerd: refactor handlers to use 'listen'
* cri-o: refactor handlers to use 'listen'
* docker: refactor handlers to use 'listen'
* etcd: refactor handlers to use 'listen'
* control-plane: refactor handlers to use 'listen'
* kubeadm: refactor handlers to use 'listen'
* node: refactor handlers to use 'listen'
* preinstall: refactor handlers to use 'listen'
* calico: refactor handlers to use 'listen'
* kube-router: refactor handlers to use 'listen'
* macvlan: refactor handlers to use 'listen'
* project: fix ansible-lint name
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>
* project: ignore jinja template error in names
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>
* project: capitalize ansible name
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>
* project: update notify after name capitalization
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>
---------
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>
We don't need to support upgrades from 2 year old installs,
just from the last major version.
Also changed most retried tasks to 1s delay instead of longer.
The current way to setup the etc cluster is messy and buggy.
- It checks for cluster is healthy before the cluster is even created.
- The unit files are started on handlers, not in the task, so you mess with "flush handlers".
- The join_member.yml is not used.
- etcd events cluster is not configured for kubeadm
- remove duplicate runs between running the role on etcd nodes and k8s nodes
* Adding yaml linter to ci check
* Minor linting fixes from yamllint
* Changing CI to install python pkgs from requirements.txt
- adding in a secondary requirements.txt for tests
- moving yamllint to tests requirements
etcd is crucial part of kubernetes cluster. Ansible restarts etcd on
reconfiguration. Backup helps operator to restore cluster manually in
case of any issues.
Signed-off-by: Sergii Golovatiuk <sgolovatiuk@mirantis.com>
Migrate older inline= syntax to pure yml syntax for module args as to be consistant with most of the rest of the tasks
Cleanup some spacing in various files
Rename some files named yaml to yml for consistancy
in the etcd handler, the reload etcd action
was called after ansible waits for etcd to be
up, this means that the health checks which are
called immediately after fail (resulting in the etcd
role always failing and never finishing)
This patch changes the order to move the 'wait for etcd
up' resource after the 'reload etcd resource', ensuring that
the service is up before the health check is called.
etcd facts are generated in kubernetes/preinstall, so etcd nodes need
to be evaluated first before the rest of the deployment.
Moved several directory facts from kubernetes/node to
kubernetes/preinstall because they are not backward dependent.
Improved docker reload command to wait for etcd to be
up before proceeding. Switched reload to run restart
because it can't reload if it is not guaranteed to be
in running state.
* Enforce a etcd-proxy role to a k8s-cluster group members. This
provides an HA layout for all of the k8s cluster internal clients.
* Proxies to be run on each node in the group as a separate etcd
instances with a readwrite proxy mode and listen the given endpoint,
which is either the access_ip:2379 or the localhost:2379.
* A notion for the 'kube_etcd_multiaccess' is: ignore endpoints and
loadbalancers and use the etcd members IPs as a comma-separated
list. Otherwise, clients shall use the local endpoint provided by a
etcd-proxy instances on each etcd node. A Netwroking plugins always
use that access mode.
* Fix apiserver's etcd servers args to use the etcd_access_endpoint.
* Fix networking plugins flannel/calico to use the etcd_endpoint.
* Fix name env var for non masters to be set as well.
* Fix etcd_client_url was not used anywhere and other etcd_* facts
evaluation was duplicated in a few places.
* Define proxy modes only in the env file, if not a master. Del
an automatic proxy mode decisions for etcd nodes in init/unit scripts.
* Use Wants= instead of Requires= as "This is the recommended way to
hook start-up of one unit to the start-up of another unit"
* Make apiserver/calico Wants= etcd-proxy to keep it always up
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Co-authored-by: Matthew Mosesohn <mmosesohn@mirantis.com>
Running etcd in Docker reduces the number of individual file
downloads and services running on the host.
Note: etcd container v3.0.1 moves bindir to /usr/local/bin
Fixes: #298
fix etcd configuration for nodes
fix wrong calico checksums
using a var name etcd_bin_dir
fix etcd handlers for sysvinit
using a var name etcd_bin_dir
sysvinit script
review etcd configuration