* Clean up redondant defaulting
drain_{timeout,grace_period}_after_failure don't exist at this point, so
they always default.
* Remove useless facts
The drain_*_after_failure are never used
* Try both conntrack modules instead of checking kernel version
Depending on kernel distributor, the kernel version might not be a
correct indicator of the conntrack module use.
Instead, we check both (and use the first found).
* Use modproble.persistent rather than manual persistence
When installed as an ansible collection, roles in
ansible_play_role_names will be designated by their FQDN (i.e
'kubernetes-sigs.kubespray.<role-name>).
It means we need to check for both when checking for roles in the play.
* Validate systemd unit files
This ensure that we fail early if we have a bad systemd unit file
(syntax error, using a version not available in the local version, etc)
* Hack to check systemd version for service files validation
factory-reset.target was introduced in system 250, same version as the
aliasing feature we need for verifying systemd services with ansible.
So we only actually executes the validation if that target is present.
This is an horrible hack which should be reverted as soon as we drop
support for distributions with systemd<250.
* ansible: upgrade to version >= 2.15.5
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
* tests: update requirements
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
* contrib/openstack: fix wrong gitignore pattern
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
* tests: add missing tzdata requirement
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
* tests: remove some molecules tests
Those doesn't work in Ansible 2.15. Ansible can't load builtin now
apparently and these tests are not worth it.
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
---------
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
Sets ignore_unreachable: true to `Gather ansible_default_ipv4 from all hosts`
task from fallback_ips.yml
Without this scale.yml will fail if a single node in the cluster is down, which
for large clusters happens often.
Remove cri-o apt repo job has state present but need absent
Uninstall CRI-O packages job has undefined variable crio_packages
replaced by list of packages
* metallb --lb-class cmd arg to support multiple load balancer implementations
* removed loadbalancer_class from metallb_config; metallb_loadbalancer_class in role defaults
* Use RandomizedDelaySec to spread out control certificates renewal plane
If the number of control plane node is superior to 6, using (index * 10
minutes) will fail (03:60:00 is not a valid timestamp).
Compared to just fixing the jinja expression (to use a modulo for
example), this should avoid having two control planes certificates
update node being triggered at the same time.
* Make k8s-certs-renew.timer Persistent
If the control plane happens to be offline during the scheduled
certificates renewal (node failure or anything like that), we still want
the renewal to happen.
* containerd: refactor handlers to use 'listen'
* cri-dockerd: refactor handlers to use 'listen'
* cri-o: refactor handlers to use 'listen'
* docker: refactor handlers to use 'listen'
* etcd: refactor handlers to use 'listen'
* control-plane: refactor handlers to use 'listen'
* kubeadm: refactor handlers to use 'listen'
* node: refactor handlers to use 'listen'
* preinstall: refactor handlers to use 'listen'
* calico: refactor handlers to use 'listen'
* kube-router: refactor handlers to use 'listen'
* macvlan: refactor handlers to use 'listen'
It was not 'false', which made some tasks (e.g. using systemd-resolved
template) to effectively remove default search domains; caused DNS loop
after rebooting the node/restarting cluster, so localdns service didn't
run correctly.
This make native ansible features (dry-run, changed state) easier to
have, and should have a minimal performance impact, since it only runs
on the etcd members.
* Specify the runc path when we use the containerd container engine
and change the bin_dir path.
Signed-off-by: Jin Li <qlijin@gmail.com>
* Update roles/container-engine/containerd/templates/config.toml.j2
Co-authored-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
---------
Signed-off-by: Jin Li <qlijin@gmail.com>
Co-authored-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
The blockSize attribute from Calico IPPool resources cannot be changed
once set [1]. Consequently, we use the one currently defined when
configuring the existing IPPool, avoiding upgrade errors by trying to
change it.
In particular, this can be useful when calico_pool_blocksize default
changes in kubespray, which would otherwise force users to add an
explicit setting to their inventories.
[1]: https://docs.tigera.io/calico/latest/reference/resources/ippool#spec
This allows this task to work with a forks count > 10 and the default
configuration of sshd, which is to limit sessions to 10. (see
MaxSessions in sshd_config).
Since this is a delegate_to task, it connects to the same host (first
etcd) for each node in the cluster, thus easily going above 10.
Raising the ssh connection attempts allow for more robustness, without
decreasing the forks count or serialising the tasks, which could slow
the task (or the playbook as a whole, if decreasing forks).
* Migrate node-role.kubernetes.io/master to node-role.kubernetes.io/control-plane
* Migrate node-role.kubernetes.io/master to node-role.kubernetes.io/control-plane
* Migrate node-role.kubernetes.io/master to node-role.kubernetes.io/control-plane
Refactor NRI (Node Resource Interface) activation in CRI-O and
containerd. Introduce a shared variable, nri_enabled, to streamline
the process. Currently, enabling NRI requires a separate update of
defaults for each container runtime independently, without any
verification of NRI support for the specific version of containerd
or CRI-O in use.
With this commit, the previous approach is replaced. Now, a single
variable, nri_enabled, handles this functionality. Also, this commit
separates the responsibility of verifying NRI supported versions of
containerd and CRI-O from cluster administrators, and leaves it to
Ansible.
Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>
* [containerd] Add Configuration option for Node Resource Interface
Node Resource Interface (NRI) is a common is a common framework for
plugging domain or vendor-specific custom logic into container
runtime like containerd. With this commit, we introduce the
containerd_disable_nri configuration flag, providing cluster
administrators the flexibility to opt in or out (defaulted to 'out')
of this feature in containerd. In line with containerd's default
configuration, NRI is disabled by default in this containerd role
defaults.
Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>
* [cri-o] Add configuration option for Node Resource Interface
Node Resource Interface (NRI) is a common is a common framework for
plugging domain or vendor-specific custom logic into container
runtimes like containerd/crio. With this commit, we introduce the
crio_enable_nri configuration flag, providing cluster
administrators the flexibility to opt in or out (defaulted to 'out')
of this feature in cri-o runtime. In line with crio's default
configuration, NRI is disabled by default in this cri-o role
defaults.
Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>
---------
Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>
when people run playbook with option `--tags=kubelet`, the kubelet config may changed, because some variables used in task populating `kubelet-config.yml` could be different with running task(`Fetch facts`)