When installed as an ansible collection, roles in
ansible_play_role_names will be designated by their FQDN (i.e
'kubernetes-sigs.kubespray.<role-name>).
It means we need to check for both when checking for roles in the play.
* Validate systemd unit files
This ensure that we fail early if we have a bad systemd unit file
(syntax error, using a version not available in the local version, etc)
* Hack to check systemd version for service files validation
factory-reset.target was introduced in system 250, same version as the
aliasing feature we need for verifying systemd services with ansible.
So we only actually executes the validation if that target is present.
This is an horrible hack which should be reverted as soon as we drop
support for distributions with systemd<250.
* ansible: upgrade to version >= 2.15.5
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
* tests: update requirements
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
* contrib/openstack: fix wrong gitignore pattern
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
* tests: add missing tzdata requirement
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
* tests: remove some molecules tests
Those doesn't work in Ansible 2.15. Ansible can't load builtin now
apparently and these tests are not worth it.
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
---------
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
Sets ignore_unreachable: true to `Gather ansible_default_ipv4 from all hosts`
task from fallback_ips.yml
Without this scale.yml will fail if a single node in the cluster is down, which
for large clusters happens often.
Remove cri-o apt repo job has state present but need absent
Uninstall CRI-O packages job has undefined variable crio_packages
replaced by list of packages
* metallb --lb-class cmd arg to support multiple load balancer implementations
* removed loadbalancer_class from metallb_config; metallb_loadbalancer_class in role defaults
* Use RandomizedDelaySec to spread out control certificates renewal plane
If the number of control plane node is superior to 6, using (index * 10
minutes) will fail (03:60:00 is not a valid timestamp).
Compared to just fixing the jinja expression (to use a modulo for
example), this should avoid having two control planes certificates
update node being triggered at the same time.
* Make k8s-certs-renew.timer Persistent
If the control plane happens to be offline during the scheduled
certificates renewal (node failure or anything like that), we still want
the renewal to happen.
* containerd: refactor handlers to use 'listen'
* cri-dockerd: refactor handlers to use 'listen'
* cri-o: refactor handlers to use 'listen'
* docker: refactor handlers to use 'listen'
* etcd: refactor handlers to use 'listen'
* control-plane: refactor handlers to use 'listen'
* kubeadm: refactor handlers to use 'listen'
* node: refactor handlers to use 'listen'
* preinstall: refactor handlers to use 'listen'
* calico: refactor handlers to use 'listen'
* kube-router: refactor handlers to use 'listen'
* macvlan: refactor handlers to use 'listen'
It was not 'false', which made some tasks (e.g. using systemd-resolved
template) to effectively remove default search domains; caused DNS loop
after rebooting the node/restarting cluster, so localdns service didn't
run correctly.
This make native ansible features (dry-run, changed state) easier to
have, and should have a minimal performance impact, since it only runs
on the etcd members.
* Specify the runc path when we use the containerd container engine
and change the bin_dir path.
Signed-off-by: Jin Li <qlijin@gmail.com>
* Update roles/container-engine/containerd/templates/config.toml.j2
Co-authored-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
---------
Signed-off-by: Jin Li <qlijin@gmail.com>
Co-authored-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
The blockSize attribute from Calico IPPool resources cannot be changed
once set [1]. Consequently, we use the one currently defined when
configuring the existing IPPool, avoiding upgrade errors by trying to
change it.
In particular, this can be useful when calico_pool_blocksize default
changes in kubespray, which would otherwise force users to add an
explicit setting to their inventories.
[1]: https://docs.tigera.io/calico/latest/reference/resources/ippool#spec
* modify variables.tf to accept AMI attributes via variables
* update README to guide users on utilizing variable-driven AMI configuration
* fix markdown lint error
This allows this task to work with a forks count > 10 and the default
configuration of sshd, which is to limit sessions to 10. (see
MaxSessions in sshd_config).
Since this is a delegate_to task, it connects to the same host (first
etcd) for each node in the cluster, thus easily going above 10.
Raising the ssh connection attempts allow for more robustness, without
decreasing the forks count or serialising the tasks, which could slow
the task (or the playbook as a whole, if decreasing forks).