* Calico: align manifests with upstream
* allow enabling typha prometheus metrics
* Calico: enable eBPF support
* manage the kubernetes-services-endpoint configmap
* Calico: document the use of eBPF dataplane
* Calico: improve checks before deployment
* enforce disabling kube-proxy when using eBPF dataplane
* ensure calico_version is supported
* rename ansible groups to use _ instead of -
k8s-cluster -> k8s_cluster
k8s-node -> k8s_node
calico-rr -> calico_rr
no-floating -> no_floating
Note: kube-node,k8s-cluster groups in upgrade CI
need clean-up after v2.16 is tagged
* ensure old groups are mapped to the new ones
* AlmaLinux: ansible>2.9.19 is needed to know about AlmaLinux
* AlmaLinux: identify as a centos derrivative
* AlmaLinux: add AlmaLinux to checks for CentOS
* Use ansible_os_family to compare family and not distribution
* Remove contrib/vault
This is marked as broken since 2018 / 3dcb914607
This still reference apiserver.pem, not used since ddffdb63bf
Signed-off-by: Etienne Champetier <e.champetier@ateme.com>
* Finish nuking vault from the codebase
Signed-off-by: Etienne Champetier <e.champetier@ateme.com>
This replaces kube-master with kube_control_plane because of [1]:
The Kubernetes project is moving away from wording that is
considered offensive. A new working group WG Naming was created
to track this work, and the word "master" was declared as offensive.
A proposal was formalized for replacing the word "master" with
"control plane". This means it should be removed from source code,
documentation, and user-facing configuration from Kubernetes and
its sub-projects.
NOTE: The reason why this changes it to kube_control_plane not
kube-control-plane is for valid group names on ansible.
[1]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-cluster-lifecycle/kubeadm/2067-rename-master-label-taint/README.md#motivation
On CentOS 8 they seem to be ignored by default, but better be extra safe
This also make it easy to exclude other network plugin interfaces
Signed-off-by: Etienne Champetier <e.champetier@ateme.com>
By default Ansible stat module compute checksum, list extended attributes and find mime type
To find all stat invocations that really use one of those:
git grep -F stat. | grep -vE 'stat.(islnk|exists|lnk_source|writeable)'
Signed-off-by: Etienne Champetier <e.champetier@ateme.com>
In some environments, it might not be possible to ping the IP address
of the nodes, e.g., because ICMP echo is blocked.
This commit allows kubespray to be configured to disable the ping
check, while performing all other checks.
If crictl (and docker) binaries are deployed to the directories
that are not in standard PATH (e.g. /usr/local/bin), it is required
to specify full path to the binaries.
The task outputs the following warning:
TASK [kubernetes/preinstall : Enable ip forwarding]
[WARNING]: The value 1 (type int) in a string field was converted
to u'1' (type string). If this does not look like what you expect,
quote the entire value to ensure it does not change.
* fix flake8 errors in Kubespray CI - tox-inventory-builder
* Invalidate CRI-O kubic repo's cache
Signed-off-by: Victor Morales <v.morales@samsung.com>
* add support to configure pkg install retries
and use in CI job tf-ovh_ubuntu18-calico (due to it failing often)
* Switch Calico, Cilium and MetalLB image repos to Quay.io
Co-authored-by: Victor Morales <v.morales@samsung.com>
Co-authored-by: Barry Melbourne <9964974+bmelbourne@users.noreply.github.com>
* Enable Kata Containers for CRI-O runtime
Kata Containers is an OCI runtime where containers are run inside
lightweight VMs. This runtime has been enabled for containerd runtime
thru the kata_containers_enabled variable. This change enables Kata
Containers to CRI-O container runtime.
Signed-off-by: Victor Morales <v.morales@samsung.com>
* Set appropiate conmon_cgroup when crio_cgroup_manager is 'cgroupfs'
* Set manage_ns_lifecycle=true when KataContainers is enabed
* Add preinstall check for katacontainers
Signed-off-by: Victor Morales <v.morales@samsung.com>
Co-authored-by: Pasquale Toscano <pasqualetoscano90@gmail.com>
When stopping at the check of "Stop if ip var does not match local ips"
the error message is like:
fatal: [single-k8s]: FAILED! => {
"assertion": "ip in ansible_all_ipv4_addresses",
"changed": false,
"evaluated_to": false,
"msg": "Assertion failed"
}
That doesn't contain actual IP addresses and it is difficult to understand
what was wrong. This adds the error message which contain actual IP addresses
to investigate the issue if happens.
* calico: add constant calico_min_version_required
and verify current deployed version against it.
* calico: remove upgrade support with data migration
The tool was used pre v3.0.0 and is no longer needed.
* calico: remove old version support from tasks
* calico: remove old ver support from policy ctrl
* calico: remove old ver support from node
* canal: remove old ver support
* remove unused calicoctl download checksums
calico_min_version_required is the oldest version that can be installed
Older versions can be removed.
* remove podman cni plugin
* configure networkamanger global dns
* allow installation of python3-libselinux by disabling update repo temporary
* remove ipv4 section because it is not a valid configuration
* etcd: etcd-events doesn't depend on etcd_cluster_setup
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* etcd: remove condition already present on include_tasks
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* etcd: fix scaling up
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* etcd: use *access_addresses, do not delegate to etcd[0]
We want to wait for the full cluster to be healthy,
so use all the cluster addresses
Also we should be able to run the playbook when etcd[0] is down
(not tested), so do not delegate to etcd[0]
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* etcd: use failed_when for health check
unhealthy cluster is expected on first run, so use failed_when
instead of ignore_errors to remove scary red messages
Also use run_once
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* kubernetes/preinstall: ensure ansible_fqdn is up to date after changing /etc/hosts
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* kubernetes/master: regenerate apiserver cert if needed
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
The variable is defined in `kubernetes/preinstall` role and used in several roles. Since `kubernetes/preinstall` is not always included when `ansible-playbook` is run with tag selectors (see #5734 for reason), they will fail, or individual roles must copy the same fact definitions (as in #3846). Moving the definition to the always-included `kubespray-defaults` role will resolve the dependency problem.
- This solves issue #5721 & #5713 (dupes)
- Provide a cleaner default usage pattern for the download role
around etcd that supports 'host' and 'docker' properly
- Extract the 'etcdctl' as a separate task install piece and reuse it where
appropriate
- Update the kubeadm-etcd task to reflect the above change
* fedora coreos support
- bootstrap and new fact for
* fedora coreos support
- fix bootstrap condition
* fedora coreos support
- allow customize packages for fedora coreos bootstrap
* fedora coreos support
- prevent install ptyhon3 and epel via dnf for fedora coreos
* fedora coreos support
- handle all ostree like os in same way
* fedora coreos support
- handle all ostree like os in same way for crio
* fedora coreos support
- add fcos documentations
* download file
* download containers
* fix push image to nodes
* pull if none image on host
* fix
* improve docker image tag checks.
do not pull already cached images
* rebase fix merge conflict
* add support download_run_once when upgrade and scale cluster
add some test with download_run_once
* set default values to temp flag for every download cycle
* add save,load abilty for containerd and crio when download_run_once=true
* return redefine image save/load command to set_docker_image_facts.yml
* move set command to set_container_facts
* ctr in containerd_bin_dir
* fix order of ctr image export arguments
* temporary disable download_run_once for containerd and crio
due https://github.com/containerd/containerd/issues/4075
* remove unused files
* fix strict yaml linter warning and errors
* refactor logical conditions to pull and cache container images
* remove comment due lint check
* document role
* remove image_load_on_localhost, because cached images are always loaded to docker on remote sites
* remove XXX from debug output
* Fix incorrect assertion comparison for kube_network_node_prefix
* Ignore assertion comparison for kube_network_node_prefix when using calico
* Adding more var docs description for kube_network_node_prefix
* Fixing trailing whitespaces
* Fix python3-libselinux installation for RHEL/CentOS 8
In bootstrap-centos.yml we haven't gathered the facts,
so #5127 couldn't work
Minimum ansible version to run kubespray is 2.7.8,
so ansible_distribution_major_version is defined an there is no need to default it
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* Restart NetworkManager for RHEL/CentOS 8
network.service doesn't exist anymore
# systemctl status network
Unit network.service could not be found.
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* Add module_hotfixes=True to docker / containerd yum repo config
https://bugzilla.redhat.com/show_bug.cgi?id=1734081https://bugzilla.redhat.com/show_bug.cgi?id=1756473
Without this setting you end up with the following error:
# yum install docker-ce
Failed to set locale, defaulting to C
Last metadata expiration check: 0:03:21 ago on Thu Sep 26 22:00:05 2019.
Error:
Problem: package docker-ce-3:19.03.2-3.el7.x86_64 requires containerd.io >= 1.2.2-3, but none of the providers can be installed
- cannot install the best candidate for the job
- package containerd.io-1.2.2-3.3.el7.x86_64 is excluded
- package containerd.io-1.2.2-3.el7.x86_64 is excluded
- package containerd.io-1.2.4-3.1.el7.x86_64 is excluded
- package containerd.io-1.2.5-3.1.el7.x86_64 is excluded
- package containerd.io-1.2.6-3.3.el7.x86_64 is excluded
(try to add '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages)
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* Use python3-libselinux on RHEL8/Centos8
* The fact ansible_facts.distribution_major_version is not present on older Ansible version.
Default it to 0 in when not present and use libselinux-python as package to get current
default behaviour.
* Refactor calico-rr to run in k8s cluster with taint
Change-Id: I75a3169ff5b36ce8302fc7ef1c32d3eb697b5afa
* add preinstall checks
* rework calico/rr role
Change-Id: I2f0a7e6cb77cf91ad4a615923680760d2e5d9ca8
* add empty calico-rr group
Change-Id: I006c0a60db9b72d02245bf8fdfabcf982144a5ad
* add macvlan cni to kubespray
* macvlan: lint yaml files and fix sample config file
* macvlan: add OWNERS file
* add macvlan to README
* macvlan : CI first shoot
* macvlan : CI add full masquerade
* delegate retrive pod cidr to master only
* macvlan: add config for CI
* macvlan: add netchecker deployment
* Require minimum version of Kubernetes
* Remove checksums for kubernetes version 1.12
* Add kube_version to precheck output and add min required version to README
* Fix merge
* Fix defaults
* Fix typo in precheck
* Make local volume provisioner dir mode a variable
I need to change this for Nagios monitoring. Others may
need to as well. Had to close previous commits, sorry for
the spam.
* Make local volume provisioner dir mode a variable
I need to change this for Nagios monitoring. Others may
need to as well. Had to close previous commits, sorry for
the spam.
* Add support for arm images for hyperkube, kubeadm and cni_binary
* Add dummy etcd checksum for arm
This commit adds dummy etcd checksum for arm to avoid "no attribute" error
during setup.
* Add etcd host assert check
* Add 1.13.4 checksums of kubeadm and hyperkube for arm
* Update checksums of kubeadm and hyperkube for arm
* Add dummy checksums for calicoctl_binary_checksums dict
* disable gather_facts because it causes tests to fail
* Remove architecture check for etcd, due to unable to run tests
* Use K8s 1.14 and add kubeadm experimental control plane mode
This reverts commit d39c273d96.
* Cleanup kubeadm setup run on first master
* pin kubeadm_certificate_key in test
* Remove kubelet autolabel of kube-node, add symlink for pki dir
Change-Id: Id5e74dd667c60675dbfe4193b0bc9fb44380e1ca
Both kubedns and dnsmasq modes are long not maintained.
We should run dns_late steps at the end because sshd
makes DNS lookups during Ansible run and has 2s timeouts
for each failed lookup trying to connect to coredns before
it is ready.
This PR ensures that the e2fsprogs and xfsprogs packages are
installed on all Kubernetes nodes and that the packages are
the latest versions. It also ensures that the nodes can
create XFS filesystems when necessary, since not all distros
install xfsprogs by default.
e2fsprogs - ext2/ext3/ext4 file system utilities
xfsprogs - Utilities for managing the XFS filesystem
- Fixed an issue where storage class host directories were looped
through excessive target hosts
- Fixes examples in the LVP `README.md` to use nested dicts instead of a
list of dicts
* Makes local volume provisioner more dynamic
* Correct variable name in local storage provisioner defaults
* Updates external-provisioner readme
* Updates variable naming to be more clear, more documentation, fixes sample inventory
* Variable refactor, untangled some jinja2 loops
* Corrects variable name
* No variable substitution in dict keys, replaced with anchor
* Fixes default storage_classes dict, inline docs
* Fixes spelling in inline docs
* Addresses comments in review
* Updates all the defaults
* Fix failing CI task
* Fixes external provisioner daemonset
* Remove non-kubeadm deployment
* More cleanup
* More cleanup
* More cleanup
* More cleanup
* Fix gitlab
* Try stop gce first before absent to make the delete process work
* More cleanup
* Fix bug with checking if kubeadm has already run
* Fix bug with checking if kubeadm has already run
* More fixes
* Fix test
* fix
* Fix gitlab checkout untill kubespray 2.8 is on quay
* Fixed
* Add upgrade path from non-kubeadm to kubeadm. Revert ssl path
* Readd secret checking
* Do gitlab checks from v2.7.0 test upgrade path to 2.8.0
* fix typo
* Fix CI jobs to kubeadm again. Fix broken hyperkube path
* Fix gitlab
* Fix rotate tokens
* More fixes
* More fixes
* Fix tokens
When using resolvconf_mode host_resolvconf, there is an early DNS
config stage where Kubernetes cluster DNS is not injected for host
DNS intially. Later, the cluster DNS is enabled, but we do not
need to run every task from the kubernetes/preinstall role.
* warning on meta flush_handlers
* avoid rm
* avoid "Module remote_tmp /root/.ansible/tmp did not exist and was created with a mode of 0700, this may cause issues when running as another user. To avoid this, create the remote_tmp dir with the correct permissions manually" warning on subsequent tasks using blockinfile
* is match
* failed
* version_compare
* succeeded
* skipped
* success
* version_compare becomes version since ansible 2.5
* ansible minimal version updated in doc and spec
* last version_compare
* [jjo] add kube-router support
Fixescloudnativelabs/kube-router#147.
* add kube-router as another network_plugin choice
* support most used kube-router flags via
`kube_router_foo` vars as other plugins
* implement replacing kube-proxy (--run-service-proxy=true) via
`kube_proxy_mode: none`, verified in a _non kubeadm_enabled_
install, should also work for recent kubeadm releases via
`skipKubeProxyInstall: true` config
* [jjo] address PR#3339 review from @woopstar
* add busybox image used by kube-router to downloads
* fix busybox download groups key
* rework kubeadm_enabled + kube_router_run_service_proxy
- verify it working ok w/the kubeadm_enabled and
kube_router_run_service_proxy true or false
- introduce `kube_proxy_remove` fact, to decouple logic
from kube_proxy_mode (which affects kubeadm configmap
settings, thus no-good to ab-use it to 'none')
* improve kube-router.md re: kubeadm_enabled and kube_router_run_service_proxy
* address @woopstar latest review
* add inventory/sample/group_vars/k8s-cluster/k8s-net-kube-router.yml
* fix kube_router_run_service_proxy conditional for kube-proxy removal
* fix kube_proxy_remove fact (w/ |bool), add some needed kube-proxy tags on my and existing changes
* update kube-router tolerations for 1.12 compatibility
* add PriorityClass to kube-router DaemonSet
The hosts(5) manpage clearly states that the first entry is the
"canonical name", or FQDN (Fully-Qualified Domain Name):
IP_address canonical_hostname [aliases...]
By using the alias as a first entry, `hostname -f` does not return the
correct domain which breaks all sorts of unrelated functionality (it
has impact over email server configuration, for example).
* [jjo] add DIND support to contrib/
- add contrib/dind with ansible playbook to
create "node" containers, and setup them to mimic
host nodes as much as possible (using Ubuntu images),
see contrib/dind/README.md
- nodes' /etc/hosts editing via `blockinfile` and
`lineinfile` need `unsafe_writes: yes` because /etc/hosts
are mounted by docker, and thus can't be handled atomically
(modify copy + rename)
* dind-host role: set node container hostname on creation
* add "Resulting deployment" section with some CLI outputs
* typo
* selectable node_distro: debian, ubuntu
* some fixes for node_distro: ubuntu
* cpu optimization: add early `pkill -STOP agetty`
* typo
* add centos dind support ;)
* add kubespray-dind.yaml, support fedora
- add kubespray-dind.yaml (former custom.yaml at README.md)
- rework README.md as per above
- use some YAML power to share distros' commonality
- add fedora support
* create unique /etc/machine-id and other updates
- create unique /etc/machine-id in each docker node,
used as seed for e.g. weave mac addresses
- with above, now netchecker 100% passes WoHooOO!
🎉🎉🎉
- updated README.md output from (1.12.1, verified
netcheck)
* minor typos
* fix centos node creation, needs earlier udevadm removal to avoid flaky facts, also verified netcheck Ok \o/
* add Q&D test-distros.sh, back to manual /etc/machine-id hack
* run-test-distros.sh cosmetics and minor fixes
* run-test-distros.sh: $rc fix and minor formatting changes
* run-test-distros.sh output cosmetics
- Local Volume StorageClass configuration is now manged by `local_volume_provisioner_storage_classes`, a list of maps that specifies local storage classes with `name` `host_dir` and `mount_dir` keys per entry
- Tasks and templates updated to loop through local volume storage classes
- Previous defaults for path/class names were not changed
- Fixed an issue where a `kubernetes/preinstall` was creating directories inconsistently with the `kubernetes-apps/external_provisioner/local_volume_provisioner` task
* calico upgrade to v3
* update calico_rr version
* add missing file
* change contents of main.yml as it was left old version
* enable network policy by default
* remove unneeded task
* Fix kubelet calico settings
* fix when statement
* switch back to node-kubeconfig.yaml