Commit Graph

5194 Commits (3d1653f95084b5326e038aa32224ae128dad5c0f)

Author SHA1 Message Date
kyrie cc0c3d73dc
fix reset/main.yml lsattr command error when kubelet has symbolic link (#11074)
Signed-off-by: KubeKyrie <shaolong.qin@daocloud.io>
2024-04-14 19:55:05 -07:00
kyrie dd0f42171f
fix kubespray-defaults: Check for boostrap-os FQCN (#11073) 2024-04-14 18:21:11 -07:00
Barry M 1b870a1862
Update kubelet systemd service default allowed IP addresses for cluster hardening (#11061)
Signed-off-by: bmelbourne <barry.melbourne0@gmail.com>
2024-04-11 00:58:27 -07:00
J 8a423abd0f
Update Snapshot controller to v7.0.2 (#11041)
Upgrade Snapshot controller installed for all supported Kubernetes
versions to v7.0.2. Also update the manifests used to deploy the
Snapshot controller.
2024-04-10 20:38:08 -07:00
Barry M 3ec2e497c6
Update kubelet-csr-approver to v1.1.0 (#11070)
Signed-off-by: bmelbourne <barry.melbourne0@gmail.com>
2024-04-10 18:57:02 -07:00
Mathieu Parent 7844b8dbac
Promote nodelocaldns daemonset to system-node-critical (#11056)
As upstream
2024-04-09 19:48:01 -07:00
kyrie e87040d5ba
change debian8 network manage service from networking to systemd-networkd (#11058)
Signed-off-by: KubeKyrie <shaolong.qin@daocloud.io>
2024-04-09 06:50:39 -07:00
Sergey b2cce8d6dc
force update helm repo if exists on host (#11043) 2024-04-08 19:02:48 -07:00
Robert Volkmann 3067e565c0
Fix calico host local ipam (#11022)
* Prevent upgrade-ipam for host-local IPAM

Otherwise, the init container upgrade-ipam would clear the state of the host-local plugin, potentially causing it to reassign IPs that are still in use.

* USE_POD_CIDR required for host-local

4efd1bfd91/charts/calico/templates/calico-node.yaml (L279)
4efd1bfd91/charts/calico/templates/calico-typha.yaml (L133)
2024-04-03 00:52:31 -07:00
Nicolas Goudry c6fcbf6ee0
Remove access to cluster from anonymous users (#11016)
* feat: add user facing variable with default

* feat: remove rolebinding to anonymous users after init and upgrade

* feat: use file discovery for secondary control plane nodes

* feat: use file discovery for nodes

* fix: do not fail if rolebinding does not exist

* docs: add warning about kube_api_anonymous_auth

* style: improve readability of delegate_to parameter

* refactor: rename discovery kubeconfig file

* test: enable new variable in hardening and upgrade test cases

* docs: add option to config parameters

* test: multiple instances and upgrade
2024-04-02 23:54:12 -07:00
ERIK fdf5988ea8
revert crictl version (#11042)
Signed-off-by: bo.jiang <bo.jiang@daocloud.io>
2024-04-01 18:43:53 -07:00
Kay Yan a7d42824be
Merge pull request #11036 from mzaian/etcd-3512
[etcd] make etcd 3.5.12 default
2024-04-01 14:57:48 +08:00
peterw 9ef6678b7e
configure crio to use kube reserved cgroups (#11028) 2024-03-31 22:21:33 -07:00
Mohamed Omar Zaian 70a54451b1 [etcd] make etcd 3.5.12 default 2024-03-30 05:01:01 +01:00
Max Gautier c6758fe544
Cleanup of kubernetes/preinstall (#11010)
* Move fedora ansible python install to bootstrap-os

* /bin/dir is set in bootstrap-os

* Removing ansible_os_family workarounds

Support for these distributions was merged in Ansible, no need to
override it ourselves now.
https://github.com/ansible/ansible/pull/69324 openEuler
https://github.com/ansible/ansible/pull/77275/ UnionTech OS Server 20
https://github.com/ansible/ansible/pull/78232/ Kylin

* Don't unconditionnaly set VARIANT_ID=coreos in os-release

WTF, this is so wrong.
Furthermore, is_fedora_coreos is already handled in boostrap-os

* Handle Clearlinux generically

Followup of 4eec302e86 (since we're using
package module anyway, let's get rid of the custom task)
2024-03-28 15:17:52 -07:00
itayporezky 10315590c7
Change hard-coded URLs to use variables (#11031) 2024-03-27 20:44:25 -07:00
Mohamed Omar Zaian 03ac02afe4
[kubernetes] Add hashes for kubernetes 1.29.3, 1.28.8, 1.27.12 (#11035) 2024-03-27 12:30:27 -07:00
Arthur Outhenin-Chalandre fd83ec9d91
kubespray-defaults: regenerate checksums and bump various versions (#10999)
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
2024-03-27 06:02:53 -07:00
Max Gautier c58497cde9
Refactor bootstrap-os (#10983)
* Remove leftover files for Coreos

Coreos was replaced by flatcar in 058438a25 but the file was copied
instead of moved.

* Remove workarounds for resolved ansible issues

* boostrap: Use first_found to include per distro

Using directly ID and VARIANT_ID with first_found allow for less manual
includes.
Distro "families" are simply handled by symlinks.

* boostrap: don't set ansible_python_interpreter

- Allows users to override the chosen python_interpreter with group_vars
  easily (group_vars have lesser precedence than facts)
- Allows us to use vars at the task scope to use a virtual env

Ansible python discovery has improved, so those workarounds should not
be necessary anymore.
Special workaround for Flatcar, due to upstream ansible not willing to
support it.
2024-03-27 05:58:53 -07:00
kyrie baf4842774
make kube-vip LeaderElection variables configurable (#11021)
Signed-off-by: KubeKyrie <shaolong.qin@daocloud.io>
2024-03-25 02:24:57 -07:00
Tom M e7d29715b4
Add kubelet_cpu_manager_policy_options (#11023) 2024-03-22 12:21:39 -07:00
ERIK 30da721f82
fix: config hostname as string type in kubeadmConf rendering (#10997)
Signed-off-by: bo.jiang <bo.jiang@daocloud.io>
2024-03-22 03:54:25 -07:00
Gary Miguel a1cf8291a9
spelling: scrapper -> scraper (#11015) 2024-03-15 07:34:30 -07:00
Max Gautier 7f6ca804a1
Upgrade ansible-core to 2.16.4 (#10984)
* upgrade ansible version

Needed for with_first_found to work correctly:
https://github.com/ansible/ansible/issues/70772 fixed in 2.16

* Remove unused google cloud cloud_playbook

* Fix dpkg_selection on non-existing packages

Needed since ansible-core>2.16, see:
f10d11bcdc
2024-03-14 02:12:45 -07:00
Clement Phu eff331ad32
Upgrade Nerdctl version to 1.7.4 (#10968) 2024-03-11 13:35:07 -07:00
Max Gautier 71fa66c08d
Delete old leftover script (#10996) 2024-03-11 13:28:00 -07:00
Ricky Kwan 69bf6639f3
Fix typo in selector (#10994) 2024-03-11 03:07:37 -07:00
Noam c275b3db37
update checksum for crio 1.29.1 (#10952)
* update checksum for crio 1.29.1

* update crio bin's names

* crio_conmon for 1.29

* remove unrequired change
2024-03-11 02:56:35 -07:00
Mohamed Omar Zaian 66eaba3775
[calico] Add hashes and make v3.27.2 default (#10960) 2024-03-10 00:20:17 -08:00
Kay Yan 90b0151caf
support node feature discovery (#10861)
Signed-off-by: Kay Yan <kay.yan@daocloud.io>
2024-03-05 08:36:08 -08:00
Clement Phu 04e40f2e6f
Add configuration to create cilium CNI plugin file when cilium>=1.14.0 (#10966) 2024-03-02 20:56:06 -08:00
Clement Phu 7a9def547e
Upgrade Helm to v3.14.2 (#10967) 2024-02-27 18:10:19 -08:00
Ludovic Logiou 26034b296e
Bump cinder-csi version and switch container registry (#10894)
* Bump cinder-csi version and switch container registry

Signed-off-by: Ludovic Logiou <ludovic.logiou@gmail.com>

* Update roles/kubespray-defaults/defaults/main/download.yml

Co-authored-by: Mohamed Omar Zaian <mohamedzaian@gmail.com>

---------

Signed-off-by: Ludovic Logiou <ludovic.logiou@gmail.com>
Co-authored-by: Mohamed Omar Zaian <mohamedzaian@gmail.com>
2024-02-22 05:06:40 -08:00
Ricky Kwan 5d822ad8cb
Support overriding cni directory owner (#10929) 2024-02-19 02:58:11 -08:00
ABW a0d2bda742
feat/add default ingress-nginx service (#10925)
feat/add default ingress-nginx service

feat/add default ingress-nginx service

feat/add default ingress-nginx service
2024-02-19 02:47:36 -08:00
R. P. Taylor 9442f28c60
do not disable SELinux surreptitiously (#10920) 2024-02-17 20:17:40 -08:00
Max Gautier 65b0604db7
download: Remove deleted kubeadm config field (#10931) 2024-02-16 05:08:43 -08:00
Mohamed Omar Zaian 082ac10fbb
[kubernetes] Add hashes for kubernetes 1.29.2, 1.28.7, 1.27.11 (#10919) 2024-02-16 01:40:58 -08:00
Max Gautier bf42ccee4e
Fix ingress-nginx controller election (#10913)
Under the original code, leader election failed for ingress controllers
as a result of mismatch between election-id in the controller config,
and the resourceName in the relevant rule of role 'ingress-nginx'.
This appeared in the controller logs.

To fix the issue, a command-line option was added to container
execution (--election-id=...).

Now, the election-id agrees with the resourceName provided in
the role-ingress-nginx.yml file. A comment in that file was
changed to reflect the new logic.

Co-authored-by: Vasilis Samoladas <vsam@softnet.tuc.gr>
Co-authored-by: Mohamed Omar Zaian <mohamedzaian@gmail.com>
2024-02-12 02:58:45 -08:00
Kundan Kumar bfbb3f8d33
updated ingress controller version (#10868) 2024-02-12 01:11:03 -08:00
Max Gautier ffda3656d1
Enable containerd 'discard_unpacked_layers' by default (#10905)
* containerd: Remove redundant 'default' filters

* containerd: enable 'discard_unpacked_layers' by default

This should help with containerd disk usage
2024-02-09 06:33:16 -08:00
Max Gautier f5474ec6cc
Don't try to set permissions recursively on cache+staging directory (#10900)
This should avoid permissions problems when the user creating the
directory and the user creating the content are different (when
containers images are saved by root for instances, because the user
can't use the container runtime).
2024-02-09 06:04:28 -08:00
Max Gautier 4b0a134bc9
Only download kubeadm images where needed (#10899)
* Refactor of kubeadm images listing

Instead of setting multiples facts, we directly create the dict we need from
kubeadm output.

* Remove useless 'default' filters in roles/download

* Only download kubeadm images where needed
2024-02-08 02:14:45 -08:00
flxbwr ad565ad922
Fix waiting for MetalLB controller (#10858)
The current state waiting method is bad to implement.
When changing the deployment version, which is execute with the upgrade_cluster in the previous ansible task: "Kubernetes Apps | Install and configure MetalLB", next ansible task: "Kubernetes Apps | Wait for MetalLB controller to be running" may fall with an error.
2024-02-06 02:58:59 -08:00
Max Gautier 6f419aa18e
Revert "implement download mirrors support (#8474)" (#10884)
This reverts commit c6e5314fab.

There is no user of the download mirrors support in kubespray, for a
long time.
2024-02-06 00:48:29 -08:00
anders-elastisys c698790122
add nat_outgoing_ipv6 to calico defaults and docs (#10866) 2024-02-05 23:14:22 -08:00
Gianmarco Mameli 989ba207e9
task description modified (#10875) 2024-02-05 07:59:04 -08:00
Max Gautier f2bdd4bb2f
Fix logical error when checking for boostrap-os (#10867)
Also remove some clutter along the way.
2024-02-05 07:58:55 -08:00
Kay Yan c9a44e4089
make docker 24.0 default (#10873)
Signed-off-by: Kay Yan <kay.yan@daocloud.io>
2024-02-04 21:55:19 -08:00
kyrie 0dbde7536f
make containerd 1.7.12 default and upgrade runc to v1.1.11 (#10862)
Signed-off-by: KubeKyrie <shaolong.qin@daocloud.io>
2024-02-01 04:06:08 -08:00
Victor Login 8d53c1723c
bump coredns version to 1.11.1 (#10719)
* update version coredns 1.11.1

* Update roles/kubespray-defaults/defaults/main/download.yml

Co-authored-by: Mohamed Omar Zaian <mohamedzaian@gmail.com>

---------

Co-authored-by: Mohamed Omar Zaian <mohamedzaian@gmail.com>
2024-02-01 03:28:20 -08:00
Mohamed Omar Zaian dce68e6839
[feat] Update metrics server to v0.7.0 (#10856) 2024-01-31 05:13:26 -08:00
Takuya Murakami 785366c2de
[kubernetes] Support kubernetes 1.29 (#10820)
* [kubernetes] Make kubernetes 1.29.1 default

* [cri-o]: support cri-o 1.29

Use "crio status" instead of "crio-status" for cri-o >=1.29.0

* Remove GAed feature gates SecCompDefault

The SecCompDefault feature gate was removed since k8s 1.29
https://github.com/kubernetes/kubernetes/pull/121246
2024-01-31 00:57:23 -08:00
Saber 1d119f1a3c
Fixed grammar (#10853) 2024-01-29 17:46:58 -08:00
Ugur Can Ozturk 7863fde552
[apiserver-kubelet/tracing]: add distributed tracing config variables (#10795)
* [apiserver-kubelet/tracing]: add distributed tracing config flags

Signed-off-by: Ugur Ozturk <ugurozturk918@gmail.com>

* [apiserver-kubelet/tracing]: add distributed tracing config flags - fix

Signed-off-by: Ugur Ozturk <ugurozturk918@gmail.com>

* [apiserver-kubelet/tracing]: add distributed tracing config flags - fix

Signed-off-by: Ugur Ozturk <ugurozturk918@gmail.com>

---------

Signed-off-by: Ugur Ozturk <ugurozturk918@gmail.com>
2024-01-25 10:24:35 +01:00
kimsehwan96 758d34a7d1 Fix typo mistake in roles/kubernetes/control-plane/tasks/define-first-kube-control.yml
- Fix 'Set fact joined_control_panes' into 'Set fact joined_control_planes'
2024-01-24 13:39:39 +01:00
Max Gautier c80f2cd573
Allow the DNS stack to be backward compatible with an old dns_domain (#10630)
Handle all old dns domains:
- for nodelocaldns: in the same server block as the current dns_domain
- for coredns: uffix rewrite of each of the old dns domains to the
  current one
2024-01-24 06:31:22 +01:00
Maxime Leroy ab0163a3ad
fix(kubernetes): taint nodes with kubectl (#10705)
Signed-off-by: Maxime Leroy <19607336+maxime1907@users.noreply.github.com>
2024-01-23 15:46:13 +01:00
Daniel Strufe 2eb588bed9
Update external huawei cloud controller to 0.26.6 (#10824)
* Update huaweicloud controller to 0.26.6

See <https://github.com/kubernetes-sigs/cloud-provider-huaweicloud/compare/v0.26.3...v0.26.6>

* Update huaweicloud sample to use 0.26.6
2024-01-23 09:28:00 +01:00
Louis Tu a88bad7947
Add scheduler plugins support (#10747)
Signed-off-by: tu1h <lihai.tu@daocloud.io>
2024-01-23 07:42:33 +01:00
Max Gautier 89d42a7716
Fix coredns_dual usage (#10821) 2024-01-22 18:36:16 +01:00
yun 13e1f33898
Correct the POLY1305 cipher suites by adding the suffix _SHA256 (#10641) 2024-01-22 18:00:52 +01:00
Alexander de2c4429a4
Enable configuring mountOptions, reclaimPolicy and volumeBindingMode … (#10450)
* Enable configuring mountOptions, reclaimPolicy and volumeBindingMode for cinder-csi StorageClasses

* Check if class.mount_options is defined at all, before generating the option list
2024-01-22 18:00:34 +01:00
Max Gautier 22bb0976d5
Adjust kubelet_event_record_qps to K8S default (#10826)
Also remove redundant check in the kubelet config template (we define a
default, so the setting will always be "true")
2024-01-22 17:49:14 +01:00
my-git9 5a405336ae
Support following k8s version selection pause image (#10756)
Signed-off-by: xin.li <xin.li@daocloud.io>
2024-01-22 17:28:09 +01:00
Yuhao Zhang 0e971a37aa
Offline control plane recover (#10660)
* ignore_unreachable for etcd dir cleanup

ignore_errors ignores errors occur within "file" module. However, when
the target node is offline, the playbook will still fail at this task
with node "unreachable" state. Setting "ignore_unreachable: true" allows
the playbook to bypass offline nodes and move on to proceed recovery
tasks on remaining online nodes.

* Re-arrange control plane recovery runbook steps

* Remove suggestion to manually update IP addresses

The suggestion was added in 48a182844c 4
years ago. But a new task added 2 years ago, in
ee0f1e9d58, automatically update API
server arg with updated etcd node ip addresses. This suggestion is no
longer needed.
2024-01-22 17:22:27 +01:00
Noam 3e7b568d3e
crictl allow setting grace period for stop containers upon reset (#10651)
* crictl allow setting different grace period for stop containers and pods

* correct grace period location
2024-01-22 17:11:08 +01:00
kyrie a45a40a398
update kube-version-min-required to v1.27 (#10817) 2024-01-22 14:26:12 +01:00
Takuya Murakami 4cb1f529d1
[kubernetes] Add hashes for kubernetes 1.29.0 and 1.29.1 (#10778)
* Add hashes of crictl and crio
* Add versions of etcd, crictl, crio and csi-snapshotter
2024-01-22 09:39:15 +01:00
Mohamed Omar Zaian 64447e745e
[kubernetes] Make kubernetes v1.28.6 default (#10810) 2024-01-19 09:07:27 +01:00
Max Gautier b7a83531e7
etcd: update to v3.5.10 (#10798) 2024-01-17 09:50:48 +01:00
Kay Yan a0a2f40295
add containerd config override_path (#10776) 2024-01-16 14:15:53 +01:00
lobiyed.karim 7b7c9f509e
Add PodDisruptionBudget for CoreDNS deployment. Allows users to control disruption behavior and set maximum unavailable pods (#10557) 2024-01-16 10:04:47 +01:00
Louis Tu 3f78bf9298
Fix incorrect ciliumcli binary (#10575)
Signed-off-by: tu1h <lihai.tu@daocloud.io>
2024-01-16 05:23:00 +01:00
Gaëtan Trellu 50fbfa2a9a
Fix PyYAML package name on SLES and openSUSE (#10794) 2024-01-15 04:21:08 +01:00
Gaëtan Trellu 747d8bb4c2
Fix ntp installation on SLES and openSUSE (#10786) 2024-01-12 04:03:35 +01:00
Serge Hartmann bb67d9524d
Fix crio_version version comparison (#10780)
Signed-off-by: serge Hartmann <serge.hartmann@gmail.com>
2024-01-11 11:49:35 +01:00
Kay Yan 8c09c3fda2
fix image pull in insecure-registry (#10775) 2024-01-09 10:20:16 +01:00
Louis Tu a656b7ed9a
Add kube_vip_lb_fwdmethod option for kube-vip (#10762)
Signed-off-by: tu1h <lihai.tu@daocloud.io>
2024-01-09 08:22:13 +01:00
Kay Yan 2e8b72e278
fix disable swap in centos (#10751) 2024-01-08 17:38:14 +01:00
Louis Tu ddf5c6ee12
Update coredns rolling update strategy (#10748)
Signed-off-by: tu1h <lihai.tu@daocloud.io>
2024-01-08 17:38:05 +01:00
Ryan Lonergan eda7ea5695
feat: add support for Cilium 1.14 (#10684)
* update cilium configmap template for new routing mode and tunnel-protocol options
Ryan Lonergan ryan.tlonergan@gmail.com

* add rbac for new cilium crd in 1.14
Ryan Lonergan ryan.tlonergan@gmail.com

* add conditional for cni-install.sh that's no longer included in cilium 1.14
Ryan Lonergan ryan.tlonergan@gmail.com

* Update roles/network_plugin/cilium/templates/cilium/ds.yml.j2

Co-authored-by: Cyclinder <qifeng.guo@daocloud.io>

---------

Co-authored-by: Cyclinder <qifeng.guo@daocloud.io>
2024-01-08 02:43:02 +01:00
刘旭 08c0b34270
[cert-manager] upgrade to v1.13.2 (#10616) 2024-01-05 04:45:10 +01:00
Romain 1a86b4cb6d
Fix download retry when get_url has no status_code. (#10613)
* Fix download retry when get_url has no status_code.

* Fix until clause in download role.
2024-01-04 04:00:47 +01:00
Mohamed Omar Zaian aea150e5dc
[kubernetes] Make kubernetes v1.28.5 default (#10739)
* Add hashes for kubernetes 1.29.0, 1.28.5, 1.27.9, 1.26.12
2023-12-21 17:30:45 +01:00
Andrei Costescu c3b674526d
Fix modprobe module on Flatcar (#10678)
* Fix modprobe module on Flatcar

* Add todo about upstream issue report
2023-12-21 16:16:34 +01:00
Kay Yan 565eab901b
remove containerd registries (#10738) 2023-12-21 10:01:12 +01:00
Max Gautier c3315ac742
systemd-resolved: use a drop-in for kubespray dns (#10732)
This avoid needlessly overriding things and make cleanup easier.
Also simplifies the template a bit.
2023-12-21 09:52:14 +01:00
Olivier Levitt 29ea790c30
Fix calico-node in etcd mode (#10438)
* Calico : add ETCD endpoints to install-cni container

* Calico : remove nodename from configmap in etcd mode
2023-12-19 04:09:06 +01:00
Ugur Can Ozturk ae780e6a9b
[etcd]: add etcd distributed tracing flags (#10666)
* [etcd]: add etcd distributed tracing flags

Signed-off-by: Ugur Ozturk <ugurozturk918@gmail.com>

* [etcd]: add etcd distributed tracing flags - fix

Signed-off-by: Ugur Ozturk <ugurozturk918@gmail.com>

* [etcd]: add etcd distributed tracing flags - fix

Signed-off-by: Ugur Ozturk <ugurozturk918@gmail.com>

---------

Signed-off-by: Ugur Ozturk <ugurozturk918@gmail.com>
2023-12-19 04:00:10 +01:00
Max Gautier 471326f458
Remove PodSecurityPolicy support and references (#10723)
This is removed from kubernetes since 1.25, time to cut some dead code.
2023-12-18 14:13:43 +01:00
Michael Kebe d435edefc4
Removed DEPRECATED --logtostderr from metrics-server (#10709)
The --logtostderr is deprecated.

https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/2845-deprecate-klog-specific-flags-in-k8s-components
2023-12-14 22:49:28 +01:00
刘旭 eb73f1d27d
support disable dns autoscaler when use CoreDNS (#10608) 2023-12-14 10:03:34 +01:00
Mohamed Omar Zaian ccb742c7ab
[containerd] add hashes for versions 1.6.25-26 and 1.7.9-11 make v1.7.11 default (#10671) 2023-12-12 17:53:32 +01:00
jandres - moscardo cb848fa7cb
New PR default node selector (#10607) 2023-12-12 14:51:26 +01:00
Max Gautier 8abf49ae13
Disable podCIDR allocation from control-plane when using calico (#10639)
* Disable control plane allocating podCIDR for nodes when using calico

Calico does not use the .spec.podCIDR field for its IP address
management.
Furthermore, it can false positives from the kube controller manager if
kube_network_node_prefix and calico_pool_blocksize are unaligned, which
is the case with the default shipped by kubespray.

If the subnets obtained from using kube_network_node_prefix are bigger,
this would result at some point in the control plane thinking it does
not have subnets left for a new node, while calico will work without
problems.

Explicitely set a default value of false for calico_ipam_host_local to
facilitate its use in templates.

* Don't default to kube_network_node_prefix for calico_pool_blocksize

They have different semantics: kube_network_node_prefix is intended to
be the size of the subnet for all pods on a node, while there can be
more than on calico block of the specified size (they are allocated on
demand).

Besides, this commit does not actually change anything, because the
current code is buggy: we don't ever default to
kube_network_node_prefix, since the variable is defined in the role
defaults.
2023-12-12 14:38:36 +01:00
Max Gautier 81a3f81aa1
Revert "Update etcd-servers for apiserver (#8253)" (#10652)
This reverts commit ee0f1e9d58.

Avoid restarting all api servers at once by changing their config.
2023-12-12 11:22:38 +01:00
Max Gautier 0fb404c775
etcd: use dynamic group for certs generation check (#10610)
We take advantage of group_by to create the list of nodes needing new
certs, instead of manually looping inside a Jinja template.

This should make the role more readable and less susceptible to
white space problems.
2023-12-12 11:22:29 +01:00
Max Gautier 51069223f5
Decouple kubespray-defaults from download (#10626)
* Decouple role kubespray-defaults from download

Avoids doing re-importing the download role on every invocation of
kubespray-defaults (and skipping everything).

This has a measurable effect on playbook performance.

* Update docs refering to moved download defaults
2023-12-11 16:56:17 +01:00
David Leadbeater 17b51240c9
Remove legacy crio packaging cleanup (#10702)
This has now been removed and results in a 404 when trying to remove the
old key, even if it's not present.
2023-12-11 15:41:13 +01:00
piwinkler eb628efbc4
Update 0040-verify-settings.yml (#10699)
remove embedded template
2023-12-11 10:56:13 +01:00
Max Gautier 2c3ea84e6f
Use systemd for disabling swap when it's used (#10587)
* Mask systemd swap.target do disable swap

This is a more generic way to disable swap, since it pulls .swap units
in systemd distributions; fstab is only one way to generate .swap units.

* Unconditionally disable swap

We only care to disable it (the "swapon" registered variable is not used
anywhere else.
This allows to get rid of the ignore_errors, since this was added
because swapon.stdout does not exist in check_mode (see issue #6642).

* Don't explicitly disable swapOnZram

We're already masking the swap.target, which would pull the zram unit,
hence no need to handle zram-generator specifically.
2023-12-07 13:26:21 +01:00
Max Gautier 85f15900a4
Remove unneeded workaround for removing kubeadm DNS (#10695)
Kubeadm dns phase is correctly skipped.
This was a workaround for kubernetes/kubeadm#1557, which was actually
not a bug ; the correct fix was #4867
2023-12-07 12:54:15 +01:00
Mohamed Omar Zaian a9321aaf86
[calico] Add version 3.26.4 and make it default (#10669) 2023-12-06 03:05:33 +01:00
Kay Yan fe02d21d23
update nerdctl to v1.7.1 (#10685)
Signed-off-by: Kay Yan <kay.yan@daocloud.io>
2023-12-05 19:00:41 +01:00
Kay Yan 5160e7e20b
using ctr pull instead of nerdctl (#10687) 2023-12-05 16:00:55 +01:00
Alexander c440106eff
add dnsPolicy: ClusterFirstWithHostNet to DaemonSets with hostNetwork: true value to avoid DNSConfigFormat events (#10618) 2023-12-05 02:52:17 +01:00
Mohamed Omar Zaian 75fecf1542
Update nodelocaldns version (#10621) 2023-11-29 12:19:36 +01:00
Max Gautier 0d7bdc6cca
pre-upgrade cleanup (#10656)
* Clean up redondant defaulting

drain_{timeout,grace_period}_after_failure don't exist at this point, so
they always default.

* Remove useless facts

The drain_*_after_failure are never used
2023-11-28 22:49:56 +01:00
chansuke c87d70b04b [cert-manager] Upgrade to v1.12.6 2023-11-28 22:42:50 +01:00
Max Gautier 612cfdceb1
Check conntrack module presence instead of kernel version (#10662)
* Try both conntrack modules instead of checking kernel version

Depending on kernel distributor, the kernel version might not be a
correct indicator of the conntrack module use.
Instead, we check both (and use the first found).

* Use modproble.persistent rather than manual persistence
2023-11-28 18:31:02 +01:00
ERIK 70bb19dd23
fix copy etcdctl retries (#10634)
Signed-off-by: bo.jiang <bo.jiang@daocloud.io>
2023-11-28 10:52:03 +01:00
Max Gautier 94d3f65f09
ipaddr (deprecated alias) => ansible.utils.ipaddr (#10650) 2023-11-28 09:56:55 +01:00
Valerii Kretinin cf3ac625da
revert env section deletion (#10655) 2023-11-28 09:47:46 +01:00
Max Gautier c2e3071a33
kubespray-defaults: Check for boostrap-os FQDN (#10590)
When installed as an ansible collection, roles in
ansible_play_role_names will be designated by their FQDN (i.e
'kubernetes-sigs.kubespray.<role-name>).

It means we need to check for both when checking for roles in the play.
2023-11-28 09:23:46 +01:00
Max Gautier 21e8b96e22
Drop the drain check for kubectl > v1.10.0 (#10657)
Older versions are unsupported for a long time.
2023-11-28 03:14:51 +01:00
Samuel Liu 3acacc6150
add kube_apiserver_etcd_compaction_interval (#10644) 2023-11-27 05:37:33 +01:00
Mohamed Omar Zaian b321ca3e64
[kubernetes] Add hashes for kubernetes 1.28.4, 1.27.8, 1.26.11 (#10624) 2023-11-24 03:22:55 +01:00
AbhishekKr 6b1188e3dc
[fix] modprobe_nf_conntrack for new Linux Kernel, when using ipvs (#10625)
Signed-off-by: AbhishekKr <abhikumar163@gmail.com>
2023-11-20 09:48:06 +01:00
Max Gautier 0d4f57aa22
Validate systemd unit files (#10597)
* Validate systemd unit files

This ensure that we fail early if we have a bad systemd unit file
(syntax error, using a version not available in the local version, etc)

* Hack to check systemd version for service files validation

factory-reset.target was introduced in system 250, same version as the
aliasing feature we need for verifying systemd services with ansible.
So we only actually executes the validation if that target is present.

This is an horrible hack which should be reverted as soon as we drop
support for distributions with systemd<250.
2023-11-17 20:01:23 +01:00
刘旭 bc5b38a771
support CoreDNS use host network and config dns port (#10617) 2023-11-17 14:41:53 +01:00
Lukáš Kubín f46910eac3
Add helm support for custom_cni deployment (#10529)
* Add helm support for custom_cni deployment

* Linting correction

* Ansible linting correction

* Add test packet with values

Signed-off-by: Lukáš Kubín <lukas.kubin@gmail.com>

* Add custom_cni configuration file with comments

Signed-off-by: Lukáš Kubín <lukas.kubin@gmail.com>

* Default values cleanup

Signed-off-by: Lukáš Kubín <lukas.kubin@gmail.com>

* Add details to custom_cni configuration file

Signed-off-by: Lukáš Kubín <lukas.kubin@gmail.com>

* Set correct yaml type of helm values

Signed-off-by: Lukáš Kubín <lukas.kubin@gmail.com>

* Set CNI filesystem ownership to root

Signed-off-by: Lukáš Kubín <lukas.kubin@gmail.com>

* Update cilium example parameter name

Signed-off-by: Lukáš Kubín <lukas.kubin@gmail.com>

---------

Signed-off-by: Lukáš Kubín <lukas.kubin@gmail.com>
2023-11-16 00:32:21 +01:00
Khanh Ngo Van Kim adb8ff14b9
fix: invalid version check in containerd jinja-template config (#10620) 2023-11-15 16:06:42 +01:00
Arthur Outhenin-Chalandre 7ba85710ad
Update to ansible 2.15 (#10481)
* ansible: upgrade to version >= 2.15.5

Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>

* tests: update requirements

Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>

* contrib/openstack: fix wrong gitignore pattern

Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>

* tests: add missing tzdata requirement

Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>

* tests: remove some molecules tests

Those doesn't work in Ansible 2.15. Ansible can't load builtin now
apparently and these tests are not worth it.

Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>

---------

Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
2023-11-15 09:39:09 +01:00
Noam cbd3a83a06
add option to enable cdi for containerd (#10603) 2023-11-14 17:20:19 +01:00
Eeo Jun eb015c0362
configure cluster-name for hubble relay (#10614) 2023-11-13 19:22:40 +01:00
Patrick O'Brien 17681a7e31
fallback_ips: ignore unreachable hosts (#10601)
Sets ignore_unreachable: true to `Gather ansible_default_ipv4 from all hosts`
task from fallback_ips.yml

Without this scale.yml will fail if a single node in the cluster is down, which
for large clusters happens often.
2023-11-10 21:07:18 +01:00
Mohamed Omar Zaian cca7615456
Update checksums (#10606) 2023-11-09 16:43:04 +01:00
Samuel Mutel a4b15690b8
fix: Same nameservers for resolv.conf and dhcp (#10548) 2023-11-08 16:57:45 +01:00
Louis Tu 32743868c7
Add cri-o criu support (#10479)
Signed-off-by: tu1h <lihai.tu@daocloud.io>
2023-11-08 16:57:32 +01:00
yun 7d221be408
Remove crio package configuration (#10584)
* Remove crio package configuration

* Remove crio package config directly without loop
2023-11-08 16:29:42 +01:00
Denis 2d75077d4a
fix: (#10197)
Remove cri-o apt repo job has state present but need absent
Uninstall CRI-O packages job has undefined variable crio_packages
replaced by list of packages
2023-11-08 16:22:39 +01:00
borgiacis 802da0bcb0
Create variables for ipvs kernel modules (#10580)
* Create variables for ipvs kernel modules

* Corrected kubernetes role node task missing name

* Added changes as suggested during review by VannTen
2023-11-08 12:44:02 +01:00
Seal1998 6305dd39e9
Metallb --lb-class cmd arg to support multiple LoadBalancer implementations (#10550)
* metallb --lb-class cmd arg to support multiple load balancer implementations

* removed loadbalancer_class from metallb_config; metallb_loadbalancer_class in role defaults
2023-11-08 12:43:48 +01:00
Max Gautier b3f6d05131
Move control plane certs renewal "spread out" into the systemd timer (#10596)
* Use RandomizedDelaySec to spread out control certificates renewal plane

If the number of control plane node is superior to 6, using (index * 10
minutes) will fail (03:60:00 is not a valid timestamp).

Compared to just fixing the jinja expression (to use a modulo for
example), this should avoid having two control planes certificates
update node being triggered at the same time.

* Make k8s-certs-renew.timer Persistent

If the control plane happens to be offline during the scheduled
certificates renewal (node failure or anything like that), we still want
the renewal to happen.
2023-11-08 12:35:20 +01:00
Max Gautier 8ebeb88e57
Refactor "multi" handlers to use listen (#10542)
* containerd: refactor handlers to use 'listen'

* cri-dockerd: refactor handlers to use 'listen'

* cri-o: refactor handlers to use 'listen'

* docker: refactor handlers to use 'listen'

* etcd: refactor handlers to use 'listen'

* control-plane: refactor handlers to use 'listen'

* kubeadm: refactor handlers to use 'listen'

* node: refactor handlers to use 'listen'

* preinstall: refactor handlers to use 'listen'

* calico: refactor handlers to use 'listen'

* kube-router: refactor handlers to use 'listen'

* macvlan: refactor handlers to use 'listen'
2023-11-08 12:28:30 +01:00
Mohamed Omar Zaian f3332af3f2
[containerd] add hashes for version 1.7.8 (#10589) 2023-11-03 16:45:15 +01:00
Boris Barnier 870065517f
[kube-router] set version to 2.0.0 (#10503)
Signed-off-by: Boris Barnier <bozzo@users.noreply.github.com>
2023-11-02 11:19:57 +01:00
Mohamed Omar Zaian 267a8c6025
[ingress-nginx] upgrade to 1.9.4 (#10583) 2023-11-02 04:02:24 +01:00
Hedayat Vatankhah (هدایت) edff3f8afd
Set remove_default_searchdomains to false by default (#10554)
It was not 'false', which made some tasks (e.g. using systemd-resolved
template) to effectively remove default search domains; caused DNS loop
after rebooting the node/restarting cluster, so localdns service didn't
run correctly.
2023-11-01 03:33:57 +01:00
yun cdc8d17d0b
Check nameserver when dns is enable (#10561) 2023-11-01 03:07:06 +01:00
Max Gautier 8f0e553e11
etcd/backup: native ansible modules instead of shell (#10540)
This make native ansible features (dry-run, changed state) easier to
have, and should have a minimal performance impact, since it only runs
on the etcd members.
2023-10-30 20:05:28 +01:00
chansuke 5f9a7b9d49
[cert-manager] Upgrade to v1.12.5 (#10500) 2023-10-30 18:51:35 +01:00
qlijin af7bc17c9a
Spicify the runc path when we use the containerd container engine and change the bin_dir path. (#10154)
* Specify the runc path when we use the containerd container engine
and change the bin_dir path.

Signed-off-by: Jin Li <qlijin@gmail.com>

* Update roles/container-engine/containerd/templates/config.toml.j2

Co-authored-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>

---------

Signed-off-by: Jin Li <qlijin@gmail.com>
Co-authored-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
2023-10-30 17:54:31 +01:00
yun becb6267fb
Set default remove_default_searchdomains to false (#10533) 2023-10-30 17:37:52 +01:00
Max Gautier 34754ccb38
Use calico_pool_blocksize from cluster when existing (#10516)
The blockSize attribute from Calico IPPool resources cannot be changed
once set [1]. Consequently, we use the one currently defined when
configuring the existing IPPool, avoiding upgrade errors by trying to
change it.

In particular, this can be useful when calico_pool_blocksize default
changes in kubespray, which would otherwise force users to add an
explicit setting to their inventories.

[1]: https://docs.tigera.io/calico/latest/reference/resources/ippool#spec
2023-10-30 17:37:43 +01:00
Mohamed Omar Zaian 7a0030b145
Change default cri-o versions for Kubernetes 1.26 (#10565) 2023-10-30 17:23:32 +01:00
Louis Tu fa9e41047e
Add kubectl alias support (#10552)
Signed-off-by: tu1h <lihai.tu@daocloud.io>
2023-10-30 17:23:19 +01:00
Mohamed Omar Zaian f5f1f9478c
[argocd] update argocd to v2.8.4 (#10568) 2023-10-30 12:54:26 +01:00
Mohamed Omar Zaian 6a70f02662
[helm] upgrade to 3.13.1 (#10567) 2023-10-30 04:32:52 +01:00