* etcd: throttle restart for availability
During upgrade, etcd member are restarted all at once.
This can impact the availability of the etcd cluster and subsequently of
the Kubernetes cluster.
Limit the concurrent restart so that the etcd cluster can keep quorum.
* Simplify etcd handlers
* Feat: add external OCI cloud controller manager template & variable
Signed-off-by: tico88612 <17496418+tico88612@users.noreply.github.com>
* Feat: add external OCI cloud controller manager workflow
Signed-off-by: tico88612 <17496418+tico88612@users.noreply.github.com>
* Feat: migrate external OCI CCM config check from OCI cloud provider
Signed-off-by: tico88612 <17496418+tico88612@users.noreply.github.com>
* cloud_controller: oracle: simpler asserts
Make the asserts check for Oracle Cloud Infrastructure external cloud
controller more compact, and hence readable.
Allows to put them back in the main tasks for less back and forth when
reading the code.
---------
Signed-off-by: tico88612 <17496418+tico88612@users.noreply.github.com>
Co-authored-by: Max Gautier <mg@max.gautier.name>
* Feat: bump CoreDNS version to v1.11.3
Signed-off-by: ChengHao Yang <17496418+tico88612@users.noreply.github.com>
* Docs: update README.md CoreDNS version to v1.11.3
Signed-off-by: ChengHao Yang <17496418+tico88612@users.noreply.github.com>
---------
Signed-off-by: ChengHao Yang <17496418+tico88612@users.noreply.github.com>
Simplify registry mirror rendering in config.toml.
The map filter can extract the host list from mirrors so we can
just unique them and render them without needing to construct vars
for it.
For the registry mirror tls section, we can first extract mirrors
from the dict then filter on only the ones having skip_veridy defined
first and then filter on the ones having true (as the dict might not
have skip_verify defined and that would cause errors of undefined var).
This will speed up and simply the templating.
Signed-off-by: Seena Fallah <seenafallah@gmail.com>
The fallback_ips tasks are essentially serializing the gathering of one
fact on all the hosts, which can have dramatic performance implications
on large clusters (several minutes).
This is essentially a reversal of 35f248dff0
Being able to run without refreshing the cache facts is not worth it.
We keep fallback_ip for now, simply changing the access to a normal
hostvars variable instead of a custom dictionnary.
* Update cluster-role for cilium to prevent errors in agent startup
ciliumloadbalancerippools permissions exists in the cilium helm chart for version 1.13.0
https://github.com/cilium/cilium/blob/v1.13.0/install/kubernetes/cilium/templates/cilium-agent/clusterrole.yaml#L71
The agent also needs permissions to read/watch secrets for bgp auth secrets when using CiliumBGPPeeringPolicy with a secret.
* Remove list/watch permissions for secrets
* Remove secrets from list/watch permissions
The old repository for these has been deleted, leaving the previous
configuration not possible to deploy, and even currently running clusters
fail after a restart as the DeameonSet has ImagePullPolicy: Always. More
details can be found here: kubernetes-sigs/vsphere-csi-driver#3053
As of writing, only CSI driver versions 3.1.2 to 3.3.1 is available in
this registry. This "officially" supports Kubernetes 1.26 to 1.30. Since
older drivers are not available, I have removed some feature-gating for
those unavailable versions while I was at it. For the cloud provider,
the `latest` image is now missing, and only 1.28.0 to 1.31.0 are
available. I've set the latest of these as the new default.
I also updated the documented default versions, as they were all out of
date and not aligned with actual code defaults.