* Improve control plane scale flow (#13)
* Added version 1.20.10 of K8s
* Setting first_kube_control_plane to a existing one
* Setting first_kube_control_plane to a existing one
* change first_kube_master for first_kube_control_plane
* Ansible-lint changes
kube-bench scan outputs warning related to Calico like:
* text: "Ensure that the Container Network Interface file
permissions are set to 644 or more restrictive (Manual)"
* text: "Ensure that the Container Network Interface file
ownership is set to root:root (Manual)"
This fixes these warnings.
When using Calico with:
- `calico_network_backend: vxlan`,
- `calico_ipip_mode: "Never"`,
- `calico_vxlan_mode: "Always"`,
the `FelixConfiguration` object has `ipipEnabled: true`, when it should be false:
This is caused by an error in the `| bool` conversion in the install task:
when `calico_ipip_mode` is `Never`,
`{{ calico_ipip_mode != 'Never' | bool }}` evaluates to `true`:
* Ansible: move to Ansible 3.4.0 which uses ansible-base 2.10.10
* Docs: add a note about ansible upgrade post 2.9.x
* CI: ensure ansible is removed before ansible 3.x is installed to avoid pip failures
* Ansible: use newer ansible-lint
* Fix ansible-lint 5.0.11 found issues
* syntax issues
* risky-file-permissions
* var-naming
* role-name
* molecule tests
* Mitogen: use 0.3.0rc1 which adds support for ansible 2.10+
* Pin ansible-base to 2.10.11 to get package fix on RHEL8
* Calico: align manifests with upstream
* allow enabling typha prometheus metrics
* Calico: enable eBPF support
* manage the kubernetes-services-endpoint configmap
* Calico: document the use of eBPF dataplane
* Calico: improve checks before deployment
* enforce disabling kube-proxy when using eBPF dataplane
* ensure calico_version is supported
* Calico: add v3.19.1 hashes
* enable liveness probe for calico-kube-controllers
3.19.1
* Calico: drop support for v3.16.x
* Calico: promote v3.18.3 as default
* add initial MetalLB docs
* metallb allow disabling the deployment of the metallb speaker
* calico>=3.18 allow using calico to advertise service loadbalancer IPs
* Document the use of MetalLB and Calico
* clean MetalLB docs
Since K8S 1.21, BoundServiceAccountTokenVolume feature gate is in beta stage, thus activated by default (anyone who follows CSI guidelines has enabled AllAlpha and faced the issue before 1.21).
With this feature, SA tokens are regenerated every hour.
As a consequence for Calico CNI, token in /etc/cni/net.d/calico-kubeconfig copied from /var/run/secrets/kubernetes.io/serviceaccount in install-cni initContainer expires after one hour and any pod creation fails due to unauthorization.
Calico pods need to be restarted so that /etc/cni/net.d/calico-kubeconfig is updated with the new SA token.
* rename ansible groups to use _ instead of -
k8s-cluster -> k8s_cluster
k8s-node -> k8s_node
calico-rr -> calico_rr
no-floating -> no_floating
Note: kube-node,k8s-cluster groups in upgrade CI
need clean-up after v2.16 is tagged
* ensure old groups are mapped to the new ones
* calico: drop support for version 3.15
* drop check for calico version >= 3.3, we are at 3.16 minimum now
* we moved to calico 3.16+ so we can default to /opt/cni/bin/install
* AlmaLinux: ansible>2.9.19 is needed to know about AlmaLinux
* AlmaLinux: identify as a centos derrivative
* AlmaLinux: add AlmaLinux to checks for CentOS
* Use ansible_os_family to compare family and not distribution
This PR is to move the cilium kvstore options to the configmap
rather than specifying them in the deployment as args. This
is not technically necessary but keeping all the options in
one place is probably not a bad idea.
Tested with cilium 1.9.5.
When attempting a fresh install without cilium_ipsec_enabled I ran
into the following error:
failed: [k8m01] (item={'name': 'cilium', 'file': 'cilium-secret.yml', 'type': 'secret', 'when': 'cilium_ipsec_enabled'}) =>
{"ansible_loop_var": "item", "changed": false, "item": {"file": "cilium-secret.yml", "name": "cilium", "type": "secret",
"when": "cilium_ipsec_enabled"},"msg": "AnsibleUndefinedVariable: 'cilium_ipsec_key' is undefined"}
Moving the when condition from the item level to the task level solved
the issue.
Starting with Cilium v1.9 the default ipam mode has changed to "Cluster
Scope". See:
https://docs.cilium.io/en/v1.9/concepts/networking/ipam/
With this ipam mode Cilium handles assigning subnets to nodes to use
for pod ip addresses. The default Kubespray deploy uses the Kube
Controller Manager for this (the --allocate-node-cidrs
kube-controller-manager flag is set). This makes the proper ipam mode
for kubespray using cilium v1.9+ "kubernetes".
Tested with Cilium 1.9.5.
This PR also mounts the cilium-config ConfigMap for this variable
to be read properly.
In the future we can probably remove the kvstore and kvstore-opt
Cilium Operator args since they can be in the ConfigMap. I will tackle
that after this merges.
When upgrading cilium from 1.8.8 to 1.9.5 I ran into the following
error:
level=error msg="Unable to update CRD" error="customresourcedefinitions.apiextensions.k8s.io
\"ciliumnodes.cilium.io\" is forbidden: User \"system:serviceaccount:kube-system:cilium-operator\"
cannot update resource \"customresourcedefinitions\" in API group \"apiextensions.k8s.io\" at the
cluster scope" name=CiliumNode/v2 subsys=k8s
The fix was to add the update verb to the clusterrole. I also added
create to match the clusterrole created by the cilium helm chart.
This replaces kube-master with kube_control_plane because of [1]:
The Kubernetes project is moving away from wording that is
considered offensive. A new working group WG Naming was created
to track this work, and the word "master" was declared as offensive.
A proposal was formalized for replacing the word "master" with
"control plane". This means it should be removed from source code,
documentation, and user-facing configuration from Kubernetes and
its sub-projects.
NOTE: The reason why this changes it to kube_control_plane not
kube-control-plane is for valid group names on ansible.
[1]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-cluster-lifecycle/kubeadm/2067-rename-master-label-taint/README.md#motivation
* Download Calico KDD CRDs
* Replace kustomize with lineinfile and use ansible assemble module
* Replace find+lineinfile by sed in shell module to avoid nested loop
* add condition on sed
* use block for kdd tasks + remove supernumerary kdd manifest apply in start "Start Calico resources"
On CentOS 8 they seem to be ignored by default, but better be extra safe
This also make it easy to exclude other network plugin interfaces
Signed-off-by: Etienne Champetier <e.champetier@ateme.com>
By default Ansible stat module compute checksum, list extended attributes and find mime type
To find all stat invocations that really use one of those:
git grep -F stat. | grep -vE 'stat.(islnk|exists|lnk_source|writeable)'
Signed-off-by: Etienne Champetier <e.champetier@ateme.com>
Previous check for presence of NM assumed "systemctl show
NetworkManager" would exit with a nonzero status code, which seems not
the case anymore with recent Flatcar Container Linux.
This new check also checks the activeness of network manager, as
`is-active` implies presence.
Signed-off-by Jorik Jonker <jorik@kippendief.biz>
TASK [network_plugin/calico : Calico | Configure calico network pool] **********
task path: /builds/kargo-ci/kubernetes-sigs-kubespray/roles/network_plugin/calico/tasks/install.yml:138
Friday 08 January 2021 17:10:12 +0000 (0:00:01.521) 0:11:36.885 ********
[WARNING]: The value {'kind': 'IPPool', 'apiVersion': 'projectcalico.org/v3',
'metadata': {'name': 'default-pool'}, 'spec': {'blockSize': 24, 'cidr':
'10.233.64.0/18', 'ipipMode': 'Always', 'vxlanMode': 'Never', 'natOutgoing':
True}} (type dict) in a string field was converted to "{'kind': 'IPPool',
'apiVersion': 'projectcalico.org/v3', 'metadata': {'name': 'default-pool'},
'spec': {'blockSize': 24, 'cidr': '10.233.64.0/18', 'ipipMode': 'Always',
'vxlanMode': 'Never', 'natOutgoing': True}}" (type string). If this does not
look like what you expect, quote the entire value to ensure it does not change.
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
If some settings were changed from the default but not commited into an inventory repo,
we risk breaking the cluster / cause downtime, so add some extra checks
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
We are currently setting the IP variable to hostIP,
Before https://github.com/projectcalico/node/pull/593 (not yet released)
Calico interpret that as hostIP/32
Using 'can-reach' we get the future behavior
This fixes vxlan and IPIP CrossSubnet modes
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* update files to handle multi-asn bgp peering conditions.
* put back in the serviceClusterIPs. Bad merge.
* remove extraneous environment var.
* update files as discussed with mirwan
* update titles.
* add not in.
* add a conditional for using bgp to advertise cluster ips.
Co-authored-by: marlow-h <mweston@habana.ai>
If crictl (and docker) binaries are deployed to the directories
that are not in standard PATH (e.g. /usr/local/bin), it is required
to specify full path to the binaries.
calico PODs are first started and then in a handler killed and
restarted for no reason, nothing has changed.
By using the existing variable 'calico_cni_config' (only defined when
calico has already started) the restart can be skipped.
* calico: add constant calico_min_version_required
and verify current deployed version against it.
* calico: remove upgrade support with data migration
The tool was used pre v3.0.0 and is no longer needed.
* calico: remove old version support from tasks
* calico: remove old ver support from policy ctrl
* calico: remove old ver support from node
* canal: remove old ver support
* remove unused calicoctl download checksums
calico_min_version_required is the oldest version that can be installed
Older versions can be removed.
* Add retries to update calico-rr data in etcd through calicoctl
* Update update-node yaml syntax
* Add comment to clarify ansible block loop
* Remove trailing space
* Added ability to set calico vxlan vni and port. defaults to calico's documented defaults.
* Check if calico_network_backend is defined prior to checking value
* Removed calico hidden defaults for vxlan port and vni
* Fixed FELIX_VXLANVNI typo
* Update CustomResourceDefinition for kubecontrollersconfigurations.crd.projectcalico.org to v1
* Align ClusterRole for kube-controllers with upstream (calico)
* Update the cilium svc proxy test to HA mode
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
* Fix cilium strict kube-proxy in HA
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
* Add a single global endpoint variable
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
* Add cilium docs about kube-proxy replacement
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
* Fix issues in docs
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
* Update calico_veth_mtu to FELIX_IPINIP variable
calico_veth_mtu is specified in the configuration, but since it only works for wireguard, modify it to work for IP-in-IP users.
* Update template with more cleaner expression
flannel, ovn and multus network plugins did not support all taint keys. This
update changes the tolerations to support them all.
According to the documentation:
```
There are two special cases: An empty key with operator Exists matches all keys,
values and effects which means this will tolerate everything. An empty effect matches
all effects with key key.
```
Usage of the empty `key` and `effect` ensures the network plugin daemonset will
be deployed on every nodes (ex: in case of custom taints, or NoExecute effect)
Since weave 2.5.1, `NoExecute` taint effect is no more supported,
this changes the daemonset tolerations to change this behavior.
Also remove the toleration key `CriticalAddonsOnly` not required anymore.
* added required permissions for querying endpointslice resources
* copy-pasted role permissions from cilium install manifests
* bumped cilium version to v1.7.2
* Support configuring the insert mode
Defaults to the upstream default https://docs.projectcalico.org/v3.9/reference/felix/configuration
so nothing should change for existing deployments.
This allows coexistence with other firewall management technologies.
* Add a note to the sample config
* Added in code to allow control over pull policy for local path provisioner
* change to imagePullPolicy to use globally used variable k8s_image_pull_policy
* removed unusued variable from defaults
* updated contiv-etcd and cinder-csi-controllerplugin to use k8s_image_pull_policy variable
Raises limit from 100 to 300 because the default is far too low
and the pod can handle 300 with the given resources.
Change-Id: Ib1eec10da3d09d198933fcfe87291587e58d7cdb
I've tested this update by deploying a containerd / etcd cluster on top CentOS7,
MetalLB + NGINX Ingress. Upgrade using upgrade-cluster.yml
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
Initially this was to fix a mis-indented approvers key. However, it turns
out that 'oilbeater' is not a member of kubernetes-sigs nor
kubernetes-incubator (the org this repo was migrated from). Thus this
OWNERS file is failing prow's validation check.
As a workaround I've opted to move them to emeritus_approver, which
isn't valiated and can be used as a hint for other approvers in this
repo