Commit Graph

337 Commits (316e5795433f9f080b73b1f155c4af3919a3078d)

Author SHA1 Message Date
Max Gautier 0f0e24be0f
etcd: throttle restart for availability (#11677)
* etcd: throttle restart for availability

During upgrade, etcd member are restarted all at once.
This can impact the availability of the etcd cluster and subsequently of
the Kubernetes cluster.

Limit the concurrent restart so that the etcd cluster can keep quorum.

* Simplify etcd handlers
2024-11-05 06:11:29 +00:00
Kubernetes Prow Robot 3f027abae6
Merge pull request #11598 from VannTen/cleanup/fact_gathering
Do not serialize fact gathering for no_proxy
2024-10-31 10:59:26 +00:00
Max Gautier b4768cfa91
Always copy cert generation scripts to first etcd (#11612)
If we don't, existing installation would not pick up fix to that script,
such as dc33a1971d.
2024-10-09 02:44:22 +01:00
Max Gautier 2826b357d4
Remove serialized collect of ansible_default_ipv4
The fallback_ips tasks are essentially serializing the gathering of one
fact on all the hosts, which can have dramatic performance implications
on large clusters (several minutes).

This is essentially a reversal of 35f248dff0
Being able to run without refreshing the cache facts is not worth it.

We keep fallback_ip for now, simply changing the access to a normal
hostvars variable instead of a custom dictionnary.
2024-10-04 14:19:20 +02:00
Max Gautier 2ec1c93897
Test group membership with group_names
Testing for group membership with group names makes Kubespray more
tolerant towards the structure of the inventory.
Where 'inventory_hostname in groups["some_group"] would fail if
"some_group" is not defined, '"some_group" in group_names' would not.
2024-09-21 14:09:09 +02:00
Bogdan Sass 4b324cb0f0
Rename master to control plane - non-breaking changes only (#11394)
K8s is moving away from the "master" terminology, so kubespray should follow the same naming conventions. See 65d886bb30/sig-architecture/naming/recommendations/001-master-control-plane.md
2024-09-06 07:56:19 +01:00
刘旭 3da6c4fc18
Allow for configuring etcd progress notify interval and default set to 5s (#11499) 2024-09-05 06:29:05 +01:00
Vlad Korolev 9a7b021eb8
Do not use ‘yes/no’ for boolean values (#11472)
Consistent boolean values in ansible playbooks
2024-08-28 06:30:56 +01:00
Lihai Tu 8208a3f04f
Rename systemd module to systemd_service (#11396)
Signed-off-by: tu1h <lihai.tu@daocloud.io>
2024-07-26 01:11:39 -07:00
Tom M. 242edd14ff
Fix etcd certificate to acces address as SAN (#11388) 2024-07-25 18:49:23 -07:00
Bas 8f5f75211f
Improving yamllint configuration (#11389)
Signed-off-by: Bas Meijer <bas.meijer@enexis.nl>
2024-07-25 18:42:20 -07:00
Max Gautier d50f61eae5
pre-commit: apply autofixes hooks and fix the rest manually
- markdownlint (manual fix)
- end-of-file-fixer
- requirements-txt-fixer
- trailing-whitespace
2024-05-28 13:26:44 +02:00
Ugur Can Ozturk a512b861e0
[etcd/tracing]: fix etcd sampling rate flag (#11175)
Signed-off-by: Ugur Ozturk <ugurozturk918@gmail.com>
2024-05-13 03:14:39 -07:00
yun 13e1f33898
Correct the POLY1305 cipher suites by adding the suffix _SHA256 (#10641) 2024-01-22 18:00:52 +01:00
Ugur Can Ozturk ae780e6a9b
[etcd]: add etcd distributed tracing flags (#10666)
* [etcd]: add etcd distributed tracing flags

Signed-off-by: Ugur Ozturk <ugurozturk918@gmail.com>

* [etcd]: add etcd distributed tracing flags - fix

Signed-off-by: Ugur Ozturk <ugurozturk918@gmail.com>

* [etcd]: add etcd distributed tracing flags - fix

Signed-off-by: Ugur Ozturk <ugurozturk918@gmail.com>

---------

Signed-off-by: Ugur Ozturk <ugurozturk918@gmail.com>
2023-12-19 04:00:10 +01:00
Max Gautier 0fb404c775
etcd: use dynamic group for certs generation check (#10610)
We take advantage of group_by to create the list of nodes needing new
certs, instead of manually looping inside a Jinja template.

This should make the role more readable and less susceptible to
white space problems.
2023-12-12 11:22:29 +01:00
Max Gautier 0d4f57aa22
Validate systemd unit files (#10597)
* Validate systemd unit files

This ensure that we fail early if we have a bad systemd unit file
(syntax error, using a version not available in the local version, etc)

* Hack to check systemd version for service files validation

factory-reset.target was introduced in system 250, same version as the
aliasing feature we need for verifying systemd services with ansible.
So we only actually executes the validation if that target is present.

This is an horrible hack which should be reverted as soon as we drop
support for distributions with systemd<250.
2023-11-17 20:01:23 +01:00
Max Gautier 8ebeb88e57
Refactor "multi" handlers to use listen (#10542)
* containerd: refactor handlers to use 'listen'

* cri-dockerd: refactor handlers to use 'listen'

* cri-o: refactor handlers to use 'listen'

* docker: refactor handlers to use 'listen'

* etcd: refactor handlers to use 'listen'

* control-plane: refactor handlers to use 'listen'

* kubeadm: refactor handlers to use 'listen'

* node: refactor handlers to use 'listen'

* preinstall: refactor handlers to use 'listen'

* calico: refactor handlers to use 'listen'

* kube-router: refactor handlers to use 'listen'

* macvlan: refactor handlers to use 'listen'
2023-11-08 12:28:30 +01:00
Max Gautier 8f0e553e11
etcd/backup: native ansible modules instead of shell (#10540)
This make native ansible features (dry-run, changed state) easier to
have, and should have a minimal performance impact, since it only runs
on the etcd members.
2023-10-30 20:05:28 +01:00
Max Gautier 0b2e5b2f82
Retries ssh connection for Gather node certs (#10515)
This allows this task to work with a forks count > 10 and the default
configuration of sshd, which is to limit sessions to 10. (see
MaxSessions in sshd_config).

Since this is a delegate_to task, it connects to the same host (first
etcd) for each node in the cluster, thus easily going above 10.

Raising the ssh connection attempts allow for more robustness, without
decreasing the forks count or serialising the tasks, which could slow
the task (or the playbook as a whole, if decreasing forks).
2023-10-19 05:04:29 +02:00
Samuel Liu e1881fae02
Install etcdutl file by default (#10385) 2023-08-23 07:04:22 -07:00
Francisco Orselli 7295d13d60
[EOS-11830] Use ETCD port 2381 for metrics (#10332) 2023-08-08 11:06:16 -07:00
Arthur Outhenin-Chalandre 36e5d742dc
Resolve ansible-lint name errors (#10253)
* project: fix ansible-lint name

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: ignore jinja template error in names

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: capitalize ansible name

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: update notify after name capitalization

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

---------

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>
2023-07-26 07:36:22 -07:00
yangsenzk 13aa32278a
bugfix: fix grep command without -w option causing prefix matched while adding one etcd member (#10291) 2023-07-13 21:43:29 -07:00
Arthur Outhenin-Chalandre 5d00b851ce
project: fix var-spacing ansible rule (#10266)
* project: fix var-spacing ansible rule

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: fix spacing on the beginning/end of jinja template

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: fix spacing of default filter

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: fix spacing between filter arguments

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: fix double space at beginning/end of jinja

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: fix remaining jinja[spacing] ansible-lint warning

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

---------

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>
2023-07-04 20:36:54 -07:00
Arthur Outhenin-Chalandre f8f197e26b
Fix outdated tag and experimental ansible-lint rules (#10254)
* project: fix outdated tag and experimental

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: remove no longer useful noqa 301

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: replace unnamed-task by name[missing]

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: fix daemon-reload -> daemon_reload

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

---------

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>
2023-06-30 02:51:57 -07:00
Arthur Outhenin-Chalandre 25cb90bc2d
Upgrade ansible (#10190)
* project: update all dependencies including ansible

Upgrade to ansible 7.x and ansible-core 2.14.x. There seems to be issue
with ansible 8/ansible-core 2.15 so we remain on those versions for now.
It's quite a big bump already anyway.

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* tests: install aws galaxy collection

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* ansible-lint: disable various rules after ansible upgrade

Temporarily disable a bunch of linting action following ansible upgrade.
Those should be taken care of separately.

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: resolve deprecated-module ansible-lint error

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: resolve no-free-form ansible-lint error

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: resolve schema[meta] ansible-lint error

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: resolve schema[playbook] ansible-lint error

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: resolve schema[tasks] ansible-lint error

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: resolve risky-file-permissions ansible-lint error

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: resolve risky-shell-pipe ansible-lint error

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: remove deprecated warn args

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: use fqcn for non builtin tasks

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: resolve syntax-check[missing-file] for contrib playbook

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: use arithmetic inside jinja to fix ansible 6 upgrade

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

---------

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>
2023-06-26 03:15:45 -07:00
Kenichi Omichi 7afbdb3e1e
Drop canal network_plugin (#10100)
According to the canal github[1] the repo is not maintained over 5 years.
In addition, the README says
```
  Originally, we thought we might more deeply integrate the two projects
  (possibly even going as far as a rebranding!). However, over time it
  became clear that that wasn't really necessary to fulfil our goal of
  making them work well together. Ultimately, we decided to focus on
  adding features to both projects rather than doing work just to
  combine them.
```
So it is difficult to support canal by Kubespray at this situation.

[1]: https://github.com/projectcalico/canal
2023-05-18 03:40:33 -07:00
Kei Kori dc33a1971d
[etcd] fix make-ssl-etcd.sh.j2; move pem files only if any new certs exist (#9974) 2023-04-12 21:52:35 -07:00
Karl Fischer 6278b12af6 fixed clinet to client 2023-02-20 10:09:03 +01:00
Bas 2c93c997cf
pre-commit autocorrected files (#9750) 2023-02-06 01:35:16 -08:00
ERIK 20d99886ca
Update etcd log-level parameter name (#9540)
Signed-off-by: bo.jiang <bo.jiang@daocloud.io>

Signed-off-by: bo.jiang <bo.jiang@daocloud.io>
2022-12-05 01:05:03 -08:00
Samuel Liu dd4bc5fbfe
[etcd] Sometimes, we do not need to run etcd role on all nodes. (#9173)
* WIP: sometimes,we not run etcd

* fix ansible lint

* like calico(kdd) cni, no need run etcd
2022-09-09 01:29:22 -07:00
ERIK 9ad2d24ad8
Add unsafe_show_logs switch (#9164)
Signed-off-by: bo.jiang <bo.jiang@daocloud.io>

Signed-off-by: bo.jiang <bo.jiang@daocloud.io>
2022-08-16 18:52:48 -07:00
emiran-orange 2b97b661d8
Move old etcd backup removal after etcd restart (#9147) 2022-08-05 08:09:59 -07:00
Kay Yan 1d0b3829ed
remove-etcd-unsupported-arch (#9049) 2022-07-04 05:39:24 -07:00
Alessio Greggi 97b4d79ed5
feat: make kubernetes owner parametrized (#8952)
* feat: make kubernetes owner parametrized

* docs: update hardening guide with configuration for CIS 1.1.19

* fix: set etcd data directory permissions to be compliant to CIS 1.1.12
2022-06-17 01:34:32 -07:00
Necatican Yıldırım dc1af5a9c5
[etcd] Add support for setting the request size limit (#8849)
* [etcd] Add extra documentation for `etcd_memory_limit` and `etcd_quota_backend_bytes`

Signed-off-by: necatican <necaticanyildirim@gmail.com>

* [etcd] Add support for setting ETCD_MAX_REQUEST_BYTES

Signed-off-by: necatican <necaticanyildirim@gmail.com>
2022-05-23 09:36:03 -07:00
Florian Ruynat 1c0df78278
Add ETCD_EXPERIMENTAL_INITIAL_CORRUPT_CHECK flag to etcd config (#8664) 2022-03-31 08:17:01 -07:00
Mac Chaffee 512d5e3348
Restart etcd if the etcd version changes (#8556)
Signed-off-by: Mac Chaffee <me@macchaffee.com>
2022-03-11 18:08:23 -08:00
Tom Janson ddef7e1139
missing "check_mode: no"s for several read-only tasks (#8584)
this is not complete -- there are almost certainly more instances of
this issue
2022-03-02 09:29:14 -08:00
Ilya Margolin e053ee4272
Check all places with `check_mode: no` for side effects (#8573)
and fix the one with side effect.

Also removes `notify` from this task as the task has `changed_when: false`
and notify is not going to fire.
2022-02-23 01:20:18 -08:00
zhengtianbao a16d427536
Set etcd-events listen port to 2383 (#8232) 2021-12-07 00:28:01 -08:00
Mathieu Parent 0263c649f4
Allow to scrape etcd metrics using a service (#8203)
Signed-off-by: Mathieu Parent <math.parent@gmail.com>
2021-11-17 23:53:01 -08:00
Florian Ruynat 9eacde212f
Fix quorum check when recovering broken etcd cluster (#8126) 2021-10-26 15:23:09 -07:00
Iago Santos 43958614e3
Fix kubespray flatcar ansible_os_family and ansible_distribution (#8029)
Closes https://github.com/kubernetes-sigs/kubespray/issues/8028

Signed-off-by: Iago Santos <iago.santos.pardo@adfinis.com>
2021-10-01 09:11:23 -07:00
Florian Ruynat 88c11b5946
Revert "etcd: enable v2 api only if needed (#8001)" (#8008)
This reverts commit c0e1211abe.
2021-09-23 10:43:14 -07:00
Max Gautier c0e1211abe
etcd: enable v2 api only if needed (#8001)
* etcd: enable v2 api only if needed

Only enable v2 API if we have a consumer (flannel)
This reduce the exposed surface for etcd.

* Fix bad group name
2021-09-22 12:36:32 -07:00
Cristian Calin 7516fe142f
Move to Ansible 3.4.0 (#7672)
* Ansible: move to Ansible 3.4.0 which uses ansible-base 2.10.10

* Docs: add a note about ansible upgrade post 2.9.x

* CI: ensure ansible is removed before ansible 3.x is installed to avoid pip failures

* Ansible: use newer ansible-lint

* Fix ansible-lint 5.0.11 found issues

* syntax issues
* risky-file-permissions
* var-naming
* role-name
* molecule tests

* Mitogen: use 0.3.0rc1 which adds support for ansible 2.10+

* Pin ansible-base to 2.10.11 to get package fix on RHEL8
2021-07-12 00:00:47 -07:00
Hari Hud f07e24db8f
Cleanup duplicate task in etcd role (#7598)
* Remove the duplicate task in etcd role

* Remove inessential delegate_to
2021-05-10 16:11:36 -07:00