Commit Graph

198 Commits (21289db1811acbe8735d10c23e4cd3b84e0d364d)

Author SHA1 Message Date
Max Gautier 0fb404c775
etcd: use dynamic group for certs generation check (#10610)
We take advantage of group_by to create the list of nodes needing new
certs, instead of manually looping inside a Jinja template.

This should make the role more readable and less susceptible to
white space problems.
2023-12-12 11:22:29 +01:00
Max Gautier 0d4f57aa22
Validate systemd unit files (#10597)
* Validate systemd unit files

This ensure that we fail early if we have a bad systemd unit file
(syntax error, using a version not available in the local version, etc)

* Hack to check systemd version for service files validation

factory-reset.target was introduced in system 250, same version as the
aliasing feature we need for verifying systemd services with ansible.
So we only actually executes the validation if that target is present.

This is an horrible hack which should be reverted as soon as we drop
support for distributions with systemd<250.
2023-11-17 20:01:23 +01:00
Max Gautier 0b2e5b2f82
Retries ssh connection for Gather node certs (#10515)
This allows this task to work with a forks count > 10 and the default
configuration of sshd, which is to limit sessions to 10. (see
MaxSessions in sshd_config).

Since this is a delegate_to task, it connects to the same host (first
etcd) for each node in the cluster, thus easily going above 10.

Raising the ssh connection attempts allow for more robustness, without
decreasing the forks count or serialising the tasks, which could slow
the task (or the playbook as a whole, if decreasing forks).
2023-10-19 05:04:29 +02:00
Samuel Liu e1881fae02
Install etcdutl file by default (#10385) 2023-08-23 07:04:22 -07:00
Arthur Outhenin-Chalandre 36e5d742dc
Resolve ansible-lint name errors (#10253)
* project: fix ansible-lint name

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: ignore jinja template error in names

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: capitalize ansible name

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: update notify after name capitalization

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

---------

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>
2023-07-26 07:36:22 -07:00
yangsenzk 13aa32278a
bugfix: fix grep command without -w option causing prefix matched while adding one etcd member (#10291) 2023-07-13 21:43:29 -07:00
Arthur Outhenin-Chalandre 5d00b851ce
project: fix var-spacing ansible rule (#10266)
* project: fix var-spacing ansible rule

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: fix spacing on the beginning/end of jinja template

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: fix spacing of default filter

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: fix spacing between filter arguments

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: fix double space at beginning/end of jinja

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: fix remaining jinja[spacing] ansible-lint warning

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

---------

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>
2023-07-04 20:36:54 -07:00
Arthur Outhenin-Chalandre f8f197e26b
Fix outdated tag and experimental ansible-lint rules (#10254)
* project: fix outdated tag and experimental

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: remove no longer useful noqa 301

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: replace unnamed-task by name[missing]

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: fix daemon-reload -> daemon_reload

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

---------

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>
2023-06-30 02:51:57 -07:00
Arthur Outhenin-Chalandre 25cb90bc2d
Upgrade ansible (#10190)
* project: update all dependencies including ansible

Upgrade to ansible 7.x and ansible-core 2.14.x. There seems to be issue
with ansible 8/ansible-core 2.15 so we remain on those versions for now.
It's quite a big bump already anyway.

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* tests: install aws galaxy collection

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* ansible-lint: disable various rules after ansible upgrade

Temporarily disable a bunch of linting action following ansible upgrade.
Those should be taken care of separately.

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: resolve deprecated-module ansible-lint error

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: resolve no-free-form ansible-lint error

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: resolve schema[meta] ansible-lint error

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: resolve schema[playbook] ansible-lint error

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: resolve schema[tasks] ansible-lint error

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: resolve risky-file-permissions ansible-lint error

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: resolve risky-shell-pipe ansible-lint error

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: remove deprecated warn args

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: use fqcn for non builtin tasks

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: resolve syntax-check[missing-file] for contrib playbook

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: use arithmetic inside jinja to fix ansible 6 upgrade

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

---------

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>
2023-06-26 03:15:45 -07:00
Kenichi Omichi 7afbdb3e1e
Drop canal network_plugin (#10100)
According to the canal github[1] the repo is not maintained over 5 years.
In addition, the README says
```
  Originally, we thought we might more deeply integrate the two projects
  (possibly even going as far as a rebranding!). However, over time it
  became clear that that wasn't really necessary to fulfil our goal of
  making them work well together. Ultimately, we decided to focus on
  adding features to both projects rather than doing work just to
  combine them.
```
So it is difficult to support canal by Kubespray at this situation.

[1]: https://github.com/projectcalico/canal
2023-05-18 03:40:33 -07:00
Karl Fischer 6278b12af6 fixed clinet to client 2023-02-20 10:09:03 +01:00
Samuel Liu dd4bc5fbfe
[etcd] Sometimes, we do not need to run etcd role on all nodes. (#9173)
* WIP: sometimes,we not run etcd

* fix ansible lint

* like calico(kdd) cni, no need run etcd
2022-09-09 01:29:22 -07:00
ERIK 9ad2d24ad8
Add unsafe_show_logs switch (#9164)
Signed-off-by: bo.jiang <bo.jiang@daocloud.io>

Signed-off-by: bo.jiang <bo.jiang@daocloud.io>
2022-08-16 18:52:48 -07:00
Alessio Greggi 97b4d79ed5
feat: make kubernetes owner parametrized (#8952)
* feat: make kubernetes owner parametrized

* docs: update hardening guide with configuration for CIS 1.1.19

* fix: set etcd data directory permissions to be compliant to CIS 1.1.12
2022-06-17 01:34:32 -07:00
Mac Chaffee 512d5e3348
Restart etcd if the etcd version changes (#8556)
Signed-off-by: Mac Chaffee <me@macchaffee.com>
2022-03-11 18:08:23 -08:00
Tom Janson ddef7e1139
missing "check_mode: no"s for several read-only tasks (#8584)
this is not complete -- there are almost certainly more instances of
this issue
2022-03-02 09:29:14 -08:00
Ilya Margolin e053ee4272
Check all places with `check_mode: no` for side effects (#8573)
and fix the one with side effect.

Also removes `notify` from this task as the task has `changed_when: false`
and notify is not going to fire.
2022-02-23 01:20:18 -08:00
Florian Ruynat 9eacde212f
Fix quorum check when recovering broken etcd cluster (#8126) 2021-10-26 15:23:09 -07:00
Iago Santos 43958614e3
Fix kubespray flatcar ansible_os_family and ansible_distribution (#8029)
Closes https://github.com/kubernetes-sigs/kubespray/issues/8028

Signed-off-by: Iago Santos <iago.santos.pardo@adfinis.com>
2021-10-01 09:11:23 -07:00
Cristian Calin 7516fe142f
Move to Ansible 3.4.0 (#7672)
* Ansible: move to Ansible 3.4.0 which uses ansible-base 2.10.10

* Docs: add a note about ansible upgrade post 2.9.x

* CI: ensure ansible is removed before ansible 3.x is installed to avoid pip failures

* Ansible: use newer ansible-lint

* Fix ansible-lint 5.0.11 found issues

* syntax issues
* risky-file-permissions
* var-naming
* role-name
* molecule tests

* Mitogen: use 0.3.0rc1 which adds support for ansible 2.10+

* Pin ansible-base to 2.10.11 to get package fix on RHEL8
2021-07-12 00:00:47 -07:00
Hari Hud f07e24db8f
Cleanup duplicate task in etcd role (#7598)
* Remove the duplicate task in etcd role

* Remove inessential delegate_to
2021-05-10 16:11:36 -07:00
Cristian Calin 360aff4a57
Rename ansible groups to use _ instead of - (#7552)
* rename ansible groups to use _ instead of -

k8s-cluster -> k8s_cluster
k8s-node -> k8s_node
calico-rr -> calico_rr
no-floating -> no_floating

Note: kube-node,k8s-cluster groups in upgrade CI
      need clean-up after v2.16 is tagged

* ensure old groups are mapped to the new ones
2021-04-29 05:20:50 -07:00
Etienne Champetier de1d9df787
Only use stat get_checksum: yes when needed (#7270)
By default Ansible stat module compute checksum, list extended attributes and find mime type
To find all stat invocations that really use one of those:
git grep -F stat. | grep -vE 'stat.(islnk|exists|lnk_source|writeable)'

Signed-off-by: Etienne Champetier <e.champetier@ateme.com>
2021-02-10 05:36:59 -08:00
David Louks aad78840a0
Updated etcd cert check tasks to detect when new cert gen is required (#7219)
* Added force_etcd_cert_refresh var to maintain existing functionality. Broke out etcd node cert syncing from member and admin cert sync logic. Now first etcd will sync node certs to other etcd members on every run to keep all etcds up to date after adding additional worker nodes to the cluster

* Updated etcd cert check tasks to better detect when new certificates need to be generated

* Move usage of force_etcd_cert_refresh var to gen_certs fact set

* Force etcd cert generation per server if force_etcd_cert_refresh is set to true

* Include gathering of node certs even if k8s-cluster member and in etcd group.

* Removed run_once due to when statement
2021-02-09 01:53:22 -08:00
Robin Elfrink 91fea7c956
Fix unintended SIGPIPEs. (#7214) 2021-01-27 01:07:40 -08:00
Samuel Liu 1a409dc7ae
Add download bin tasks (#7131)
* Add downlaod bin tasks

* Add tags never and etcd

* yamllint
2021-01-22 20:41:39 -08:00
Hannes Körber dbe02d398a
etcd: Fix permissions of /etc/ssl/etcd/ssl (#6908) 2020-12-09 00:48:49 -08:00
Emerson Ford f377d9f057
Set etcd_.*_addresses to use etcd_[events_]access_address instead of access_ip (#6936) 2020-12-02 13:55:00 -08:00
Sergey 6a4d322a7c
Do not install etcd and etcdctl on master with scale.yml playbook. (#6798)
Remove task with install etcdctl from etcd role when etcd_kubeadm_enabled=true
2020-10-06 07:04:20 -07:00
Florent Monbillard 80df4f8b01
Fix unintended SIGPIPE (#6721) 2020-09-22 11:14:42 -07:00
Maxime Guyot 648fcf3a2e
Fix E306 in roles/etcd (#6515) 2020-08-31 03:20:20 -07:00
Barry Melbourne 058438a25d
Remove support for CoreOS Container Linux (#6576) 2020-08-28 02:28:53 -07:00
Florian Ruynat 706c7cb4f1
etcd should not fail when adding an already existing member (#6587) 2020-08-27 02:33:01 -07:00
Maxime Guyot e70f27dd79
Add noqa and disable .ansible-lint global exclusions (#6410) 2020-07-27 06:24:17 -07:00
Florent Monbillard bf8c8976dd
Upgrade etcd to 3.4.3 (#5998) 2020-07-20 07:26:51 -07:00
Maxime Guyot 00fe3d5094
Explicitly set ETCDCTL_API and use ETCDCTL_ENDPOINTS (#6327) 2020-07-01 04:56:16 -07:00
Joel Seguillon 4c1e0b188d
Add .editorconfig file (#6307) 2020-06-29 12:39:59 -07:00
Etienne Champetier a35b6dc1af
Fix scaling (#5889)
* etcd: etcd-events doesn't depend on etcd_cluster_setup

Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>

* etcd: remove condition already present on include_tasks

Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>

* etcd: fix scaling up

Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>

* etcd: use *access_addresses, do not delegate to etcd[0]

We want to wait for the full cluster to be healthy,
so use all the cluster addresses
Also we should be able to run the playbook when etcd[0] is down
(not tested), so do not delegate to etcd[0]

Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>

* etcd: use failed_when for health check

unhealthy cluster is expected on first run, so use failed_when
instead of ignore_errors to remove scary red messages

Also use run_once

Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>

* kubernetes/preinstall: ensure ansible_fqdn is up to date after changing /etc/hosts

Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>

* kubernetes/master: regenerate apiserver cert if needed

Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
2020-04-08 01:27:43 -07:00
Xiaodu 63fa406c3c
Move host_architecture to kubespray-defaults (#5811)
The variable is defined in `kubernetes/preinstall` role and used in several roles. Since `kubernetes/preinstall` is not always included when `ansible-playbook` is run with tag selectors (see #5734 for reason), they will fail, or individual roles must copy the same fact definitions (as in #3846). Moving the definition to the always-included `kubespray-defaults` role will resolve the dependency problem.
2020-03-25 12:58:25 -07:00
Etienne Champetier 6ad6609872
Fix certificates checking when adding etcd node to existing k8s node (#5807)
Co-authored-by: alexkomrakov <alexkomrakov@gmail.com>
2020-03-25 12:46:25 -07:00
Stephen Schmidt 0379a52f03
Fix etcd install with docker and etcd_kubeadm_enabled (#5777)
- This solves issue #5721 & #5713 (dupes)
  - Provide a cleaner default usage pattern for the download role
    around etcd that supports 'host' and 'docker' properly
  - Extract the 'etcdctl' as a separate task install piece and reuse it where
    appropriate
  - Update the kubeadm-etcd task to reflect the above change
2020-03-24 08:12:47 -07:00
Sylvain Chateau 0ca7aa126b
added "Flatcar", "Flatcar Container Linux by Kinvolk" for all coreOS role (#5607) 2020-02-18 00:15:29 -08:00
Erwan Miran 339e36fbe6
Files to archive can be passed directly (#5571) 2020-02-12 07:50:51 -08:00
qvicksilver ac2135e450
Fix recover-control-plane to work with etcd 3.3.x and add CI (#5500)
* Fix recover-control-plane to work with etcd 3.3.x and add CI

* Set default values for testcase

* Add actual test jobs

* Attempt to satisty gitlab ci linter

* Fix ansible targets

* Set etcd_member_name as stated in the docs...

* Recovering from 0 masters is not supported yet

* Add other master to broken_kube-master group as well

* Increase number of retries to see if etcd needs more time to heal

* Make number of retries for ETCD loops configurable, increase it for recovery CI and document it
2020-02-11 01:38:01 -08:00
Erwan Miran 303c3654a1 Set pipefail in case tar fails (#5506) 2020-01-06 02:25:34 -08:00
Maxime Guyot b15d41a96a Add support to Ansible 2.9 (#5361) 2019-12-05 07:24:32 -08:00
Sergey 932935ecc7 fix wrong path in include install_host.yml in etcd role (#5256) 2019-10-13 18:16:34 -07:00
Matthew Mosesohn 7abf6a6958 Allow etcd member join by checking cluster health only on first etcd (#5032)
Change-Id: I9cc01cef3a437893225e2d9f58495826bbce7be9
2019-08-07 04:44:50 -07:00
Matthew Mosesohn 4348e78b24 Enable kubeadm etcd mode (#4818)
* Enable kubeadm etcd mode

Uses cert commands from kubeadm experimental control plane to
enable non-master nodes to obtain etcd certs.

Related story: PROD-29434

Change-Id: Idafa1d223e5c6ceadf819b6f9c06adf4c4f74178

* Add validation checks and exclude calico kdd mode

Change-Id: Ic234f5e71261d33191376e70d438f9f6d35f358c

* Move etcd mode test to ubuntu flannel HA job

Change-Id: I9af6fd80a1bbb1692ab10d6da095eb368f6bc732

* rename etcd_mode to etcd_kubeadm_enabled

Change-Id: Ib196d6c8a52f48cae370b026f7687ff9ca69c172
2019-06-20 11:12:51 -07:00
MarkusTeufelberger 73c2ff17dd Fix Ansible-lint error [E502] (#4743) 2019-05-16 00:27:43 -07:00