ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Guillaume Abrioux	8871de6d99	simplify monitor address setting this drops the following parameters: - monitor_address_block - monitor_interface - monitor_address The monitor address will be automatically set from `public_network` parameter. Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>	2024-03-08 13:02:44 +01:00
Seena Fallah	2b72ea991d	ceph-exporter: add installation role Signed-off-by: Seena Fallah <seenafallah@gmail.com>	2024-03-07 20:22:44 +01:00
Guillaume Abrioux	ebd0c6fce3	kickoff squid This adds the few required changes in order to fully support Ceph Squid. Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>	2024-02-16 13:17:44 +01:00
Guillaume Abrioux	1af387621d	drop rhcs references RHCS moved away from ceph-ansible. All RHCS references should be removed. Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>	2024-02-16 11:12:01 +01:00
Guillaume Abrioux	03f1e3f48e	drop iscsigw support This service is no longer maintained. Let's drop its support within ceph-ansible. Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>	2024-02-16 08:59:05 +01:00
Guillaume Abrioux	18da10bb7a	address Ansible linter errors This addresses all errors reported by the Ansible linter. Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>	2024-02-16 00:38:19 +01:00
Guillaume Abrioux	c58529fc04	update: update ceph release pre-check update this check in order to check for Ceph Squid Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>	2024-02-14 09:54:13 +01:00
Guillaume Abrioux	644fac8f2d	update: update the osd require-osd-release to squid This updates the `osd require-osd-release` call with `squid` instead of `reef`. Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>	2024-02-14 09:54:13 +01:00
Guillaume Abrioux	7909778d0e	add CentOS stream 9 support This adds the resquired changes in order to support CentOS stream 9. Also, this bumps the Ansible version support to 2.15 Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>	2024-02-14 09:54:13 +01:00
Guillaume Abrioux	5cd692dcdc	update: fix mgr upgrade issue for some reason, this task has to be done in 2 steps otherwise it fails. 1/ stop and disable the service 2/ mask it when done with with a single task, the module says the service has been stopped while this isn't the case (Ansible systemd module bug?). it possibly relates to https://github.com/ansible/ansible/issues/68680 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2023-05-31 23:07:13 +02:00
Guillaume Abrioux	371592a8fb	common: v18/reef kickoff align with ceph/ceph/pull/47458 since it has been merged. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2022-10-07 16:39:56 +02:00
Guillaume Abrioux	82e0ae7e75	rolling_update: fix rbd-mirror play There's no service to stop/mask when the node being upgraded is a 'primary node' only (1 way replication). Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2022-08-03 13:09:42 +02:00
Guillaume Abrioux	7d848fa19e	Revert "upgrade: block upgrade when rgw multisite is active" This reverts commit `51bc8cb636`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2022-06-29 06:55:31 +02:00
Francesco Pantano	0e9b3902b0	Add ceph_infra tag to rolling_update When the upgrade from Ceph 4 to 5 is performed in the OpenStack context, ceph-ansible triggers the rolling_update playbook, which is supposed to rollout new Ceph containers. The ceph-infra role tries to take care about firewall, ntp config and logrotate; however, TripleO manages them through tripleo-heat-templates. This patch just add an additional tag to skip the ceph-infra role in the OpenStack context. Closes: https://bugzilla.redhat.com/2090456 Signed-off-by: Francesco Pantano <fpantano@redhat.com>	2022-05-27 15:05:16 +02:00
Guillaume Abrioux	51bc8cb636	upgrade: block upgrade when rgw multisite is active With this commit, upgrading a cluster from Nautilus to Pacific with active rgw multisite replication will be blocked. This is because a lot of bugs are currently present in Pacific regarding RGW multisite. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2063702 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2022-03-20 08:13:38 +01:00
Guillaume Abrioux	817c03bc0e	update: speed up client play wip Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-12-15 08:42:23 +01:00
Guillaume Abrioux	e5edcc4214	update: move a set_fact ceph-facts roles makes decisions based on the fact `rolling_update` so it must be called before we run this role. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2014304 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-11-03 08:48:09 +01:00
Guillaume Abrioux	82eee4303b	update: support --limit on monitor nodes Change needed in order to support --limit on mon nodes. Otherwise, a call to `hostvars[groups[mon_group_name][0]]['_current_monitor_address']` throws an error: ``` "The error was: 'ansible.vars.hostvars.HostVarsVars object' has no attribute '_current_monitor_address'" ``` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2014304#c28 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-10-28 21:47:01 +02:00
Guillaume Abrioux	50a21d695e	rolling_update: modify default health_osd_check_* let's do more retries with a shorter delay. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-10-25 15:44:17 +02:00
Guillaume Abrioux	fc9f87c45f	rolling_update: fix pre and post osd upgrade play when using --limit osds, the play before and after osd upgrade are skipped because we use `hosts: "{{ mon_group_name \| default('mons') }}[0]"` using `hosts: "{{ osds_group_name \| default('osds') }}" with `delegate_to` to the first monitor addresses this issue. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-10-25 14:23:00 +02:00
Guillaume Abrioux	e5cf9db2b0	update: support upgrading a subset of nodes It can be useful in a large cluster deployment to split the upgrade and only upgrade a group of nodes at a time. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2014304 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-10-21 20:51:14 +02:00
Guillaume Abrioux	c14e9114ba	update: gather facts only one time this play doesn't need to gather facts from localhost Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-08-17 14:41:17 -04:00
Dimitri Savineau	386661699b	infra: use dedicated variables for balancer status The balancer status is registered during the cephadm-adopt, rolling_update and swith2container playbooks. But it is also used in the ceph-handler role which is included in those playbooks too. Even if the ceph-handler tasks are skipped for rolling_update and switch2container, the balancer_status variable is erased with the skip task result. play1: register: balancer_status play2: register: balancer_status <-- skipped play3: when: (balancer_status.stdout \| from_json)['active'] \| bool This leads to issue like: The conditional check '(balancer_status.stdout \| from_json)['active'] \| bool' failed. The error was: Unexpected templating type error occurred on ({% if (balancer_status.stdout \| from_json)['active'] \| bool %} True {% else %} False {% endif %}): expected string or buffer. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1982054 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-08-04 17:39:54 +02:00
Dimitri Savineau	06471a4b82	osds: use osd pool ls instead of osd dump command The ceph osd pool ls detail command is a subset of the ceph osd dump command. $ ceph osd dump --format json\|wc -c 10117 $ ceph osd pool ls detail --format json\|wc -c 4740 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-08-02 15:51:01 +02:00
Dimitri Savineau	e87a47cf0c	rolling_update: get ceph version when mons exist `eec3878` introduced a regression for upgrade scenarios where there's no monitor nodes at all (like ganesha standalone, external clients, etc..) TASK [get the ceph release being deployed] ********************************** task path: infrastructure-playbooks/rolling_update.yml:121 Thursday 29 July 2021 15:55:29 +0000 (0:00:00.484) 0:00:15.802 ******* fatal: [client0]: FAILED! => msg: '''dict object'' has no attribute ''mons''' Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-08-02 15:47:56 +02:00
Benoît Knecht	d7653dca95	infrastructure-playbooks: Get Ceph info in check mode In the `set osd flags` block, run the Ceph commands that gather information from the cluster (and don't make any changes to it) even when running in check mode. This allows the tasks that depend on the variables set by those tasks to succeed in check mode. Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>	2021-07-28 14:04:54 +02:00
Guillaume Abrioux	eec38784ec	update: check the ceph release Check early which Ceph release is going to be deployed and fail if it doesn't correspond to the ceph-ansible version being used. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1978643 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-07-26 18:11:22 +02:00
Dimitri Savineau	738fa9428a	common: remove unnecessary run_once statements `1303611` introduced tasks for disabling the pg_autoscaler on pools and the balancer but thoses tasks are already executed on the first monitor node so we don't need to add the run_once statement. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-07-21 09:55:21 -04:00
Dimitri Savineau	cf6e33346e	common: fix py2 pool_list from_json when skipped When using python 2 and the task with a loop is skipped then it generates an error. Unexpected templating type error occurred on ({{ (pool_list.stdout \| from_json)['pools'] }}): expected string or buffer Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-07-21 08:17:58 +02:00
Guillaume Abrioux	13036115e2	common: disable/enable pg_autoscaler The PG autoscaler can disrupt the PG checks so the idea here is to disable it and re-enable it back after the restart is done. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-07-20 07:37:07 +02:00
Dimitri Savineau	97148dd58c	rolling_update: check quorum state before upgrade If one a the monitor is out of the quorum then nothing prevents the upgrade playbook to run. We only check if we have at least three monitor nodes but we should also check if those monitor nodes are correctly present in the quorum. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1952571 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-07-12 18:16:22 +02:00
Guillaume Abrioux	c396122ad9	update: fail the playbook if straw2 conversion failed It's better to fail the playbook so the user is aware the straw2 migration has failed. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-07-09 11:44:06 -04:00
Guillaume Abrioux	4eb4268dee	update: followup on pr #6689 add mising 'osd' command. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-07-09 10:01:45 +02:00
Guillaume Abrioux	eee576477c	update: convert straw bucket After an upgrade, the presence of straw buckets will produce the following warning (HEALTH_WARN): ``` crush map has legacy tunables (require firefly, min is hammer) ``` because straw bucket is a firefly feature it needs to be converted to straw2. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967964 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-07-09 08:28:46 +02:00
Dimitri Savineau	a05730b38a	rhcs: remove ISO install method Starting RHCS 5, there's no ISO available anymore. This removes all ISO variables and the ceph_repository_type variable. Closes: #6626 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-06-30 18:03:03 +02:00
Guillaume Abrioux	2c77d0094c	update: do not gather facts on each play There's no benefit to gather facts again on each play in rolling_update.yml Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-05-22 08:33:44 +02:00
Guillaume Abrioux	3db1ea7ec4	update: fix ceph-crash stop task This is a workaround for an issue in ansible. When trying to stop/mask/disable this service in one task, the stop didn't actually happen, the task doesn't fail but for some reason the container is still present and running. Then the task starting the service in the role ceph-crash fails because it can't start the container since it's already running with the same name. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1955393 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-05-04 13:06:47 +02:00
Guillaume Abrioux	e9ddb972fe	update: followup on `07029e1` Playbook must fail anyway, the `rescue` block has been introduced for unmasking the unit after the playbook has failed. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-03-26 21:27:02 +01:00
Guillaume Abrioux	07029e1bf1	rolling_update: unmask monitor service after a failure if for some reason the playbook fails after the service was stopped, disabled and masked and before it got restarted, enabled and unmasked, the playbook leaves the service masked and which can make users confused and forces them to unmask the unit manually. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1917680 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-03-18 15:22:38 +01:00
Guillaume Abrioux	6ccc8b4722	update: convert legacy grafana-server groupname early If the legacy name `grafana-server` is still being used when upgrading from Nautilus to Pacific, the task that sets the fact `rolling_update` to `true` doesn't run on the node(s) included in that group. Indeed the play where we set this fact (`rolling_update`) only runs on the group `monitoring_group_name \| default('monitoring')`. As a workaround, we can run earlier the task which converts the `grafana-server` group name to `monitoring`. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1935554 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-03-15 15:25:48 +01:00
Alex Schultz	a7f2fa73e6	Use ansible_facts It has come to our attention that using ansible_* vars that are populated with INJECT_FACTS_AS_VARS=True is not very performant. In order to be able to support setting that to off, we need to update the references to use ansible_facts[<thing>] instead of ansible_<thing>. Related: ansible#73654 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1935406 Signed-off-by: Alex Schultz <aschultz@redhat.com>	2021-03-08 20:54:02 +01:00
Dimitri Savineau	48a456dc8c	rolling_update: enforce ceph-container-engine When running the rolling_update.yml playbook and adding the dashboard component in the same time then the requirement (like container packages) aren't installed. This could lead to a failure in case of using authentication on the container registry because the playbook will try to login on the registry but podman/docker aren't yet installed. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1903504 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1918650 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-02-10 08:17:11 +01:00
Dimitri Savineau	94af3c87d1	rolling_update: exclude clients from node-exporter Since `b105549` we don't install node-exporter on client nodes so we should also exclude the client node from the node-exporter upgrade. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-02-09 14:41:13 +01:00
Guillaume Abrioux	b9cdee40a2	update: update ceph release pattern in complete upgrade play since master is now deploying quincy, we must update this. Otherwise, it will fail like following: ``` Error EPERM: require_osd_release cannot be lowered once it has been set ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-02-06 00:34:14 +01:00
Guillaume Abrioux	44fbadb50c	rolling_update: pg check refactor There's no need to achieve this in two tasks. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-02-06 00:34:14 +01:00
Guillaume Abrioux	86a8889ee3	common: do not use pipefail when not needed Let's discard the ansible lint error 306 and add a "# noqa 306" on tasks where we don't need `set -o pipefail` Fixes: #6090 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-12-01 15:07:09 -05:00
Dimitri Savineau	5da593604a	library: add ceph_osd_flag module This adds ceph_osd_flag ansible module for replacing the command module usage with the ceph osd set/unset commands. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-12-01 10:29:11 +01:00
Dimitri Savineau	3baac5ad5b	library: add ceph_volume_simple_{activate,scan} This adds ceph_volume_simple_{activate,scan} ansible modules for replacing the command module usage with the ceph-volume simple activate/scan commands. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-11-25 10:09:42 +01:00
Guillaume Abrioux	97dd9218dd	lint: all tasks should be named Fix ansible-lint 502 error: [502] All tasks should be named Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-11-23 08:33:47 +01:00
Guillaume Abrioux	5450de58b3	lint: commands should not change things Fix ansible lint 301 error: [301] Commands should not change things if nothing needs doing Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-11-23 08:33:47 +01:00

1 2 3 4 5

240 Commits (7b9a459ce3aff02d8e1878ed06cbcea0dd2d886e)