ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Teoman ONAY	bb92e27707	shrink-osd fails when the OSD container is stopped ceph-volume simple scan cannot be executed as it is meant to be run inside the OSD container. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2164414 Signed-off-by: Teoman ONAY <tonay@ibm.com>	2023-03-16 10:48:25 +01:00
Teoman ONAY	a76ae5af16	Updates ceph systemd unit files and reloads systemd Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2147570 Signed-off-by: Teoman ONAY <tonay@ibm.com>	2023-01-31 14:19:08 +01:00
Guillaume Abrioux	3cca408738	switch-to-containers: ignore errors when stopping service There might be cases where it can break idempotency. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `09b8f7b2ea`)	2022-10-17 10:31:45 +02:00
Guillaume Abrioux	37db27eefa	switch-to-containers: fix rbd-mirror migration `--state=enabled` isn't a valid filter so the unit from the packaging never gets removed. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2134917 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `7664da58da`)	2022-10-15 07:09:13 +02:00
Guillaume Abrioux	64d912bad2	rbd-mirror: follow up on recent rbd-mirror refactor - ensure /var/lib/ceph/bootstrap-rbd-mirror exists - always install ceph-base on rbdmirror nodes (otherwise, ceph-crash isn't present) Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `041435e1e3`) (cherry picked from commit `b634fb1cb3`) (cherry picked from commit `302da16c27`)	2022-08-08 13:05:23 +02:00
Guillaume Abrioux	4661c5679f	shrink-osd: use command instead of ceph_volume_simple_scan This module isn't available in RHCS 4 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2071035 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2022-07-18 10:11:27 +02:00
Guillaume Abrioux	f8e969c890	backup-and-restore: use archive/unarchive approach current approach is too complex and causes too many issues permission issues. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2051640 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `dffe7b47de`)	2022-07-11 14:05:47 +02:00
Guillaume Abrioux	1b09682f75	backup-and-restore: various fixes - preserve mode and ownership on main directories - make sure the directories are well present prior to restoring files. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2051640 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `047af3a3f6`)	2022-07-11 14:05:43 +02:00
Guillaume Abrioux	1ecdc189b1	backup-and-restore: fix check on 'target_node' variable If the user doesn't pass a valid name (present in the inventory) the playbook will fail like following: ``` fatal: [localhost -> {{ target_node }}]: FAILED! => msg: \|- The task includes an option with an undefined variable. The error was: "hostvars['10.70.46.40']" is undefined ``` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2051640 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `b18a1aa3ca`)	2022-07-11 14:05:40 +02:00
Guillaume Abrioux	e28615809c	backup-and-restore: fix check on 'mode' variable Typical failure: ``` fatal: [localhost]: FAILED! => msg: \|- The conditional check 'mode not in ['backup', 'restore']' failed. The error was: error while evaluating conditional (mode not in ['backup', 'restore']): 'mode' is undefined ``` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2051640 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `848dd03fa6`)	2022-07-11 14:05:36 +02:00
Guillaume Abrioux	85049464ce	backup-and-restore: fix a typo Typo introduced during initial implementation. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2051640 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `e28c486e52`)	2022-07-11 14:05:33 +02:00
Guillaume Abrioux	2c2833b540	contrib: add a playbook this playbook can backup or restore some ceph files. (/etc/ceph, /var/lib/ceph, ...) Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2051640 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `ed0bba4d77`)	2022-07-11 14:05:27 +02:00
Teoman ONAY	6feb7646a1	Refresh /etc/ceph/osd json files content before zapping the disks If the physical disk to device path mapping has changed since the last ceph-volume simple scan (e.g. addition or removal of disks), a wrong disk could be deleted. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2071035 Signed-off-by: Teoman ONAY <tonay@redhat.com> (cherry picked from commit `64e08f2c0b`)	2022-07-11 09:16:34 +02:00
Guillaume Abrioux	28063d1d69	purge: reset-failed ceph-crash This ensures we always reset-failed the ceph-crash service. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2055992 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `e368ee0fc9`)	2022-05-23 09:53:31 +02:00
Guillaume Abrioux	6485e1a69e	purge: remove ceph directories on client nodes Otherwise any ceph directories are left over on client nodes after the purge. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2024815 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `20035852a4`) (cherry picked from commit `346d4a1e1d`)	2022-05-19 18:00:13 +02:00
Guillaume Abrioux	36fceeaaff	update: speed up client play there's no need to run the roles ceph-facts, ceph-config and ceph-client altogether on client nodes in rolling update playbook. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2019831 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `817c03bc0e`) (cherry picked from commit `c0da98b1d6`)	2022-05-19 17:58:01 +02:00
Guillaume Abrioux	f4bd5c7d91	switch2containers: fail if less than 3 monitors This playbook doesn't support less than 3 monitors present in the inventory. Just like the rolling_update playbook, let's fail if less than 3 monitors are present. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2049132 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `f08129edf2`) (cherry picked from commit `b970ab6691`)	2022-05-19 17:53:53 +02:00
Guillaume Abrioux	dbe940f1a7	purge: ceph-crash purge fixes This fixes the service file removal and makes the playbook call `systemctl reset-failed` on the service because in Ceph Nautilus, ceph-crash doesn't handle `SIGTERM` signal. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2055992 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `2f11982590`) (cherry picked from commit `7a570c719e`)	2022-05-09 13:45:16 +02:00
Guillaume Abrioux	48a8b1cc34	update: move a set_fact ceph-facts roles makes decisions based on the fact `rolling_update` so it must be called before we run this role. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2014304 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `e5edcc4214`)	2021-11-03 11:50:50 +01:00
Guillaume Abrioux	a9a7c35a74	update: support --limit on monitor nodes Change needed in order to support --limit on mon nodes. Otherwise, a call to `hostvars[groups[mon_group_name][0]]['_current_monitor_address']` throws an error: ``` "The error was: 'ansible.vars.hostvars.HostVarsVars object' has no attribute '_current_monitor_address'" ``` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2014304#c28 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `82eee4303b`)	2021-10-29 01:41:13 +02:00
Per Abildgaard Toft	c5e4851a3f	shrink-osd: fix regression because of a wrong regex `968891f449` introduced a regression. The regex is wrong because it doesn't allow to shrink osds with id greater than 9 Fixes: #6950 Signed-off-by: Per Abildgaard Toft <per@minfejl.dk> (cherry picked from commit `84118a3063`)	2021-10-26 16:39:33 +02:00
Guillaume Abrioux	3f4abb09b4	shrink-osd: check osd id format This adds a check early in order to ensure the format of osd ids passed is correct. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2005734 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `968891f449`)	2021-10-26 16:39:33 +02:00
Guillaume Abrioux	2c9fc7f517	rolling_update: modify default health_osd_check_* let's do more retries with a shorter delay. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `50a21d695e`)	2021-10-26 16:39:09 +02:00
Guillaume Abrioux	3dd96da652	rolling_update: fix pre and post osd upgrade play when using --limit osds, the play before and after osd upgrade are skipped because we use `hosts: "{{ mon_group_name \| default('mons') }}[0]"` using `hosts: "{{ osds_group_name \| default('osds') }}" with `delegate_to` to the first monitor addresses this issue. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `fc9f87c45f`)	2021-10-25 23:22:35 +02:00
Guillaume Abrioux	dc1a4c29ea	update: support upgrading a subset of nodes It can be useful in a large cluster deployment to split the upgrade and only upgrade a group of nodes at a time. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2014304 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `e5cf9db2b0`)	2021-10-25 23:22:35 +02:00
Seena Fallah	0a93de938b	purge: add remove_docker tag This can help to skip docker removal tasks Signed-off-by: Seena Fallah <seenafallah@gmail.com> (cherry picked from commit `ff39c8d70b`)	2021-09-14 20:50:06 +02:00
Seena Fallah	0ede37b2ec	purge: add container_binary needed for zap osds `container_binary` isn't set anymore in the purge osd play because of a regression introduced by `60aa70a`. The CI didn't catch it because the play purging node-exporter sets this variable for all nodes before we run the purge osd play. This commit fixes this regression. Signed-off-by: Seena Fallah <seenafallah@gmail.com> (cherry picked from commit `a51ce767ca`)	2021-09-09 14:40:53 +02:00
Dimitri Savineau	0d670c7942	purge-dashboard: remove cid files This adds the service cid file cleanup as supported in the classic purge playbook since `b9dd253` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786691 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `cddc23f511`)	2021-09-08 12:05:33 -04:00
Guillaume Abrioux	20583e83dd	containers: introduce target systemd unit This adds ceph-*.target systemd unit files support for containerized deployments. This also fixes a regression introduced by PR #6719 (rgw and nfs systemd units not getting purged) Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1962748 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `09ef465f62`)	2021-08-18 13:43:01 -04:00
Guillaume Abrioux	2d38d8266b	update: gather facts only one time this play doesn't need to gather facts from localhost Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `c14e9114ba`)	2021-08-17 15:47:38 -04:00
Dimitri Savineau	712a9c4403	switch2container: fix mon quorum check This was reverted by `7ddbe74` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1990733 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-08-10 10:03:02 +02:00
VasishtaShastry	4ae9f321ac	Fixes typo in rgw-add-users-buckets playbook Signed-off-by: VasishtaShastry <vipin.indiasmg@gmail.com> (cherry picked from commit `478d9fdcb6`)	2021-08-09 14:31:55 -04:00
Dimitri Savineau	e1e22933a7	add-osd: use container_exec_cmd fact from mon host Because we're delegating the task to the first monitor node, we need to be sure that the container_exec_cmd fact is the one from that node too otherwise we could have a mismatch on the ceph-mon container name. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1990772 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-08-09 15:48:23 +02:00
Dimitri Savineau	03ed9e111c	infra: use dedicated variables for balancer status The balancer status is registered during the cephadm-adopt, rolling_update and swith2container playbooks. But it is also used in the ceph-handler role which is included in those playbooks too. Even if the ceph-handler tasks are skipped for rolling_update and switch2container, the balancer_status variable is erased with the skip task result. play1: register: balancer_status play2: register: balancer_status <-- skipped play3: when: (balancer_status.stdout \| from_json)['active'] \| bool This leads to issue like: The conditional check '(balancer_status.stdout \| from_json)['active'] \| bool' failed. The error was: Unexpected templating type error occurred on ({% if (balancer_status.stdout \| from_json)['active'] \| bool %} True {% else %} False {% endif %}): expected string or buffer. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1982054 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `386661699b`)	2021-08-04 11:48:13 -04:00
Dimitri Savineau	1044940304	osds: use osd pool ls instead of osd dump command The ceph osd pool ls detail command is a subset of the ceph osd dump command. $ ceph osd dump --format json\|wc -c 10117 $ ceph osd pool ls detail --format json\|wc -c 4740 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `06471a4b82`)	2021-08-03 14:03:35 -04:00
Dimitri Savineau	5c6921e553	rolling_update: get ceph version when mons exist `eec3878` introduced a regression for upgrade scenarios where there's no monitor nodes at all (like ganesha standalone, external clients, etc..) TASK [get the ceph release being deployed] ********************************** task path: infrastructure-playbooks/rolling_update.yml:121 Thursday 29 July 2021 15:55:29 +0000 (0:00:00.484) 0:00:15.802 ******* fatal: [client0]: FAILED! => msg: '''dict object'' has no attribute ''mons''' Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `e87a47cf0c`)	2021-08-03 12:42:28 -04:00
Benoît Knecht	bacaa654b1	infrastructure-playbooks: Get Ceph info in check mode In the `set osd flags` block, run the Ceph commands that gather information from the cluster (and don't make any changes to it) even when running in check mode. This allows the tasks that depend on the variables set by those tasks to succeed in check mode. Signed-off-by: Benoît Knecht <bknecht@protonmail.ch> (cherry picked from commit `d7653dca95`)	2021-08-02 15:54:34 +02:00
Guillaume Abrioux	77171216fb	update: check the ceph release Check early which Ceph release is going to be deployed and fail if it doesn't correspond to the ceph-ansible version being used. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1978643 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `eec38784ec`)	2021-07-26 14:10:24 -04:00
Guillaume Abrioux	907fb08956	purge: support osd_auto_discovery This adds a task that zaps by osd id so we can support the scenario where osds were deployed with `osd_auto_discovery` is true. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1876860 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `4144074a50`)	2021-07-26 17:53:06 +02:00
Guillaume Abrioux	3dcfbc2edf	purge: merge playbooks This refactor merges the two playbooks so we only have to maintain 1 playbook. (Symlink the old purge-container-cluster.yml playbook for backward compatibility). Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `17cd83bf3a`)	2021-07-26 17:53:06 +02:00
Guillaume Abrioux	e4fea521d9	purge: drop variables from 'hosts' sections Those variables are useless given this is not possible to override them. Let's replace them with the hardcoded name instead. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `6b50401d0c`)	2021-07-26 17:53:06 +02:00
Guillaume Abrioux	cf812d06e3	purge: reindent playbook This commit reindents the playbook. Also improve readability by adding an extra line between plays. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `60aa70a128`)	2021-07-26 17:53:06 +02:00
Dimitri Savineau	f433d06a93	rolling_update: check quorum state before upgrade If one a the monitor is out of the quorum then nothing prevents the upgrade playbook to run. We only check if we have at least three monitor nodes but we should also check if those monitor nodes are correctly present in the quorum. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1952571 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `97148dd58c`)	2021-07-26 17:49:23 +02:00
Dimitri Savineau	c8ca73f620	infra: add playbook to purge dashboard/monitoring The dashboard/monitoring stack can be deployed via the dashboard_enabled variable. But there's nothing similar if we can to remove that part only and keep the ceph cluster up and running. The current purge playbooks remove everything. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786691 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `8e4ef7d6da`)	2021-07-26 17:48:32 +02:00
Dimitri Savineau	8e939dc377	common: remove unnecessary run_once statements `1303611` introduced tasks for disabling the pg_autoscaler on pools and the balancer but thoses tasks are already executed on the first monitor node so we don't need to add the run_once statement. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `738fa9428a`)	2021-07-21 10:03:36 -04:00
Dimitri Savineau	17b9ff03d2	common: fix py2 pool_list from_json when skipped When using python 2 and the task with a loop is skipped then it generates an error. Unexpected templating type error occurred on ({{ (pool_list.stdout \| from_json)['pools'] }}): expected string or buffer Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `cf6e33346e`)	2021-07-21 09:54:46 -04:00
Guillaume Abrioux	f7882bbc02	common: disable/enable pg_autoscaler The PG autoscaler can disrupt the PG checks so the idea here is to disable it and re-enable it back after the restart is done. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `13036115e2`)	2021-07-21 09:40:18 -04:00
Guillaume Abrioux	f0cd3c4f48	update: fail the playbook if straw2 conversion failed It's better to fail the playbook so the user is aware the straw2 migration has failed. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `c396122ad9`)	2021-07-09 17:29:54 -04:00
Guillaume Abrioux	65ce69567a	update: followup on pr #6689 add mising 'osd' command. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `4eb4268dee`)	2021-07-09 11:34:46 +02:00
Guillaume Abrioux	1179ea8b2f	update: convert straw bucket After an upgrade, the presence of straw buckets will produce the following warning (HEALTH_WARN): ``` crush map has legacy tunables (require firefly, min is hammer) ``` because straw bucket is a firefly feature it needs to be converted to straw2. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967964 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `eee576477c`)	2021-07-09 11:34:46 +02:00

1 2 3 4 5 ...

692 Commits (e31363ea9b7b39d0ea34a26f693ed58874caa42d)