ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Dimitri Savineau	0d670c7942	purge-dashboard: remove cid files This adds the service cid file cleanup as supported in the classic purge playbook since `b9dd253` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786691 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `cddc23f511`)	2021-09-08 12:05:33 -04:00
Guillaume Abrioux	20583e83dd	containers: introduce target systemd unit This adds ceph-*.target systemd unit files support for containerized deployments. This also fixes a regression introduced by PR #6719 (rgw and nfs systemd units not getting purged) Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1962748 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `09ef465f62`)	2021-08-18 13:43:01 -04:00
Guillaume Abrioux	2d38d8266b	update: gather facts only one time this play doesn't need to gather facts from localhost Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `c14e9114ba`)	2021-08-17 15:47:38 -04:00
Dimitri Savineau	712a9c4403	switch2container: fix mon quorum check This was reverted by `7ddbe74` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1990733 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-08-10 10:03:02 +02:00
VasishtaShastry	4ae9f321ac	Fixes typo in rgw-add-users-buckets playbook Signed-off-by: VasishtaShastry <vipin.indiasmg@gmail.com> (cherry picked from commit `478d9fdcb6`)	2021-08-09 14:31:55 -04:00
Dimitri Savineau	e1e22933a7	add-osd: use container_exec_cmd fact from mon host Because we're delegating the task to the first monitor node, we need to be sure that the container_exec_cmd fact is the one from that node too otherwise we could have a mismatch on the ceph-mon container name. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1990772 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-08-09 15:48:23 +02:00
Dimitri Savineau	03ed9e111c	infra: use dedicated variables for balancer status The balancer status is registered during the cephadm-adopt, rolling_update and swith2container playbooks. But it is also used in the ceph-handler role which is included in those playbooks too. Even if the ceph-handler tasks are skipped for rolling_update and switch2container, the balancer_status variable is erased with the skip task result. play1: register: balancer_status play2: register: balancer_status <-- skipped play3: when: (balancer_status.stdout \| from_json)['active'] \| bool This leads to issue like: The conditional check '(balancer_status.stdout \| from_json)['active'] \| bool' failed. The error was: Unexpected templating type error occurred on ({% if (balancer_status.stdout \| from_json)['active'] \| bool %} True {% else %} False {% endif %}): expected string or buffer. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1982054 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `386661699b`)	2021-08-04 11:48:13 -04:00
Dimitri Savineau	1044940304	osds: use osd pool ls instead of osd dump command The ceph osd pool ls detail command is a subset of the ceph osd dump command. $ ceph osd dump --format json\|wc -c 10117 $ ceph osd pool ls detail --format json\|wc -c 4740 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `06471a4b82`)	2021-08-03 14:03:35 -04:00
Dimitri Savineau	5c6921e553	rolling_update: get ceph version when mons exist `eec3878` introduced a regression for upgrade scenarios where there's no monitor nodes at all (like ganesha standalone, external clients, etc..) TASK [get the ceph release being deployed] ********************************** task path: infrastructure-playbooks/rolling_update.yml:121 Thursday 29 July 2021 15:55:29 +0000 (0:00:00.484) 0:00:15.802 ******* fatal: [client0]: FAILED! => msg: '''dict object'' has no attribute ''mons''' Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `e87a47cf0c`)	2021-08-03 12:42:28 -04:00
Benoît Knecht	bacaa654b1	infrastructure-playbooks: Get Ceph info in check mode In the `set osd flags` block, run the Ceph commands that gather information from the cluster (and don't make any changes to it) even when running in check mode. This allows the tasks that depend on the variables set by those tasks to succeed in check mode. Signed-off-by: Benoît Knecht <bknecht@protonmail.ch> (cherry picked from commit `d7653dca95`)	2021-08-02 15:54:34 +02:00
Guillaume Abrioux	77171216fb	update: check the ceph release Check early which Ceph release is going to be deployed and fail if it doesn't correspond to the ceph-ansible version being used. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1978643 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `eec38784ec`)	2021-07-26 14:10:24 -04:00
Guillaume Abrioux	907fb08956	purge: support osd_auto_discovery This adds a task that zaps by osd id so we can support the scenario where osds were deployed with `osd_auto_discovery` is true. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1876860 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `4144074a50`)	2021-07-26 17:53:06 +02:00
Guillaume Abrioux	3dcfbc2edf	purge: merge playbooks This refactor merges the two playbooks so we only have to maintain 1 playbook. (Symlink the old purge-container-cluster.yml playbook for backward compatibility). Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `17cd83bf3a`)	2021-07-26 17:53:06 +02:00
Guillaume Abrioux	e4fea521d9	purge: drop variables from 'hosts' sections Those variables are useless given this is not possible to override them. Let's replace them with the hardcoded name instead. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `6b50401d0c`)	2021-07-26 17:53:06 +02:00
Guillaume Abrioux	cf812d06e3	purge: reindent playbook This commit reindents the playbook. Also improve readability by adding an extra line between plays. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `60aa70a128`)	2021-07-26 17:53:06 +02:00
Dimitri Savineau	f433d06a93	rolling_update: check quorum state before upgrade If one a the monitor is out of the quorum then nothing prevents the upgrade playbook to run. We only check if we have at least three monitor nodes but we should also check if those monitor nodes are correctly present in the quorum. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1952571 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `97148dd58c`)	2021-07-26 17:49:23 +02:00
Dimitri Savineau	c8ca73f620	infra: add playbook to purge dashboard/monitoring The dashboard/monitoring stack can be deployed via the dashboard_enabled variable. But there's nothing similar if we can to remove that part only and keep the ceph cluster up and running. The current purge playbooks remove everything. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786691 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `8e4ef7d6da`)	2021-07-26 17:48:32 +02:00
Dimitri Savineau	8e939dc377	common: remove unnecessary run_once statements `1303611` introduced tasks for disabling the pg_autoscaler on pools and the balancer but thoses tasks are already executed on the first monitor node so we don't need to add the run_once statement. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `738fa9428a`)	2021-07-21 10:03:36 -04:00
Dimitri Savineau	17b9ff03d2	common: fix py2 pool_list from_json when skipped When using python 2 and the task with a loop is skipped then it generates an error. Unexpected templating type error occurred on ({{ (pool_list.stdout \| from_json)['pools'] }}): expected string or buffer Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `cf6e33346e`)	2021-07-21 09:54:46 -04:00
Guillaume Abrioux	f7882bbc02	common: disable/enable pg_autoscaler The PG autoscaler can disrupt the PG checks so the idea here is to disable it and re-enable it back after the restart is done. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `13036115e2`)	2021-07-21 09:40:18 -04:00
Guillaume Abrioux	f0cd3c4f48	update: fail the playbook if straw2 conversion failed It's better to fail the playbook so the user is aware the straw2 migration has failed. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `c396122ad9`)	2021-07-09 17:29:54 -04:00
Guillaume Abrioux	65ce69567a	update: followup on pr #6689 add mising 'osd' command. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `4eb4268dee`)	2021-07-09 11:34:46 +02:00
Guillaume Abrioux	1179ea8b2f	update: convert straw bucket After an upgrade, the presence of straw buckets will produce the following warning (HEALTH_WARN): ``` crush map has legacy tunables (require firefly, min is hammer) ``` because straw bucket is a firefly feature it needs to be converted to straw2. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967964 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `eee576477c`)	2021-07-09 11:34:46 +02:00
Guillaume Abrioux	595a61c137	purge: add monitoring group in final cleanup play This adds the monitoring group in the "final cleanup play" so any cid files generated are well removed when purging the cluster. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1974536 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `037d8cd05e`)	2021-07-02 14:37:18 -04:00
Guillaume Abrioux	f0413c4a2b	update: do not gather facts on each play There's no benefit to gather facts again on each play in rolling_update.yml Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `2c77d0094c`)	2021-06-30 20:40:15 +02:00
Guillaume Abrioux	8802dcf05f	shrink-mgr: modify existing mgr check Do not rely on the inventory aliases in order to check if the selected manager to be removed is present. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967897 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `26a7256c4c`)	2021-06-30 16:12:07 +02:00
Dimitri Savineau	25ea12d31d	switch2container: run ceph-validate role This adds the ceph-validate role before starting the switch to a containerized deployment. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1968177 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `fc160b3be1`)	2021-06-30 09:32:34 +02:00
Guillaume Abrioux	a391dad8e1	dashboard: fix typo introduced during backport during backport of `c8b92deba1` the pattern should have been s/monitoring_group_name/grafana_server_group_name/ Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1964907 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `ac0a5c1e68`)	2021-05-26 18:51:18 +02:00
Guillaume Abrioux	0eed858952	fs2bs: use match filter in selectattr() `0990ae4109` changed the filter in selectattr() from 'match' to 'equalto' but due to an incompatibility with the Jinja2 version for python 2.7 on el7 we must stick to using 'match' filter. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `d6745e9cd9`)	2021-05-26 09:58:19 +02:00
Guillaume Abrioux	abebf9b23e	fs2bs: fix wrong filter when setting osd_ids using 'match' filter in that task will lead to bad behavior if I have the following node names for instance: - node1 - node11 - node111 with `selectattr('name', 'match', inventory_hostname)` it will match 'node1' along with 'node11' and 'node111'. using 'equalto' filter will make sure we only match the target node. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1963066 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `0990ae4109`)	2021-05-26 09:58:19 +02:00
Guillaume Abrioux	2d59f4579b	update: fix ceph-crash stop task This is a workaround for an issue in ansible. When trying to stop/mask/disable this service in one task, the stop didn't actually happen, the task doesn't fail but for some reason the container is still present and running. Then the task starting the service in the role ceph-crash fails because it can't start the container since it's already running with the same name. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1955393 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `3db1ea7ec4`)	2021-05-05 09:47:32 +02:00
Guillaume Abrioux	650964a8c7	fs2bs: add a final play This removes the fact `skipped_nodes` which is useless when we run with `--limit` since it gets reset when a new iteration is made. Instead, let's print within a final play which node has been skipped reusing the `skip_this_node` fact. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `3d4267051f`)	2021-04-28 08:55:34 +02:00
Guillaume Abrioux	b4340b71d9	switch-to-containers: only chown corresponding files When collocating daemons, if we chown all files under `/var/lib/ceph` it can cause issues for the collocated daemons that wouldn't have been migrated yet. This commit makes the playbook chown only the files corresponding to the daemon being migrated. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `ddbc11c4a9`)	2021-04-15 05:24:44 +02:00
Guillaume Abrioux	3ef9690cd1	docker2podman: skip some role imports from handler when running docker-to-podman playbook, there's no need to call `ceph-config` and `ceph-rgw` from the role `ceph-handler`. It can even have side effects when coming from a baremetal cluster that was previously migrated using the switch-to-containers playbook. Indeed it might complain about missing .target systemd unit since they are removed during that migration. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1944999 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `70f19be367`)	2021-04-12 13:30:31 +02:00
Guillaume Abrioux	c0c90c6747	docker2podman: add documentation/header this adds a small documentation in the header of the playbook in order to explain what is the goal of this playbook. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `36b4227dcd`)	2021-04-12 09:45:20 +02:00
Guillaume Abrioux	3bf2c45123	switch_to_containers: support iscsigws migration This adds the iscsigws migration to containers. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=<bz-number> Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `2c74c27321`)	2021-04-09 15:28:27 +02:00
Guillaume Abrioux	5fd299e358	update: followup on `07029e1` Playbook must fail anyway, the `rescue` block has been introduced for unmasking the unit after the playbook has failed. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `e9ddb972fe`)	2021-03-29 15:22:23 +02:00
Guillaume Abrioux	82b934cfc1	rolling_update: unmask monitor service after a failure if for some reason the playbook fails after the service was stopped, disabled and masked and before it got restarted, enabled and unmasked, the playbook leaves the service masked and which can make users confused and forces them to unmask the unit manually. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1917680 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `07029e1bf1`)	2021-03-29 15:22:23 +02:00
Guillaume Abrioux	a8420d41c6	update: stop ceph-crash service before upgrading This adds the missing service stop task for ceph-crash upgrade workflow. It should have been added through commit `15872e3db1e342238636bc9c8e1aef6bd1d3dcd8` in stable-4.0 but at the time we backported this patch ceph-crash wasn't implemented yet so the ceph-crash related content in this patch was removed. Then, ceph-crash has been implemented later so we are still missing this part of the patch in stable-4.0. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1943471 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-03-26 16:18:50 +01:00
Alex Schultz	7ddbe74712	Use ansible_facts It has come to our attention that using ansible_* vars that are populated with INJECT_FACTS_AS_VARS=True is not very performant. In order to be able to support setting that to off, we need to update the references to use ansible_facts[<thing>] instead of ansible_<thing>. Related: ansible#73654 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1935406 Signed-off-by: Alex Schultz <aschultz@redhat.com> (cherry picked from commit `a7f2fa73e6`)	2021-03-26 00:16:58 +01:00
Guillaume Abrioux	2cd8c3637c	fix 'command -v' tasks `command -v` is a bash script which needs a shell to run. Fixes: #6325 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `14c472707c`)	2021-03-22 13:53:11 +01:00
Guillaume Abrioux	0d0723298f	purge: rm service-cid files This commit makes sure purge playbooks remove those file if for any reason they have been left. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1920900 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `b9dd253a4f`)	2021-03-11 13:52:48 +01:00
Guillaume Abrioux	932abbc8cf	switch2container: do not serialize the ceph-crash migration There's no need to slow down the playbook execution time by migrating all the `ceph-crash` instances in a serial way. Let's remove the `serial: 1` so the migration is achieved in a parallel way. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `980a5a7df4`)	2021-03-11 13:52:39 +01:00
Dimitri Savineau	8f26ffdbac	rolling_update: enforce ceph-container-engine When running the rolling_update.yml playbook and adding the dashboard component in the same time then the requirement (like container packages) aren't installed. This could lead to a failure in case of using authentication on the container registry because the playbook will try to login on the registry but podman/docker aren't yet installed. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1903504 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1918650 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `48a456dc8c`)	2021-03-11 13:52:21 +01:00
Dimitri Savineau	3ba27c9387	rolling_update: exclude clients from node-exporter Since `b105549` we don't install node-exporter on client nodes so we should also exclude the client node from the node-exporter upgrade. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `94af3c87d1`)	2021-03-11 13:52:02 +01:00
Guillaume Abrioux	1b424ad5e9	purge: zap and destroy db and wal devices for lvm batch Those devices (db/wal) are never zapped in lvm batch deployment. Iterating over `dedicated_devices` and `bluestore_wal_devices` fixes this issue. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1922926 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `984191ac7f`)	2021-03-11 13:51:38 +01:00
Guillaume Abrioux	bb1f66cb51	switch2container: fix mon quorum check The current check makes no sense because it checks any of other monitor than the one being played (either a previous one already converted or a next that isn't yet converted) is present on the quorum. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1909011 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `175ffa1b88`)	2021-03-11 13:50:27 +01:00
Guillaume Abrioux	858048560e	update: fix require-osd-release task This commit fixes two issues in rolling_update.yml: - `container_exec_cmd_update_osd` is unset in the `complete osd upgrade` play so it never runs the command in a container. - the 'require-osd-release' task is never applied because the condition looks for luminous release. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1930164 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-02-18 22:22:06 +01:00
Guillaume Abrioux	b903446fa4	containers: use --cpus instead --cpu-quota When using docker 1.13.1, the current condition: ``` {% if (container_binary == 'docker' and ceph_docker_version.split('.')[0] is version_compare('13', '>=')) or container_binary == 'podman' -%} ``` is wrong because it compares the first digit (1) whereas it should compare the second one. It means we always use `--cpu-quota` although documentation recommend using `--cpus` when docker version is 1.13.1 or higher. From the doc: > --cpu-quota=<value> Impose a CPU CFS quota on the container. The number of > microseconds per --cpu-period that the container is limited to before > throttled. As such acting as the effective ceiling. > If you use Docker 1.13 or higher, use --cpus instead. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `3e262e072b`)	2021-01-28 16:37:50 -05:00
Guillaume Abrioux	a36eee1852	fs2bs: skip migration when a mix of fs and bs is detected Since the default of `osd_objectstore` has changed as of 3.2, some deployments might have a mix of filestore and bluestore OSDs on a same node. In some specific cases, there's a possibility that a filestore OSD shares a journal/db device with a bluestore OSD. We shouldn't try to redeploy in this context because ceph-volume will complain. (either because in lvm batch you can't pass partition or about gpt header). The safest option is to skip the migration on the node when such a mix is detected or force all osds including those already using bluestore (option `force_filestore_to_bluestore=True` has to be passed as an extra var). If all OSDs are using filestore, then they will be migrated to bluestore. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1875777 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `e66f12d138`)	2021-01-22 11:37:40 -05:00

1 2 3 4 5 ...

665 Commits (0d670c7942d919bab5ea4ad4c31ade1a61be0336)