ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Dimitri Savineau	f64a4258ea	switch2container: run ceph-validate role This adds the ceph-validate role before starting the switch to a containerized deployment. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1968177 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `fc160b3be1`)	2021-06-30 09:29:58 +02:00
Guillaume Abrioux	16dc991351	shrink-mgr: modify existing mgr check Do not rely on the inventory aliases in order to check if the selected manager to be removed is present. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967897 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `26a7256c4c`)	2021-06-29 17:52:22 +02:00
Guillaume Abrioux	0856d3e47f	cephadm-adopt/rgw: add host target in svc_id If multi-realms were deployed with several instances belonging to the same realm and zone using the same port on different nodes, the service id expected by cephadm will be the same and therefore only one service will be deployed. We need to create a service called `<node>.<realm>.<zone>.<port>` to be sure the service name will be unique and well deployed on the expected node in order to preserve backward compatibility with the rgws instances that were deployed with ceph-ansible. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967455 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `31311b03ed`)	2021-06-29 15:18:49 +02:00
Guillaume Abrioux	aa332ac64d	cephadm-adopt: support rgw multisite adoption We need to support rgw multisite deployments. This commit makes the adoption playbook support this kind of deployment. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967455 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `fc784fc44c`)	2021-06-24 09:48:27 +02:00
Guillaume Abrioux	93f1765259	update: block upgrade when nfs+rgw is deployed This is an unsupported configuration since there are issues with RGW+NFS upgraded from Nautilus to Pacific. This approach might be seen as a bit aggressive but it is preferable to wait before upgrading in that case. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1970003 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-06-16 19:39:42 +02:00
Guillaume Abrioux	17f9780274	cephadm-adopt: fix mgr placement hosts task When no `[mgrs]` group is defined in the inventory, mgr daemon are implicitly collocated with monitors. This task currently relies on the length of the mgr group in order to tell cephadm to deploy mgr daemons. If there's no `[mgrs]` group defined in the inventory, it will ask cephadm to deploy 0 mgr daemon which doesn't make sense and will throw an error. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1970313 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `f9a73149a4`)	2021-06-14 13:55:45 +02:00
Guillaume Abrioux	8dda6d0b4d	fs2bs: use match filter in selectattr() `0990ae4109` changed the filter in selectattr() from 'match' to 'equalto' but due to an incompatibility with the Jinja2 version for python 2.7 on el7 we must stick to using 'match' filter. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `d6745e9cd9`)	2021-05-26 09:15:43 +02:00
Guillaume Abrioux	b2759c0c51	fs2bs: fix wrong filter when setting osd_ids using 'match' filter in that task will lead to bad behavior if I have the following node names for instance: - node1 - node11 - node111 with `selectattr('name', 'match', inventory_hostname)` it will match 'node1' along with 'node11' and 'node111'. using 'equalto' filter will make sure we only match the target node. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1963066 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `0990ae4109`)	2021-05-25 20:50:10 +02:00
Guillaume Abrioux	d319da14c8	update: fix ceph-crash stop task This is a workaround for an issue in ansible. When trying to stop/mask/disable this service in one task, the stop didn't actually happen, the task doesn't fail but for some reason the container is still present and running. Then the task starting the service in the role ceph-crash fails because it can't start the container since it's already running with the same name. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1955393 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `3db1ea7ec4`)	2021-05-04 15:59:46 +02:00
Guillaume Abrioux	747d259511	cephadm_adopt: fix ceph-crash migration ceph-ansible leaves a ceph-crash container in containerized deployment. It means we end up with 2 ceph-crash containers running after the migration playbook is complete. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1954614 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `22c18e82f0`)	2021-04-29 07:14:17 +02:00
Guillaume Abrioux	60c0fb8a7a	cephadm_adopt: fix rgw placement task Due to a recent breaking change in ceph, this command must be modified to add the <svc_id> parameter. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `1f40c12502`)	2021-04-27 15:17:28 +02:00
Guillaume Abrioux	a1f445cc73	cephadm_adopt: create a 'nfs-ganesha' pool When migrating from a cluster with no MDS nodes deployed, `{{ cephfs_data_pool.name }}` doesn't exist so we need to create a pool for storing nfs export objects. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1950403 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `bb7d37fb6a`)	2021-04-27 15:17:28 +02:00
Guillaume Abrioux	e332051b46	switch-to-containers: only chown corresponding files When collocating daemons, if we chown all files under `/var/lib/ceph` it can cause issues for the collocated daemons that wouldn't have been migrated yet. This commit makes the playbook chown only the files corresponding to the daemon being migrated. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `ddbc11c4a9`)	2021-04-15 05:24:12 +02:00
Guillaume Abrioux	fd0da6f43c	fs2bs: add a final play This removes the fact `skipped_nodes` which is useless when we run with `--limit` since it gets reset when a new iteration is made. Instead, let's print within a final play which node has been skipped reusing the `skip_this_node` fact. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `3d4267051f`)	2021-04-14 16:46:31 +02:00
Guillaume Abrioux	6b87d8c95c	cephadm_adopt: support nfs-ganesha adoption This commit adds the nfs-ganesha adoption support in the `cephadm-adopt.yml` playbook. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1944504 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `a9220654f5`)	2021-04-12 15:32:22 +02:00
Guillaume Abrioux	5aa9d0dfb4	cephadm_adopt: modify placement policy for rgw the adoption playbook should use `radosgw_num_instances` in order to determine how much rgw instance it should set recreate. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1943170 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `1ffc4df6b6`)	2021-04-12 15:32:22 +02:00
Guillaume Abrioux	c2d40d4383	cephadm_adopt: fix a typo This play doesn't nothing else than stopping/removing rgw daemons. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `ee44d86072`)	2021-04-12 15:32:22 +02:00
Guillaume Abrioux	e84c42b33f	docker2podman: skip some role imports from handler when running docker-to-podman playbook, there's no need to call `ceph-config` and `ceph-rgw` from the role `ceph-handler`. It can even have side effects when coming from a baremetal cluster that was previously migrated using the switch-to-containers playbook. Indeed it might complain about missing .target systemd unit since they are removed during that migration. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1944999 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `70f19be367`)	2021-04-12 13:30:09 +02:00
Guillaume Abrioux	03793da772	docker2podman: add documentation/header this adds a small documentation in the header of the playbook in order to explain what is the goal of this playbook. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `36b4227dcd`)	2021-04-12 09:44:14 +02:00
Guillaume Abrioux	9ab9b741f3	switch_to_containers: support iscsigws migration This adds the iscsigws migration to containers. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=<bz-number> Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `2c74c27321`)	2021-04-09 15:28:06 +02:00
Guillaume Abrioux	000b203ebf	update: followup on `07029e1` Playbook must fail anyway, the `rescue` block has been introduced for unmasking the unit after the playbook has failed. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `e9ddb972fe`)	2021-03-29 10:54:44 +02:00
Guillaume Abrioux	1fd0661d3e	rolling_update: unmask monitor service after a failure if for some reason the playbook fails after the service was stopped, disabled and masked and before it got restarted, enabled and unmasked, the playbook leaves the service masked and which can make users confused and forces them to unmask the unit manually. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1917680 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `07029e1bf1`)	2021-03-26 15:20:35 +01:00
Alex Schultz	56aac327dd	Use ansible_facts It has come to our attention that using ansible_* vars that are populated with INJECT_FACTS_AS_VARS=True is not very performant. In order to be able to support setting that to off, we need to update the references to use ansible_facts[<thing>] instead of ansible_<thing>. Related: ansible#73654 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1935406 Signed-off-by: Alex Schultz <aschultz@redhat.com> (cherry picked from commit `a7f2fa73e6`)	2021-03-26 00:04:49 +01:00
Guillaume Abrioux	a4d4f53080	fix 'command -v' tasks `command -v` is a bash script which needs a shell to run. Fixes: #6325 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `14c472707c`)	2021-03-22 13:52:39 +01:00
Guillaume Abrioux	8d25b4305e	adopt: convert legacy grafana-server groupname early This is a follow up on PR #6332 cephadm-adopt.yml playbook is affected by the same bug Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1938658 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `af95595c82`)	2021-03-18 08:56:44 +01:00
Guillaume Abrioux	c296824ae0	cephadm_adopt: fetch and write ceph minimal config This commit makes the playbook fetch the minimal current ceph configuration and write it later on monitoring nodes so `cephadm` can proceed with the adoption. When a monitoring stack was deployed on a dedicated node, it means no `ceph.conf` file was written, `cephadm` requires a `ceph.conf` in order to adopt the daemon present on the node. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1939887 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `b445df0479`)	2021-03-18 08:51:59 +01:00
Guillaume Abrioux	732e5b10b8	update: convert legacy grafana-server groupname early If the legacy name `grafana-server` is still being used when upgrading from Nautilus to Pacific, the task that sets the fact `rolling_update` to `true` doesn't run on the node(s) included in that group. Indeed the play where we set this fact (`rolling_update`) only runs on the group `monitoring_group_name \| default('monitoring')`. As a workaround, we can run earlier the task which converts the `grafana-server` group name to `monitoring`. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1935554 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `6ccc8b4722`)	2021-03-16 14:33:40 +01:00
Guillaume Abrioux	3326b6d54f	purge: rm service-cid files This commit makes sure purge playbooks remove those file if for any reason they have been left. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1920900 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `b9dd253a4f`)	2021-02-12 18:33:19 +01:00
Guillaume Abrioux	5803619a5d	switch2container: do not serialize the ceph-crash migration There's no need to slow down the playbook execution time by migrating all the `ceph-crash` instances in a serial way. Let's remove the `serial: 1` so the migration is achieved in a parallel way. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `980a5a7df4`)	2021-02-12 14:06:15 +01:00
Guillaume Abrioux	980a0dd00e	rolling_update: update specific pacific task update the 'require-osd-release' task. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-02-12 09:15:24 +01:00
Dimitri Savineau	950a6ae406	cephadm-adopt: remove prometheus workaround This was fixed by [1][2] [1] https://tracker.ceph.com/issues/45120 [2] https://github.com/ceph/ceph/commit/252d4b30 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-02-10 13:51:41 +01:00
Dimitri Savineau	48a456dc8c	rolling_update: enforce ceph-container-engine When running the rolling_update.yml playbook and adding the dashboard component in the same time then the requirement (like container packages) aren't installed. This could lead to a failure in case of using authentication on the container registry because the playbook will try to login on the registry but podman/docker aren't yet installed. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1903504 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1918650 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-02-10 08:17:11 +01:00
Dimitri Savineau	94af3c87d1	rolling_update: exclude clients from node-exporter Since `b105549` we don't install node-exporter on client nodes so we should also exclude the client node from the node-exporter upgrade. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-02-09 14:41:13 +01:00
Guillaume Abrioux	b9cdee40a2	update: update ceph release pattern in complete upgrade play since master is now deploying quincy, we must update this. Otherwise, it will fail like following: ``` Error EPERM: require_osd_release cannot be lowered once it has been set ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-02-06 00:34:14 +01:00
Guillaume Abrioux	44fbadb50c	rolling_update: pg check refactor There's no need to achieve this in two tasks. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-02-06 00:34:14 +01:00
Dimitri Savineau	76a663245d	cephadm-adopt: use ceph_osd_flag module There's no reason to not use the ceph_osd_flag module to set/unset osd flags. Also if there's no OSD nodes in the inventory then we don't need to execute the set/unset play. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-02-03 08:29:31 +01:00
Dimitri Savineau	36fc04eaab	purge-cluster: use parted ansible module Instead of doing some scripting via the shell module, we can use the parted ansible module to check the boot flag on partitions. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-02-03 08:28:22 +01:00
Guillaume Abrioux	984191ac7f	purge: zap and destroy db and wal devices for lvm batch Those devices (db/wal) are never zapped in lvm batch deployment. Iterating over `dedicated_devices` and `bluestore_wal_devices` fixes this issue. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1922926 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-02-01 13:01:58 -05:00
Dimitri Savineau	2734a12d44	cephadm-adopt: use radosgw modules for idempotency When rerunning the cephadm-adopt.yml playbook the radosgw realm, zonegroup and zone tasks will fail because the task isn't idempotent. Using the radosgw ansible modules solves that problem. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-01-29 21:07:39 +01:00
Dimitri Savineau	6886700a00	cephadm-adopt: make the playbook idempotent If the cephadm-adopt.yml fails during the first execution and some daemons have already been adopted by cephadm then we can't rerun the playbook because the old container won't exist anymore. Error: no container with name or ID ceph-mon-xxx found: no such container If the daemons are adopted then the old systemd unit doesn't exist anymore so any call to that unit with systemd will fail. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1918424 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-01-29 21:07:39 +01:00
Guillaume Abrioux	e835b08f8f	fs2bs: remove a legacy fact since `cf7345f143`, we don't need to set this fact anymore. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-01-28 16:26:46 +01:00
Dimitri Savineau	13427eddac	cephadm-adopt: add grafana group conversion The grafana group conversion task wasn't present in the cephadm-adopt.yml playbook. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1917530 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-01-18 20:52:58 +01:00
Guillaume Abrioux	e66f12d138	fs2bs: skip migration when a mix of fs and bs is detected Since the default of `osd_objectstore` has changed as of 3.2, some deployments might have a mix of filestore and bluestore OSDs on a same node. In some specific cases, there's a possibility that a filestore OSD shares a journal/db device with a bluestore OSD. We shouldn't try to redeploy in this context because ceph-volume will complain. (either because in lvm batch you can't pass partition or about gpt header). The safest option is to skip the migration on the node when such a mix is detected or force all osds including those already using bluestore (option `force_filestore_to_bluestore=True` has to be passed as an extra var). If all OSDs are using filestore, then they will be migrated to bluestore. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1875777 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-01-12 14:40:25 -05:00
Guillaume Abrioux	175ffa1b88	switch2container: fix mon quorum check The current check makes no sense because it checks any of other monitor than the one being played (either a previous one already converted or a next that isn't yet converted) is present on the quorum. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1909011 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-01-11 14:42:45 -05:00
Dimitri Savineau	5b6f907a72	cephadm: remove loop on host add tasks Instead of iterate over the host list for adding the node/label to the host orchestrator configuration then we can do it parallelly. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-12-16 15:14:28 +01:00
Dimitri Savineau	0108c9f941	purge-container-cluster: always prune force Since podman 2.x, there's now a confirmation when running podman container prune command. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-12-09 14:46:45 -05:00
Dimitri Savineau	08f118077f	library: add cephadm_adopt module This adds cephadm_adopt ansible module for replacing the command module usage with the cephadm adopt command. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-12-02 09:15:44 +01:00
Guillaume Abrioux	86a8889ee3	common: do not use pipefail when not needed Let's discard the ansible lint error 306 and add a "# noqa 306" on tasks where we don't need `set -o pipefail` Fixes: #6090 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-12-01 15:07:09 -05:00
Dimitri Savineau	cf7345f143	consume ceph_volume module when possible We should always use the ceph_volume ansible module when possible. This patch replace the ceph-volume inventory and lvm {list,zap} commands called via the command/shell modules by the corresponding call with the ceph_volume module. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-12-01 17:54:10 +01:00
Dimitri Savineau	c3ed124d31	library: add cephadm_bootstrap module This adds cephadm_bootstrap ansible module for replacing the command module usage with the cephadm bootstrap command. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-12-01 10:30:05 +01:00

1 2 3 4 5 ...

709 Commits (f64a4258ea47a15ff7d8aa9cafa2ac88a239dcc0)