ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Guillaume Abrioux	60aa70a128	purge: reindent playbook This commit reindents the playbook. Also improve readability by adding an extra line between plays. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-07-13 09:47:30 -04:00
Dimitri Savineau	a305296384	cephadm-adopt: enable osd memory autotune for HCI This enables the osd_memory_target_autotune option on HCI environment. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1973149 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-07-12 18:17:37 +02:00
Dimitri Savineau	97148dd58c	rolling_update: check quorum state before upgrade If one a the monitor is out of the quorum then nothing prevents the upgrade playbook to run. We only check if we have at least three monitor nodes but we should also check if those monitor nodes are correctly present in the quorum. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1952571 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-07-12 18:16:22 +02:00
Guillaume Abrioux	c396122ad9	update: fail the playbook if straw2 conversion failed It's better to fail the playbook so the user is aware the straw2 migration has failed. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-07-09 11:44:06 -04:00
Guillaume Abrioux	4eb4268dee	update: followup on pr #6689 add mising 'osd' command. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-07-09 10:01:45 +02:00
Guillaume Abrioux	eee576477c	update: convert straw bucket After an upgrade, the presence of straw buckets will produce the following warning (HEALTH_WARN): ``` crush map has legacy tunables (require firefly, min is hammer) ``` because straw bucket is a firefly feature it needs to be converted to straw2. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967964 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-07-09 08:28:46 +02:00
Dimitri Savineau	aeb9f562e5	cephadm-adopt: set application on ganesha pool Set the nfs application to the ganesha pool. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1956840 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-07-08 20:35:58 +02:00
Dimitri Savineau	8e4ef7d6da	infra: add playbook to purge dashboard/monitoring The dashboard/monitoring stack can be deployed via the dashboard_enabled variable. But there's nothing similar if we can to remove that part only and keep the ceph cluster up and running. The current purge playbooks remove everything. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786691 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-07-06 09:02:37 +02:00
Guillaume Abrioux	3b804a61dd	cephadm_adopt: add any_errors_fatal on play Add any_errors_fatal: true in cephadm-adopt playbook. We should stop the playbook execution when a task throws an error. Otherwise it can lead to unexpected behavior. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1976179 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-07-02 22:15:07 +02:00
Guillaume Abrioux	037d8cd05e	purge: add monitoring group in final cleanup play This adds the monitoring group in the "final cleanup play" so any cid files generated are well removed when purging the cluster. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1974536 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-07-02 13:37:15 -04:00
Dimitri Savineau	a05730b38a	rhcs: remove ISO install method Starting RHCS 5, there's no ISO available anymore. This removes all ISO variables and the ceph_repository_type variable. Closes: #6626 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-06-30 18:03:03 +02:00
Guillaume Abrioux	26a7256c4c	shrink-mgr: modify existing mgr check Do not rely on the inventory aliases in order to check if the selected manager to be removed is present. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967897 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-06-29 14:53:19 +02:00
Guillaume Abrioux	31311b03ed	cephadm-adopt/rgw: add host target in svc_id If multi-realms were deployed with several instances belonging to the same realm and zone using the same port on different nodes, the service id expected by cephadm will be the same and therefore only one service will be deployed. We need to create a service called `<node>.<realm>.<zone>.<port>` to be sure the service name will be unique and well deployed on the expected node in order to preserve backward compatibility with the rgws instances that were deployed with ceph-ansible. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967455 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-06-29 14:41:09 +02:00
Dimitri Savineau	fc160b3be1	switch2container: run ceph-validate role This adds the ceph-validate role before starting the switch to a containerized deployment. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1968177 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-06-28 18:06:53 +02:00
Guillaume Abrioux	fc784fc44c	cephadm-adopt: support rgw multisite adoption We need to support rgw multisite deployments. This commit makes the adoption playbook support this kind of deployment. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967455 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-06-23 22:01:59 +02:00
Guillaume Abrioux	f9a73149a4	cephadm-adopt: fix mgr placement hosts task When no `[mgrs]` group is defined in the inventory, mgr daemon are implicitly collocated with monitors. This task currently relies on the length of the mgr group in order to tell cephadm to deploy mgr daemons. If there's no `[mgrs]` group defined in the inventory, it will ask cephadm to deploy 0 mgr daemon which doesn't make sense and will throw an error. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1970313 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-06-14 10:38:37 +02:00
Guillaume Abrioux	d6745e9cd9	fs2bs: use match filter in selectattr() `0990ae4109` changed the filter in selectattr() from 'match' to 'equalto' but due to an incompatibility with the Jinja2 version for python 2.7 on el7 we must stick to using 'match' filter. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-05-26 08:14:38 +02:00
Guillaume Abrioux	0990ae4109	fs2bs: fix wrong filter when setting osd_ids using 'match' filter in that task will lead to bad behavior if I have the following node names for instance: - node1 - node11 - node111 with `selectattr('name', 'match', inventory_hostname)` it will match 'node1' along with 'node11' and 'node111'. using 'equalto' filter will make sure we only match the target node. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1963066 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-05-25 16:59:30 +02:00
Guillaume Abrioux	2c77d0094c	update: do not gather facts on each play There's no benefit to gather facts again on each play in rolling_update.yml Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-05-22 08:33:44 +02:00
Guillaume Abrioux	3db1ea7ec4	update: fix ceph-crash stop task This is a workaround for an issue in ansible. When trying to stop/mask/disable this service in one task, the stop didn't actually happen, the task doesn't fail but for some reason the container is still present and running. Then the task starting the service in the role ceph-crash fails because it can't start the container since it's already running with the same name. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1955393 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-05-04 13:06:47 +02:00
Guillaume Abrioux	22c18e82f0	cephadm_adopt: fix ceph-crash migration ceph-ansible leaves a ceph-crash container in containerized deployment. It means we end up with 2 ceph-crash containers running after the migration playbook is complete. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1954614 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-04-28 19:53:01 +02:00
Guillaume Abrioux	1f40c12502	cephadm_adopt: fix rgw placement task Due to a recent breaking change in ceph, this command must be modified to add the <svc_id> parameter. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-04-27 13:37:56 +02:00
Guillaume Abrioux	bb7d37fb6a	cephadm_adopt: create a 'nfs-ganesha' pool When migrating from a cluster with no MDS nodes deployed, `{{ cephfs_data_pool.name }}` doesn't exist so we need to create a pool for storing nfs export objects. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1950403 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-04-27 13:37:56 +02:00
Guillaume Abrioux	ddbc11c4a9	switch-to-containers: only chown corresponding files When collocating daemons, if we chown all files under `/var/lib/ceph` it can cause issues for the collocated daemons that wouldn't have been migrated yet. This commit makes the playbook chown only the files corresponding to the daemon being migrated. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-04-14 21:32:20 +02:00
Guillaume Abrioux	3d4267051f	fs2bs: add a final play This removes the fact `skipped_nodes` which is useless when we run with `--limit` since it gets reset when a new iteration is made. Instead, let's print within a final play which node has been skipped reusing the `skip_this_node` fact. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-04-14 14:56:02 +02:00
Guillaume Abrioux	a9220654f5	cephadm_adopt: support nfs-ganesha adoption This commit adds the nfs-ganesha adoption support in the `cephadm-adopt.yml` playbook. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1944504 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-04-12 14:43:19 +02:00
Guillaume Abrioux	1ffc4df6b6	cephadm_adopt: modify placement policy for rgw the adoption playbook should use `radosgw_num_instances` in order to determine how much rgw instance it should set recreate. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1943170 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-04-12 14:43:19 +02:00
Guillaume Abrioux	ee44d86072	cephadm_adopt: fix a typo This play doesn't nothing else than stopping/removing rgw daemons. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-04-12 14:43:19 +02:00
Guillaume Abrioux	36b4227dcd	docker2podman: add documentation/header this adds a small documentation in the header of the playbook in order to explain what is the goal of this playbook. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-04-12 09:30:26 +02:00
Guillaume Abrioux	70f19be367	docker2podman: skip some role imports from handler when running docker-to-podman playbook, there's no need to call `ceph-config` and `ceph-rgw` from the role `ceph-handler`. It can even have side effects when coming from a baremetal cluster that was previously migrated using the switch-to-containers playbook. Indeed it might complain about missing .target systemd unit since they are removed during that migration. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1944999 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-04-09 15:28:50 +02:00
Guillaume Abrioux	2c74c27321	switch_to_containers: support iscsigws migration This adds the iscsigws migration to containers. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=<bz-number> Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-04-09 13:37:55 +02:00
Guillaume Abrioux	e9ddb972fe	update: followup on `07029e1` Playbook must fail anyway, the `rescue` block has been introduced for unmasking the unit after the playbook has failed. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-03-26 21:27:02 +01:00
Guillaume Abrioux	14c472707c	fix 'command -v' tasks `command -v` is a bash script which needs a shell to run. Fixes: #6325 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-03-18 20:29:05 +01:00
Guillaume Abrioux	07029e1bf1	rolling_update: unmask monitor service after a failure if for some reason the playbook fails after the service was stopped, disabled and masked and before it got restarted, enabled and unmasked, the playbook leaves the service masked and which can make users confused and forces them to unmask the unit manually. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1917680 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-03-18 15:22:38 +01:00
Guillaume Abrioux	b445df0479	cephadm_adopt: fetch and write ceph minimal config This commit makes the playbook fetch the minimal current ceph configuration and write it later on monitoring nodes so `cephadm` can proceed with the adoption. When a monitoring stack was deployed on a dedicated node, it means no `ceph.conf` file was written, `cephadm` requires a `ceph.conf` in order to adopt the daemon present on the node. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1939887 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-03-17 17:39:12 +01:00
Guillaume Abrioux	af95595c82	adopt: convert legacy grafana-server groupname early This is a follow up on PR #6332 cephadm-adopt.yml playbook is affected by the same bug Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1938658 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-03-17 16:04:11 +01:00
Guillaume Abrioux	6ccc8b4722	update: convert legacy grafana-server groupname early If the legacy name `grafana-server` is still being used when upgrading from Nautilus to Pacific, the task that sets the fact `rolling_update` to `true` doesn't run on the node(s) included in that group. Indeed the play where we set this fact (`rolling_update`) only runs on the group `monitoring_group_name \| default('monitoring')`. As a workaround, we can run earlier the task which converts the `grafana-server` group name to `monitoring`. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1935554 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-03-15 15:25:48 +01:00
Alex Schultz	a7f2fa73e6	Use ansible_facts It has come to our attention that using ansible_* vars that are populated with INJECT_FACTS_AS_VARS=True is not very performant. In order to be able to support setting that to off, we need to update the references to use ansible_facts[<thing>] instead of ansible_<thing>. Related: ansible#73654 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1935406 Signed-off-by: Alex Schultz <aschultz@redhat.com>	2021-03-08 20:54:02 +01:00
Guillaume Abrioux	b9dd253a4f	purge: rm service-cid files This commit makes sure purge playbooks remove those file if for any reason they have been left. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1920900 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-02-12 10:01:31 +01:00
Guillaume Abrioux	980a5a7df4	switch2container: do not serialize the ceph-crash migration There's no need to slow down the playbook execution time by migrating all the `ceph-crash` instances in a serial way. Let's remove the `serial: 1` so the migration is achieved in a parallel way. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-02-11 21:36:23 +01:00
Dimitri Savineau	950a6ae406	cephadm-adopt: remove prometheus workaround This was fixed by [1][2] [1] https://tracker.ceph.com/issues/45120 [2] https://github.com/ceph/ceph/commit/252d4b30 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-02-10 13:51:41 +01:00
Dimitri Savineau	48a456dc8c	rolling_update: enforce ceph-container-engine When running the rolling_update.yml playbook and adding the dashboard component in the same time then the requirement (like container packages) aren't installed. This could lead to a failure in case of using authentication on the container registry because the playbook will try to login on the registry but podman/docker aren't yet installed. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1903504 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1918650 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-02-10 08:17:11 +01:00
Dimitri Savineau	94af3c87d1	rolling_update: exclude clients from node-exporter Since `b105549` we don't install node-exporter on client nodes so we should also exclude the client node from the node-exporter upgrade. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-02-09 14:41:13 +01:00
Guillaume Abrioux	b9cdee40a2	update: update ceph release pattern in complete upgrade play since master is now deploying quincy, we must update this. Otherwise, it will fail like following: ``` Error EPERM: require_osd_release cannot be lowered once it has been set ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-02-06 00:34:14 +01:00
Guillaume Abrioux	44fbadb50c	rolling_update: pg check refactor There's no need to achieve this in two tasks. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-02-06 00:34:14 +01:00
Dimitri Savineau	76a663245d	cephadm-adopt: use ceph_osd_flag module There's no reason to not use the ceph_osd_flag module to set/unset osd flags. Also if there's no OSD nodes in the inventory then we don't need to execute the set/unset play. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-02-03 08:29:31 +01:00
Dimitri Savineau	36fc04eaab	purge-cluster: use parted ansible module Instead of doing some scripting via the shell module, we can use the parted ansible module to check the boot flag on partitions. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-02-03 08:28:22 +01:00
Guillaume Abrioux	984191ac7f	purge: zap and destroy db and wal devices for lvm batch Those devices (db/wal) are never zapped in lvm batch deployment. Iterating over `dedicated_devices` and `bluestore_wal_devices` fixes this issue. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1922926 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-02-01 13:01:58 -05:00
Dimitri Savineau	2734a12d44	cephadm-adopt: use radosgw modules for idempotency When rerunning the cephadm-adopt.yml playbook the radosgw realm, zonegroup and zone tasks will fail because the task isn't idempotent. Using the radosgw ansible modules solves that problem. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-01-29 21:07:39 +01:00
Dimitri Savineau	6886700a00	cephadm-adopt: make the playbook idempotent If the cephadm-adopt.yml fails during the first execution and some daemons have already been adopted by cephadm then we can't rerun the playbook because the old container won't exist anymore. Error: no container with name or ID ceph-mon-xxx found: no such container If the daemons are adopted then the old systemd unit doesn't exist anymore so any call to that unit with systemd will fail. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1918424 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-01-29 21:07:39 +01:00

1 2 3 4 5 ...

719 Commits (cd06e7c046b3e56920b1f9bdc1907429382bee5c)