ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Dimitri Savineau	386661699b	infra: use dedicated variables for balancer status The balancer status is registered during the cephadm-adopt, rolling_update and swith2container playbooks. But it is also used in the ceph-handler role which is included in those playbooks too. Even if the ceph-handler tasks are skipped for rolling_update and switch2container, the balancer_status variable is erased with the skip task result. play1: register: balancer_status play2: register: balancer_status <-- skipped play3: when: (balancer_status.stdout \| from_json)['active'] \| bool This leads to issue like: The conditional check '(balancer_status.stdout \| from_json)['active'] \| bool' failed. The error was: Unexpected templating type error occurred on ({% if (balancer_status.stdout \| from_json)['active'] \| bool %} True {% else %} False {% endif %}): expected string or buffer. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1982054 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-08-04 17:39:54 +02:00
Dimitri Savineau	06471a4b82	osds: use osd pool ls instead of osd dump command The ceph osd pool ls detail command is a subset of the ceph osd dump command. $ ceph osd dump --format json\|wc -c 10117 $ ceph osd pool ls detail --format json\|wc -c 4740 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-08-02 15:51:01 +02:00
Dimitri Savineau	e87a47cf0c	rolling_update: get ceph version when mons exist `eec3878` introduced a regression for upgrade scenarios where there's no monitor nodes at all (like ganesha standalone, external clients, etc..) TASK [get the ceph release being deployed] ********************************** task path: infrastructure-playbooks/rolling_update.yml:121 Thursday 29 July 2021 15:55:29 +0000 (0:00:00.484) 0:00:15.802 ******* fatal: [client0]: FAILED! => msg: '''dict object'' has no attribute ''mons''' Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-08-02 15:47:56 +02:00
Benoît Knecht	d7653dca95	infrastructure-playbooks: Get Ceph info in check mode In the `set osd flags` block, run the Ceph commands that gather information from the cluster (and don't make any changes to it) even when running in check mode. This allows the tasks that depend on the variables set by those tasks to succeed in check mode. Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>	2021-07-28 14:04:54 +02:00
Guillaume Abrioux	eec38784ec	update: check the ceph release Check early which Ceph release is going to be deployed and fail if it doesn't correspond to the ceph-ansible version being used. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1978643 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-07-26 18:11:22 +02:00
Guillaume Abrioux	4144074a50	purge: support osd_auto_discovery This adds a task that zaps by osd id so we can support the scenario where osds were deployed with `osd_auto_discovery` is true. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1876860 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-07-22 10:49:44 -04:00
Guillaume Abrioux	17cd83bf3a	purge: merge playbooks This refactor merges the two playbooks so we only have to maintain 1 playbook. (Symlink the old purge-container-cluster.yml playbook for backward compatibility). Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-07-22 10:49:44 -04:00
Guillaume Abrioux	6b50401d0c	purge: drop variables from 'hosts' sections Those variables are useless given this is not possible to override them. Let's replace them with the hardcoded name instead. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-07-22 10:49:44 -04:00
Dimitri Savineau	738fa9428a	common: remove unnecessary run_once statements `1303611` introduced tasks for disabling the pg_autoscaler on pools and the balancer but thoses tasks are already executed on the first monitor node so we don't need to add the run_once statement. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-07-21 09:55:21 -04:00
Dimitri Savineau	cf6e33346e	common: fix py2 pool_list from_json when skipped When using python 2 and the task with a loop is skipped then it generates an error. Unexpected templating type error occurred on ({{ (pool_list.stdout \| from_json)['pools'] }}): expected string or buffer Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-07-21 08:17:58 +02:00
Guillaume Abrioux	13036115e2	common: disable/enable pg_autoscaler The PG autoscaler can disrupt the PG checks so the idea here is to disable it and re-enable it back after the restart is done. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-07-20 07:37:07 +02:00
Guillaume Abrioux	60aa70a128	purge: reindent playbook This commit reindents the playbook. Also improve readability by adding an extra line between plays. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-07-13 09:47:30 -04:00
Dimitri Savineau	a305296384	cephadm-adopt: enable osd memory autotune for HCI This enables the osd_memory_target_autotune option on HCI environment. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1973149 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-07-12 18:17:37 +02:00
Dimitri Savineau	97148dd58c	rolling_update: check quorum state before upgrade If one a the monitor is out of the quorum then nothing prevents the upgrade playbook to run. We only check if we have at least three monitor nodes but we should also check if those monitor nodes are correctly present in the quorum. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1952571 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-07-12 18:16:22 +02:00
Guillaume Abrioux	c396122ad9	update: fail the playbook if straw2 conversion failed It's better to fail the playbook so the user is aware the straw2 migration has failed. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-07-09 11:44:06 -04:00
Guillaume Abrioux	4eb4268dee	update: followup on pr #6689 add mising 'osd' command. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-07-09 10:01:45 +02:00
Guillaume Abrioux	eee576477c	update: convert straw bucket After an upgrade, the presence of straw buckets will produce the following warning (HEALTH_WARN): ``` crush map has legacy tunables (require firefly, min is hammer) ``` because straw bucket is a firefly feature it needs to be converted to straw2. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967964 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-07-09 08:28:46 +02:00
Dimitri Savineau	aeb9f562e5	cephadm-adopt: set application on ganesha pool Set the nfs application to the ganesha pool. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1956840 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-07-08 20:35:58 +02:00
Dimitri Savineau	8e4ef7d6da	infra: add playbook to purge dashboard/monitoring The dashboard/monitoring stack can be deployed via the dashboard_enabled variable. But there's nothing similar if we can to remove that part only and keep the ceph cluster up and running. The current purge playbooks remove everything. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786691 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-07-06 09:02:37 +02:00
Guillaume Abrioux	3b804a61dd	cephadm_adopt: add any_errors_fatal on play Add any_errors_fatal: true in cephadm-adopt playbook. We should stop the playbook execution when a task throws an error. Otherwise it can lead to unexpected behavior. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1976179 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-07-02 22:15:07 +02:00
Guillaume Abrioux	037d8cd05e	purge: add monitoring group in final cleanup play This adds the monitoring group in the "final cleanup play" so any cid files generated are well removed when purging the cluster. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1974536 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-07-02 13:37:15 -04:00
Dimitri Savineau	a05730b38a	rhcs: remove ISO install method Starting RHCS 5, there's no ISO available anymore. This removes all ISO variables and the ceph_repository_type variable. Closes: #6626 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-06-30 18:03:03 +02:00
Guillaume Abrioux	26a7256c4c	shrink-mgr: modify existing mgr check Do not rely on the inventory aliases in order to check if the selected manager to be removed is present. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967897 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-06-29 14:53:19 +02:00
Guillaume Abrioux	31311b03ed	cephadm-adopt/rgw: add host target in svc_id If multi-realms were deployed with several instances belonging to the same realm and zone using the same port on different nodes, the service id expected by cephadm will be the same and therefore only one service will be deployed. We need to create a service called `<node>.<realm>.<zone>.<port>` to be sure the service name will be unique and well deployed on the expected node in order to preserve backward compatibility with the rgws instances that were deployed with ceph-ansible. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967455 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-06-29 14:41:09 +02:00
Dimitri Savineau	fc160b3be1	switch2container: run ceph-validate role This adds the ceph-validate role before starting the switch to a containerized deployment. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1968177 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-06-28 18:06:53 +02:00
Guillaume Abrioux	fc784fc44c	cephadm-adopt: support rgw multisite adoption We need to support rgw multisite deployments. This commit makes the adoption playbook support this kind of deployment. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967455 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-06-23 22:01:59 +02:00
Guillaume Abrioux	f9a73149a4	cephadm-adopt: fix mgr placement hosts task When no `[mgrs]` group is defined in the inventory, mgr daemon are implicitly collocated with monitors. This task currently relies on the length of the mgr group in order to tell cephadm to deploy mgr daemons. If there's no `[mgrs]` group defined in the inventory, it will ask cephadm to deploy 0 mgr daemon which doesn't make sense and will throw an error. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1970313 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-06-14 10:38:37 +02:00
Guillaume Abrioux	d6745e9cd9	fs2bs: use match filter in selectattr() `0990ae4109` changed the filter in selectattr() from 'match' to 'equalto' but due to an incompatibility with the Jinja2 version for python 2.7 on el7 we must stick to using 'match' filter. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-05-26 08:14:38 +02:00
Guillaume Abrioux	0990ae4109	fs2bs: fix wrong filter when setting osd_ids using 'match' filter in that task will lead to bad behavior if I have the following node names for instance: - node1 - node11 - node111 with `selectattr('name', 'match', inventory_hostname)` it will match 'node1' along with 'node11' and 'node111'. using 'equalto' filter will make sure we only match the target node. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1963066 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-05-25 16:59:30 +02:00
Guillaume Abrioux	2c77d0094c	update: do not gather facts on each play There's no benefit to gather facts again on each play in rolling_update.yml Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-05-22 08:33:44 +02:00
Guillaume Abrioux	3db1ea7ec4	update: fix ceph-crash stop task This is a workaround for an issue in ansible. When trying to stop/mask/disable this service in one task, the stop didn't actually happen, the task doesn't fail but for some reason the container is still present and running. Then the task starting the service in the role ceph-crash fails because it can't start the container since it's already running with the same name. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1955393 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-05-04 13:06:47 +02:00
Guillaume Abrioux	22c18e82f0	cephadm_adopt: fix ceph-crash migration ceph-ansible leaves a ceph-crash container in containerized deployment. It means we end up with 2 ceph-crash containers running after the migration playbook is complete. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1954614 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-04-28 19:53:01 +02:00
Guillaume Abrioux	1f40c12502	cephadm_adopt: fix rgw placement task Due to a recent breaking change in ceph, this command must be modified to add the <svc_id> parameter. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-04-27 13:37:56 +02:00
Guillaume Abrioux	bb7d37fb6a	cephadm_adopt: create a 'nfs-ganesha' pool When migrating from a cluster with no MDS nodes deployed, `{{ cephfs_data_pool.name }}` doesn't exist so we need to create a pool for storing nfs export objects. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1950403 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-04-27 13:37:56 +02:00
Guillaume Abrioux	ddbc11c4a9	switch-to-containers: only chown corresponding files When collocating daemons, if we chown all files under `/var/lib/ceph` it can cause issues for the collocated daemons that wouldn't have been migrated yet. This commit makes the playbook chown only the files corresponding to the daemon being migrated. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-04-14 21:32:20 +02:00
Guillaume Abrioux	3d4267051f	fs2bs: add a final play This removes the fact `skipped_nodes` which is useless when we run with `--limit` since it gets reset when a new iteration is made. Instead, let's print within a final play which node has been skipped reusing the `skip_this_node` fact. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-04-14 14:56:02 +02:00
Guillaume Abrioux	a9220654f5	cephadm_adopt: support nfs-ganesha adoption This commit adds the nfs-ganesha adoption support in the `cephadm-adopt.yml` playbook. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1944504 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-04-12 14:43:19 +02:00
Guillaume Abrioux	1ffc4df6b6	cephadm_adopt: modify placement policy for rgw the adoption playbook should use `radosgw_num_instances` in order to determine how much rgw instance it should set recreate. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1943170 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-04-12 14:43:19 +02:00
Guillaume Abrioux	ee44d86072	cephadm_adopt: fix a typo This play doesn't nothing else than stopping/removing rgw daemons. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-04-12 14:43:19 +02:00
Guillaume Abrioux	36b4227dcd	docker2podman: add documentation/header this adds a small documentation in the header of the playbook in order to explain what is the goal of this playbook. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-04-12 09:30:26 +02:00
Guillaume Abrioux	70f19be367	docker2podman: skip some role imports from handler when running docker-to-podman playbook, there's no need to call `ceph-config` and `ceph-rgw` from the role `ceph-handler`. It can even have side effects when coming from a baremetal cluster that was previously migrated using the switch-to-containers playbook. Indeed it might complain about missing .target systemd unit since they are removed during that migration. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1944999 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-04-09 15:28:50 +02:00
Guillaume Abrioux	2c74c27321	switch_to_containers: support iscsigws migration This adds the iscsigws migration to containers. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=<bz-number> Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-04-09 13:37:55 +02:00
Guillaume Abrioux	e9ddb972fe	update: followup on `07029e1` Playbook must fail anyway, the `rescue` block has been introduced for unmasking the unit after the playbook has failed. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-03-26 21:27:02 +01:00
Guillaume Abrioux	14c472707c	fix 'command -v' tasks `command -v` is a bash script which needs a shell to run. Fixes: #6325 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-03-18 20:29:05 +01:00
Guillaume Abrioux	07029e1bf1	rolling_update: unmask monitor service after a failure if for some reason the playbook fails after the service was stopped, disabled and masked and before it got restarted, enabled and unmasked, the playbook leaves the service masked and which can make users confused and forces them to unmask the unit manually. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1917680 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-03-18 15:22:38 +01:00
Guillaume Abrioux	b445df0479	cephadm_adopt: fetch and write ceph minimal config This commit makes the playbook fetch the minimal current ceph configuration and write it later on monitoring nodes so `cephadm` can proceed with the adoption. When a monitoring stack was deployed on a dedicated node, it means no `ceph.conf` file was written, `cephadm` requires a `ceph.conf` in order to adopt the daemon present on the node. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1939887 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-03-17 17:39:12 +01:00
Guillaume Abrioux	af95595c82	adopt: convert legacy grafana-server groupname early This is a follow up on PR #6332 cephadm-adopt.yml playbook is affected by the same bug Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1938658 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-03-17 16:04:11 +01:00
Guillaume Abrioux	6ccc8b4722	update: convert legacy grafana-server groupname early If the legacy name `grafana-server` is still being used when upgrading from Nautilus to Pacific, the task that sets the fact `rolling_update` to `true` doesn't run on the node(s) included in that group. Indeed the play where we set this fact (`rolling_update`) only runs on the group `monitoring_group_name \| default('monitoring')`. As a workaround, we can run earlier the task which converts the `grafana-server` group name to `monitoring`. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1935554 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-03-15 15:25:48 +01:00
Alex Schultz	a7f2fa73e6	Use ansible_facts It has come to our attention that using ansible_* vars that are populated with INJECT_FACTS_AS_VARS=True is not very performant. In order to be able to support setting that to off, we need to update the references to use ansible_facts[<thing>] instead of ansible_<thing>. Related: ansible#73654 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1935406 Signed-off-by: Alex Schultz <aschultz@redhat.com>	2021-03-08 20:54:02 +01:00
Guillaume Abrioux	b9dd253a4f	purge: rm service-cid files This commit makes sure purge playbooks remove those file if for any reason they have been left. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1920900 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-02-12 10:01:31 +01:00
Guillaume Abrioux	980a5a7df4	switch2container: do not serialize the ceph-crash migration There's no need to slow down the playbook execution time by migrating all the `ceph-crash` instances in a serial way. Let's remove the `serial: 1` so the migration is achieved in a parallel way. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-02-11 21:36:23 +01:00
Dimitri Savineau	950a6ae406	cephadm-adopt: remove prometheus workaround This was fixed by [1][2] [1] https://tracker.ceph.com/issues/45120 [2] https://github.com/ceph/ceph/commit/252d4b30 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-02-10 13:51:41 +01:00
Dimitri Savineau	48a456dc8c	rolling_update: enforce ceph-container-engine When running the rolling_update.yml playbook and adding the dashboard component in the same time then the requirement (like container packages) aren't installed. This could lead to a failure in case of using authentication on the container registry because the playbook will try to login on the registry but podman/docker aren't yet installed. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1903504 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1918650 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-02-10 08:17:11 +01:00
Dimitri Savineau	94af3c87d1	rolling_update: exclude clients from node-exporter Since `b105549` we don't install node-exporter on client nodes so we should also exclude the client node from the node-exporter upgrade. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-02-09 14:41:13 +01:00
Guillaume Abrioux	b9cdee40a2	update: update ceph release pattern in complete upgrade play since master is now deploying quincy, we must update this. Otherwise, it will fail like following: ``` Error EPERM: require_osd_release cannot be lowered once it has been set ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-02-06 00:34:14 +01:00
Guillaume Abrioux	44fbadb50c	rolling_update: pg check refactor There's no need to achieve this in two tasks. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-02-06 00:34:14 +01:00
Dimitri Savineau	76a663245d	cephadm-adopt: use ceph_osd_flag module There's no reason to not use the ceph_osd_flag module to set/unset osd flags. Also if there's no OSD nodes in the inventory then we don't need to execute the set/unset play. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-02-03 08:29:31 +01:00
Dimitri Savineau	36fc04eaab	purge-cluster: use parted ansible module Instead of doing some scripting via the shell module, we can use the parted ansible module to check the boot flag on partitions. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-02-03 08:28:22 +01:00
Guillaume Abrioux	984191ac7f	purge: zap and destroy db and wal devices for lvm batch Those devices (db/wal) are never zapped in lvm batch deployment. Iterating over `dedicated_devices` and `bluestore_wal_devices` fixes this issue. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1922926 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-02-01 13:01:58 -05:00
Dimitri Savineau	2734a12d44	cephadm-adopt: use radosgw modules for idempotency When rerunning the cephadm-adopt.yml playbook the radosgw realm, zonegroup and zone tasks will fail because the task isn't idempotent. Using the radosgw ansible modules solves that problem. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-01-29 21:07:39 +01:00
Dimitri Savineau	6886700a00	cephadm-adopt: make the playbook idempotent If the cephadm-adopt.yml fails during the first execution and some daemons have already been adopted by cephadm then we can't rerun the playbook because the old container won't exist anymore. Error: no container with name or ID ceph-mon-xxx found: no such container If the daemons are adopted then the old systemd unit doesn't exist anymore so any call to that unit with systemd will fail. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1918424 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-01-29 21:07:39 +01:00
Guillaume Abrioux	e835b08f8f	fs2bs: remove a legacy fact since `cf7345f143`, we don't need to set this fact anymore. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-01-28 16:26:46 +01:00
Dimitri Savineau	13427eddac	cephadm-adopt: add grafana group conversion The grafana group conversion task wasn't present in the cephadm-adopt.yml playbook. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1917530 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-01-18 20:52:58 +01:00
Guillaume Abrioux	e66f12d138	fs2bs: skip migration when a mix of fs and bs is detected Since the default of `osd_objectstore` has changed as of 3.2, some deployments might have a mix of filestore and bluestore OSDs on a same node. In some specific cases, there's a possibility that a filestore OSD shares a journal/db device with a bluestore OSD. We shouldn't try to redeploy in this context because ceph-volume will complain. (either because in lvm batch you can't pass partition or about gpt header). The safest option is to skip the migration on the node when such a mix is detected or force all osds including those already using bluestore (option `force_filestore_to_bluestore=True` has to be passed as an extra var). If all OSDs are using filestore, then they will be migrated to bluestore. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1875777 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-01-12 14:40:25 -05:00
Guillaume Abrioux	175ffa1b88	switch2container: fix mon quorum check The current check makes no sense because it checks any of other monitor than the one being played (either a previous one already converted or a next that isn't yet converted) is present on the quorum. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1909011 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-01-11 14:42:45 -05:00
Dimitri Savineau	5b6f907a72	cephadm: remove loop on host add tasks Instead of iterate over the host list for adding the node/label to the host orchestrator configuration then we can do it parallelly. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-12-16 15:14:28 +01:00
Dimitri Savineau	0108c9f941	purge-container-cluster: always prune force Since podman 2.x, there's now a confirmation when running podman container prune command. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-12-09 14:46:45 -05:00
Dimitri Savineau	08f118077f	library: add cephadm_adopt module This adds cephadm_adopt ansible module for replacing the command module usage with the cephadm adopt command. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-12-02 09:15:44 +01:00
Guillaume Abrioux	86a8889ee3	common: do not use pipefail when not needed Let's discard the ansible lint error 306 and add a "# noqa 306" on tasks where we don't need `set -o pipefail` Fixes: #6090 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-12-01 15:07:09 -05:00
Dimitri Savineau	cf7345f143	consume ceph_volume module when possible We should always use the ceph_volume ansible module when possible. This patch replace the ceph-volume inventory and lvm {list,zap} commands called via the command/shell modules by the corresponding call with the ceph_volume module. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-12-01 17:54:10 +01:00
Dimitri Savineau	c3ed124d31	library: add cephadm_bootstrap module This adds cephadm_bootstrap ansible module for replacing the command module usage with the cephadm bootstrap command. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-12-01 10:30:05 +01:00
Dimitri Savineau	5da593604a	library: add ceph_osd_flag module This adds ceph_osd_flag ansible module for replacing the command module usage with the ceph osd set/unset commands. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-12-01 10:29:11 +01:00
Dimitri Savineau	0b5b1de963	library: add ceph_osd module This adds ceph_osd ansible module for replacing the command module usage with the ceph osd destroy/down/in/out/purge/rm commands. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-11-30 16:53:45 +01:00
Dimitri Savineau	eaf0ebfc85	library: add ceph_mgr_module module This adds ceph_mgr_module ansible module for replacing the command module usage with the ceph mgr module enable/disable commands. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-11-30 16:52:02 +01:00
Guillaume Abrioux	0b05620597	switch2containers: do not stop ceph.target in osd play `ceph.target` should be disabled only. Otherwise, in collocation scenario you stop other collocated services in the OSD play which isn't what we want to do. Each daemon has its corresponding play for managing the transition to container. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1901865 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-11-27 09:34:01 -05:00
Dimitri Savineau	3baac5ad5b	library: add ceph_volume_simple_{activate,scan} This adds ceph_volume_simple_{activate,scan} ansible modules for replacing the command module usage with the ceph-volume simple activate/scan commands. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-11-25 10:09:42 +01:00
Guillaume Abrioux	970c6a4ee6	mon: refact initial keyring generation adding monitor is no longer possible because we generate a new mon keyring each time the playbook is run. Fixes: #5864 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-11-25 09:34:44 +01:00
Guillaume Abrioux	195d88fcda	lint: ignore 302,303,505 errors ignore 302,303 and 505 errors [302] Using command rather than an argument to e.g. file [303] Using command rather than module [505] referenced files must exist they aren't relevant on these tasks. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-11-23 08:33:47 +01:00
Guillaume Abrioux	c948b668eb	lint: do not use 'local_action' Fix ansible-lint 504 error: [504] Do not use 'local_action', use 'delegate_to: localhost' Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-11-23 08:33:47 +01:00
Guillaume Abrioux	dfc7e6e4bd	lint: trailing whitespace Fix ansible-lint 201 error: [201] Trailing whitespace Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-11-23 08:33:47 +01:00
Guillaume Abrioux	97dd9218dd	lint: all tasks should be named Fix ansible-lint 502 error: [502] All tasks should be named Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-11-23 08:33:47 +01:00
Guillaume Abrioux	11b4bf5083	lint: use shell only when shell functionality is required Fix ansible-lint 305 error: [305] Use shell only when shell functionality is required Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-11-23 08:33:47 +01:00
Guillaume Abrioux	2011e4dbc8	lint: don't compare to literal true/false Fix ansible lint 601 error: [601] Don't compare to literal True/False Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-11-23 08:33:47 +01:00
Guillaume Abrioux	9fba6eecfa	lint: variables should have spaces before and after Fix ansible lint 206 error: [206] Variables should have spaces before and after: {{ var_name }} Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-11-23 08:33:47 +01:00
Guillaume Abrioux	5450de58b3	lint: commands should not change things Fix ansible lint 301 error: [301] Commands should not change things if nothing needs doing Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-11-23 08:33:47 +01:00
Guillaume Abrioux	1879c26eb9	lint: set pipefail on shell tasks Fix ansible lint 306 error: [306] Shells that use pipes should set the pipefail option Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-11-23 08:33:47 +01:00
Dimitri Savineau	35ed9977aa	switch2container: chown symlink in mon/mgr plays `fa2bb3a` only fix the symlink owner/group issue in the OSD play. If the OSDs are collocated with other services like MONs and MGRs then the chown command will fail. $ find /var/lib/ceph/osd/ceph-0 -not -user 167 -execdir chown 167:167 {} + chown: cannot dereference './block': Permission denied Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1896448 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-11-16 13:40:57 -05:00
Dimitri Savineau	fa2bb3af86	switch2container: disable ceph-osd enabled-runtime When deploying the ceph OSD via the packages then the ceph-osd@.service unit is configured as enabled-runtime. This means that each ceph-osd service will inherit from that state. The enabled-runtime systemd state doesn't survive after a reboot. For non containerized deployment the OSD are still starting after a reboot because there's the ceph-volume@.service and/or ceph-osd.target units that are doing the job. $ systemctl list-unit-files\|egrep '^ceph-(volume\|osd)'\|column -t ceph-osd@.service enabled-runtime ceph-volume@.service enabled ceph-osd.target enabled When switching to containerized deployment we are stopping/disabling ceph-osd@XX.servive, ceph-volume and ceph.target and then removing the systemd unit files. But the new systemd units for containerized ceph-osd service will still inherit from ceph-osd@.service unit file. As a consequence, if an OSD host is rebooting after the playbook execution then the ceph-osd service won't come back because they aren't enabled at boot. This patch also adds a reboot and testinfra run after running the switch to container playbook. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1881288 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-11-12 20:05:39 +01:00
Dimitri Savineau	3e49258377	rolling_update: always run cv simple scan/activate There's no need to use a condition on the ceph release for the ceph-volume simple commands. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-11-10 14:01:10 +01:00
Dimitri Savineau	3d3ce26327	rolling_update: fix mgr start with mon collocation `cec994b` introduced a regression when a mgr is collocated with a mon. During the mon upgrade, the mgr service is masked to avoid to be restarted on packages update. Then the start mgr task is failing because the service is still masked. Instead we should unmask it. Fixes: #5983 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-11-03 09:10:17 +01:00
Dimitri Savineau	16afe90806	infrastructure: consume ceph_fs module `bd611a7` introduced the new ceph_fs module but missed some tasks in rolling_update and shrink-mds playbooks. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-11-03 09:06:17 +01:00
Dimitri Savineau	acddf4fb67	rolling_update: use ceph health instead of ceph -s The ceph status command returns a lot of information stored in variables and/or facts which could consume resources for nothing. When checking the cluster health, we're using the health structure in the ceph status output. To optimize this, we could use the ceph health command which contains the same needed information. $ ceph status -f json \| wc -c 2001 $ ceph health -f json \| wc -c 46 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-11-03 09:05:33 +01:00
Dimitri Savineau	3f9081931f	rgw/rbdmirror: use service dump instead of ceph -s The ceph status command returns a lot of information stored in variables and/or facts which could consume resources for nothing. When checking the rgw/rbdmirror services status, we're only using the servicmap structure in the ceph status output. To optimize this, we could use the ceph service dump command which contains the same needed information. This command returns less information and is slightly faster than the ceph status command. $ ceph status -f json \| wc -c 2001 $ ceph service dump -f json \| wc -c 1105 $ time ceph status -f json > /dev/null real 0m0.557s user 0m0.516s sys 0m0.040s $ time ceph service dump -f json > /dev/null real 0m0.454s user 0m0.434s sys 0m0.020s Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-11-03 09:05:33 +01:00
Dimitri Savineau	88f91d8c12	monitor: use quorum_status instead of ceph status The ceph status command returns a lot of information stored in variables and/or facts which could consume resources for nothing. When checking the quorum status, we're only using the quorum_names structure in the ceph status output. To optimize this, we could use the ceph quorum_status command which contains the same needed information. This command returns less information. $ ceph status -f json \| wc -c 2001 $ ceph quorum_status -f json \| wc -c 957 $ time ceph status -f json > /dev/null real 0m0.577s user 0m0.538s sys 0m0.029s $ time ceph quorum_status -f json > /dev/null real 0m0.544s user 0m0.527s sys 0m0.016s Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-11-03 09:05:33 +01:00
Dimitri Savineau	ee50588590	osds: use pg stat command instead of ceph status The ceph status command returns a lot of information stored in variables and/or facts which could consume resources for nothing. When checking the pgs state, we're using the pgmap structure in the ceph status output. To optimize this, we could use the ceph pg stat command which contains the same needed information. This command returns less information (only about pgs) and is slightly faster than the ceph status command. $ ceph status -f json \| wc -c 2000 $ ceph pg stat -f json \| wc -c 240 $ time ceph status -f json > /dev/null real 0m0.529s user 0m0.503s sys 0m0.024s $ time ceph pg stat -f json > /dev/null real 0m0.426s user 0m0.409s sys 0m0.016s The data returned by the ceph status is even bigger when using the nautilus release. $ ceph status -f json \| wc -c 35005 $ ceph pg stat -f json \| wc -c 240 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-11-03 09:05:33 +01:00
Dimitri Savineau	59ecddcdd0	keyring: use ceph_key module for auth get command Instead of using ceph auth get command via the ansible command module then we can use the ceph_key module and the info state. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-11-02 17:17:29 +01:00
Guillaume Abrioux	1cc9666c09	common: drop `fetch_directory` feature This commit drops the `fetch_directory` feature. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-10-21 13:22:16 +02:00
Guillaume Abrioux	20718582da	infrastructure-playbooks: drop add-osd playbook This playbook isn't needed anymore, we can achieve this operation by running main playbook with `--limit` option. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-10-06 08:54:16 +02:00
Dimitri Savineau	bd611a785b	library: add ceph_fs module This adds the ceph_fs ansible module for replacing the command module usage with the ceph fs command. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-10-06 08:02:58 +02:00
Guillaume Abrioux	8b1eeef18a	fs2bs: support `osd_auto_discovery` scenario This commit adds the `osd_auto_discovery` scenario support in the filestore-to-bluestore playbook. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1881523 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> Co-authored-by: Dimitri Savineau <dsavinea@redhat.com>	2020-09-29 09:29:01 -04:00
Guillaume Abrioux	eefe11d90c	defaults: change default grafana-server name This change default value of grafana-server group name. Adding some tasks in ceph-defaults in order to keep backward compatibility. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-09-29 07:42:26 +02:00
Dimitri Savineau	50104650e7	add missing boolean filter Otherwise this will generate an ansible warning about the missing filter. [DEPRECATION WARNING]: evaluating xxx as a bare variable, this behaviour will go away and you might need to add \|bool to the expression in the future. Also see CONDITIONAL_BARE_VARS configuration toggle.. This feature will be removed in version 2.12. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-09-28 20:45:01 +02:00
Dimitri Savineau	4808523403	rolling_update: remove msgr2 migration In Pacific we're are sure that users already achieved the msgr2 because that was introduced in Nautilus. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-09-25 19:14:42 +02:00
Guillaume Abrioux	f906caa6da	ansible.cfg: remove cfg file in infrastructure-playbooks There's no need ot have a copy of this file in infrastructure-playbooks directory. playbooks in that directory can be run from the root dir of ceph-ansible. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-09-24 14:03:49 -04:00
Guillaume Abrioux	6938ed1302	ansible.cfg: set force_valid_group_names param As of 2.10, group names containing a dash are invalid. However, setting this option makes it still possible to use a dash in group names and prevent this warning to show up. It might need to be definitely addressed in a future ansible release. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1880476 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-09-24 14:03:49 -04:00
Dimitri Savineau	da4280e243	switch2container: chown symlink for devices If the OSD directory is using symlinks for referencing devices (like block, db, wal for bluestore and journal for filestore) then the chown command could fail to change the owner:group on some system. $ ls -hl /var/lib/ceph/osd/ceph-0/ total 28K lrwxrwxrwx 1 ceph ceph 92 Sep 15 01:53 block -> /dev/ceph-45113532-95ca-471b-bd75-51de46f1339c/osd-data-570a1aee-60c0-44c9-8036-ffed7d67a4e6 -rw------- 1 ceph ceph 37 Sep 15 01:53 ceph_fsid -rw------- 1 ceph ceph 37 Sep 15 01:53 fsid -rw------- 1 ceph ceph 55 Sep 15 01:53 keyring -rw------- 1 ceph ceph 6 Sep 15 01:53 ready -rw------- 1 ceph ceph 3 Sep 15 02:00 require_osd_release -rw------- 1 ceph ceph 10 Sep 15 01:53 type -rw------- 1 ceph ceph 2 Sep 15 01:53 whoami $ find /var/lib/ceph/osd/ceph-0 -not -user 167 -execdir chown 167:167 {} + chown: cannot dereference './block': Permission denied $ find /var/lib/ceph/osd/ceph-0 -not -user 167 /var/lib/ceph/osd/ceph-0/block Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-09-15 20:05:49 +02:00
Dimitri Savineau	c1af69a7e7	switch2container: remove deb systemd units When running the switch2container playbook on a Debian based system then the systemd unit path isn't the same than Red Hat based system. Because the systemd unit files aren't removed then the new container systemd unit isn't take in count. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-09-15 20:05:49 +02:00
Guillaume Abrioux	5e91e0f3e2	purge: remove potential socket leftover This commit ensure we remove any socket left by ceph and the `ceph-osd-run.sh` script. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1861755 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-09-14 15:34:49 -04:00
Dimitri Savineau	abb4023d76	ceph_key: set state as optional Most ansible module using a state parameter default to the present value (when available) instead of using it as a mandatory option. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-09-14 14:12:21 -04:00
Dimitri Savineau	8ecbdc6ede	container: run engine/common roles on first client We already do this in the site-container.yml playbook because we don't need docker/podman installed on all client nodes and having the container image only on the first client node. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-09-10 13:19:44 -04:00
Dimitri Savineau	f63022dfec	ceph-facts: only get fsid when monitor are present When running the rolling_update playbook with an inventory without monitor nodes defined (like external scenario) then we can't retrieve the cluster fsid from the running monitor. In this scenario we have to pass this information manually (group_vars or host_vars). Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1877426 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-09-10 13:19:44 -04:00
Francesco Pantano	e65f9a5c72	Fix hosts field in rolling_update playbook when mds are processed In the OSP context, during the rolling update the playbook fails with the following error: ''' ERROR! The field 'hosts' has an invalid value, which includes an undefined variable. The error was: list object has no element 0 ''' This PR just change the hosts field providing a valid mons group value. Closes: https://bugzilla.redhat.com/1876803 Signed-off-by: Francesco Pantano <fpantano@redhat.com>	2020-09-08 11:52:08 -04:00
Francesco Pantano	cb64df30b6	Add --cluster option on ceph require-osd-release command On DCN environments, or when multiple ceph cluster are configured, we need to specify the cluster name before running the command or the rolling_update playbook will fail during minor updates. Closes: https://bugzilla.redhat.com/1876447 Signed-off-by: Francesco Pantano <fpantano@redhat.com>	2020-09-07 16:31:14 +02:00
Guillaume Abrioux	cec994b973	rolling_update: remove 'ignore_errors' There's no need to use `ignore_errors: true` on these tasks. Using a loop on the task stopping mon daemons allows us to avoid duplicating this task, the `ignore_errors` isn't needed here because it won't fail the playbook if one of the ID doesn't exist (shortname vs. fqdn) Using the right condition on the task starting the mgr daemon allows us to avoid using an `ignore_errors: true` as well. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-08-21 09:22:36 -04:00
Guillaume Abrioux	51c382677d	shrink-mds: use mds_to_kill_hostname instead When using fqdn in inventory host file, this task will fail because the mds is registered with its shortname. It means we must use `mds_to_kill_hostname` in this task. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1869837 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-08-18 14:56:57 -04:00
Guillaume Abrioux	f77fa6e2a4	purge-cluster: use sysfs method for unmapping rbd devices This way we keep consistency with purge-container-cluster.yml playbook. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-08-17 09:28:12 +02:00
Guillaume Abrioux	33a544644a	purge: import ceph-defaults in purge osd play Otherwise, `ceph_volume_debug` variable is undefined Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-08-11 15:03:20 +02:00
Guillaume Abrioux	448cc280b7	common: don't enable debug log on ceph-volume calls by default ceph-volume can generate large logs at some point. debug logs by definition should be enabled only when debugging. Let's make it customizable with a variable which is set to `False` by default. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-08-11 15:03:20 +02:00
Benoît Knecht	a57fd7a090	purge-cluster: check if rbdmap exists When running `infrastructure-playbooks/purge-cluster.yml` twice, it fails the second time on the `ensure rbd devices are unmapped` task, because `rbdmap` isn't installed anymore at that point. This commit adds a check that ensures `rbdmap` is available, and skips the `ensure rbd devices are unmapped` task if it isn't. Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>	2020-08-06 09:35:03 +02:00
Guillaume Abrioux	c2e507b42d	purge-cluster: replace shell by command in a task There is no need to use `shell` here. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-08-05 09:37:41 +02:00
Benoît Knecht	fe8fbd3ee2	shrink-osd: various fixes This handles missing /etc/ceph/osd, by ensuring we actually found files in `/etc/ceph/osd` before trying to slurp their content. This also add a missing `\| default(False)` to avoid fowlloing error: ``` fatal: [ceph01]: FAILED! => msg: \|- The conditional check 'ceph_osd_data_json[item.2]['encrypted'] \| bool' failed. The error was: error while evaluating conditional (ceph_osd_data_json[item.2]['encrypted'] \| bool): 'dict object' has no attribute 'encrypted' ``` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1862416 Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>	2020-08-05 01:30:57 +02:00
Kevin Coakley	d19e6033b2	Remove ceph-radosgw.target when switching to containerize daemons The task "remove old systemd unit file" under "switching from non-containerized to containerized ceph rgw" only removes the ceph-radosgw@.service file. The task should also remove the ceph-radosgw.target file, like the "remove old systemd unit files" tasks for the mons, mgrs, osds, etc, in order to clean up all of the unused systemd unit files. Signed-off-by: Kevin Coakley <kcoakley@sdsc.edu>	2020-08-04 11:08:12 -04:00
Guillaume Abrioux	8933bfde33	shrink_osd: remove osd data directory Otherwise it leaves an empty directory. When shrinking and redeploying multiple OSDs you have no guarantee it will reuse the same osd id. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-08-03 14:46:56 +02:00
Dimitri Savineau	ec0a37a74f	rolling_update: restart mds after the upgrade In addition of `155e2a2`, the active mds daemons isn't stop/start correctly as opposed as the other services so that daemon doesn't come back after the upgrade. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1861688 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-29 16:45:41 -04:00
Dimitri Savineau	a6209bd957	rolling_update: refact dashboard workflow The dashboard upgrade workflow should do the same process than the ceph upgrade otherwise any systemd unit modification won't be apply on the monitoring/dashboard stack. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1859173 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-25 09:35:17 +02:00
Dimitri Savineau	155e2a23d5	rolling_update: stop/start instead of restart During the daemon upgrade we're - stopping the service when it's not containerized - running the daemon role - start the service when it's not containerized - restart the service when it's containerized This implementation has multiple issue. 1/ We don't use the same service workflow when using containers or baremetal. 2/ The explicity daemon start isn't required since we'are already doing this in the daemon role. 3/ Any non backward changes in the systemd unit template (for containerized deployment) won't work due to the restart usage. This patch refacts the rolling_update playbook by using the same service stop task for both containerized and baremetal deployment at the start of the upgrade play. It removes the explicit service start task because it's already included in the dedicated role. The service restart tasks for containerized deployment are also removed. Finally, this adds the missing service stop task for ceph crash upgrade workflow. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1859173 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-25 09:35:17 +02:00
Guillaume Abrioux	9d2f2108e1	ceph-crash: introduce new role ceph-crash This commit introduces a new role `ceph-crash` in order to deploy everything needed for the ceph-crash daemon. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-07-21 20:22:12 +02:00
Dimitri Savineau	5ef965c4dc	cephadm: set the command as a fact Set the cephadm cmd as a fact instead of rewriting the same command over and over. This also fix an issue when using docker as container engine because the --docker cephadm parameter should be use before the subcommand not after. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-20 16:32:20 -04:00
Dimitri Savineau	957903d561	cephadm: add playbook This adds a new playbook for deploying ceph via cephadm. This also adds a new dedicated tox file for CI purpose. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-16 11:40:45 -04:00
Dimitri Savineau	9596494911	cephadm-adopt: delegate task for orch apply This is a partial revert of `b38019e` because we don't want to execute the whole play on the monitor otherwise if we have some empty group like rgws or mdss then the orchestrator commands will still be executed. Instead we should keep the real target group name at play level and delegate the orchestator commands to the monitor. The whole play will be skipped is the group is empty. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-16 09:44:33 -04:00
Dimitri Savineau	75ae1b7e90	cephadm-adopt: inform users about cephadm Print a message at the end of the playbook to inform users that they don't have to user ceph-ansible playbooks anymore as everything else need to be done via cephadm (day 2 operation). Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-15 17:04:59 -04:00
Dimitri Savineau	7164426456	cephadm-adopt: refresh the service/daemon list When reporting the orchestrator service/daemon list at the end of the playbook, we can use the --refresh option otherwise we could have an outdated output. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-15 17:04:59 -04:00
Dimitri Savineau	ceac81cd24	Revert "cephadm-adopt: remove the cephadm script" This reverts commit `c3bbc6b13c`. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-15 17:04:59 -04:00
Dimitri Savineau	0c3a2b72ff	cephadm-adopt: wait for monitor in quorum After adopting a monitor we need to wait that monitor to join back the quorum before moving to the next node. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-13 09:16:11 -04:00
Dimitri Savineau	d3b3c8948e	cephadm-adopt: add osd flags during adoption Like rolling_update or switch2container playbooks, we need to set/unset some osd flags before and after the OSD daemons adoption. This also adds a task for waiting for clean pgs at then of an OSd node. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-13 09:16:11 -04:00
Dimitri Savineau	9fe2694711	cephadm-adopt: add iscsi support The iSCSI support has been added recently in cephadm. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-13 09:16:11 -04:00
Dimitri Savineau	c3bbc6b13c	cephadm-adopt: remove the cephadm script At the end of the process when don't need the cephadm script. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-13 09:16:11 -04:00
Dimitri Savineau	381201a394	cephadm-adopt: show orchestrator status At the end of the playbook we can show the orchestrator status like we do with the ceph status in initial deployment. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-13 09:16:11 -04:00
Dimitri Savineau	91a6c79e41	cephadm-adopt: use placement parameter It's better to use the --placement parameter when using ceph orch apply commands to avoid confusion in the parameters. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-10 14:05:15 -04:00
Dimitri Savineau	f2d997396e	cephadm-adopt: use custom dashboard images cephadm uses default value for dashboard container images which need to be customized by ansible for upstream or downstream purpose. This feature wasn't present when cephadm-adopt.yml has been designed. Also set the container_image_base variable for upgrade purpose. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-10 16:00:24 +02:00
Dimitri Savineau	b38019e3ca	cephadm-adopt: run orch apply from monitors It looks like we can't run the ceph orch apply commands on nodes other than monitors even if it used to work in the past. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-10 16:00:24 +02:00
Dimitri Savineau	27efcbc0e5	cephadm-adopt: don't fail on systemd reset-failed If the systemd service exists successfully then we don't need to reset the failed state. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-10 16:00:24 +02:00
Dimitri Savineau	fd36433826	cephadm-adopt: copy client.admin keyring The ceph config assimilate-conf command requires the client.admin keyring which isn't present on all nodes most of the time. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-10 16:00:24 +02:00
Guillaume Abrioux	cc0d9697c5	play: remove backward compatibility group name It's time to remove this old group name. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-07-08 09:21:19 -04:00
Dimitri Savineau	c95adc564b	facts: explicitly disable facter and ohai By default, ansible gathers facts from facter and ohai if installed on the remote nodes, given we don't need them, let's exclude these facts from our facts gathering Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-02 17:46:12 +02:00
Guillaume Abrioux	7dd68b9ac1	rgw: fix multi instances scaleout When rgw and osd are collocated, the current workflow prevents from scaling out the radosgw_num_instances parameter when rerunning the playbook. The environment file used in the rgw systemd template is rendered when executing the `ceph-rgw` role but during a new run of the playbook (in order to scale out rgw instances), handlers are triggered from `ceph-osd` role which is run before `ceph-rgw`, therefore it tries to start the new rgw daemon whereas its corresponding environment file hasn't been rendered yet and fails like following: ``` ceph-radosgw@rgw.ceph4osd3.rgw1.service failed to run 'start-pre' task: No such file or directory ``` This commit moves the tasks generating this file in `ceph-config` role so it is generated early. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1851906 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-07-02 10:39:50 -04:00
Guillaume Abrioux	8f9cdf4b10	rolling_update: add any_errors_fatal If a failure occurs in ceph-validate, the upgrade playbook keeps running where we expect it to fail. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-06-29 12:58:53 -04:00
Dimitri Savineau	548ff26256	Add playbook for converting cluster to cephadm The commit adds a new playbook for converting an existing ceph cluster deployed by ceph-ansible to the cephadm orchestrator. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-06-29 09:21:38 -04:00
Guillaume Abrioux	37b20b6525	docker2podman: make images pulling optional This commit makes the images pulling skipped if podman isn't installed on the machine. In OSP context, the podman installation is done later in the workflow, it means all `podman pull` commands will fail. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1849559 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-06-22 12:19:38 -04:00
Dimitri Savineau	829990e60d	ceph-osd: remove ceph-osd-run.sh script Since we only have one scenario since nautilus then we can just move the container start command from ceph-osd-run.sh to the systemd unit service. As a result, the ceph-osd-run.sh.j2 template and the ceph_osd_docker_run_script_path variable are removed. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-06-18 17:51:13 +02:00

1 2 3 4 5 ...

830 Commits (6d772cb96997347dd3771e03fa8ff87d4b67d2a4)