ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Dimitri Savineau	06471a4b82	osds: use osd pool ls instead of osd dump command The ceph osd pool ls detail command is a subset of the ceph osd dump command. $ ceph osd dump --format json\|wc -c 10117 $ ceph osd pool ls detail --format json\|wc -c 4740 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-08-02 15:51:01 +02:00
Benoît Knecht	d7653dca95	infrastructure-playbooks: Get Ceph info in check mode In the `set osd flags` block, run the Ceph commands that gather information from the cluster (and don't make any changes to it) even when running in check mode. This allows the tasks that depend on the variables set by those tasks to succeed in check mode. Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>	2021-07-28 14:04:54 +02:00
Dimitri Savineau	cf6e33346e	common: fix py2 pool_list from_json when skipped When using python 2 and the task with a loop is skipped then it generates an error. Unexpected templating type error occurred on ({{ (pool_list.stdout \| from_json)['pools'] }}): expected string or buffer Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-07-21 08:17:58 +02:00
Guillaume Abrioux	13036115e2	common: disable/enable pg_autoscaler The PG autoscaler can disrupt the PG checks so the idea here is to disable it and re-enable it back after the restart is done. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-07-20 07:37:07 +02:00
Dimitri Savineau	a305296384	cephadm-adopt: enable osd memory autotune for HCI This enables the osd_memory_target_autotune option on HCI environment. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1973149 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-07-12 18:17:37 +02:00
Dimitri Savineau	aeb9f562e5	cephadm-adopt: set application on ganesha pool Set the nfs application to the ganesha pool. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1956840 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-07-08 20:35:58 +02:00
Guillaume Abrioux	3b804a61dd	cephadm_adopt: add any_errors_fatal on play Add any_errors_fatal: true in cephadm-adopt playbook. We should stop the playbook execution when a task throws an error. Otherwise it can lead to unexpected behavior. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1976179 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-07-02 22:15:07 +02:00
Guillaume Abrioux	31311b03ed	cephadm-adopt/rgw: add host target in svc_id If multi-realms were deployed with several instances belonging to the same realm and zone using the same port on different nodes, the service id expected by cephadm will be the same and therefore only one service will be deployed. We need to create a service called `<node>.<realm>.<zone>.<port>` to be sure the service name will be unique and well deployed on the expected node in order to preserve backward compatibility with the rgws instances that were deployed with ceph-ansible. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967455 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-06-29 14:41:09 +02:00
Guillaume Abrioux	fc784fc44c	cephadm-adopt: support rgw multisite adoption We need to support rgw multisite deployments. This commit makes the adoption playbook support this kind of deployment. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967455 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-06-23 22:01:59 +02:00
Guillaume Abrioux	f9a73149a4	cephadm-adopt: fix mgr placement hosts task When no `[mgrs]` group is defined in the inventory, mgr daemon are implicitly collocated with monitors. This task currently relies on the length of the mgr group in order to tell cephadm to deploy mgr daemons. If there's no `[mgrs]` group defined in the inventory, it will ask cephadm to deploy 0 mgr daemon which doesn't make sense and will throw an error. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1970313 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-06-14 10:38:37 +02:00
Guillaume Abrioux	22c18e82f0	cephadm_adopt: fix ceph-crash migration ceph-ansible leaves a ceph-crash container in containerized deployment. It means we end up with 2 ceph-crash containers running after the migration playbook is complete. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1954614 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-04-28 19:53:01 +02:00
Guillaume Abrioux	1f40c12502	cephadm_adopt: fix rgw placement task Due to a recent breaking change in ceph, this command must be modified to add the <svc_id> parameter. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-04-27 13:37:56 +02:00
Guillaume Abrioux	bb7d37fb6a	cephadm_adopt: create a 'nfs-ganesha' pool When migrating from a cluster with no MDS nodes deployed, `{{ cephfs_data_pool.name }}` doesn't exist so we need to create a pool for storing nfs export objects. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1950403 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-04-27 13:37:56 +02:00
Guillaume Abrioux	a9220654f5	cephadm_adopt: support nfs-ganesha adoption This commit adds the nfs-ganesha adoption support in the `cephadm-adopt.yml` playbook. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1944504 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-04-12 14:43:19 +02:00
Guillaume Abrioux	1ffc4df6b6	cephadm_adopt: modify placement policy for rgw the adoption playbook should use `radosgw_num_instances` in order to determine how much rgw instance it should set recreate. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1943170 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-04-12 14:43:19 +02:00
Guillaume Abrioux	ee44d86072	cephadm_adopt: fix a typo This play doesn't nothing else than stopping/removing rgw daemons. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-04-12 14:43:19 +02:00
Guillaume Abrioux	b445df0479	cephadm_adopt: fetch and write ceph minimal config This commit makes the playbook fetch the minimal current ceph configuration and write it later on monitoring nodes so `cephadm` can proceed with the adoption. When a monitoring stack was deployed on a dedicated node, it means no `ceph.conf` file was written, `cephadm` requires a `ceph.conf` in order to adopt the daemon present on the node. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1939887 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-03-17 17:39:12 +01:00
Guillaume Abrioux	af95595c82	adopt: convert legacy grafana-server groupname early This is a follow up on PR #6332 cephadm-adopt.yml playbook is affected by the same bug Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1938658 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-03-17 16:04:11 +01:00
Alex Schultz	a7f2fa73e6	Use ansible_facts It has come to our attention that using ansible_* vars that are populated with INJECT_FACTS_AS_VARS=True is not very performant. In order to be able to support setting that to off, we need to update the references to use ansible_facts[<thing>] instead of ansible_<thing>. Related: ansible#73654 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1935406 Signed-off-by: Alex Schultz <aschultz@redhat.com>	2021-03-08 20:54:02 +01:00
Dimitri Savineau	950a6ae406	cephadm-adopt: remove prometheus workaround This was fixed by [1][2] [1] https://tracker.ceph.com/issues/45120 [2] https://github.com/ceph/ceph/commit/252d4b30 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-02-10 13:51:41 +01:00
Dimitri Savineau	76a663245d	cephadm-adopt: use ceph_osd_flag module There's no reason to not use the ceph_osd_flag module to set/unset osd flags. Also if there's no OSD nodes in the inventory then we don't need to execute the set/unset play. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-02-03 08:29:31 +01:00
Dimitri Savineau	2734a12d44	cephadm-adopt: use radosgw modules for idempotency When rerunning the cephadm-adopt.yml playbook the radosgw realm, zonegroup and zone tasks will fail because the task isn't idempotent. Using the radosgw ansible modules solves that problem. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-01-29 21:07:39 +01:00
Dimitri Savineau	6886700a00	cephadm-adopt: make the playbook idempotent If the cephadm-adopt.yml fails during the first execution and some daemons have already been adopted by cephadm then we can't rerun the playbook because the old container won't exist anymore. Error: no container with name or ID ceph-mon-xxx found: no such container If the daemons are adopted then the old systemd unit doesn't exist anymore so any call to that unit with systemd will fail. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1918424 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-01-29 21:07:39 +01:00
Dimitri Savineau	13427eddac	cephadm-adopt: add grafana group conversion The grafana group conversion task wasn't present in the cephadm-adopt.yml playbook. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1917530 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-01-18 20:52:58 +01:00
Dimitri Savineau	5b6f907a72	cephadm: remove loop on host add tasks Instead of iterate over the host list for adding the node/label to the host orchestrator configuration then we can do it parallelly. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-12-16 15:14:28 +01:00
Dimitri Savineau	08f118077f	library: add cephadm_adopt module This adds cephadm_adopt ansible module for replacing the command module usage with the cephadm adopt command. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-12-02 09:15:44 +01:00
Dimitri Savineau	cf7345f143	consume ceph_volume module when possible We should always use the ceph_volume ansible module when possible. This patch replace the ceph-volume inventory and lvm {list,zap} commands called via the command/shell modules by the corresponding call with the ceph_volume module. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-12-01 17:54:10 +01:00
Dimitri Savineau	eaf0ebfc85	library: add ceph_mgr_module module This adds ceph_mgr_module ansible module for replacing the command module usage with the ceph mgr module enable/disable commands. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-11-30 16:52:02 +01:00
Guillaume Abrioux	195d88fcda	lint: ignore 302,303,505 errors ignore 302,303 and 505 errors [302] Using command rather than an argument to e.g. file [303] Using command rather than module [505] referenced files must exist they aren't relevant on these tasks. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-11-23 08:33:47 +01:00
Dimitri Savineau	88f91d8c12	monitor: use quorum_status instead of ceph status The ceph status command returns a lot of information stored in variables and/or facts which could consume resources for nothing. When checking the quorum status, we're only using the quorum_names structure in the ceph status output. To optimize this, we could use the ceph quorum_status command which contains the same needed information. This command returns less information. $ ceph status -f json \| wc -c 2001 $ ceph quorum_status -f json \| wc -c 957 $ time ceph status -f json > /dev/null real 0m0.577s user 0m0.538s sys 0m0.029s $ time ceph quorum_status -f json > /dev/null real 0m0.544s user 0m0.527s sys 0m0.016s Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-11-03 09:05:33 +01:00
Dimitri Savineau	ee50588590	osds: use pg stat command instead of ceph status The ceph status command returns a lot of information stored in variables and/or facts which could consume resources for nothing. When checking the pgs state, we're using the pgmap structure in the ceph status output. To optimize this, we could use the ceph pg stat command which contains the same needed information. This command returns less information (only about pgs) and is slightly faster than the ceph status command. $ ceph status -f json \| wc -c 2000 $ ceph pg stat -f json \| wc -c 240 $ time ceph status -f json > /dev/null real 0m0.529s user 0m0.503s sys 0m0.024s $ time ceph pg stat -f json > /dev/null real 0m0.426s user 0m0.409s sys 0m0.016s The data returned by the ceph status is even bigger when using the nautilus release. $ ceph status -f json \| wc -c 35005 $ ceph pg stat -f json \| wc -c 240 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-11-03 09:05:33 +01:00
Dimitri Savineau	59ecddcdd0	keyring: use ceph_key module for auth get command Instead of using ceph auth get command via the ansible command module then we can use the ceph_key module and the info state. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-11-02 17:17:29 +01:00
Guillaume Abrioux	eefe11d90c	defaults: change default grafana-server name This change default value of grafana-server group name. Adding some tasks in ceph-defaults in order to keep backward compatibility. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-09-29 07:42:26 +02:00
Dimitri Savineau	5ef965c4dc	cephadm: set the command as a fact Set the cephadm cmd as a fact instead of rewriting the same command over and over. This also fix an issue when using docker as container engine because the --docker cephadm parameter should be use before the subcommand not after. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-20 16:32:20 -04:00
Dimitri Savineau	9596494911	cephadm-adopt: delegate task for orch apply This is a partial revert of `b38019e` because we don't want to execute the whole play on the monitor otherwise if we have some empty group like rgws or mdss then the orchestrator commands will still be executed. Instead we should keep the real target group name at play level and delegate the orchestator commands to the monitor. The whole play will be skipped is the group is empty. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-16 09:44:33 -04:00
Dimitri Savineau	75ae1b7e90	cephadm-adopt: inform users about cephadm Print a message at the end of the playbook to inform users that they don't have to user ceph-ansible playbooks anymore as everything else need to be done via cephadm (day 2 operation). Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-15 17:04:59 -04:00
Dimitri Savineau	7164426456	cephadm-adopt: refresh the service/daemon list When reporting the orchestrator service/daemon list at the end of the playbook, we can use the --refresh option otherwise we could have an outdated output. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-15 17:04:59 -04:00
Dimitri Savineau	ceac81cd24	Revert "cephadm-adopt: remove the cephadm script" This reverts commit `c3bbc6b13c`. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-15 17:04:59 -04:00
Dimitri Savineau	0c3a2b72ff	cephadm-adopt: wait for monitor in quorum After adopting a monitor we need to wait that monitor to join back the quorum before moving to the next node. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-13 09:16:11 -04:00
Dimitri Savineau	d3b3c8948e	cephadm-adopt: add osd flags during adoption Like rolling_update or switch2container playbooks, we need to set/unset some osd flags before and after the OSD daemons adoption. This also adds a task for waiting for clean pgs at then of an OSd node. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-13 09:16:11 -04:00
Dimitri Savineau	9fe2694711	cephadm-adopt: add iscsi support The iSCSI support has been added recently in cephadm. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-13 09:16:11 -04:00
Dimitri Savineau	c3bbc6b13c	cephadm-adopt: remove the cephadm script At the end of the process when don't need the cephadm script. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-13 09:16:11 -04:00
Dimitri Savineau	381201a394	cephadm-adopt: show orchestrator status At the end of the playbook we can show the orchestrator status like we do with the ceph status in initial deployment. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-13 09:16:11 -04:00
Dimitri Savineau	91a6c79e41	cephadm-adopt: use placement parameter It's better to use the --placement parameter when using ceph orch apply commands to avoid confusion in the parameters. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-10 14:05:15 -04:00
Dimitri Savineau	f2d997396e	cephadm-adopt: use custom dashboard images cephadm uses default value for dashboard container images which need to be customized by ansible for upstream or downstream purpose. This feature wasn't present when cephadm-adopt.yml has been designed. Also set the container_image_base variable for upgrade purpose. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-10 16:00:24 +02:00
Dimitri Savineau	b38019e3ca	cephadm-adopt: run orch apply from monitors It looks like we can't run the ceph orch apply commands on nodes other than monitors even if it used to work in the past. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-10 16:00:24 +02:00
Dimitri Savineau	27efcbc0e5	cephadm-adopt: don't fail on systemd reset-failed If the systemd service exists successfully then we don't need to reset the failed state. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-10 16:00:24 +02:00
Dimitri Savineau	fd36433826	cephadm-adopt: copy client.admin keyring The ceph config assimilate-conf command requires the client.admin keyring which isn't present on all nodes most of the time. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-10 16:00:24 +02:00
Dimitri Savineau	c95adc564b	facts: explicitly disable facter and ohai By default, ansible gathers facts from facter and ohai if installed on the remote nodes, given we don't need them, let's exclude these facts from our facts gathering Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-02 17:46:12 +02:00
Dimitri Savineau	548ff26256	Add playbook for converting cluster to cephadm The commit adds a new playbook for converting an existing ceph cluster deployed by ceph-ansible to the cephadm orchestrator. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-06-29 09:21:38 -04:00

1 2

100 Commits (189ddfd92a8b0b790afb614bcc60ffb5aa0c0395)