ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Dimitri Savineau	35ed9977aa	switch2container: chown symlink in mon/mgr plays `fa2bb3a` only fix the symlink owner/group issue in the OSD play. If the OSDs are collocated with other services like MONs and MGRs then the chown command will fail. $ find /var/lib/ceph/osd/ceph-0 -not -user 167 -execdir chown 167:167 {} + chown: cannot dereference './block': Permission denied Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1896448 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-11-16 13:40:57 -05:00
Dimitri Savineau	fa2bb3af86	switch2container: disable ceph-osd enabled-runtime When deploying the ceph OSD via the packages then the ceph-osd@.service unit is configured as enabled-runtime. This means that each ceph-osd service will inherit from that state. The enabled-runtime systemd state doesn't survive after a reboot. For non containerized deployment the OSD are still starting after a reboot because there's the ceph-volume@.service and/or ceph-osd.target units that are doing the job. $ systemctl list-unit-files\|egrep '^ceph-(volume\|osd)'\|column -t ceph-osd@.service enabled-runtime ceph-volume@.service enabled ceph-osd.target enabled When switching to containerized deployment we are stopping/disabling ceph-osd@XX.servive, ceph-volume and ceph.target and then removing the systemd unit files. But the new systemd units for containerized ceph-osd service will still inherit from ceph-osd@.service unit file. As a consequence, if an OSD host is rebooting after the playbook execution then the ceph-osd service won't come back because they aren't enabled at boot. This patch also adds a reboot and testinfra run after running the switch to container playbook. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1881288 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-11-12 20:05:39 +01:00
Dimitri Savineau	3e49258377	rolling_update: always run cv simple scan/activate There's no need to use a condition on the ceph release for the ceph-volume simple commands. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-11-10 14:01:10 +01:00
Dimitri Savineau	3d3ce26327	rolling_update: fix mgr start with mon collocation `cec994b` introduced a regression when a mgr is collocated with a mon. During the mon upgrade, the mgr service is masked to avoid to be restarted on packages update. Then the start mgr task is failing because the service is still masked. Instead we should unmask it. Fixes: #5983 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-11-03 09:10:17 +01:00
Dimitri Savineau	16afe90806	infrastructure: consume ceph_fs module `bd611a7` introduced the new ceph_fs module but missed some tasks in rolling_update and shrink-mds playbooks. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-11-03 09:06:17 +01:00
Dimitri Savineau	acddf4fb67	rolling_update: use ceph health instead of ceph -s The ceph status command returns a lot of information stored in variables and/or facts which could consume resources for nothing. When checking the cluster health, we're using the health structure in the ceph status output. To optimize this, we could use the ceph health command which contains the same needed information. $ ceph status -f json \| wc -c 2001 $ ceph health -f json \| wc -c 46 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-11-03 09:05:33 +01:00
Dimitri Savineau	3f9081931f	rgw/rbdmirror: use service dump instead of ceph -s The ceph status command returns a lot of information stored in variables and/or facts which could consume resources for nothing. When checking the rgw/rbdmirror services status, we're only using the servicmap structure in the ceph status output. To optimize this, we could use the ceph service dump command which contains the same needed information. This command returns less information and is slightly faster than the ceph status command. $ ceph status -f json \| wc -c 2001 $ ceph service dump -f json \| wc -c 1105 $ time ceph status -f json > /dev/null real 0m0.557s user 0m0.516s sys 0m0.040s $ time ceph service dump -f json > /dev/null real 0m0.454s user 0m0.434s sys 0m0.020s Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-11-03 09:05:33 +01:00
Dimitri Savineau	88f91d8c12	monitor: use quorum_status instead of ceph status The ceph status command returns a lot of information stored in variables and/or facts which could consume resources for nothing. When checking the quorum status, we're only using the quorum_names structure in the ceph status output. To optimize this, we could use the ceph quorum_status command which contains the same needed information. This command returns less information. $ ceph status -f json \| wc -c 2001 $ ceph quorum_status -f json \| wc -c 957 $ time ceph status -f json > /dev/null real 0m0.577s user 0m0.538s sys 0m0.029s $ time ceph quorum_status -f json > /dev/null real 0m0.544s user 0m0.527s sys 0m0.016s Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-11-03 09:05:33 +01:00
Dimitri Savineau	ee50588590	osds: use pg stat command instead of ceph status The ceph status command returns a lot of information stored in variables and/or facts which could consume resources for nothing. When checking the pgs state, we're using the pgmap structure in the ceph status output. To optimize this, we could use the ceph pg stat command which contains the same needed information. This command returns less information (only about pgs) and is slightly faster than the ceph status command. $ ceph status -f json \| wc -c 2000 $ ceph pg stat -f json \| wc -c 240 $ time ceph status -f json > /dev/null real 0m0.529s user 0m0.503s sys 0m0.024s $ time ceph pg stat -f json > /dev/null real 0m0.426s user 0m0.409s sys 0m0.016s The data returned by the ceph status is even bigger when using the nautilus release. $ ceph status -f json \| wc -c 35005 $ ceph pg stat -f json \| wc -c 240 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-11-03 09:05:33 +01:00
Dimitri Savineau	59ecddcdd0	keyring: use ceph_key module for auth get command Instead of using ceph auth get command via the ansible command module then we can use the ceph_key module and the info state. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-11-02 17:17:29 +01:00
Guillaume Abrioux	1cc9666c09	common: drop `fetch_directory` feature This commit drops the `fetch_directory` feature. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-10-21 13:22:16 +02:00
Guillaume Abrioux	20718582da	infrastructure-playbooks: drop add-osd playbook This playbook isn't needed anymore, we can achieve this operation by running main playbook with `--limit` option. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-10-06 08:54:16 +02:00
Dimitri Savineau	bd611a785b	library: add ceph_fs module This adds the ceph_fs ansible module for replacing the command module usage with the ceph fs command. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-10-06 08:02:58 +02:00
Guillaume Abrioux	8b1eeef18a	fs2bs: support `osd_auto_discovery` scenario This commit adds the `osd_auto_discovery` scenario support in the filestore-to-bluestore playbook. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1881523 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> Co-authored-by: Dimitri Savineau <dsavinea@redhat.com>	2020-09-29 09:29:01 -04:00
Guillaume Abrioux	eefe11d90c	defaults: change default grafana-server name This change default value of grafana-server group name. Adding some tasks in ceph-defaults in order to keep backward compatibility. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-09-29 07:42:26 +02:00
Dimitri Savineau	50104650e7	add missing boolean filter Otherwise this will generate an ansible warning about the missing filter. [DEPRECATION WARNING]: evaluating xxx as a bare variable, this behaviour will go away and you might need to add \|bool to the expression in the future. Also see CONDITIONAL_BARE_VARS configuration toggle.. This feature will be removed in version 2.12. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-09-28 20:45:01 +02:00
Dimitri Savineau	4808523403	rolling_update: remove msgr2 migration In Pacific we're are sure that users already achieved the msgr2 because that was introduced in Nautilus. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-09-25 19:14:42 +02:00
Guillaume Abrioux	f906caa6da	ansible.cfg: remove cfg file in infrastructure-playbooks There's no need ot have a copy of this file in infrastructure-playbooks directory. playbooks in that directory can be run from the root dir of ceph-ansible. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-09-24 14:03:49 -04:00
Guillaume Abrioux	6938ed1302	ansible.cfg: set force_valid_group_names param As of 2.10, group names containing a dash are invalid. However, setting this option makes it still possible to use a dash in group names and prevent this warning to show up. It might need to be definitely addressed in a future ansible release. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1880476 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-09-24 14:03:49 -04:00
Dimitri Savineau	da4280e243	switch2container: chown symlink for devices If the OSD directory is using symlinks for referencing devices (like block, db, wal for bluestore and journal for filestore) then the chown command could fail to change the owner:group on some system. $ ls -hl /var/lib/ceph/osd/ceph-0/ total 28K lrwxrwxrwx 1 ceph ceph 92 Sep 15 01:53 block -> /dev/ceph-45113532-95ca-471b-bd75-51de46f1339c/osd-data-570a1aee-60c0-44c9-8036-ffed7d67a4e6 -rw------- 1 ceph ceph 37 Sep 15 01:53 ceph_fsid -rw------- 1 ceph ceph 37 Sep 15 01:53 fsid -rw------- 1 ceph ceph 55 Sep 15 01:53 keyring -rw------- 1 ceph ceph 6 Sep 15 01:53 ready -rw------- 1 ceph ceph 3 Sep 15 02:00 require_osd_release -rw------- 1 ceph ceph 10 Sep 15 01:53 type -rw------- 1 ceph ceph 2 Sep 15 01:53 whoami $ find /var/lib/ceph/osd/ceph-0 -not -user 167 -execdir chown 167:167 {} + chown: cannot dereference './block': Permission denied $ find /var/lib/ceph/osd/ceph-0 -not -user 167 /var/lib/ceph/osd/ceph-0/block Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-09-15 20:05:49 +02:00
Dimitri Savineau	c1af69a7e7	switch2container: remove deb systemd units When running the switch2container playbook on a Debian based system then the systemd unit path isn't the same than Red Hat based system. Because the systemd unit files aren't removed then the new container systemd unit isn't take in count. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-09-15 20:05:49 +02:00
Guillaume Abrioux	5e91e0f3e2	purge: remove potential socket leftover This commit ensure we remove any socket left by ceph and the `ceph-osd-run.sh` script. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1861755 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-09-14 15:34:49 -04:00
Dimitri Savineau	abb4023d76	ceph_key: set state as optional Most ansible module using a state parameter default to the present value (when available) instead of using it as a mandatory option. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-09-14 14:12:21 -04:00
Dimitri Savineau	8ecbdc6ede	container: run engine/common roles on first client We already do this in the site-container.yml playbook because we don't need docker/podman installed on all client nodes and having the container image only on the first client node. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-09-10 13:19:44 -04:00
Dimitri Savineau	f63022dfec	ceph-facts: only get fsid when monitor are present When running the rolling_update playbook with an inventory without monitor nodes defined (like external scenario) then we can't retrieve the cluster fsid from the running monitor. In this scenario we have to pass this information manually (group_vars or host_vars). Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1877426 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-09-10 13:19:44 -04:00
Francesco Pantano	e65f9a5c72	Fix hosts field in rolling_update playbook when mds are processed In the OSP context, during the rolling update the playbook fails with the following error: ''' ERROR! The field 'hosts' has an invalid value, which includes an undefined variable. The error was: list object has no element 0 ''' This PR just change the hosts field providing a valid mons group value. Closes: https://bugzilla.redhat.com/1876803 Signed-off-by: Francesco Pantano <fpantano@redhat.com>	2020-09-08 11:52:08 -04:00
Francesco Pantano	cb64df30b6	Add --cluster option on ceph require-osd-release command On DCN environments, or when multiple ceph cluster are configured, we need to specify the cluster name before running the command or the rolling_update playbook will fail during minor updates. Closes: https://bugzilla.redhat.com/1876447 Signed-off-by: Francesco Pantano <fpantano@redhat.com>	2020-09-07 16:31:14 +02:00
Guillaume Abrioux	cec994b973	rolling_update: remove 'ignore_errors' There's no need to use `ignore_errors: true` on these tasks. Using a loop on the task stopping mon daemons allows us to avoid duplicating this task, the `ignore_errors` isn't needed here because it won't fail the playbook if one of the ID doesn't exist (shortname vs. fqdn) Using the right condition on the task starting the mgr daemon allows us to avoid using an `ignore_errors: true` as well. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-08-21 09:22:36 -04:00
Guillaume Abrioux	51c382677d	shrink-mds: use mds_to_kill_hostname instead When using fqdn in inventory host file, this task will fail because the mds is registered with its shortname. It means we must use `mds_to_kill_hostname` in this task. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1869837 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-08-18 14:56:57 -04:00
Guillaume Abrioux	f77fa6e2a4	purge-cluster: use sysfs method for unmapping rbd devices This way we keep consistency with purge-container-cluster.yml playbook. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-08-17 09:28:12 +02:00
Guillaume Abrioux	33a544644a	purge: import ceph-defaults in purge osd play Otherwise, `ceph_volume_debug` variable is undefined Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-08-11 15:03:20 +02:00
Guillaume Abrioux	448cc280b7	common: don't enable debug log on ceph-volume calls by default ceph-volume can generate large logs at some point. debug logs by definition should be enabled only when debugging. Let's make it customizable with a variable which is set to `False` by default. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-08-11 15:03:20 +02:00
Benoît Knecht	a57fd7a090	purge-cluster: check if rbdmap exists When running `infrastructure-playbooks/purge-cluster.yml` twice, it fails the second time on the `ensure rbd devices are unmapped` task, because `rbdmap` isn't installed anymore at that point. This commit adds a check that ensures `rbdmap` is available, and skips the `ensure rbd devices are unmapped` task if it isn't. Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>	2020-08-06 09:35:03 +02:00
Guillaume Abrioux	c2e507b42d	purge-cluster: replace shell by command in a task There is no need to use `shell` here. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-08-05 09:37:41 +02:00
Benoît Knecht	fe8fbd3ee2	shrink-osd: various fixes This handles missing /etc/ceph/osd, by ensuring we actually found files in `/etc/ceph/osd` before trying to slurp their content. This also add a missing `\| default(False)` to avoid fowlloing error: ``` fatal: [ceph01]: FAILED! => msg: \|- The conditional check 'ceph_osd_data_json[item.2]['encrypted'] \| bool' failed. The error was: error while evaluating conditional (ceph_osd_data_json[item.2]['encrypted'] \| bool): 'dict object' has no attribute 'encrypted' ``` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1862416 Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>	2020-08-05 01:30:57 +02:00
Kevin Coakley	d19e6033b2	Remove ceph-radosgw.target when switching to containerize daemons The task "remove old systemd unit file" under "switching from non-containerized to containerized ceph rgw" only removes the ceph-radosgw@.service file. The task should also remove the ceph-radosgw.target file, like the "remove old systemd unit files" tasks for the mons, mgrs, osds, etc, in order to clean up all of the unused systemd unit files. Signed-off-by: Kevin Coakley <kcoakley@sdsc.edu>	2020-08-04 11:08:12 -04:00
Guillaume Abrioux	8933bfde33	shrink_osd: remove osd data directory Otherwise it leaves an empty directory. When shrinking and redeploying multiple OSDs you have no guarantee it will reuse the same osd id. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-08-03 14:46:56 +02:00
Dimitri Savineau	ec0a37a74f	rolling_update: restart mds after the upgrade In addition of `155e2a2`, the active mds daemons isn't stop/start correctly as opposed as the other services so that daemon doesn't come back after the upgrade. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1861688 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-29 16:45:41 -04:00
Dimitri Savineau	a6209bd957	rolling_update: refact dashboard workflow The dashboard upgrade workflow should do the same process than the ceph upgrade otherwise any systemd unit modification won't be apply on the monitoring/dashboard stack. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1859173 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-25 09:35:17 +02:00
Dimitri Savineau	155e2a23d5	rolling_update: stop/start instead of restart During the daemon upgrade we're - stopping the service when it's not containerized - running the daemon role - start the service when it's not containerized - restart the service when it's containerized This implementation has multiple issue. 1/ We don't use the same service workflow when using containers or baremetal. 2/ The explicity daemon start isn't required since we'are already doing this in the daemon role. 3/ Any non backward changes in the systemd unit template (for containerized deployment) won't work due to the restart usage. This patch refacts the rolling_update playbook by using the same service stop task for both containerized and baremetal deployment at the start of the upgrade play. It removes the explicit service start task because it's already included in the dedicated role. The service restart tasks for containerized deployment are also removed. Finally, this adds the missing service stop task for ceph crash upgrade workflow. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1859173 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-25 09:35:17 +02:00
Guillaume Abrioux	9d2f2108e1	ceph-crash: introduce new role ceph-crash This commit introduces a new role `ceph-crash` in order to deploy everything needed for the ceph-crash daemon. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-07-21 20:22:12 +02:00
Dimitri Savineau	5ef965c4dc	cephadm: set the command as a fact Set the cephadm cmd as a fact instead of rewriting the same command over and over. This also fix an issue when using docker as container engine because the --docker cephadm parameter should be use before the subcommand not after. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-20 16:32:20 -04:00
Dimitri Savineau	957903d561	cephadm: add playbook This adds a new playbook for deploying ceph via cephadm. This also adds a new dedicated tox file for CI purpose. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-16 11:40:45 -04:00
Dimitri Savineau	9596494911	cephadm-adopt: delegate task for orch apply This is a partial revert of `b38019e` because we don't want to execute the whole play on the monitor otherwise if we have some empty group like rgws or mdss then the orchestrator commands will still be executed. Instead we should keep the real target group name at play level and delegate the orchestator commands to the monitor. The whole play will be skipped is the group is empty. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-16 09:44:33 -04:00
Dimitri Savineau	75ae1b7e90	cephadm-adopt: inform users about cephadm Print a message at the end of the playbook to inform users that they don't have to user ceph-ansible playbooks anymore as everything else need to be done via cephadm (day 2 operation). Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-15 17:04:59 -04:00
Dimitri Savineau	7164426456	cephadm-adopt: refresh the service/daemon list When reporting the orchestrator service/daemon list at the end of the playbook, we can use the --refresh option otherwise we could have an outdated output. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-15 17:04:59 -04:00
Dimitri Savineau	ceac81cd24	Revert "cephadm-adopt: remove the cephadm script" This reverts commit `c3bbc6b13c`. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-15 17:04:59 -04:00
Dimitri Savineau	0c3a2b72ff	cephadm-adopt: wait for monitor in quorum After adopting a monitor we need to wait that monitor to join back the quorum before moving to the next node. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-13 09:16:11 -04:00
Dimitri Savineau	d3b3c8948e	cephadm-adopt: add osd flags during adoption Like rolling_update or switch2container playbooks, we need to set/unset some osd flags before and after the OSD daemons adoption. This also adds a task for waiting for clean pgs at then of an OSd node. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-13 09:16:11 -04:00
Dimitri Savineau	9fe2694711	cephadm-adopt: add iscsi support The iSCSI support has been added recently in cephadm. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-13 09:16:11 -04:00

1 2 3 4 5 ...

644 Commits (35ed9977aac9afbcad4f726a865891f0e84b4680)