ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Dimitri Savineau	ee50588590	osds: use pg stat command instead of ceph status The ceph status command returns a lot of information stored in variables and/or facts which could consume resources for nothing. When checking the pgs state, we're using the pgmap structure in the ceph status output. To optimize this, we could use the ceph pg stat command which contains the same needed information. This command returns less information (only about pgs) and is slightly faster than the ceph status command. $ ceph status -f json \| wc -c 2000 $ ceph pg stat -f json \| wc -c 240 $ time ceph status -f json > /dev/null real 0m0.529s user 0m0.503s sys 0m0.024s $ time ceph pg stat -f json > /dev/null real 0m0.426s user 0m0.409s sys 0m0.016s The data returned by the ceph status is even bigger when using the nautilus release. $ ceph status -f json \| wc -c 35005 $ ceph pg stat -f json \| wc -c 240 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-11-03 09:05:33 +01:00
Dimitri Savineau	59ecddcdd0	keyring: use ceph_key module for auth get command Instead of using ceph auth get command via the ansible command module then we can use the ceph_key module and the info state. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-11-02 17:17:29 +01:00
Guillaume Abrioux	1cc9666c09	common: drop `fetch_directory` feature This commit drops the `fetch_directory` feature. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-10-21 13:22:16 +02:00
Guillaume Abrioux	20718582da	infrastructure-playbooks: drop add-osd playbook This playbook isn't needed anymore, we can achieve this operation by running main playbook with `--limit` option. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-10-06 08:54:16 +02:00
Dimitri Savineau	bd611a785b	library: add ceph_fs module This adds the ceph_fs ansible module for replacing the command module usage with the ceph fs command. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-10-06 08:02:58 +02:00
Guillaume Abrioux	8b1eeef18a	fs2bs: support `osd_auto_discovery` scenario This commit adds the `osd_auto_discovery` scenario support in the filestore-to-bluestore playbook. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1881523 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> Co-authored-by: Dimitri Savineau <dsavinea@redhat.com>	2020-09-29 09:29:01 -04:00
Guillaume Abrioux	eefe11d90c	defaults: change default grafana-server name This change default value of grafana-server group name. Adding some tasks in ceph-defaults in order to keep backward compatibility. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-09-29 07:42:26 +02:00
Dimitri Savineau	50104650e7	add missing boolean filter Otherwise this will generate an ansible warning about the missing filter. [DEPRECATION WARNING]: evaluating xxx as a bare variable, this behaviour will go away and you might need to add \|bool to the expression in the future. Also see CONDITIONAL_BARE_VARS configuration toggle.. This feature will be removed in version 2.12. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-09-28 20:45:01 +02:00
Dimitri Savineau	4808523403	rolling_update: remove msgr2 migration In Pacific we're are sure that users already achieved the msgr2 because that was introduced in Nautilus. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-09-25 19:14:42 +02:00
Guillaume Abrioux	f906caa6da	ansible.cfg: remove cfg file in infrastructure-playbooks There's no need ot have a copy of this file in infrastructure-playbooks directory. playbooks in that directory can be run from the root dir of ceph-ansible. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-09-24 14:03:49 -04:00
Guillaume Abrioux	6938ed1302	ansible.cfg: set force_valid_group_names param As of 2.10, group names containing a dash are invalid. However, setting this option makes it still possible to use a dash in group names and prevent this warning to show up. It might need to be definitely addressed in a future ansible release. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1880476 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-09-24 14:03:49 -04:00
Dimitri Savineau	da4280e243	switch2container: chown symlink for devices If the OSD directory is using symlinks for referencing devices (like block, db, wal for bluestore and journal for filestore) then the chown command could fail to change the owner:group on some system. $ ls -hl /var/lib/ceph/osd/ceph-0/ total 28K lrwxrwxrwx 1 ceph ceph 92 Sep 15 01:53 block -> /dev/ceph-45113532-95ca-471b-bd75-51de46f1339c/osd-data-570a1aee-60c0-44c9-8036-ffed7d67a4e6 -rw------- 1 ceph ceph 37 Sep 15 01:53 ceph_fsid -rw------- 1 ceph ceph 37 Sep 15 01:53 fsid -rw------- 1 ceph ceph 55 Sep 15 01:53 keyring -rw------- 1 ceph ceph 6 Sep 15 01:53 ready -rw------- 1 ceph ceph 3 Sep 15 02:00 require_osd_release -rw------- 1 ceph ceph 10 Sep 15 01:53 type -rw------- 1 ceph ceph 2 Sep 15 01:53 whoami $ find /var/lib/ceph/osd/ceph-0 -not -user 167 -execdir chown 167:167 {} + chown: cannot dereference './block': Permission denied $ find /var/lib/ceph/osd/ceph-0 -not -user 167 /var/lib/ceph/osd/ceph-0/block Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-09-15 20:05:49 +02:00
Dimitri Savineau	c1af69a7e7	switch2container: remove deb systemd units When running the switch2container playbook on a Debian based system then the systemd unit path isn't the same than Red Hat based system. Because the systemd unit files aren't removed then the new container systemd unit isn't take in count. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-09-15 20:05:49 +02:00
Guillaume Abrioux	5e91e0f3e2	purge: remove potential socket leftover This commit ensure we remove any socket left by ceph and the `ceph-osd-run.sh` script. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1861755 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-09-14 15:34:49 -04:00
Dimitri Savineau	abb4023d76	ceph_key: set state as optional Most ansible module using a state parameter default to the present value (when available) instead of using it as a mandatory option. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-09-14 14:12:21 -04:00
Dimitri Savineau	8ecbdc6ede	container: run engine/common roles on first client We already do this in the site-container.yml playbook because we don't need docker/podman installed on all client nodes and having the container image only on the first client node. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-09-10 13:19:44 -04:00
Dimitri Savineau	f63022dfec	ceph-facts: only get fsid when monitor are present When running the rolling_update playbook with an inventory without monitor nodes defined (like external scenario) then we can't retrieve the cluster fsid from the running monitor. In this scenario we have to pass this information manually (group_vars or host_vars). Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1877426 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-09-10 13:19:44 -04:00
Francesco Pantano	e65f9a5c72	Fix hosts field in rolling_update playbook when mds are processed In the OSP context, during the rolling update the playbook fails with the following error: ''' ERROR! The field 'hosts' has an invalid value, which includes an undefined variable. The error was: list object has no element 0 ''' This PR just change the hosts field providing a valid mons group value. Closes: https://bugzilla.redhat.com/1876803 Signed-off-by: Francesco Pantano <fpantano@redhat.com>	2020-09-08 11:52:08 -04:00
Francesco Pantano	cb64df30b6	Add --cluster option on ceph require-osd-release command On DCN environments, or when multiple ceph cluster are configured, we need to specify the cluster name before running the command or the rolling_update playbook will fail during minor updates. Closes: https://bugzilla.redhat.com/1876447 Signed-off-by: Francesco Pantano <fpantano@redhat.com>	2020-09-07 16:31:14 +02:00
Guillaume Abrioux	cec994b973	rolling_update: remove 'ignore_errors' There's no need to use `ignore_errors: true` on these tasks. Using a loop on the task stopping mon daemons allows us to avoid duplicating this task, the `ignore_errors` isn't needed here because it won't fail the playbook if one of the ID doesn't exist (shortname vs. fqdn) Using the right condition on the task starting the mgr daemon allows us to avoid using an `ignore_errors: true` as well. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-08-21 09:22:36 -04:00
Guillaume Abrioux	51c382677d	shrink-mds: use mds_to_kill_hostname instead When using fqdn in inventory host file, this task will fail because the mds is registered with its shortname. It means we must use `mds_to_kill_hostname` in this task. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1869837 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-08-18 14:56:57 -04:00
Guillaume Abrioux	f77fa6e2a4	purge-cluster: use sysfs method for unmapping rbd devices This way we keep consistency with purge-container-cluster.yml playbook. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-08-17 09:28:12 +02:00
Guillaume Abrioux	33a544644a	purge: import ceph-defaults in purge osd play Otherwise, `ceph_volume_debug` variable is undefined Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-08-11 15:03:20 +02:00
Guillaume Abrioux	448cc280b7	common: don't enable debug log on ceph-volume calls by default ceph-volume can generate large logs at some point. debug logs by definition should be enabled only when debugging. Let's make it customizable with a variable which is set to `False` by default. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-08-11 15:03:20 +02:00
Benoît Knecht	a57fd7a090	purge-cluster: check if rbdmap exists When running `infrastructure-playbooks/purge-cluster.yml` twice, it fails the second time on the `ensure rbd devices are unmapped` task, because `rbdmap` isn't installed anymore at that point. This commit adds a check that ensures `rbdmap` is available, and skips the `ensure rbd devices are unmapped` task if it isn't. Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>	2020-08-06 09:35:03 +02:00
Guillaume Abrioux	c2e507b42d	purge-cluster: replace shell by command in a task There is no need to use `shell` here. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-08-05 09:37:41 +02:00
Benoît Knecht	fe8fbd3ee2	shrink-osd: various fixes This handles missing /etc/ceph/osd, by ensuring we actually found files in `/etc/ceph/osd` before trying to slurp their content. This also add a missing `\| default(False)` to avoid fowlloing error: ``` fatal: [ceph01]: FAILED! => msg: \|- The conditional check 'ceph_osd_data_json[item.2]['encrypted'] \| bool' failed. The error was: error while evaluating conditional (ceph_osd_data_json[item.2]['encrypted'] \| bool): 'dict object' has no attribute 'encrypted' ``` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1862416 Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>	2020-08-05 01:30:57 +02:00
Kevin Coakley	d19e6033b2	Remove ceph-radosgw.target when switching to containerize daemons The task "remove old systemd unit file" under "switching from non-containerized to containerized ceph rgw" only removes the ceph-radosgw@.service file. The task should also remove the ceph-radosgw.target file, like the "remove old systemd unit files" tasks for the mons, mgrs, osds, etc, in order to clean up all of the unused systemd unit files. Signed-off-by: Kevin Coakley <kcoakley@sdsc.edu>	2020-08-04 11:08:12 -04:00
Guillaume Abrioux	8933bfde33	shrink_osd: remove osd data directory Otherwise it leaves an empty directory. When shrinking and redeploying multiple OSDs you have no guarantee it will reuse the same osd id. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-08-03 14:46:56 +02:00
Dimitri Savineau	ec0a37a74f	rolling_update: restart mds after the upgrade In addition of `155e2a2`, the active mds daemons isn't stop/start correctly as opposed as the other services so that daemon doesn't come back after the upgrade. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1861688 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-29 16:45:41 -04:00
Dimitri Savineau	a6209bd957	rolling_update: refact dashboard workflow The dashboard upgrade workflow should do the same process than the ceph upgrade otherwise any systemd unit modification won't be apply on the monitoring/dashboard stack. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1859173 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-25 09:35:17 +02:00
Dimitri Savineau	155e2a23d5	rolling_update: stop/start instead of restart During the daemon upgrade we're - stopping the service when it's not containerized - running the daemon role - start the service when it's not containerized - restart the service when it's containerized This implementation has multiple issue. 1/ We don't use the same service workflow when using containers or baremetal. 2/ The explicity daemon start isn't required since we'are already doing this in the daemon role. 3/ Any non backward changes in the systemd unit template (for containerized deployment) won't work due to the restart usage. This patch refacts the rolling_update playbook by using the same service stop task for both containerized and baremetal deployment at the start of the upgrade play. It removes the explicit service start task because it's already included in the dedicated role. The service restart tasks for containerized deployment are also removed. Finally, this adds the missing service stop task for ceph crash upgrade workflow. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1859173 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-25 09:35:17 +02:00
Guillaume Abrioux	9d2f2108e1	ceph-crash: introduce new role ceph-crash This commit introduces a new role `ceph-crash` in order to deploy everything needed for the ceph-crash daemon. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-07-21 20:22:12 +02:00
Dimitri Savineau	5ef965c4dc	cephadm: set the command as a fact Set the cephadm cmd as a fact instead of rewriting the same command over and over. This also fix an issue when using docker as container engine because the --docker cephadm parameter should be use before the subcommand not after. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-20 16:32:20 -04:00
Dimitri Savineau	957903d561	cephadm: add playbook This adds a new playbook for deploying ceph via cephadm. This also adds a new dedicated tox file for CI purpose. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-16 11:40:45 -04:00
Dimitri Savineau	9596494911	cephadm-adopt: delegate task for orch apply This is a partial revert of `b38019e` because we don't want to execute the whole play on the monitor otherwise if we have some empty group like rgws or mdss then the orchestrator commands will still be executed. Instead we should keep the real target group name at play level and delegate the orchestator commands to the monitor. The whole play will be skipped is the group is empty. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-16 09:44:33 -04:00
Dimitri Savineau	75ae1b7e90	cephadm-adopt: inform users about cephadm Print a message at the end of the playbook to inform users that they don't have to user ceph-ansible playbooks anymore as everything else need to be done via cephadm (day 2 operation). Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-15 17:04:59 -04:00
Dimitri Savineau	7164426456	cephadm-adopt: refresh the service/daemon list When reporting the orchestrator service/daemon list at the end of the playbook, we can use the --refresh option otherwise we could have an outdated output. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-15 17:04:59 -04:00
Dimitri Savineau	ceac81cd24	Revert "cephadm-adopt: remove the cephadm script" This reverts commit `c3bbc6b13c`. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-15 17:04:59 -04:00
Dimitri Savineau	0c3a2b72ff	cephadm-adopt: wait for monitor in quorum After adopting a monitor we need to wait that monitor to join back the quorum before moving to the next node. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-13 09:16:11 -04:00
Dimitri Savineau	d3b3c8948e	cephadm-adopt: add osd flags during adoption Like rolling_update or switch2container playbooks, we need to set/unset some osd flags before and after the OSD daemons adoption. This also adds a task for waiting for clean pgs at then of an OSd node. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-13 09:16:11 -04:00
Dimitri Savineau	9fe2694711	cephadm-adopt: add iscsi support The iSCSI support has been added recently in cephadm. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-13 09:16:11 -04:00
Dimitri Savineau	c3bbc6b13c	cephadm-adopt: remove the cephadm script At the end of the process when don't need the cephadm script. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-13 09:16:11 -04:00
Dimitri Savineau	381201a394	cephadm-adopt: show orchestrator status At the end of the playbook we can show the orchestrator status like we do with the ceph status in initial deployment. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-13 09:16:11 -04:00
Dimitri Savineau	91a6c79e41	cephadm-adopt: use placement parameter It's better to use the --placement parameter when using ceph orch apply commands to avoid confusion in the parameters. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-10 14:05:15 -04:00
Dimitri Savineau	f2d997396e	cephadm-adopt: use custom dashboard images cephadm uses default value for dashboard container images which need to be customized by ansible for upstream or downstream purpose. This feature wasn't present when cephadm-adopt.yml has been designed. Also set the container_image_base variable for upgrade purpose. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-10 16:00:24 +02:00
Dimitri Savineau	b38019e3ca	cephadm-adopt: run orch apply from monitors It looks like we can't run the ceph orch apply commands on nodes other than monitors even if it used to work in the past. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-10 16:00:24 +02:00
Dimitri Savineau	27efcbc0e5	cephadm-adopt: don't fail on systemd reset-failed If the systemd service exists successfully then we don't need to reset the failed state. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-10 16:00:24 +02:00
Dimitri Savineau	fd36433826	cephadm-adopt: copy client.admin keyring The ceph config assimilate-conf command requires the client.admin keyring which isn't present on all nodes most of the time. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-10 16:00:24 +02:00
Guillaume Abrioux	cc0d9697c5	play: remove backward compatibility group name It's time to remove this old group name. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-07-08 09:21:19 -04:00
Dimitri Savineau	c95adc564b	facts: explicitly disable facter and ohai By default, ansible gathers facts from facter and ohai if installed on the remote nodes, given we don't need them, let's exclude these facts from our facts gathering Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-02 17:46:12 +02:00
Guillaume Abrioux	7dd68b9ac1	rgw: fix multi instances scaleout When rgw and osd are collocated, the current workflow prevents from scaling out the radosgw_num_instances parameter when rerunning the playbook. The environment file used in the rgw systemd template is rendered when executing the `ceph-rgw` role but during a new run of the playbook (in order to scale out rgw instances), handlers are triggered from `ceph-osd` role which is run before `ceph-rgw`, therefore it tries to start the new rgw daemon whereas its corresponding environment file hasn't been rendered yet and fails like following: ``` ceph-radosgw@rgw.ceph4osd3.rgw1.service failed to run 'start-pre' task: No such file or directory ``` This commit moves the tasks generating this file in `ceph-config` role so it is generated early. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1851906 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-07-02 10:39:50 -04:00
Guillaume Abrioux	8f9cdf4b10	rolling_update: add any_errors_fatal If a failure occurs in ceph-validate, the upgrade playbook keeps running where we expect it to fail. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-06-29 12:58:53 -04:00
Dimitri Savineau	548ff26256	Add playbook for converting cluster to cephadm The commit adds a new playbook for converting an existing ceph cluster deployed by ceph-ansible to the cephadm orchestrator. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-06-29 09:21:38 -04:00
Guillaume Abrioux	37b20b6525	docker2podman: make images pulling optional This commit makes the images pulling skipped if podman isn't installed on the machine. In OSP context, the podman installation is done later in the workflow, it means all `podman pull` commands will fail. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1849559 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-06-22 12:19:38 -04:00
Dimitri Savineau	829990e60d	ceph-osd: remove ceph-osd-run.sh script Since we only have one scenario since nautilus then we can just move the container start command from ceph-osd-run.sh to the systemd unit service. As a result, the ceph-osd-run.sh.j2 template and the ceph_osd_docker_run_script_path variable are removed. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-06-18 17:51:13 +02:00
Guillaume Abrioux	b91d60d384	switch_to_containers: don't set noup flag We shouldn't set this flag when running switch_to_containers playbook. Otherwise the playbook fails waiting for pgs to be clean. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1843569 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-06-17 01:32:18 +02:00
Dimitri Savineau	50140c9b5d	switch_to_container: fix osd systemd regex The systemd LOAD and ACTIVE fileds could have more than one space between both values. This update the systemd regex the same way we're using it in different part of the code. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1843500 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-06-16 17:04:06 +02:00
Guillaume Abrioux	8aed824f71	switch_to_container: refact wait for pg check There is no need to make this check with several steps. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-05-16 07:31:57 +02:00
Dimitri Savineau	252e78b4e4	docker2podman: manage dashboard nodes The dashboard nodes (alertmanager, grafana, node-exporter, and prometheus) were not manage during the docker to podman migration. This adds the systemd container template of those services to a dedicated file (systemd.yml) in order to include it in the docker2podman playbook. This also adds the dashboard container images pull from docker to podman. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1829389 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-05-13 12:02:00 +02:00
Dimitri Savineau	d38f21aeba	docker2podman: pull images from docker daemon The docker2podman playbook only installs the podman package and updates the systemd units with the right container_binary value. We never pull the container image so if one service is restarted then the container image will be pulled first before the service can start which could cause longer downstream. To avoid to download the container image from internet again we can just pull it from the local docker daemon. The container_{binding,package,service}_name variables are removed because they are only used in the ceph-container-engine role which isn't call in this playbook. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-05-13 12:02:00 +02:00
Dimitri Savineau	c0a213f928	rolling_update: fix rbdmirror group name The rbdmirror group name was using the wrong variable definition. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-05-13 11:57:42 +02:00
Dimitri Savineau	2b9edba131	filestore-to-bluestore: fix py2 on skipped tasks When using skipped variables with from_json filter and python2 then we need to have a default value otherwise the skipped task will fail. Unexpected templating type error occurred on ({{ (ceph_volume_lvm_list.stdout \| from_json) }}): expected string or buffer Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1790472 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-04-20 16:19:18 +02:00
Dimitri Savineau	7b620a22bc	rolling_update: require_osd_release pacific Since [1] we need to set pacific for the required OSD release during the upgrade. [1] https://github.com/ceph/ceph/commit/cc99c3bc Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-04-16 15:32:00 -04:00
Guillaume Abrioux	2cfaa056e0	switch-to-containers: set and unset osd flags The workflow in this playbook should be the same than in rolling_update, we should first set noout and nodeep-scrub flags before migrating the first osd and unset osd flags after the last osd is migrated. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-04-06 17:00:00 +02:00
Guillaume Abrioux	6df7887f87	update: use tasks_from when including ceph-facts When setting/unsetting osd flags, we can use `tasks_from` when importing `ceph-facts` role to save some times given that we only need this role for setting `container_binary` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-04-06 17:00:00 +02:00
Guillaume Abrioux	4a4f54f6ee	docker2podman: call `container_options_facts.yml` on osd nodes We must call `ceph-osd` role from `container_options_facts.yml` because ceph-osd-run.sh.j2 needs variables set in this file. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1819681 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-04-02 07:56:15 +02:00
Guillaume Abrioux	9219991441	remove docker.yml symlinks This commits removes these two symlinks. They were there for backward compatibility and were marked deprecated as of stable-4.0 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-03-31 10:03:22 -04:00
Guillaume Abrioux	5e7962ccf6	purge-container: get all osds id Adding `--all` to the `systemctl list-units` command in order to get all osds id on the node (including stoppped osds). Otherwise, it will purge the cluster but there will be leftover after that. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1814542 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-03-31 09:37:30 -04:00
Dimitri Savineau	64701437de	container: remove ulimit nofile parameter Since Ceph Octopus is python3 only we don't need to specify the max open files anymore with the container engine. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-03-30 09:54:23 +02:00
Guillaume Abrioux	a94035e957	purge-container: clean legacy code This commit removes a register which isn't used in this playbook. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-03-12 09:45:12 -04:00
Dimitri Savineau	38a683e5bf	filestore-to-bluestore: stop ceph-volume services We only disable the ceph-osd services but not the ceph-volume lvm services during the filestore to bluestore migration. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-03-05 17:53:32 -05:00
Dimitri Savineau	d1316ce77b	shrink-rbdmirror: fix presence after removal We should add retry/delay to check the presence of the rbdmirror daemon in the cluster status because the status takes some time to be updated. Also the metadata.hostname isn't a good key to check because it doesn't reflect the ansible_hostname fact. We should use metadata.id instead. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-03-03 10:32:15 +01:00
Dimitri Savineau	a664159061	shrink-mgr: fix systemd condition This playbook was using mds systemd condition. Also a command task was using pipeline which is not allowed. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-03-03 10:32:15 +01:00
Dimitri Savineau	08ac2e3034	shrink: don't use localhost node The ceph-facts are running on localhost so if this node is using a different OS/release that the ceph node we can have a mismatch between docker/podman container binary. This commit also reduces the scope of the ceph-facts role because we only need the container_binary tasks. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-03-03 10:32:15 +01:00
Dimitri Savineau	9d3b49293d	purge: stop rgw instances by iteration It looks like that the service module doesn't support wildcard anymore for stopping/disabling multiple services. fatal: [rgw0]: FAILED! => changed=false msg: 'This module does not currently support using glob patterns, found '''' in service name: ceph-radosgw@' ...ignoring Instead we should iterate over the rgw_instances list. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-03-02 16:32:06 +01:00
Guillaume Abrioux	a084a2a347	common: support OSDs with more than 2 digits When running environment with OSDs having ID with more than 2 digits, some tasks don't match the system units and therefore, playbook can fail. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1805643 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-02-27 09:48:36 +01:00
Guillaume Abrioux	1de2bf9991	shrink-osd: support shrinking ceph-disk prepared osds This commit adds the ceph-disk prepared osds support Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1796453 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-02-26 11:45:41 -05:00
Guillaume Abrioux	55970b18f1	shrink-osd: don't run ceph-facts entirely We need to call ceph-facts only for setting `container_binary`. Since this task has been isolated we can use `tasks_from` to only execute the needed task. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-02-26 11:45:41 -05:00
Dimitri Savineau	535da53d69	filestore-to-bluestore: reuse dedicated journal If the filestore configuration was using a dedicated journal with either a partition or a LV/VG then we need to reuse this for bluestore DB. When filestore is using a raw devices then we shouldn't destroy everything (data + journal) but only data otherwise the journal partition won't exist anymore. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1790479 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-25 16:07:21 +01:00
Dimitri Savineau	195944b123	doc: update infra playbooks statements We don't need to copy the infrastructure playbooks in the root ceph-ansible directory. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-25 15:27:52 +01:00
Benoît Knecht	8b3df4e418	infrastructure-playbooks: Run shrink-osd tasks on monitor Instead of running shring-osd tasks on localhost and delegating most of them to the first monitor, run all of them on the first monitor directly. This has the added advantage of becoming root on the monitor only, not on localhost. Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>	2020-02-19 11:16:30 -05:00
Dimitri Savineau	100e3a044e	purge-cluster: update package list to remove We only support python3 so renaming all ceph python packages. Some ceph packages were missing from the list (ceph-mon, ceph-osd or rbd-mirror) or didn't exist anymore (ceph-fs-common, libcephfs1). Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-17 11:33:15 +01:00
Guillaume Abrioux	3700aa5385	switch_to_containers: increase health check values This commit increases the default values for the following variable consumed in switch-from-non-containerized-to-containerized-ceph-daemons.yml playbook. This also moves these variables in `ceph-defaults` role so the user can set different values if needed. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1783223 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-02-07 14:59:14 -05:00
wujie1993	d8b0b3cbd9	purge: fix purge cluster failed Fix purge cluster failed when local container images does not exist. Purge node-exporter and grafana-server only when dashboard_enabled is set to True. Signed-off-by: wujie1993 qq594jj@gmail.com	2020-01-31 12:09:46 -05:00
Dimitri Savineau	cd76054f76	filestore-to-bluestore: fix undefine osd_fsid_list If the playbook is used on a host running bluestore OSDs then the osd_fsid_list won't be filled because the bluestore OSDs are reported with 'type: block' via ceph-volume lvm list command but we are looking for 'type: data' (filestore). TASK [zap ceph-volume prepared OSDs] ********* fatal: [xxxxx]: FAILED! => msg: '''osd_fsid_list'' is undefined Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1729267 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-28 02:42:39 +01:00
Dimitri Savineau	83c5a1d7a8	filestore-to-bluestore: skip bluestore osd nodes If the OSD node is already using bluestore OSDs then we should skip all the remaining tasks to avoid purging OSD for nothing. Instead we warn the user. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1790472 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-27 18:08:00 +01:00
Dimitri Savineau	a9c2300545	filestore-to-bluestore: don't fail when with no PV When the PV is already removed from the devices then we should not fail to avoid errors like: stderr: No PV found on device /dev/sdb. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1729267 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-24 20:56:08 +01:00
Guillaume Abrioux	e5812fe45b	rolling_update: support upgrading 3.x + ceph-metrics on a dedicated node When upgrading from RHCS 3.x where ceph-metrics was deployed on a dedicated node to RHCS 4.0, it fails like following: ``` fatal: [magna005]: FAILED! => changed=false gid: 0 group: root mode: '0755' msg: 'chown failed: failed to look up user ceph' owner: root path: /etc/ceph secontext: unconfined_u:object_r:etc_t:s0 size: 4096 state: directory uid: 0 ``` because we are trying to run `ceph-config` on this node, it doesn't make sense so we should simply run this play on all groups except `[grafana-server]`. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1793885 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-22 11:29:36 -05:00
Dimitri Savineau	bb3eae0c80	filestore-to-bluestore: fix osd_auto_discovery When osd_auto_discovery is set then we need to refresh the ansible_devices fact between after the filestore OSD purge otherwise the devices fact won't be populated. Also remove the gpt header on ceph_disk_osds_devices because the devices is empty at this point for osd_auto_discovery. Adding the bool filter when needed. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1729267 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-22 09:36:09 +01:00
Dimitri Savineau	f995b079a6	filestore-to-bluestore: --destroy with raw devices We still need --destroy when using a raw device otherwise we won't be able to recreate the lvm stack on that device with bluestore. Running command: /usr/sbin/vgcreate -s 1G --force --yes ceph-bdc67a84-894a-4687-b43f-bcd76317580a /dev/sdd stderr: Physical volume '/dev/sdd' is already in volume group 'ceph-b7801d50-e827-4857-95ec-3291ad6f0151' Unable to add physical volume '/dev/sdd' to volume group 'ceph-b7801d50-e827-4857-95ec-3291ad6f0151' /dev/sdd: physical volume not initialized. --> Was unable to complete a new OSD, will rollback changes Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792227 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-21 11:37:39 -05:00
Guillaume Abrioux	3e262e072b	containers: use --cpus instead --cpu-quota When using docker 1.13.1, the current condition: ``` {% if (container_binary == 'docker' and ceph_docker_version.split('.')[0] is version_compare('13', '>=')) or container_binary == 'podman' -%} ``` is wrong because it compares the first digit (1) whereas it should compare the second one. It means we always use `--cpu-quota` although documentation recommend using `--cpus` when docker version is 1.13.1 or higher. From the doc: > --cpu-quota=<value> Impose a CPU CFS quota on the container. The number of > microseconds per --cpu-period that the container is limited to before > throttled. As such acting as the effective ceiling. > If you use Docker 1.13 or higher, use --cpus instead. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-16 13:51:43 -05:00
Guillaume Abrioux	3d0898aa5d	shrink-mds: fix condition on fs deletion the new ceph status registered in `ceph_status` will report `fsmap.up` = 0 when it's the last mds given that it's done after we shrink the mds, it means the condition is wrong. Also adding a condition so we don't try to delete the fs if a standby node is going to rejoin the cluster. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1787543 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-15 10:40:14 +01:00
Guillaume Abrioux	d853da2a68	update: remove legacy This task is a code duplicate, probably a legacy, let's remove it. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-13 15:18:45 -05:00
Guillaume Abrioux	3496a0efa2	osd: support scaling up using --limit This commit lets add-osd.yml in place but mark the deprecation of the playbook. Scaling up OSDs is now possible using --limit Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-13 09:59:08 -05:00
Guillaume Abrioux	b0c491800a	docker2podman: use set_fact to override variables play vars have lower precedence than role vars and `set_fact`. We must use a `set_fact` to reset these variables. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-10 10:21:29 -05:00
Guillaume Abrioux	1c2ec9fb40	docker2podman: force systemd to reload config This is needed after a change is made in systemd unit files. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-10 10:21:29 -05:00
Guillaume Abrioux	d746575fd0	docker2podman: install podman This commit adds a package installation task in order to install podman during the docker-to-podman.yml migration playbook. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-10 10:21:29 -05:00
Dimitri Savineau	a09d1c38bf	purge-iscsi-gateways: don't run all ceph-facts We only need to have the container_binary fact. Because we're not gathering the facts from all nodes then the purge fails trying to get one of the grafana fact. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786686 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-10 15:46:15 +01:00
Dimitri Savineau	3f344fdefe	rolling_update: run registry auth before upgrading There's some tasks using the new container image during the rolling upgrade playbook that needs to execute the registry login first otherwise the nodes won't be able to pull the container image. Unable to find image 'xxx.io/foo/bar:latest' locally Trying to pull repository xxx.io/foo/bar ... /usr/bin/docker-current: Get https://xxx.io/v2/foo/bar/manifests/latest: unauthorized Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-09 16:14:33 -05:00

1 2 3 4 5 ...

686 Commits (ab857d8b54f9a70fe6e886984410181b8e964c02)