ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Guillaume Abrioux	9d04b8ca8b	ansible.cfg: remove cfg file in infrastructure-playbooks There's no need ot have a copy of this file in infrastructure-playbooks directory. playbooks in that directory can be run from the root dir of ceph-ansible. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `f906caa6da`)	2020-09-25 11:12:39 -04:00
Guillaume Abrioux	113eadad72	ansible.cfg: set force_valid_group_names param As of 2.10, group names containing a dash are invalid. However, setting this option makes it still possible to use a dash in group names and prevent this warning to show up. It might need to be definitely addressed in a future ansible release. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1880476 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `6938ed1302`)	2020-09-25 11:12:39 -04:00
Dimitri Savineau	aaf1139242	switch2container: chown symlink for devices If the OSD directory is using symlinks for referencing devices (like block, db, wal for bluestore and journal for filestore) then the chown command could fail to change the owner:group on some system. $ ls -hl /var/lib/ceph/osd/ceph-0/ total 28K lrwxrwxrwx 1 ceph ceph 92 Sep 15 01:53 block -> /dev/ceph-45113532-95ca-471b-bd75-51de46f1339c/osd-data-570a1aee-60c0-44c9-8036-ffed7d67a4e6 -rw------- 1 ceph ceph 37 Sep 15 01:53 ceph_fsid -rw------- 1 ceph ceph 37 Sep 15 01:53 fsid -rw------- 1 ceph ceph 55 Sep 15 01:53 keyring -rw------- 1 ceph ceph 6 Sep 15 01:53 ready -rw------- 1 ceph ceph 3 Sep 15 02:00 require_osd_release -rw------- 1 ceph ceph 10 Sep 15 01:53 type -rw------- 1 ceph ceph 2 Sep 15 01:53 whoami $ find /var/lib/ceph/osd/ceph-0 -not -user 167 -execdir chown 167:167 {} + chown: cannot dereference './block': Permission denied $ find /var/lib/ceph/osd/ceph-0 -not -user 167 /var/lib/ceph/osd/ceph-0/block Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `da4280e243`)	2020-09-15 15:30:12 -04:00
Dimitri Savineau	8757fdfb4a	switch2container: remove deb systemd units When running the switch2container playbook on a Debian based system then the systemd unit path isn't the same than Red Hat based system. Because the systemd unit files aren't removed then the new container systemd unit isn't take in count. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `c1af69a7e7`)	2020-09-15 15:30:12 -04:00
Guillaume Abrioux	edcdbe5601	purge: remove potential socket leftover This commit ensure we remove any socket left by ceph and the `ceph-osd-run.sh` script. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1861755 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `5e91e0f3e2`)	2020-09-14 16:50:49 -04:00
Dimitri Savineau	23522a11e4	ceph_key: set state as optional Most ansible module using a state parameter default to the present value (when available) instead of using it as a mandatory option. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `abb4023d76`)	2020-09-14 15:37:56 -04:00
Dimitri Savineau	7745fd3560	container: run engine/common roles on first client We already do this in the site-container.yml playbook because we don't need docker/podman installed on all client nodes and having the container image only on the first client node. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `8ecbdc6ede`)	2020-09-10 20:57:16 +02:00
Dimitri Savineau	0c0a930374	ceph-facts: only get fsid when monitor are present When running the rolling_update playbook with an inventory without monitor nodes defined (like external scenario) then we can't retrieve the cluster fsid from the running monitor. In this scenario we have to pass this information manually (group_vars or host_vars). Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1877426 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `f63022dfec`)	2020-09-10 20:57:16 +02:00
Francesco Pantano	8e3ecfd869	Add --cluster option on ceph require-osd-release command On DCN environments, or when multiple ceph cluster are configured, we need to specify the cluster name before running the command or the rolling_update playbook will fail during minor updates. Closes: https://bugzilla.redhat.com/1876447 Signed-off-by: Francesco Pantano <fpantano@redhat.com> (cherry picked from commit `cb64df30b6`)	2020-09-09 14:54:19 +02:00
Francesco Pantano	8dd8675080	Fix hosts field in rolling_update playbook when mds are processed In the OSP context, during the rolling update the playbook fails with the following error: ''' ERROR! The field 'hosts' has an invalid value, which includes an undefined variable. The error was: list object has no element 0 ''' This PR just change the hosts field providing a valid mons group value. Closes: https://bugzilla.redhat.com/1876803 Signed-off-by: Francesco Pantano <fpantano@redhat.com> (cherry picked from commit `e65f9a5c72`)	2020-09-09 14:53:44 +02:00
Guillaume Abrioux	3a8be20699	rolling_update: remove 'ignore_errors' There's no need to use `ignore_errors: true` on these tasks. Using a loop on the task stopping mon daemons allows us to avoid duplicating this task, the `ignore_errors` isn't needed here because it won't fail the playbook if one of the ID doesn't exist (shortname vs. fqdn) Using the right condition on the task starting the mgr daemon allows us to avoid using an `ignore_errors: true` as well. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `cec994b973`)	2020-08-21 16:33:15 +02:00
Guillaume Abrioux	81d116b0ac	shrink-mds: use mds_to_kill_hostname instead When using fqdn in inventory host file, this task will fail because the mds is registered with its shortname. It means we must use `mds_to_kill_hostname` in this task. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1869837 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `51c382677d`)	2020-08-18 15:09:57 -04:00
Guillaume Abrioux	004155d407	purge-cluster: use sysfs method for unmapping rbd devices This way we keep consistency with purge-container-cluster.yml playbook. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `f77fa6e2a4`)	2020-08-17 09:50:08 -04:00
Guillaume Abrioux	56d2b62e00	purge: import ceph-defaults in purge osd play Otherwise, `ceph_volume_debug` variable is undefined Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `33a544644a`)	2020-08-12 22:57:10 +02:00
Guillaume Abrioux	8a7e4193db	common: don't enable debug log on ceph-volume calls by default ceph-volume can generate large logs at some point. debug logs by definition should be enabled only when debugging. Let's make it customizable with a variable which is set to `False` by default. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `448cc280b7`)	2020-08-12 22:57:10 +02:00
Benoît Knecht	5d06c0eda9	purge-cluster: check if rbdmap exists When running `infrastructure-playbooks/purge-cluster.yml` twice, it fails the second time on the `ensure rbd devices are unmapped` task, because `rbdmap` isn't installed anymore at that point. This commit adds a check that ensures `rbdmap` is available, and skips the `ensure rbd devices are unmapped` task if it isn't. Signed-off-by: Benoît Knecht <bknecht@protonmail.ch> (cherry picked from commit `a57fd7a090`)	2020-08-06 12:01:50 -04:00
Kevin Coakley	92b400f433	Remove ceph-radosgw.target when switching to containerize daemons The task "remove old systemd unit file" under "switching from non-containerized to containerized ceph rgw" only removes the ceph-radosgw@.service file. The task should also remove the ceph-radosgw.target file, like the "remove old systemd unit files" tasks for the mons, mgrs, osds, etc, in order to clean up all of the unused systemd unit files. Signed-off-by: Kevin Coakley <kcoakley@sdsc.edu> (cherry picked from commit `d19e6033b2`)	2020-08-06 11:43:12 -04:00
Guillaume Abrioux	bd3439db75	shrink_osd: remove osd data directory Otherwise it leaves an empty directory. When shrinking and redeploying multiple OSDs you have no guarantee it will reuse the same osd id. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `8933bfde33`)	2020-08-06 13:09:38 +02:00
Benoît Knecht	ccefe7da9f	shrink-osd: various fixes This handles missing /etc/ceph/osd, by ensuring we actually found files in `/etc/ceph/osd` before trying to slurp their content. This also add a missing `\| default(False)` to avoid fowlloing error: ``` fatal: [ceph01]: FAILED! => msg: \|- The conditional check 'ceph_osd_data_json[item.2]['encrypted'] \| bool' failed. The error was: error while evaluating conditional (ceph_osd_data_json[item.2]['encrypted'] \| bool): 'dict object' has no attribute 'encrypted' ``` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1862416 Signed-off-by: Benoît Knecht <bknecht@protonmail.ch> (cherry picked from commit `fe8fbd3ee2`)	2020-08-06 13:09:38 +02:00
Dimitri Savineau	1dd9c43efc	rolling_update: restart mds after the upgrade In addition of `155e2a2`, the active mds daemons isn't stop/start correctly as opposed as the other services so that daemon doesn't come back after the upgrade. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1861688 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `ec0a37a74f`)	2020-07-29 17:43:36 -04:00
Dimitri Savineau	2ce60504bd	rolling_update: refact dashboard workflow The dashboard upgrade workflow should do the same process than the ceph upgrade otherwise any systemd unit modification won't be apply on the monitoring/dashboard stack. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1859173 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `a6209bd957`)	2020-07-27 10:59:25 -04:00
Dimitri Savineau	8ea3fa1752	rolling_update: stop/start instead of restart During the daemon upgrade we're - stopping the service when it's not containerized - running the daemon role - start the service when it's not containerized - restart the service when it's containerized This implementation has multiple issue. 1/ We don't use the same service workflow when using containers or baremetal. 2/ The explicity daemon start isn't required since we'are already doing this in the daemon role. 3/ Any non backward changes in the systemd unit template (for containerized deployment) won't work due to the restart usage. This patch refacts the rolling_update playbook by using the same service stop task for both containerized and baremetal deployment at the start of the upgrade play. It removes the explicit service start task because it's already included in the dedicated role. The service restart tasks for containerized deployment are also removed. Finally, this adds the missing service stop task for ceph crash upgrade workflow. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1859173 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `155e2a23d5`)	2020-07-27 10:59:25 -04:00
Guillaume Abrioux	e6059fdcd3	ceph-crash: introduce new role ceph-crash This commit introduces a new role `ceph-crash` in order to deploy everything needed for the ceph-crash daemon. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `9d2f2108e1`)	2020-07-22 18:47:01 -04:00
Dimitri Savineau	0178114f3b	cephadm: set the command as a fact Set the cephadm cmd as a fact instead of rewriting the same command over and over. This also fix an issue when using docker as container engine because the --docker cephadm parameter should be use before the subcommand not after. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `5ef965c4dc`)	2020-07-20 22:48:07 -04:00
Dimitri Savineau	b7fd3bc844	cephadm: add playbook This adds a new playbook for deploying ceph via cephadm. This also adds a new dedicated tox file for CI purpose. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `957903d561`)	2020-07-16 12:00:14 -04:00
Dimitri Savineau	a22855319b	cephadm-adopt: delegate task for orch apply This is a partial revert of `b38019e` because we don't want to execute the whole play on the monitor otherwise if we have some empty group like rgws or mdss then the orchestrator commands will still be executed. Instead we should keep the real target group name at play level and delegate the orchestator commands to the monitor. The whole play will be skipped is the group is empty. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `9596494911`)	2020-07-16 10:50:53 -04:00
Dimitri Savineau	585b3e476c	cephadm-adopt: inform users about cephadm Print a message at the end of the playbook to inform users that they don't have to user ceph-ansible playbooks anymore as everything else need to be done via cephadm (day 2 operation). Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `75ae1b7e90`)	2020-07-15 17:57:41 -04:00
Dimitri Savineau	4e4748b58d	cephadm-adopt: refresh the service/daemon list When reporting the orchestrator service/daemon list at the end of the playbook, we can use the --refresh option otherwise we could have an outdated output. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `7164426456`)	2020-07-15 17:57:41 -04:00
Dimitri Savineau	bc2aebaa26	Revert "cephadm-adopt: remove the cephadm script" This reverts commit `c3bbc6b13c`. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `ceac81cd24`)	2020-07-15 17:57:41 -04:00
Dimitri Savineau	48baf63bc2	cephadm-adopt: wait for monitor in quorum After adopting a monitor we need to wait that monitor to join back the quorum before moving to the next node. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `0c3a2b72ff`)	2020-07-13 10:17:56 -04:00
Dimitri Savineau	980d1a8365	cephadm-adopt: add osd flags during adoption Like rolling_update or switch2container playbooks, we need to set/unset some osd flags before and after the OSD daemons adoption. This also adds a task for waiting for clean pgs at then of an OSd node. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `d3b3c8948e`)	2020-07-13 10:17:56 -04:00
Dimitri Savineau	f4a9f00f20	cephadm-adopt: add iscsi support The iSCSI support has been added recently in cephadm. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `9fe2694711`)	2020-07-13 10:17:56 -04:00
Dimitri Savineau	d8a8d74625	cephadm-adopt: remove the cephadm script At the end of the process when don't need the cephadm script. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `c3bbc6b13c`)	2020-07-13 10:17:56 -04:00
Dimitri Savineau	90f974abb0	cephadm-adopt: show orchestrator status At the end of the playbook we can show the orchestrator status like we do with the ceph status in initial deployment. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `381201a394`)	2020-07-13 10:17:56 -04:00
Dimitri Savineau	c5009101f1	cephadm-adopt: use placement parameter It's better to use the --placement parameter when using ceph orch apply commands to avoid confusion in the parameters. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `91a6c79e41`)	2020-07-10 14:53:39 -04:00
Dimitri Savineau	3b9ff9ae26	cephadm-adopt: use custom dashboard images cephadm uses default value for dashboard container images which need to be customized by ansible for upstream or downstream purpose. This feature wasn't present when cephadm-adopt.yml has been designed. Also set the container_image_base variable for upgrade purpose. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `f2d997396e`)	2020-07-10 11:08:30 -04:00
Dimitri Savineau	f4d62212c6	cephadm-adopt: run orch apply from monitors It looks like we can't run the ceph orch apply commands on nodes other than monitors even if it used to work in the past. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `b38019e3ca`)	2020-07-10 11:08:30 -04:00
Dimitri Savineau	9d6a33e114	cephadm-adopt: don't fail on systemd reset-failed If the systemd service exists successfully then we don't need to reset the failed state. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `27efcbc0e5`)	2020-07-10 11:08:30 -04:00
Dimitri Savineau	0af87be5fc	cephadm-adopt: copy client.admin keyring The ceph config assimilate-conf command requires the client.admin keyring which isn't present on all nodes most of the time. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `fd36433826`)	2020-07-10 11:08:30 -04:00
Guillaume Abrioux	cdf61540d8	rgw: fix multi instances scaleout When rgw and osd are collocated, the current workflow prevents from scaling out the radosgw_num_instances parameter when rerunning the playbook. The environment file used in the rgw systemd template is rendered when executing the `ceph-rgw` role but during a new run of the playbook (in order to scale out rgw instances), handlers are triggered from `ceph-osd` role which is run before `ceph-rgw`, therefore it tries to start the new rgw daemon whereas its corresponding environment file hasn't been rendered yet and fails like following: ``` ceph-radosgw@rgw.ceph4osd3.rgw1.service failed to run 'start-pre' task: No such file or directory ``` This commit moves the tasks generating this file in `ceph-config` role so it is generated early. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1851906 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `7dd68b9ac1`)	2020-07-03 06:37:34 +02:00
Dimitri Savineau	503bc893fb	facts: explicitly disable facter and ohai By default, ansible gathers facts from facter and ohai if installed on the remote nodes, given we don't need them, let's exclude these facts from our facts gathering Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `c95adc564b`)	2020-07-03 06:37:08 +02:00
Guillaume Abrioux	688d5eebf7	rolling_update: add any_errors_fatal If a failure occurs in ceph-validate, the upgrade playbook keeps running where we expect it to fail. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `8f9cdf4b10`)	2020-06-29 17:13:03 -04:00
Dimitri Savineau	c3e89983fc	Add playbook for converting cluster to cephadm The commit adds a new playbook for converting an existing ceph cluster deployed by ceph-ansible to the cephadm orchestrator. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `548ff26256`)	2020-06-29 09:45:22 -04:00
Dimitri Savineau	51cfb89501	ceph-osd: remove ceph-osd-run.sh script Since we only have one scenario since nautilus then we can just move the container start command from ceph-osd-run.sh to the systemd unit service. As a result, the ceph-osd-run.sh.j2 template and the ceph_osd_docker_run_script_path variable are removed. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `829990e60d`)	2020-06-23 17:35:24 +02:00
Guillaume Abrioux	a7fc4af06e	docker2podman: make images pulling optional This commit makes the images pulling skipped if podman isn't installed on the machine. In OSP context, the podman installation is done later in the workflow, it means all `podman pull` commands will fail. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1849559 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `37b20b6525`)	2020-06-22 14:19:44 -04:00
Guillaume Abrioux	4fe8e12484	switch_to_containers: don't set noup flag We shouldn't set this flag when running switch_to_containers playbook. Otherwise the playbook fails waiting for pgs to be clean. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1843569 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `b91d60d384`)	2020-06-17 09:24:02 -04:00
Dimitri Savineau	b219b1abed	switch_to_container: fix osd systemd regex The systemd LOAD and ACTIVE fileds could have more than one space between both values. This update the systemd regex the same way we're using it in different part of the code. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1843500 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `50140c9b5d`)	2020-06-16 18:10:28 +02:00
Guillaume Abrioux	c67b3d3530	switch_to_container: refact wait for pg check There is no need to make this check with several steps. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `8aed824f71`)	2020-05-22 17:05:22 +02:00
Dimitri Savineau	e6bfdd2e44	rolling_update: fix rbdmirror group name The rbdmirror group name was using the wrong variable definition. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `c0a213f928`)	2020-05-13 16:41:23 -04:00
Dimitri Savineau	9a7af0ce6a	docker2podman: manage dashboard nodes The dashboard nodes (alertmanager, grafana, node-exporter, and prometheus) were not manage during the docker to podman migration. This adds the systemd container template of those services to a dedicated file (systemd.yml) in order to include it in the docker2podman playbook. This also adds the dashboard container images pull from docker to podman. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1829389 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `252e78b4e4`)	2020-05-13 16:41:11 -04:00

1 2 3 4 5 ...

624 Commits (7e2e11320d95d13071c30742108f086526557d72)