ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Dimitri Savineau	b219b1abed	switch_to_container: fix osd systemd regex The systemd LOAD and ACTIVE fileds could have more than one space between both values. This update the systemd regex the same way we're using it in different part of the code. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1843500 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `50140c9b5d`)	2020-06-16 18:10:28 +02:00
Guillaume Abrioux	c67b3d3530	switch_to_container: refact wait for pg check There is no need to make this check with several steps. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `8aed824f71`)	2020-05-22 17:05:22 +02:00
Dimitri Savineau	e6bfdd2e44	rolling_update: fix rbdmirror group name The rbdmirror group name was using the wrong variable definition. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `c0a213f928`)	2020-05-13 16:41:23 -04:00
Dimitri Savineau	9a7af0ce6a	docker2podman: manage dashboard nodes The dashboard nodes (alertmanager, grafana, node-exporter, and prometheus) were not manage during the docker to podman migration. This adds the systemd container template of those services to a dedicated file (systemd.yml) in order to include it in the docker2podman playbook. This also adds the dashboard container images pull from docker to podman. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1829389 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `252e78b4e4`)	2020-05-13 16:41:11 -04:00
Dimitri Savineau	0114457e13	docker2podman: pull images from docker daemon The docker2podman playbook only installs the podman package and updates the systemd units with the right container_binary value. We never pull the container image so if one service is restarted then the container image will be pulled first before the service can start which could cause longer downstream. To avoid to download the container image from internet again we can just pull it from the local docker daemon. The container_{binding,package,service}_name variables are removed because they are only used in the ceph-container-engine role which isn't call in this playbook. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `d38f21aeba`)	2020-05-13 16:41:11 -04:00
Dimitri Savineau	1e351bcdd7	filestore-to-bluestore: fix py2 on skipped tasks When using skipped variables with from_json filter and python2 then we need to have a default value otherwise the skipped task will fail. Unexpected templating type error occurred on ({{ (ceph_volume_lvm_list.stdout \| from_json) }}): expected string or buffer Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1790472 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `2b9edba131`)	2020-04-20 13:38:06 -04:00
Guillaume Abrioux	9b2d55c007	switch-to-containers: set and unset osd flags The workflow in this playbook should be the same than in rolling_update, we should first set noout and nodeep-scrub flags before migrating the first osd and unset osd flags after the last osd is migrated. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `2cfaa056e0`)	2020-04-06 18:05:10 +02:00
Guillaume Abrioux	529d99a691	update: use tasks_from when including ceph-facts When setting/unsetting osd flags, we can use `tasks_from` when importing `ceph-facts` role to save some times given that we only need this role for setting `container_binary` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `6df7887f87`)	2020-04-06 18:05:10 +02:00
Guillaume Abrioux	3b7459b3d9	docker2podman: call `container_options_facts.yml` on osd nodes We must call `ceph-osd` role from `container_options_facts.yml` because ceph-osd-run.sh.j2 needs variables set in this file. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1819681 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `4a4f54f6ee`)	2020-04-02 10:17:50 -04:00
Guillaume Abrioux	4a9007ce3c	remove docker.yml symlinks This commits removes these two symlinks. They were there for backward compatibility and were marked deprecated as of stable-4.0 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `9219991441`)	2020-03-31 11:59:20 -04:00
Guillaume Abrioux	5272a0d1fc	purge-container: get all osds id Adding `--all` to the `systemctl list-units` command in order to get all osds id on the node (including stoppped osds). Otherwise, it will purge the cluster but there will be leftover after that. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1814542 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `5e7962ccf6`)	2020-03-31 10:58:48 -04:00
Dimitri Savineau	1b094acf24	container: remove ulimit nofile parameter Since Ceph Octopus is python3 only we don't need to specify the max open files anymore with the container engine. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `64701437de`)	2020-03-30 09:22:28 -04:00
Guillaume Abrioux	a94035e957	purge-container: clean legacy code This commit removes a register which isn't used in this playbook. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-03-12 09:45:12 -04:00
Dimitri Savineau	38a683e5bf	filestore-to-bluestore: stop ceph-volume services We only disable the ceph-osd services but not the ceph-volume lvm services during the filestore to bluestore migration. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-03-05 17:53:32 -05:00
Dimitri Savineau	d1316ce77b	shrink-rbdmirror: fix presence after removal We should add retry/delay to check the presence of the rbdmirror daemon in the cluster status because the status takes some time to be updated. Also the metadata.hostname isn't a good key to check because it doesn't reflect the ansible_hostname fact. We should use metadata.id instead. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-03-03 10:32:15 +01:00
Dimitri Savineau	a664159061	shrink-mgr: fix systemd condition This playbook was using mds systemd condition. Also a command task was using pipeline which is not allowed. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-03-03 10:32:15 +01:00
Dimitri Savineau	08ac2e3034	shrink: don't use localhost node The ceph-facts are running on localhost so if this node is using a different OS/release that the ceph node we can have a mismatch between docker/podman container binary. This commit also reduces the scope of the ceph-facts role because we only need the container_binary tasks. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-03-03 10:32:15 +01:00
Dimitri Savineau	9d3b49293d	purge: stop rgw instances by iteration It looks like that the service module doesn't support wildcard anymore for stopping/disabling multiple services. fatal: [rgw0]: FAILED! => changed=false msg: 'This module does not currently support using glob patterns, found '''' in service name: ceph-radosgw@' ...ignoring Instead we should iterate over the rgw_instances list. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-03-02 16:32:06 +01:00
Guillaume Abrioux	a084a2a347	common: support OSDs with more than 2 digits When running environment with OSDs having ID with more than 2 digits, some tasks don't match the system units and therefore, playbook can fail. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1805643 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-02-27 09:48:36 +01:00
Guillaume Abrioux	1de2bf9991	shrink-osd: support shrinking ceph-disk prepared osds This commit adds the ceph-disk prepared osds support Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1796453 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-02-26 11:45:41 -05:00
Guillaume Abrioux	55970b18f1	shrink-osd: don't run ceph-facts entirely We need to call ceph-facts only for setting `container_binary`. Since this task has been isolated we can use `tasks_from` to only execute the needed task. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-02-26 11:45:41 -05:00
Dimitri Savineau	535da53d69	filestore-to-bluestore: reuse dedicated journal If the filestore configuration was using a dedicated journal with either a partition or a LV/VG then we need to reuse this for bluestore DB. When filestore is using a raw devices then we shouldn't destroy everything (data + journal) but only data otherwise the journal partition won't exist anymore. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1790479 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-25 16:07:21 +01:00
Dimitri Savineau	195944b123	doc: update infra playbooks statements We don't need to copy the infrastructure playbooks in the root ceph-ansible directory. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-25 15:27:52 +01:00
Benoît Knecht	8b3df4e418	infrastructure-playbooks: Run shrink-osd tasks on monitor Instead of running shring-osd tasks on localhost and delegating most of them to the first monitor, run all of them on the first monitor directly. This has the added advantage of becoming root on the monitor only, not on localhost. Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>	2020-02-19 11:16:30 -05:00
Dimitri Savineau	100e3a044e	purge-cluster: update package list to remove We only support python3 so renaming all ceph python packages. Some ceph packages were missing from the list (ceph-mon, ceph-osd or rbd-mirror) or didn't exist anymore (ceph-fs-common, libcephfs1). Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-17 11:33:15 +01:00
Guillaume Abrioux	3700aa5385	switch_to_containers: increase health check values This commit increases the default values for the following variable consumed in switch-from-non-containerized-to-containerized-ceph-daemons.yml playbook. This also moves these variables in `ceph-defaults` role so the user can set different values if needed. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1783223 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-02-07 14:59:14 -05:00
wujie1993	d8b0b3cbd9	purge: fix purge cluster failed Fix purge cluster failed when local container images does not exist. Purge node-exporter and grafana-server only when dashboard_enabled is set to True. Signed-off-by: wujie1993 qq594jj@gmail.com	2020-01-31 12:09:46 -05:00
Dimitri Savineau	cd76054f76	filestore-to-bluestore: fix undefine osd_fsid_list If the playbook is used on a host running bluestore OSDs then the osd_fsid_list won't be filled because the bluestore OSDs are reported with 'type: block' via ceph-volume lvm list command but we are looking for 'type: data' (filestore). TASK [zap ceph-volume prepared OSDs] ********* fatal: [xxxxx]: FAILED! => msg: '''osd_fsid_list'' is undefined Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1729267 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-28 02:42:39 +01:00
Dimitri Savineau	83c5a1d7a8	filestore-to-bluestore: skip bluestore osd nodes If the OSD node is already using bluestore OSDs then we should skip all the remaining tasks to avoid purging OSD for nothing. Instead we warn the user. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1790472 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-27 18:08:00 +01:00
Dimitri Savineau	a9c2300545	filestore-to-bluestore: don't fail when with no PV When the PV is already removed from the devices then we should not fail to avoid errors like: stderr: No PV found on device /dev/sdb. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1729267 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-24 20:56:08 +01:00
Guillaume Abrioux	e5812fe45b	rolling_update: support upgrading 3.x + ceph-metrics on a dedicated node When upgrading from RHCS 3.x where ceph-metrics was deployed on a dedicated node to RHCS 4.0, it fails like following: ``` fatal: [magna005]: FAILED! => changed=false gid: 0 group: root mode: '0755' msg: 'chown failed: failed to look up user ceph' owner: root path: /etc/ceph secontext: unconfined_u:object_r:etc_t:s0 size: 4096 state: directory uid: 0 ``` because we are trying to run `ceph-config` on this node, it doesn't make sense so we should simply run this play on all groups except `[grafana-server]`. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1793885 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-22 11:29:36 -05:00
Dimitri Savineau	bb3eae0c80	filestore-to-bluestore: fix osd_auto_discovery When osd_auto_discovery is set then we need to refresh the ansible_devices fact between after the filestore OSD purge otherwise the devices fact won't be populated. Also remove the gpt header on ceph_disk_osds_devices because the devices is empty at this point for osd_auto_discovery. Adding the bool filter when needed. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1729267 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-22 09:36:09 +01:00
Dimitri Savineau	f995b079a6	filestore-to-bluestore: --destroy with raw devices We still need --destroy when using a raw device otherwise we won't be able to recreate the lvm stack on that device with bluestore. Running command: /usr/sbin/vgcreate -s 1G --force --yes ceph-bdc67a84-894a-4687-b43f-bcd76317580a /dev/sdd stderr: Physical volume '/dev/sdd' is already in volume group 'ceph-b7801d50-e827-4857-95ec-3291ad6f0151' Unable to add physical volume '/dev/sdd' to volume group 'ceph-b7801d50-e827-4857-95ec-3291ad6f0151' /dev/sdd: physical volume not initialized. --> Was unable to complete a new OSD, will rollback changes Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792227 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-21 11:37:39 -05:00
Guillaume Abrioux	3e262e072b	containers: use --cpus instead --cpu-quota When using docker 1.13.1, the current condition: ``` {% if (container_binary == 'docker' and ceph_docker_version.split('.')[0] is version_compare('13', '>=')) or container_binary == 'podman' -%} ``` is wrong because it compares the first digit (1) whereas it should compare the second one. It means we always use `--cpu-quota` although documentation recommend using `--cpus` when docker version is 1.13.1 or higher. From the doc: > --cpu-quota=<value> Impose a CPU CFS quota on the container. The number of > microseconds per --cpu-period that the container is limited to before > throttled. As such acting as the effective ceiling. > If you use Docker 1.13 or higher, use --cpus instead. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-16 13:51:43 -05:00
Guillaume Abrioux	3d0898aa5d	shrink-mds: fix condition on fs deletion the new ceph status registered in `ceph_status` will report `fsmap.up` = 0 when it's the last mds given that it's done after we shrink the mds, it means the condition is wrong. Also adding a condition so we don't try to delete the fs if a standby node is going to rejoin the cluster. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1787543 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-15 10:40:14 +01:00
Guillaume Abrioux	d853da2a68	update: remove legacy This task is a code duplicate, probably a legacy, let's remove it. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-13 15:18:45 -05:00
Guillaume Abrioux	3496a0efa2	osd: support scaling up using --limit This commit lets add-osd.yml in place but mark the deprecation of the playbook. Scaling up OSDs is now possible using --limit Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-13 09:59:08 -05:00
Guillaume Abrioux	b0c491800a	docker2podman: use set_fact to override variables play vars have lower precedence than role vars and `set_fact`. We must use a `set_fact` to reset these variables. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-10 10:21:29 -05:00
Guillaume Abrioux	1c2ec9fb40	docker2podman: force systemd to reload config This is needed after a change is made in systemd unit files. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-10 10:21:29 -05:00
Guillaume Abrioux	d746575fd0	docker2podman: install podman This commit adds a package installation task in order to install podman during the docker-to-podman.yml migration playbook. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-10 10:21:29 -05:00
Dimitri Savineau	a09d1c38bf	purge-iscsi-gateways: don't run all ceph-facts We only need to have the container_binary fact. Because we're not gathering the facts from all nodes then the purge fails trying to get one of the grafana fact. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786686 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-10 15:46:15 +01:00
Dimitri Savineau	3f344fdefe	rolling_update: run registry auth before upgrading There's some tasks using the new container image during the rolling upgrade playbook that needs to execute the registry login first otherwise the nodes won't be able to pull the container image. Unable to find image 'xxx.io/foo/bar:latest' locally Trying to pull repository xxx.io/foo/bar ... /usr/bin/docker-current: Get https://xxx.io/v2/foo/bar/manifests/latest: unauthorized Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-09 16:14:33 -05:00
Dimitri Savineau	747555dfa6	shrink-rgw: refact global workflow Instead of running the ceph roles against localhost we should do it on the first mon. The ansible and inventory hostname of the rgw nodes could be different. Ensure that the rgw instance to remove is present in the cluster. Fix rgw service and directory path. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-09 19:02:17 +01:00
Guillaume Abrioux	0ae0a9ce28	shrink-mds: do not play ceph-facts entirely We only need to set `container_binary`. Let's use `tasks_from` option. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-08 10:39:27 -05:00
Guillaume Abrioux	77b39d235b	shrink-mds: use fact from delegated node The command is delegated on the first monitor so we must use the fact `container_binary` from this node. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-08 10:06:43 -05:00
Guillaume Abrioux	38278a6bb5	shrink-mds: fix filesystem removal task This commit deletes the filesystem when no more MDS is present after shrinking operation. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1787543 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-08 10:06:43 -05:00
Guillaume Abrioux	2cfe5a04bf	shrink-mds: ensure max_mds is always honored This commit prevent from shrinking an mds node when max_mds wouldn't be honored after that operation. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-08 10:06:43 -05:00
Dimitri Savineau	931a842f21	purge-iscsi-gateways: remove node from dashboard When using the ceph dashboard with iscsi gateways nodes we also need to remove the nodes from the ceph dashboard list. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786686 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-08 14:17:56 +01:00
Dimitri Savineau	42366f0a6c	purge-container-cluster: prune exited containers Remove all stopped/exited containers. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-08 11:13:46 +01:00
Guillaume Abrioux	e665d8e239	tests: upgrade from octopus to octopus on master we can't test upgrade from stable-4.0/CentOS 7 to master/CentOS 8. This commit refact the upgrade so we test upgrade from master/CentOS 8 to master/CentOS 8 (octopus to octopus) Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-08 11:13:46 +01:00

1 2 3 4 5 ...

578 Commits (c6e60db2fb942420e30c24fcda32d3ebf449d367)