ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Guillaume Abrioux	8f9cdf4b10	rolling_update: add any_errors_fatal If a failure occurs in ceph-validate, the upgrade playbook keeps running where we expect it to fail. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-06-29 12:58:53 -04:00
Dimitri Savineau	548ff26256	Add playbook for converting cluster to cephadm The commit adds a new playbook for converting an existing ceph cluster deployed by ceph-ansible to the cephadm orchestrator. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-06-29 09:21:38 -04:00
Guillaume Abrioux	37b20b6525	docker2podman: make images pulling optional This commit makes the images pulling skipped if podman isn't installed on the machine. In OSP context, the podman installation is done later in the workflow, it means all `podman pull` commands will fail. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1849559 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-06-22 12:19:38 -04:00
Dimitri Savineau	829990e60d	ceph-osd: remove ceph-osd-run.sh script Since we only have one scenario since nautilus then we can just move the container start command from ceph-osd-run.sh to the systemd unit service. As a result, the ceph-osd-run.sh.j2 template and the ceph_osd_docker_run_script_path variable are removed. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-06-18 17:51:13 +02:00
Guillaume Abrioux	b91d60d384	switch_to_containers: don't set noup flag We shouldn't set this flag when running switch_to_containers playbook. Otherwise the playbook fails waiting for pgs to be clean. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1843569 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-06-17 01:32:18 +02:00
Dimitri Savineau	50140c9b5d	switch_to_container: fix osd systemd regex The systemd LOAD and ACTIVE fileds could have more than one space between both values. This update the systemd regex the same way we're using it in different part of the code. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1843500 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-06-16 17:04:06 +02:00
Guillaume Abrioux	8aed824f71	switch_to_container: refact wait for pg check There is no need to make this check with several steps. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-05-16 07:31:57 +02:00
Dimitri Savineau	252e78b4e4	docker2podman: manage dashboard nodes The dashboard nodes (alertmanager, grafana, node-exporter, and prometheus) were not manage during the docker to podman migration. This adds the systemd container template of those services to a dedicated file (systemd.yml) in order to include it in the docker2podman playbook. This also adds the dashboard container images pull from docker to podman. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1829389 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-05-13 12:02:00 +02:00
Dimitri Savineau	d38f21aeba	docker2podman: pull images from docker daemon The docker2podman playbook only installs the podman package and updates the systemd units with the right container_binary value. We never pull the container image so if one service is restarted then the container image will be pulled first before the service can start which could cause longer downstream. To avoid to download the container image from internet again we can just pull it from the local docker daemon. The container_{binding,package,service}_name variables are removed because they are only used in the ceph-container-engine role which isn't call in this playbook. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-05-13 12:02:00 +02:00
Dimitri Savineau	c0a213f928	rolling_update: fix rbdmirror group name The rbdmirror group name was using the wrong variable definition. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-05-13 11:57:42 +02:00
Dimitri Savineau	2b9edba131	filestore-to-bluestore: fix py2 on skipped tasks When using skipped variables with from_json filter and python2 then we need to have a default value otherwise the skipped task will fail. Unexpected templating type error occurred on ({{ (ceph_volume_lvm_list.stdout \| from_json) }}): expected string or buffer Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1790472 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-04-20 16:19:18 +02:00
Dimitri Savineau	7b620a22bc	rolling_update: require_osd_release pacific Since [1] we need to set pacific for the required OSD release during the upgrade. [1] https://github.com/ceph/ceph/commit/cc99c3bc Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-04-16 15:32:00 -04:00
Guillaume Abrioux	2cfaa056e0	switch-to-containers: set and unset osd flags The workflow in this playbook should be the same than in rolling_update, we should first set noout and nodeep-scrub flags before migrating the first osd and unset osd flags after the last osd is migrated. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-04-06 17:00:00 +02:00
Guillaume Abrioux	6df7887f87	update: use tasks_from when including ceph-facts When setting/unsetting osd flags, we can use `tasks_from` when importing `ceph-facts` role to save some times given that we only need this role for setting `container_binary` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-04-06 17:00:00 +02:00
Guillaume Abrioux	4a4f54f6ee	docker2podman: call `container_options_facts.yml` on osd nodes We must call `ceph-osd` role from `container_options_facts.yml` because ceph-osd-run.sh.j2 needs variables set in this file. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1819681 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-04-02 07:56:15 +02:00
Guillaume Abrioux	9219991441	remove docker.yml symlinks This commits removes these two symlinks. They were there for backward compatibility and were marked deprecated as of stable-4.0 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-03-31 10:03:22 -04:00
Guillaume Abrioux	5e7962ccf6	purge-container: get all osds id Adding `--all` to the `systemctl list-units` command in order to get all osds id on the node (including stoppped osds). Otherwise, it will purge the cluster but there will be leftover after that. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1814542 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-03-31 09:37:30 -04:00
Dimitri Savineau	64701437de	container: remove ulimit nofile parameter Since Ceph Octopus is python3 only we don't need to specify the max open files anymore with the container engine. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-03-30 09:54:23 +02:00
Guillaume Abrioux	a94035e957	purge-container: clean legacy code This commit removes a register which isn't used in this playbook. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-03-12 09:45:12 -04:00
Dimitri Savineau	38a683e5bf	filestore-to-bluestore: stop ceph-volume services We only disable the ceph-osd services but not the ceph-volume lvm services during the filestore to bluestore migration. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-03-05 17:53:32 -05:00
Dimitri Savineau	d1316ce77b	shrink-rbdmirror: fix presence after removal We should add retry/delay to check the presence of the rbdmirror daemon in the cluster status because the status takes some time to be updated. Also the metadata.hostname isn't a good key to check because it doesn't reflect the ansible_hostname fact. We should use metadata.id instead. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-03-03 10:32:15 +01:00
Dimitri Savineau	a664159061	shrink-mgr: fix systemd condition This playbook was using mds systemd condition. Also a command task was using pipeline which is not allowed. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-03-03 10:32:15 +01:00
Dimitri Savineau	08ac2e3034	shrink: don't use localhost node The ceph-facts are running on localhost so if this node is using a different OS/release that the ceph node we can have a mismatch between docker/podman container binary. This commit also reduces the scope of the ceph-facts role because we only need the container_binary tasks. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-03-03 10:32:15 +01:00
Dimitri Savineau	9d3b49293d	purge: stop rgw instances by iteration It looks like that the service module doesn't support wildcard anymore for stopping/disabling multiple services. fatal: [rgw0]: FAILED! => changed=false msg: 'This module does not currently support using glob patterns, found '''' in service name: ceph-radosgw@' ...ignoring Instead we should iterate over the rgw_instances list. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-03-02 16:32:06 +01:00
Guillaume Abrioux	a084a2a347	common: support OSDs with more than 2 digits When running environment with OSDs having ID with more than 2 digits, some tasks don't match the system units and therefore, playbook can fail. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1805643 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-02-27 09:48:36 +01:00
Guillaume Abrioux	1de2bf9991	shrink-osd: support shrinking ceph-disk prepared osds This commit adds the ceph-disk prepared osds support Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1796453 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-02-26 11:45:41 -05:00
Guillaume Abrioux	55970b18f1	shrink-osd: don't run ceph-facts entirely We need to call ceph-facts only for setting `container_binary`. Since this task has been isolated we can use `tasks_from` to only execute the needed task. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-02-26 11:45:41 -05:00
Dimitri Savineau	535da53d69	filestore-to-bluestore: reuse dedicated journal If the filestore configuration was using a dedicated journal with either a partition or a LV/VG then we need to reuse this for bluestore DB. When filestore is using a raw devices then we shouldn't destroy everything (data + journal) but only data otherwise the journal partition won't exist anymore. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1790479 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-25 16:07:21 +01:00
Dimitri Savineau	195944b123	doc: update infra playbooks statements We don't need to copy the infrastructure playbooks in the root ceph-ansible directory. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-25 15:27:52 +01:00
Benoît Knecht	8b3df4e418	infrastructure-playbooks: Run shrink-osd tasks on monitor Instead of running shring-osd tasks on localhost and delegating most of them to the first monitor, run all of them on the first monitor directly. This has the added advantage of becoming root on the monitor only, not on localhost. Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>	2020-02-19 11:16:30 -05:00
Dimitri Savineau	100e3a044e	purge-cluster: update package list to remove We only support python3 so renaming all ceph python packages. Some ceph packages were missing from the list (ceph-mon, ceph-osd or rbd-mirror) or didn't exist anymore (ceph-fs-common, libcephfs1). Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-17 11:33:15 +01:00
Guillaume Abrioux	3700aa5385	switch_to_containers: increase health check values This commit increases the default values for the following variable consumed in switch-from-non-containerized-to-containerized-ceph-daemons.yml playbook. This also moves these variables in `ceph-defaults` role so the user can set different values if needed. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1783223 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-02-07 14:59:14 -05:00
wujie1993	d8b0b3cbd9	purge: fix purge cluster failed Fix purge cluster failed when local container images does not exist. Purge node-exporter and grafana-server only when dashboard_enabled is set to True. Signed-off-by: wujie1993 qq594jj@gmail.com	2020-01-31 12:09:46 -05:00
Dimitri Savineau	cd76054f76	filestore-to-bluestore: fix undefine osd_fsid_list If the playbook is used on a host running bluestore OSDs then the osd_fsid_list won't be filled because the bluestore OSDs are reported with 'type: block' via ceph-volume lvm list command but we are looking for 'type: data' (filestore). TASK [zap ceph-volume prepared OSDs] ********* fatal: [xxxxx]: FAILED! => msg: '''osd_fsid_list'' is undefined Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1729267 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-28 02:42:39 +01:00
Dimitri Savineau	83c5a1d7a8	filestore-to-bluestore: skip bluestore osd nodes If the OSD node is already using bluestore OSDs then we should skip all the remaining tasks to avoid purging OSD for nothing. Instead we warn the user. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1790472 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-27 18:08:00 +01:00
Dimitri Savineau	a9c2300545	filestore-to-bluestore: don't fail when with no PV When the PV is already removed from the devices then we should not fail to avoid errors like: stderr: No PV found on device /dev/sdb. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1729267 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-24 20:56:08 +01:00
Guillaume Abrioux	e5812fe45b	rolling_update: support upgrading 3.x + ceph-metrics on a dedicated node When upgrading from RHCS 3.x where ceph-metrics was deployed on a dedicated node to RHCS 4.0, it fails like following: ``` fatal: [magna005]: FAILED! => changed=false gid: 0 group: root mode: '0755' msg: 'chown failed: failed to look up user ceph' owner: root path: /etc/ceph secontext: unconfined_u:object_r:etc_t:s0 size: 4096 state: directory uid: 0 ``` because we are trying to run `ceph-config` on this node, it doesn't make sense so we should simply run this play on all groups except `[grafana-server]`. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1793885 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-22 11:29:36 -05:00
Dimitri Savineau	bb3eae0c80	filestore-to-bluestore: fix osd_auto_discovery When osd_auto_discovery is set then we need to refresh the ansible_devices fact between after the filestore OSD purge otherwise the devices fact won't be populated. Also remove the gpt header on ceph_disk_osds_devices because the devices is empty at this point for osd_auto_discovery. Adding the bool filter when needed. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1729267 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-22 09:36:09 +01:00
Dimitri Savineau	f995b079a6	filestore-to-bluestore: --destroy with raw devices We still need --destroy when using a raw device otherwise we won't be able to recreate the lvm stack on that device with bluestore. Running command: /usr/sbin/vgcreate -s 1G --force --yes ceph-bdc67a84-894a-4687-b43f-bcd76317580a /dev/sdd stderr: Physical volume '/dev/sdd' is already in volume group 'ceph-b7801d50-e827-4857-95ec-3291ad6f0151' Unable to add physical volume '/dev/sdd' to volume group 'ceph-b7801d50-e827-4857-95ec-3291ad6f0151' /dev/sdd: physical volume not initialized. --> Was unable to complete a new OSD, will rollback changes Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792227 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-21 11:37:39 -05:00
Guillaume Abrioux	3e262e072b	containers: use --cpus instead --cpu-quota When using docker 1.13.1, the current condition: ``` {% if (container_binary == 'docker' and ceph_docker_version.split('.')[0] is version_compare('13', '>=')) or container_binary == 'podman' -%} ``` is wrong because it compares the first digit (1) whereas it should compare the second one. It means we always use `--cpu-quota` although documentation recommend using `--cpus` when docker version is 1.13.1 or higher. From the doc: > --cpu-quota=<value> Impose a CPU CFS quota on the container. The number of > microseconds per --cpu-period that the container is limited to before > throttled. As such acting as the effective ceiling. > If you use Docker 1.13 or higher, use --cpus instead. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-16 13:51:43 -05:00
Guillaume Abrioux	3d0898aa5d	shrink-mds: fix condition on fs deletion the new ceph status registered in `ceph_status` will report `fsmap.up` = 0 when it's the last mds given that it's done after we shrink the mds, it means the condition is wrong. Also adding a condition so we don't try to delete the fs if a standby node is going to rejoin the cluster. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1787543 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-15 10:40:14 +01:00
Guillaume Abrioux	d853da2a68	update: remove legacy This task is a code duplicate, probably a legacy, let's remove it. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-13 15:18:45 -05:00
Guillaume Abrioux	3496a0efa2	osd: support scaling up using --limit This commit lets add-osd.yml in place but mark the deprecation of the playbook. Scaling up OSDs is now possible using --limit Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-13 09:59:08 -05:00
Guillaume Abrioux	b0c491800a	docker2podman: use set_fact to override variables play vars have lower precedence than role vars and `set_fact`. We must use a `set_fact` to reset these variables. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-10 10:21:29 -05:00
Guillaume Abrioux	1c2ec9fb40	docker2podman: force systemd to reload config This is needed after a change is made in systemd unit files. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-10 10:21:29 -05:00
Guillaume Abrioux	d746575fd0	docker2podman: install podman This commit adds a package installation task in order to install podman during the docker-to-podman.yml migration playbook. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-10 10:21:29 -05:00
Dimitri Savineau	a09d1c38bf	purge-iscsi-gateways: don't run all ceph-facts We only need to have the container_binary fact. Because we're not gathering the facts from all nodes then the purge fails trying to get one of the grafana fact. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786686 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-10 15:46:15 +01:00
Dimitri Savineau	3f344fdefe	rolling_update: run registry auth before upgrading There's some tasks using the new container image during the rolling upgrade playbook that needs to execute the registry login first otherwise the nodes won't be able to pull the container image. Unable to find image 'xxx.io/foo/bar:latest' locally Trying to pull repository xxx.io/foo/bar ... /usr/bin/docker-current: Get https://xxx.io/v2/foo/bar/manifests/latest: unauthorized Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-09 16:14:33 -05:00
Dimitri Savineau	747555dfa6	shrink-rgw: refact global workflow Instead of running the ceph roles against localhost we should do it on the first mon. The ansible and inventory hostname of the rgw nodes could be different. Ensure that the rgw instance to remove is present in the cluster. Fix rgw service and directory path. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-09 19:02:17 +01:00
Guillaume Abrioux	0ae0a9ce28	shrink-mds: do not play ceph-facts entirely We only need to set `container_binary`. Let's use `tasks_from` option. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-08 10:39:27 -05:00
Guillaume Abrioux	77b39d235b	shrink-mds: use fact from delegated node The command is delegated on the first monitor so we must use the fact `container_binary` from this node. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-08 10:06:43 -05:00
Guillaume Abrioux	38278a6bb5	shrink-mds: fix filesystem removal task This commit deletes the filesystem when no more MDS is present after shrinking operation. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1787543 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-08 10:06:43 -05:00
Guillaume Abrioux	2cfe5a04bf	shrink-mds: ensure max_mds is always honored This commit prevent from shrinking an mds node when max_mds wouldn't be honored after that operation. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-08 10:06:43 -05:00
Dimitri Savineau	931a842f21	purge-iscsi-gateways: remove node from dashboard When using the ceph dashboard with iscsi gateways nodes we also need to remove the nodes from the ceph dashboard list. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786686 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-08 14:17:56 +01:00
Dimitri Savineau	42366f0a6c	purge-container-cluster: prune exited containers Remove all stopped/exited containers. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-08 11:13:46 +01:00
Guillaume Abrioux	e665d8e239	tests: upgrade from octopus to octopus on master we can't test upgrade from stable-4.0/CentOS 7 to master/CentOS 8. This commit refact the upgrade so we test upgrade from master/CentOS 8 to master/CentOS 8 (octopus to octopus) Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-08 11:13:46 +01:00
Guillaume Abrioux	8056514134	filestore-to-bluestore: umount partitions before zapping them When an OSD is stopped, it leaves partitions mounted. We must umount them before zapping them, otherwise error like "Device is busy" will show up. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1729267 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-12-19 09:22:25 +01:00
Guillaume Abrioux	8e6ef818a2	filestore-to-bluestore: ensure all dm are closed This commit adds a task to ensure device mappers are well closed when lvm batch scenario is used. Otherwise, OSDs can't be redeployed given that devices that are rejected by ceph-volume because they are locked. Adding a condition `devices \| default([]) \| length > 0` to remove these dm only when using lvm batch scenario. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-12-11 09:04:41 -05:00
Guillaume Abrioux	51d601193e	filestore-to-bluestore: force OSDs to be marked down Otherwise, sometimes it can take a while for an OSD to be seen as down and causes the `ceph osd purge` command to fail. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-12-11 09:04:41 -05:00
Guillaume Abrioux	e3305e6bb6	filestore-to-bluestore: do not use --destroy Do not use `--destroy` when zapping a device. Otherwise, it destroys VGs while they are still needed to redeploy the OSDs. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-12-11 09:04:41 -05:00
Guillaume Abrioux	4833b85e04	filestore-to-bluestore: add non containerized support This commit adds the non containerized context support to the filestore-to-bluestore.yml infrastructure playbook. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1729267 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-12-11 09:04:41 -05:00
Guillaume Abrioux	6d9ca6b05b	shrink-osd: support fqdn in inventory When using fqdn in inventory, that playbook fails because of some tasks using the result of ceph osd tree (which returns shortname) to get some datas in hostvars[]. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1779021 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-12-09 10:52:38 -05:00
Guillaume Abrioux	332c39376b	switch_to_containers: exclude clients nodes from facts gathering just like site.yml and rolling_update, let's exclude clients node from the fact gathering. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-12-09 10:49:13 -05:00
Guillaume Abrioux	c7708eb458	update: restart iscsigws daemons after upgrade In containerized context, containers aren't stopped early in the sequence. It means they aren't restarted after the upgrade because the task is just checking the daemon status is started (eg: `state: started`). This commit also removes the task which ensure services are started because it's already done in the role ceph-iscsigw. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-12-05 13:02:06 -05:00
Guillaume Abrioux	451c5ca934	upgrade: add dashboard deployment when upgrading from RHCS 3, dashboard has obviously never been deployed and it forces us to deploy it later manually. This commit adds the dashboard deployment as part of the upgrade to RHCS 4. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1779092 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-12-05 13:02:06 -05:00
Dimitri Savineau	89f6cc54a2	purge-cluster: add podman support The podman support was added to the purge-container-cluster playbook but containers are always used for the dashboard even on non containerized deployment. This commits adds the podman support on purging the dashboard resources in the purge-cluster playbook. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-12-04 14:15:12 -05:00
Guillaume Abrioux	f5a81b1790	purge: fix symlink to purge-container-cluster ceph/ceph-ansible#4805 introduced a symlink to purge-container-cluster.yml playbook which is broken. This commit fixes it. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-12-04 09:38:34 +01:00
Guillaume Abrioux	7bc7e3669d	purge: rename playbook (container) Since we now support podman, let's rename the playbook so it's more generic. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-12-03 11:10:21 -05:00
Guillaume Abrioux	b18476a1a6	purge: do not try to stop docker when binary is podman If the container binary is podman, we shouldn't try to stop docker here. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-12-03 13:29:52 +01:00
Guillaume Abrioux	fe5ffe589e	facts: isolate container_binary facts in order to be able to call container_binary without having to run the whole ceph-facts role. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-12-03 13:29:52 +01:00
Guillaume Abrioux	d23383a820	purge: remove docker_* task All containers are removed when systemd stops them. There is no need to call this module in purge container playbook. This commit also removes all docker_image task and remove all container images in the final cleanup play. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1776736 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-12-03 13:29:52 +01:00
Guillaume Abrioux	a43a872105	docker2podman: import ceph-handler role This is needed to avoid following error: ``` ERROR! The requested handler 'restart ceph mons' was not found in either the main handlers list nor in the listening handlers list ``` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1777829 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-12-02 09:11:12 -05:00
Guillaume Abrioux	7fe0d55eff	docker2podman: do not hardcode group name let's use `client_group_name` instead of hardcoding the name. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-12-02 09:11:12 -05:00
Guillaume Abrioux	6526a25ab5	docker2podman: import ceph-defaults in first play We must import this role in the first play otherwise the first call to `client_group_name`fails. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1777829 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-12-02 09:11:12 -05:00
Dimitri Savineau	39cfe0aa65	switch_to_containers: fix umount ceph partitions When a container is already running on a non containerized node then the umount ceph partition task is skipped. This is due to the container ps command which always returns 0 even if the filter matches nothing. We should run the umount task when: 1/ the container command is failing (not installed) : rc != 0 2/ the container command reports running ceph-osd containers : rc == 0 Also we should not fail on the ceph directory listing. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1616159 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-12-02 09:19:50 +01:00
Guillaume Abrioux	0441812959	purge/update: remove backward compatibility legacy This was introduced in 3.1 and marked as deprecation We can definitely drop it in stable-4.0 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-11-27 10:27:43 -05:00
Guillaume Abrioux	c878e99589	update: only run post osd upgrade play on 1 mon There is no need to run these tasks n times from each monitor. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-11-20 09:22:19 -05:00
Guillaume Abrioux	548db78b95	update: use flags noout and nodeep-scrub only 1. set noout and nodeep-scrub flags, 2. upgrade each OSD node, one by one, wait for active+clean pgs 3. after all osd nodes are upgraded, unset flags Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> Co-authored-by: Rachana Patel <racpatel@redhat.com>	2019-11-20 09:22:19 -05:00
Guillaume Abrioux	206ee589d6	update: reset flags before and after each osd node upgrade It might be possible at some point even with osd flags `noout` and `norebalance` set the PGs states can change depending on the amount of data written meantime. It means the check for PGs state will fail. This commit changes the way we set those flags: we set them before an OSD node upgrade and unset them before the PGs state check so they can recover. Fixes: #3961 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-11-08 09:10:52 -05:00
Guillaume Abrioux	3cfcc7a105	purge: use sysfs to unmap rbd devices in containerized context, using the binary provided in atomic os won't work because it's an old version provided by ceph-common based on 10.2.5. Using a container could be an idea but for large cluster with hundreds of client nodes, that would require to pull the image of each of them just to unmap the rbd devices. Let's use the sysfs method in order to avoid any issue related to ceph version that is shipped on the host. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1766064 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-11-08 09:06:11 -05:00
Dimitri Savineau	34b03d1873	add-{mon,osd}: run raw install python tasks If the new mon/osd node doesn't have python installed then we need to execute the tasks from raw_install_python.yml. Closes: #4368 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-11-07 14:04:26 +01:00
Guillaume Abrioux	e9823f319b	update: add default values when setting fact This commit adds a default value in the `with_dict` because when using python 2.7, if a task using a `with_dict` has a condition, it is evaluated anyway whereas in python 3 it isn't. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1766499 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-10-29 14:45:28 -04:00
Dimitri Savineau	2ca79fcc99	rolling_update: remove default filter on mds group There's no need to use the default filter on active/standby groups because if the group doesn't exist then the play is just skipped. Currently this generates warnings like: [WARNING]: Could not match supplied host pattern, ignoring: \| [WARNING]: Could not match supplied host pattern, ignoring: default([]) Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-10-28 15:02:50 +01:00
Dimitri Savineau	f1f2352c79	rolling_update: fix active mds host value The active mds host should be based on the inventory hostname and not on the ansible hostname. The value returns under the mdsmap structure is based on the OS hostname so we need to find the right node in the inventory with this value when doing operation on inventory nodes. Othewise we could see error like: The task includes an option with an undefined variable. The error was: "hostvars[foobar]" is undefined Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-10-28 15:02:50 +01:00
Dimitri Savineau	77b212833e	add-mon: add missing become flag Without the become flag set to true, we can't executed the roles successfully. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-10-25 23:05:25 +02:00
Dimitri Savineau	650bc0c3f0	rolling_update: fix reset mon_host variable mon_host should use the inventory hostname and not the node hostname. Fix creates an issue when the inventory and node hostname are different. Closes: #4670 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-10-25 23:04:35 +02:00
Dimitri Savineau	bfb1d6be12	add-{mon,osd}: add ceph-container-engine role The ceph-container-engine role is missing from both playbooks so the container engine (docker, podman) isn't install resulting in a failure on the added nodes. fatal: [xxxxx]: FAILED! => changed=false cmd: docker --version msg: '[Errno 2] No such file or directory' rc: 2 Closes: #4634 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-10-24 16:24:02 -04:00
Guillaume Abrioux	d06057ebd2	update: use right node when creating active mds group This must be consistent with what is used in `name` parameter. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-10-24 15:15:51 -04:00
Guillaume Abrioux	1122da7f4a	update: avoid skipping single mds deployment upgrade otherwise a single MDS would never be updated. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-10-24 09:28:36 +02:00
Guillaume Abrioux	5ec906c3af	update: skip mds deactivation when no mds in inventory Let's skip this part of the code if there's no mds node in the inventory. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-10-23 11:06:13 -04:00
Guillaume Abrioux	8d72ff8e5e	update: add missing quotes Add missing quote in order to keep consistency. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-10-21 09:19:34 -04:00
Guillaume Abrioux	25b98b2ce3	tests: add multimds coverage This commit makes the all_daemons scenario deploying 3 mds in order to cover the multimds case. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-10-18 13:43:13 -04:00
Guillaume Abrioux	c4fc8cc878	upgrade: fix standby_mdss group creation This commit fixes the standby_mdss group creation by using `{{ item }}`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-10-18 13:43:13 -04:00
Dimitri Savineau	8426856262	Move the dashboard playbook in the main directory The [group\|host]_vars directories are ignored for the dashboard playbook when the inventory file directory doesn't contain those directories. Closes: #4601 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1761612 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-10-17 21:54:09 -04:00
Guillaume Abrioux	71cebf80a6	update: follow new recommandation to upgrade mds cluster Refact the mds cluster upgrade code in order to follow the documented recommandation. See: https://github.com/ceph/ceph/blob/master/doc/cephfs/upgrading.rst Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1569689 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-10-16 11:23:12 -04:00
Dimitri Savineau	68a3dac7cd	Execute common roles once on all nodes The common roles don't need to be executed again on each group plays (like mons, osds, etc..). We only need to execute them during the first play. That wat, we will apply the changes on all nodes in parallel instead of doing it once per group. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-10-16 15:31:24 +02:00
Dimitri Savineau	5ae7304ace	dashboard: disable facts gathering This is already done in the main playbooks but absent in the dashboard playbook. The facts are already gathered during the first play of the main playbooks so we don't need to doing twice. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-10-11 14:51:33 -04:00
Guillaume Abrioux	0b245bd007	dashboard: if no host is available, let's just skip these plays. If there is no host available, let's just skip these plays. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1759917 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-10-09 13:27:09 -04:00
Dimitri Savineau	19edf707a5	switch_to_containers: umount osd lockbox partition When switching from a baremetal deployment to a containerized deployment we only umount the OSD data partition. If the OSD is encrypted (dmcrypt: true) then there's an additional partition (part number 5) used for the lockbox and mount in the /var/lib/ceph/osd-lockbox/ directory. Because this partition isn't umount then the containerized OSD aren't able to start. The partition is still mount by the system and can't be remount from the container. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1616159 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-10-08 00:45:52 +02:00
Guillaume Abrioux	fa9b42e98e	switch_to_containers: do not re-set `ceph_uid` This commit refacts the way we set `ceph_uid` fact in `ceph-facts` and removes all `set_fact` tasks for `ceph_uid` in switch-to-containers playbook to avoid duplicated code. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-10-07 14:15:56 +02:00
Guillaume Abrioux	c5d0c90bb7	switch_to_containers: optimize ownership change As per https://github.com/ceph/ceph-ansible/pull/4323#issuecomment-538420164 using `find` command should be faster. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1757400 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> Co-Authored-by: Giulio Fidente <gfidente@redhat.com>	2019-10-07 14:15:56 +02:00
Guillaume Abrioux	8138d4193c	update: import ceph-defaults role in first play Typical error: ``` fatal: [mon0]: FAILED! => msg: \|- The conditional check 'not delegate_facts_host \| bool or inventory_hostname in groups.get(client_group_name, [])' failed. The error was: error while evaluating conditional (not delegate_facts_host \| bool or inventory_hostname in groups.get(client_group_name, [])): 'client_group_name' is undefined ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-10-07 09:00:38 +02:00
Guillaume Abrioux	865d2eac9b	main: exclude client nodes from facts gathering when delegate_facts_host This commit excludes client nodes from facts gathering, they are not needed and can speed up this task. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-10-07 09:00:38 +02:00
Dimitri Savineau	cf47594b47	dashboard: remove useless block section The block section were used with the dashboard_enabled condition when the code was included in the main playbooks. Because this condition isn't present in the dashboard playbook anymore we can remove the block section. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-10-04 07:42:05 +02:00
Guillaume Abrioux	e08194dd67	rgw: refact tasks directory layout This commit moves containerized deployment related files to `./tasks/` directory. This is needed to make `docker-to-podman.yml` working since we use `tasks_from:` option. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-10-01 10:27:51 -04:00
Guillaume Abrioux	c69816c6b7	rbdmirror: refact tasks directory layout This commit moves containerized deployment related files to `./tasks/` directory. This is needed to make `docker-to-podman.yml` working since we use `tasks_from:` option. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-10-01 10:27:51 -04:00
Guillaume Abrioux	4636f3f7e2	iscsigw: refact tasks directory layout This commit moves containerized deployment related files to `./tasks/ directory. This is needed to make `docker-to-podman.yml` working since we use `tasks_from:` option. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-10-01 10:27:51 -04:00
Guillaume Abrioux	f2017dcda2	upgrade: add an infra playbook to migrate systemd units to podman this commit adds a new playbook to force systemd units for containers to use podman instead of docker. This is needed in the rhel8 upgrade context so after the base OS is upgraded containers can be started using podman. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-10-01 10:27:51 -04:00
Guillaume Abrioux	d84160a170	update: reset mon_host after mons upgrade after all mon are upgraded, let's reset mon_host which is used in the rest of the playbook for setting `container_exec_cmd` so we are sure to use the right value. Typical error: ``` failed: [mds0 -> mon0] (item={u'path': u'/var/lib/ceph/bootstrap-mds/ceph.keyring', u'name': u'client.bootstrap-mds', u'copy_key': True}) => changed=true ansible_loop_var: item cmd: - docker - exec - ceph-mon-mon2 - ceph - --cluster - ceph - auth - get - client.bootstrap-mds delta: '0:00:00.016294' end: '2019-09-27 13:54:58.828835' item: copy_key: true name: client.bootstrap-mds path: /var/lib/ceph/bootstrap-mds/ceph.keyring msg: non-zero return code rc: 1 start: '2019-09-27 13:54:58.812541' stderr: 'Error response from daemon: No such container: ceph-mon-mon2' stderr_lines: <omitted> stdout: '' stdout_lines: <omitted> ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-09-28 04:37:47 +02:00
Harald Jensås	e695efcaf7	Replace ipaddr() with ips_in_ranges() This change implements a filter_plugin that is used in the ceph-facts, ceph-validate roles and infrastucture-playbooks. The new filter plugin will return a list of all IP address that reside in any one of the given IP ranges. The new filter replaces the use of the ipaddr filter. ceph.conf already support a comma separated list of CIDRs for the public_network and cluster_network options. Changes: [1] and [2] introduced a regression in ceph-ansible where public_network can no longer be a comma separated list of cidrs. With this change a comma separated list of subnet CIDRs can also be used for monitor_address_block and radosgw_address_block. [1] commit: `d67230b2a2` [2] commit: `20e4852888` Related-To: https://bugs.launchpad.net/tripleo/+bug/1840030 Related-To: https://bugzilla.redhat.com/show_bug.cgi?id=1740283 Closes: #4333 Please backport to stable-4.0 Signed-off-by: Harald Jensås <hjensas@redhat.com>	2019-09-27 10:11:53 +02:00
Sam Choraria	7cc9f93680	rolling_update.yml: force ceph-volume scan on osds The rolling_update.yml playbook fails when scanning ceph-disk osds while deploying nautilus. The --force flag is required to scan existing osds and rewrite their json metadata. Signed-off-by: Sam Choraria <sam.choraria@bbc.co.uk>	2019-09-26 16:53:25 +02:00
Guillaume Abrioux	3f9ccdaa8a	infrastructure-playbooks: add filestore-to-bluestore.yml This playbook helps to migrate all osds on a node from filestore to bluestore backend. Note that ALL osd on the specified osd nodes will be shrinked and redeployed. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1729267 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-09-26 11:35:24 +02:00
Guillaume Abrioux	c785ad3637	lv-create: fix a typo This commit fixes a typo. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-09-26 11:35:24 +02:00
Mehdy	9fa98d79fd	shrink-rgw.yml: fix confirmation play's name the confirmation play's name should confirm removing rgw instead of monitor Signed-off-by: Mehdy Khoshnoody <mehdy.khoshnoody@gmail.com>	2019-09-24 07:47:56 +02:00
Dimitri Savineau	734c0dc310	shrink-mon: search mon in the quorum_names list If we're looking at the mon hostname in the ceph status output then there's some scenarios where this could be true. If we collocate some services (mons, mgrs, etc..) then the hostname of the monitor to shrink will still be present in the ceph status (like in mgrs or other). Instead we should check the hostame only in the mon part of the output. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-09-18 14:35:02 +02:00
Guillaume Abrioux	5986b26a01	global: add newline at end of file This commit re-add a newline at end of files when it's missing. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-08-23 15:56:47 +02:00
Kevin Jones	47bf47c9d8	Set proper ownership command performance improvement By changing the set ownership command from using the file module in combination with a with_items loop to a raw chown command, we can achieve a 98% performance increase here. On a ceph cluster with a significant amount of directories and files in /var/lib/ceph, the file module has to run checks on ownership of all those directories and files to determine whether a change is needed. In this case, we just want to explicitly set the ownership of all these directories and files to the ceph_uid Added context note to all set proper ownership tasks Signed-off-by: Kevin Jones <kevinjones@redhat.com>	2019-08-22 10:26:47 +02:00
Guillaume Abrioux	5573f17e76	shrink-mon: refact 'verify the monitor is out of the cluster' task use `from_json` filter instead of a `\| python` so we can get rid of the `shell` module usage here. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-08-14 16:42:02 +02:00
Guillaume Abrioux	13815ad3ca	common: use discovered_interpreter_python fact in order to use the right binary name when using python cli in command or shell module. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-08-14 16:42:02 +02:00
Dimitri Savineau	16939eff9e	dashboard: run dashboard role on mgr/mon nodes We don't need to execute the ceph-dashboard role on the nodes present in the grafana-server group. This one is dedicated to the grafana and prometheus stack. The ceph-dashboard needs to executed where the ceph-mgr is running. It is either on the dedicated mgr nodes or if mgr and mon are collocated implicitly on the mon nodes. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-08-06 09:34:20 +02:00
Rishabh Dave	632a44bdf2	add a playbook the remove rgw from a given node Add a playbook named shrink-rgw.yml to infrastructure-playbooks/ that can remove a RGW from a node in an already deployed Ceph cluster. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431 Signed-off-by: Rishabh Dave <ridave@redhat.com>	2019-07-30 08:45:57 +02:00
Guillaume Abrioux	d67230b2a2	dashboard: use dedicated group only There's no need to add complexity and trying to fallback on other group. Let's deploy dashboard on all nodes present in grafana-server group. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-07-29 14:42:45 +02:00
Dimitri Savineau	43135840b1	dashboard: move code into a dedicated playbook Move dashboard, grafana/prometheus and node-exporter plays into a dedicated playbook in infrastructure-playbook directory. To avoid using 'dashboard_enabled \| bool' condition multiple time in the main playbook we can just import the dashboard playbook or not. This patch also allows to use an unique dashboard playbook for both baremetal and container playbooks. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-07-29 14:42:45 +02:00
Dimitri Savineau	07c6695d16	Remove NBSP characters Some NBSP are still present in the yaml files. Adding a test in travis CI. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-07-26 16:09:23 -04:00
Rishabh Dave	5aecdd3ba6	infra-playbooks: rewite a condition for better readability Use facility built-in in Ansible to check whether a command was executed successfully rather looking at its return value. Signed-off-by: Rishabh Dave <ridave@redhat.com>	2019-07-25 16:21:34 +02:00
Guillaume Abrioux	916dc1f52f	shrink-rbdmirror: check if rbdmirror is well removed from cluster This commits adds a check to ensure the daemon has been removed from the cluster. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-07-15 11:22:17 +02:00
Rishabh Dave	c4824acb19	add a playbook that removes rbd-mirror from a node Add a playbook named "shrink-rbdmirror.yml" in infrastructure-playbooks/ that removes a RBD Mirror from a node in an already deployed Ceph cluster. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431 Signed-off-by: Rishabh Dave <ridave@redhat.com>	2019-07-15 11:22:17 +02:00
Rishabh Dave	f4ea75051b	add a playbook that removes manager from a node Add a playbook, named "shrink-mgr.yml", in infrastructure-playbooks/ that removes a MGR from a node in an already deployed Ceph cluster. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431 Signed-off-by: Rishabh Dave <ridave@redhat.com>	2019-07-09 14:37:02 +02:00
Guillaume Abrioux	7df62fde34	shrink-mds: refact post tasks This commit refacts the way we check the "mds_to_kill" node is well stopped. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> Signed-off-by: Rishabh Dave <ridave@redhat.com>	2019-07-08 11:05:28 +02:00
Rishabh Dave	235b1fccc6	add a playbook that removes mds from a node Add a playbook, named "shrink-mds.yml", in infrastructure-playbooks/ that removes a MDS from a node in an already deployed Ceph cluster. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431 Signed-off-by: Rishabh Dave <ridave@redhat.com>	2019-07-08 11:05:28 +02:00
Mike Christie	b163206db7	igw: Support new ceph-iscsi package during purge The ceph-iscsi-config and ceph-iscsi-cli packages were combined into ceph-iscsi and its APIs changed. This fixes up the iscsi purge task to support the new API and old one. Signed-off-by: Mike Christie <mchristi@redhat.com>	2019-07-03 22:13:19 +02:00
Guillaume Abrioux	20e4852888	purge: ensure no ceph kernel thread is present This tries to first unmount any cephfs/nfs-ganesha mount point on client nodes, then unmap any mapped rbd devices and finally it tries to remove ceph kernel modules. If it fails it means some resources are still busy and should be cleaned manually before continuing to purge the cluster. This is done early in the playbook so the cluster stays untouched until everything is ready for that operation, otherwise if you try to redeploy a cluster it could end up by getting confused by leftover from previous deployment. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1337915 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-24 10:05:11 +02:00
Guillaume Abrioux	6dce51183b	upgrade: accept HEALTH_OK and HEALTH_WARN as valid state `3a100cfa52` introduced a check which is a bit too restrictive, let's accept HEALTH_OK and HEALTH_WARN. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-21 10:08:56 +02:00
Guillaume Abrioux	3a100cfa52	rolling_update: fail early if cluster state is not OK starting an upgrade if the cluster isn't HEALTH_OK isn't a good idea. Let's check for the cluster status before trying to upgrade. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-18 12:45:01 -04:00
Guillaume Abrioux	51b2813e04	rolling_update: only mask and stop unit in mgr part Otherwise it fails like following: ``` fatal: [mon0]: FAILED! => changed=false msg: \|- Unable to enable service ceph-mgr@mon0: Failed to execute operation: Cannot send after transport endpoint shutdown ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-18 12:45:01 -04:00
Dimitri Savineau	da8b7ab7fb	remove ceph restapi references The ceph restapi configuration was only available until Luminous release so we don't need those leftovers for nautilus+. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-06-18 09:13:19 +02:00
Rishabh Dave	2034387f57	use pre_tasks and post_tasks in shrink-mon.yml too This commit should've been part of commit `2fb12ae554`. Signed-off-by: Rishabh Dave <ridave@redhat.com>	2019-06-17 12:14:49 -04:00
Dimitri Savineau	44c63903ca	purge-cluster: clean all ceph repo files We currently only purge rh_storage yum repository file but depending on the ceph_repository value we are using, the ceph repository file could have a different name. Resolves: #4056 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-06-07 09:28:14 +02:00
guihecheng	59e702ec39	Add section for purging rgw loadbalancer in purge-cluster.yml Signed-off-by: guihecheng <guihecheng@cmiot.chinamobile.com>	2019-06-06 17:12:04 +02:00
L3D	ab54fe20ec	ansible: use 'bool' filter on boolean conditionals By running ceph-ansible there are a lot ``[DEPRECATION WARNING]`` like these: ``` [DEPRECATION WARNING]: evaluating containerized_deployment as a bare variable, this behaviour will go away and you might need to add \|bool to the expression in the future. Also see CONDITIONAL_BARE_VARS configuration toggle.. This feature will be removed in version 2.12. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg. ``` Now appended ``\| bool`` on a lot of the affected variables. Sometimes the coding style from ``variable\|bool`` changed to ``variable \| bool`` (with spaces at the pipe). Closes: #4022 Signed-off-by: L3D <l3d@c3woc.de>	2019-06-06 10:21:17 +02:00
Dimitri Savineau	7503098ca0	remove ceph-agent role and references The ceph-agent role was used only for RHCS 2 (jewel) so it's not usefull anymore. The current code will fail on CentOS distribution because the rhscon package is only avaible on Red Hat with the RHCS 2 repository and this ceph release is supported on stable-3.0 branch. Resolves: #4020 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-06-03 13:35:50 +02:00
Guillaume Abrioux	55420d6253	roles: introduce `ceph-container-engine` role This commit splits the current `ceph-container-common` role. This introduces a new role `ceph-container-engine` which handles the tasks specific to the installation of containers tools (docker/podman). This is needed for the ceph-dashboard implementation for 2 main reasons: 1/ Since the ceph-dashboard stack is only containerized, we must install everything needed to run containers even in non containerized deployments. Splitting this role allows us to not have to call the full `ceph-container-common` role which would run a bunch of unneeded tasks that would have been skipped anyway. 2/ The current implementation would have required to run `ceph-container-common` on all ceph-clients nodes which would have been conflicting with `9d3517c670` (we don't want to run ceph-container-common on all client nodes, see mentioned commit for more details) Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-05-22 13:02:10 +02:00
Guillaume Abrioux	72d8315299	switch to ansible 2.8 - remove private attribute with import_role. - update documentation. - update rpm spec requirement. - fix MagicMock python import in unit tests. Closes: #3765 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-05-20 13:04:58 +02:00
Dimitri Savineau	638604929b	purge-docker-cluster: don't remove data on atomic Because we don't manage the docker service on atomic (yet) via the ceph-container-common role then we can't stop docker dans remove the data. For now let's do that only for non atomic hosts. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-05-17 09:33:50 +02:00
Guillaume Abrioux	e74d80e72f	rename docker_exec_cmd variable This commit renames the `docker_exec_cmd` variable to `container_exec_cmd` so it's more generic. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-05-16 16:39:13 +02:00
Zack Cerza	9b4339a2ba	purge-docker-cluster.yml: Default lvm_volumes We were failing when that variable is unset; purge-cluster.yml contains this workaround. Signed-off-by: Zack Cerza <zack@redhat.com>	2019-05-16 16:39:13 +02:00
Boris Ranto	2f141a6e80	Merge cephmetrics/dashboard-ansible repo This commit will merge dashboard-ansible installation scripts with ceph-ansible. This includes several new roles to setup ceph-dashboard and the underlying technologies like prometheus and grafana server. Signed-off-by: Boris Ranto & Zack Cerza <team-gmeno@redhat.com> Co-authored-by: Zack Cerza <zcerza@redhat.com> Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-05-16 16:39:13 +02:00
wumingqiao	5320aa11c4	shrink_osd: mark all osd(s) out in one command Signed-off-by: wumingqiao <wumingqiao@beyondcent.com>	2019-05-15 16:04:28 +02:00
Dimitri Savineau	168d7cd016	purge-docker-cluster: remove docker data We never clean the content of /var/lib/docker so we can still have some data present in this directory after run the purge playbook. Pip isn't used anymore. Also update the docker package name (especially the python binding one). Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-05-14 10:55:43 +02:00
Dimitri Savineau	ea1f8f551c	gather-ceph-logs: fix logs list generation The shell module doesn't have a stdout_lines attributes. Instead of using the shell module, we can use the find modules. Also adding `become: false` to the local tmp directory creation otherwise we won't have enough right to fetch the files into this directory. Resolves: #3966 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-05-13 16:23:05 +02:00
Mike Christie	d7ef12910e	igw: Fix rolling update service ordering We must stop tcmu-runner after the other rbd-target-* services because they may need to interact with tcmu-runner during shutdown. There is also a bug in some kernels where IO can get stuck in the kernel and by stopping rbd-target-* first we can make sure all IO is flushed. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1659611 Signed-off-by: Mike Christie <mchristi@redhat.com>	2019-05-10 09:40:52 +02:00
Rishabh Dave	6e8fb2b3ea	remove infrastructure-playbooks/rgw-standalone.yml We don't need infrastructure-playbooks/rgw-standalone.yml since site.yml.sample and site-cotainer.yml.sample can add a new RGW node to an already deployed Ceph cluster. Signed-off-by: Rishabh Dave <ridave@redhat.com>	2019-05-07 13:05:17 +02:00
letterwuyu	d57f6fcdc6	Fix comment content Signed-off-by: lishuhao letterwuyu@gmail.com	2019-05-07 10:54:44 +02:00
Dimitri Savineau	f1048627ea	rolling_update: restart all ceph-iscsi services Currently only rbd-target-gw service is restarted during an update. We also need to restart tcmu-runner and rbd-target-api services during the ceph iscsi upgrade. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1659611 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-04-24 07:47:23 +00:00
Rishabh Dave	739a662c80	improve coding style Keywords requiring only one item shouldn't express it by creating a list with single item. Signed-off-by: Rishabh Dave <ridave@redhat.com>	2019-04-23 15:37:07 +02:00
Guillaume Abrioux	7eb42c9e8e	update: ensure tasks are executed on an upgraded mon These tasks must be run from a monitor which is upgraded otherwise it might fail. See: https://tracker.ceph.com/issues/39355 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-18 11:16:11 +02:00
Guillaume Abrioux	ed84325b1d	update: ensure ceph command returns 0 these commands could return something else than 0. Let's ensure all retries have been done before actually failing. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-18 11:16:11 +02:00
Guillaume Abrioux	543d1e2e41	update: set osd flags before upgrading any mon Typical error: ``` failed: [mon0 -> mon2] (item=noout) => changed=true cmd: - ceph - --cluster - ceph - osd - set - noout delta: '0:00:00.293756' end: '2019-04-17 06:31:57.552386' item: noout msg: non-zero return code rc: 1 start: '2019-04-17 06:31:57.258630' stderr: \|- Traceback (most recent call last): File "/bin/ceph", line 1222, in <module> retval = main() File "/bin/ceph", line 1146, in main sigdict = parse_json_funcsigs(outbuf.decode('utf-8'), 'cli') File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 788, in parse_json_funcsigs cmd['sig'] = parse_funcsig(cmd['sig']) File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 728, in parse_funcsig raise JsonFormat(s) ceph_argparse.JsonFormat: unknown type CephBool stderr_lines: - 'Traceback (most recent call last):' - ' File "/bin/ceph", line 1222, in <module>' - ' retval = main()' - ' File "/bin/ceph", line 1146, in main' - ' sigdict = parse_json_funcsigs(outbuf.decode(''utf-8''), ''cli'')' - ' File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 788, in parse_json_funcsigs' - ' cmd[''sig''] = parse_funcsig(cmd[''sig''])' - ' File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 728, in parse_funcsig' - ' raise JsonFormat(s)' - 'ceph_argparse.JsonFormat: unknown type CephBool' stdout: '' stdout_lines: <omitted> ``` Having mixed versions of monitors seems to cause this error. Moving these tasks before any monitor gets upgraded seems to be enough to get around this issue. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-18 11:16:11 +02:00
Andrew Schoen	e2529dcd7f	rolling_update: ceph commands should use --cluster Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2019-04-18 10:55:11 +02:00
Andrew Schoen	67453853ff	rolling_update: set num_osds to the number of running osds We do this so that the ceph-config role can most accurately report the number of osds for the generation of the ceph.conf file. We don't want to use ceph-volume to determine the number of osds because in an upgrade to nautilus ceph-volume won't be able to accurately count osds created by ceph-disk. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2019-04-18 10:55:11 +02:00
Andrew Schoen	28c47e4d1b	rolling_update: migrate ceph-disk osds to ceph-volume When upgrading to nautlius run ``ceph-volume simple scan`` and ``ceph-volume simple activate --all`` to migrate any running ceph-disk osds to ceph-volume. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1656460 Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2019-04-18 10:55:11 +02:00
Rishabh Dave	d5967af7fb	allow adding a monitor to a deployed cluster Add a playbook that deploys a new monitor on a new node, adds that node to the Ceph cluster and the monitor to the quorum and updates the ceph configuration file on OSD nodes. Signed-off-by: Rishabh Dave <ridave@redhat.com>	2019-04-15 10:00:50 +02:00
Dimitri Savineau	eb658b3af6	purge-cluster: remove python-ceph-argparse package When using purge-cluster playbook with nautilus, there's still the python-ceph-argparse package installed on the host preventing to reinstall a ceph cluster with a different version (like luminous or mimic) Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-04-15 09:15:08 +02:00
Dimitri Savineau	150acba8c5	switch-from-non-containerized: stop all osds `e6bfb84` introduced a regression in the switch from non containerized to container deployment. We need to stop all previous OSDs services. We just don't need the ceph-disk pattern in the regex. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-04-11 16:26:53 -04:00
Guillaume Abrioux	a1254d767c	purge: remove references to ceph-disk as of stable-4.0, ceph-disk is no longer supported. These tasks aren't needed anymore. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-11 11:57:02 -04:00
Guillaume Abrioux	73aa788459	shrink-osd: remove legacy playbook as of stable-4.0, ceph-disk is no longer supported. Let's remove this legacy version of the playbook. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-11 11:57:02 -04:00
Guillaume Abrioux	e6bfb843f4	switch_to_containers: remove ceph-disk references as of stable-4.0, ceph-disk is no longer supported. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-11 11:57:02 -04:00
Guillaume Abrioux	4d35e9eeed	osd: remove variable osd_scenario As of stable-4.0, the only valid scenario is `lvm`. Thus, this makes this variable useless. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-11 11:57:02 -04:00
Guillaume Abrioux	c1e4529b0e	update: fix undefined error when no mgr group is declared if mgr group isn't defined in inventory, that task will fail with undefined error. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-11 09:22:35 +02:00
Dimitri Savineau	57b4e76d11	rolling_update: Remove ceph aliases ceph aliases have been introduced in stable-3.2 during the ceph deployment. On master this has been removed but we don't handle this removal in the upgrade from stable-3.2 to master via the rolling_update playbook. Also remove the task from purge-docker-cluster missing from `d9e7835` Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-04-09 16:50:10 +02:00
Rishabh Dave	192fea0fec	add-osds: don't hardcode group names Instead of hardcoding group names, import ceph-defaults earlier. Also, rectify a minor mistake in vagrant_varaibles.yml for containerized version of add_osds. Signed-off-by: Rishabh Dave <ridave@redhat.com>	2019-04-08 10:09:00 +02:00
Guillaume Abrioux	0180738313	purge: fix lvm-batch purge osd `lvm_volumes` and/or `devices` variable(s) can be undefined depending on the scenario chosen. These tasks should be run only if these variable are defined, otherwise it ends up with undefined variable errors. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1653307 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-02 14:56:24 +02:00
Dimitri Savineau	7cc626b72d	purge-docker-cluster: Remove ceph-osd service The systemd ceph-osd@.service file used for starting the ceph osd containers is used in all osd_scenarios. Currently purging a containerized deployment using the lvm scenario didn't remove the ceph-osd systemd service. If the next deployment is a non-containerized deployment, the OSDs won't be online because the file is still present and override the one from the package. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-03-28 18:26:38 +00:00
Guillaume Abrioux	f55e2b08be	remove all NBSPs on master branch Similar to #3658 Since there's too many changes between master and stable branches let's commit directly in each branches instead of trying to backport this commit. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-28 11:57:55 +00:00
Dimitri Savineau	c8442f3705	rolling_update: Update systemd unit regex for nvme The systemd unit regex doesn't handle nvme devices (/dev/nvmeXn1). Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1687828 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-03-26 12:01:00 +00:00
Guillaume Abrioux	78aac3e96a	update: followup on `edfdc49` all rgw instances should be stopped according to the multiple rgw instances support added in rolling_update.yml Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-25 16:02:56 -04:00
Guillaume Abrioux	f6e0185146	update: add containerized deployment upgrade support (L->N) Add a couple of fixes to allow containerized deployments upgrade support to upgrade from luminous/mimic to nautilus. - pass CEPH_CONTAINER_IMAGE and CEPH_CONTAINER_BINARY environment variable to the ceph_key module, - fix the docker exec command in 'waiting for the containerized monitor to join the quorum' task according to the `delegate_to` parameter, - override `docker_exec_cmd` in `ceph-facts` with `mon_host` when rolling_update is `True`, - do not run unnecessarily `create_mds_filesystems.yml` when performing an upgrade. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-25 16:02:56 -04:00
Guillaume Abrioux	1816b876ee	update: add missing hosts in facts gathering iscsigws were missing. The 'complete upgrade' couldn't complete because rolling_update was set to False for iscsigw nodes. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-25 16:02:56 -04:00
Guillaume Abrioux	45ba90c169	update: remove rbdmirror legacy task This task is no longer needed for next release. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-25 16:02:56 -04:00
Guillaume Abrioux	0ea0adf039	update: show all daemons version at the end Let's display all daemons version at the end of the playbook. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-25 16:02:56 -04:00
Guillaume Abrioux	f31d6d9485	update: enable new nautilus-only functionality once the cluster is upgraded to nautilus, we can complete the process by disallowing pre-nautilus OSDs and enabling all new nautilus-only functionality Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-25 16:02:56 -04:00
Guillaume Abrioux	afdaa70a63	update: enable msgr2 protocol This commit enable the msgr2 protocol when the cluster is fully upgraded to nautilus Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-25 16:02:56 -04:00
Guillaume Abrioux	ef096dd021	update: ensure mgrs are upgraded after ALL monitors As of `1c760904b0`, ceph-ansible implicitly bootstrap managers on monitors. mgrs must be upgraded only after all monitors, therefore, this commit refact the way mgrs are upgraded to be sure we don't upgrade a mgr during the monitors upgrade. This commit also ensure we handle the case were we split managers on dedicated nodes. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-25 16:02:56 -04:00
Guillaume Abrioux	7fa2434f0f	update: ensure /var/lib/ceph/bootstrap-rbd-mirror is present This directory is created by ceph-config node by node. In the upgrade context we need it to be created on ALL monitors as soon as the first iteration because of the task right after which creates and sends the keyrings on all monitors. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-25 16:02:56 -04:00
Guillaume Abrioux	82764afe8d	update: mask systemd service units during upgrade This prevents the packaging from restarting services before we do need to restart them in the rolling update sequence. We want to handle services restart at rolling_update playbook. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-25 16:02:56 -04:00
Guillaume Abrioux	8add55451c	update: set osd flags only once There is no need to set osd flags (noout, norebalance) each time we upgrade a mon. This commit moves up those tasks (before stopping the mon) so we don't need to delegate them. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-25 16:02:56 -04:00
Guillaume Abrioux	f7c6f4e0b6	update: fix tasks waiting for the node to join the quorum We actually want to ensure the node being upgraded is joining the quorum instead of the monitor picked up earlier. Indeed, the `mon_host`is used only in `delegate_to:` so we can still run ceph commands while the monitor being upgraded is stopped. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-25 16:02:56 -04:00
Guillaume Abrioux	32569b79e2	update: remove an old parameter in ceph_key module call the `containerized` parameter in ceph_key module doesn't exist anymore. This was making the module failing but was hidden because of the `ignore_errors: True`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-25 16:02:56 -04:00
Dimitri Savineau	b23c05ae52	add-osd.yml: Add become flag for ceph-validate The check_devices task fails if the ceph-validate role isn't executed as a privileged user (Permission denied). failed: [osd0] (item=/dev/sdb) => {"changed": false, "err": "Error: Error opening /dev/sdb: Permission denied\n", "item": "/dev/sdb", "msg": "Error while getting device information with parted script: '/sbin/parted -s -m /dev/sdb -- unit 'MiB' print'", "out": "", "rc": 1} Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-03-09 05:54:46 +00:00
Guillaume Abrioux	a440878533	add-osd: gather facts in second part of playbook otherwise, it will end up with error like following: ``` FAILED! => {"msg": "'ansible.vars.hostvars.HostVarsVars object' has no attribute 'ansible_hostname'"} ``` because facts won't have been gathered. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1670663 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-04 14:44:27 +01:00
Guillaume Abrioux	47ebef374f	purge: fix rbd-mirror group name the default is rbdmirrors in ceph-defaults Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-01 20:31:14 +00:00
Guillaume Abrioux	a915308477	purge: fix rbd mirror purge as of `b70d54ac80` the service launched isn't ceph-rbd-mirror@admin.service. it's now `ceph-rbd-mirror@rbd-mirror.{{ ansible_hostname }}` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-01 20:31:14 +00:00
Guillaume Abrioux	3849f30f58	purge: do not remove /var/lib/apt/lists/* removing the content of this directory seems a bit agressive and cause a redeployment to fail after a purge on debian based distrubition. Typical error: ``` fatal: [mon0]: FAILED! => changed=false attempts: 3 msg: No package matching 'ceph' is available ``` The following task will consider the cache is still valid, so apt doesn't refresh it: ``` - name: update apt cache if cache_valid_time has expired apt: update_cache: yes cache_valid_time: 3600 register: result until: result is succeeded ``` since the task installing ceph packages has a `update_cache: no` it fails: ``` - name: install ceph for debian apt: name: "{{ debian_ceph_pkgs \| unique }}" update_cache: no state: "{{ (upgrade_ceph_packages\|bool) \| ternary('latest','present') }}" default_release: "{{ ceph_stable_release_uca \| default('') }}{{ ansible_distribution_release ~ '-backports' if ceph_origin == 'distro' and ceph_use_distro_backports else '' }}" register: result until: result is succeeded ``` /tmp/* isn't specific to ceph as well, so we shouldn't remove everything in this directory. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-01 20:31:14 +00:00
Guillaume Abrioux	89f77589fa	purge: fix purge of lvm devices using `shell` module seems to be the only way to make this task working on rhel based distribution AND debian based distributions. on ubuntu, using `command` ansible module fails like following (not due to `sudo` usage or not): ``` ok: [osd1] => changed=false cmd: command -v ceph-volume failed_when_result: false msg: '[Errno 2] No such file or directory: ''command'': ''command''' rc: 2 ``` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1653307 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-01 20:31:14 +00:00
Guillaume Abrioux	69310a5cd6	switch_to_containers: support multiple rgw instances per host add multiple rgw instances per host in switch_to_containers playbook. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-02-13 09:42:27 +01:00
Guillaume Abrioux	70f1eea9b2	switch_to_containers: remove non-containerized systemd unit files remove old systemd unit files (non-containerized) during the switch_to_containers transition. We have seen sometimes the unit started is the old one instead of the new systemd unit generated. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-02-13 09:42:27 +01:00
Guillaume Abrioux	4064035a54	switch_to_containers: use ceph binary from container use the ceph binary from the container instead of the host. If the ceph CLI version isn't compatible between host and container image, it can cause the CLI to hang. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-02-13 09:42:27 +01:00
Guillaume Abrioux	7e0a70f7a8	switch_to_containers: do not try to redeploy monitors `ceph-mon` tries to redeploy monitors because it assumes it was not yet deployed since `mon_socket_stat` and `ceph_mon_container_stat` are undefined (indeed, we stop the daemon before calling `ceph-mon` in the switch_to_containers playbook). Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-02-13 09:42:27 +01:00
John Fulton	37b5d1084a	Make python print statements python3 compatible The restart_osd_daemon.sh generated from the j2 template contains a python call which uses 'print x' instead of 'print(x)'. Add the missing parentheses to make this call compatible with both 2 and 3. Also add parentheses to other python print calls found in roles/ceph-client/defaults/main.yml and infrastructure-playbooks/cluster-os-migration.yml. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1671721 Signed-off-by: John Fulton <fulton@redhat.com>	2019-02-01 15:23:27 +00:00
Noah Watkins	9a43674d2e	shrink_osd: use cv zap by fsid to remove parts/lvs Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1569413 https://bugzilla.redhat.com/show_bug.cgi?id=1572933 Signed-off-by: Noah Watkins <noahwatkins@gmail.com>	2019-01-24 16:34:13 +01:00
Guillaume Abrioux	edfdc49488	rolling_update: support multiple rgw instance `1ac94c048f` introduced the support of multiple rgw instances on a single host but somehow has missed to implement this feature in rolling_update. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-01-22 13:45:38 +01:00
Giulio Fidente	ff8dbe114c	Preserve rolling_update backward compatibility with ansible < 2.5 Signed-off-by: Giulio Fidente <gfidente@redhat.com>	2019-01-21 14:05:45 +01:00
guihecheng	1ac94c048f	rgw: add support for multiple rgw instances on a single host With this, we could have multiple rgw instances on a single host with a single run, don't have to use rgw-standalone.yml which does not seems able to bind ports separately. If you want to have multiple rgw instances, just change 'radosgw_instances' to the number you want, which defaults to 1. Not compatible with Multi-Site yet. Signed-off-by: guihecheng <guihecheng@cmiot.chinamobile.com>	2019-01-18 11:12:28 +01:00
Guillaume Abrioux	268f2cef82	update: do not enforce `serial: 1` on client nodes There is no need to enforce `serial: 1` on client nodes. Let's make it parameterizable by introducing a new extra variable `client_update_batch`, if not filled this will default to `{{ ansible_forks }}`. NOTE: this is only usable as an extra variable passed with `-e client_update_batch=<num>` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1650184 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-01-02 16:55:08 +00:00
Daniel-Pivonka	ba149972be	Example ceph_add_users_buckets playbook This is example playbook will show how to bulk add rgw users and buckets Signed-off-by: Daniel-Pivonka <dpivonka@redhat.com>	2018-12-20 14:23:25 +01:00
Guillaume Abrioux	d7e77012ef	retry on packages and repositories failures add register/until on all packaging related tasks to avoid non valid CI failure. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-12-19 14:48:27 +00:00
Noah Watkins	110049e825	playbook: report storage device inventory Signed-off-by: Noah Watkins <nwatkins@redhat.com>	2018-12-18 10:51:31 +01:00
Andrew Schoen	ffd56177e7	purge-cluster: skip tasks that use ceph-volume if it's not installed This will allow the playbook to be idempotent. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1656935 Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-12-13 11:27:27 +01:00
Guillaume Abrioux	a12de3e048	purge-container: move facts gathering after ceph-defaults role import This task has to be called after the role `ceph-defaults` has been played, otherwise, `mon_group_name` will never be known. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-12-12 16:50:24 +00:00
Guillaume Abrioux	d0b3cb7f85	purge-container: fix wrong syntax we want a default value for `mon_group_name`, not for `groups[mon_group_name]`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-12-12 11:33:57 +01:00
Guillaume Abrioux	0eb56e36f8	introduce new role ceph-facts sometimes we play the whole role `ceph-defaults` just to access the default value of some variables. It means we play the `facts.yml` part in this role while it's not desired. Splitting this role will speedup the playbook. Closes: #3282 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-12-12 11:18:01 +01:00
Guillaume Abrioux	ae7f3d66a6	purge-docker: do not call ceph-osd role calling ceph-osd role in purge playbook is not needed. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-12-11 09:59:25 +01:00
Guillaume Abrioux	1a4a6ec855	purge: gather monitors facts in OSD purge the OSD part of the purge delegates commands on monitor node, we need to gather monitors facts to know the `ansible_hostname` fact that is used in the `docker_exec_cmd` fact. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-12-11 09:59:25 +01:00
Sébastien Han	62111ff53c	purge-container: gather fact before calling ceph-defaults ceph-defaults relies on facts so we must gather facts before running it. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-12-11 09:59:25 +01:00
Sébastien Han	fc6ebd8ebb	purge-cluster: add support for mon/mgr collocation Recently we introduced the default collocation of mon/mgr without the need of a dedicated mgrs section. This means we have to stop the mgr process on that machine too. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-12-11 09:59:25 +01:00
Sébastien Han	3a154fa0ad	purge-cluster: remove support for other init system We only support systemd and use the service module anyway. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-12-11 09:59:25 +01:00
Sébastien Han	325a159415	purge-docker-cluster: add support for mgr/mon collocation Recently we introduced the collocation of mon and mgr by default, so we don't need to have an explicit mgrs section for this. This means we have to remove the mgr container on the mon machines too. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-12-11 09:59:25 +01:00
Sébastien Han	2bcc00896f	purge-docker-cluste: add a task to check hosts It's useful when running on CI to see what might remain on the machines. So we list all the containers and images. We expect the list to be empty. We fail if we see containers running. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-12-11 09:59:25 +01:00
Sébastien Han	1751885bc9	purge-docker-cluster: add ceph-volume support This commits adds the support for purging cluster that were deployed with ceph-volume. It also separates nicely with a block intruction the work to do when lvm is used or not. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-12-11 09:59:25 +01:00
Rishabh Dave	2fb12ae554	use pre_tasks and post_tasks when necessary Signed-off-by: Rishabh Dave <ridave@redhat.com>	2018-12-05 08:17:10 +00:00
Rishabh Dave	e4f0af2b78	don't use private option for import_role Since sharing variables amongst roles has been made default since Ansible 2.6, private option has been deprecated; so stop using it. Signed-off-by: Rishabh Dave <ridave@redhat.com>	2018-12-04 23:45:59 +00:00
Ramana Raja	cb784c601d	rolling_update: fail if less than 3 MONs ... for non-containerized deployments as well. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1655470 Signed-off-by: Ramana Raja <rraja@redhat.com>	2018-12-04 14:28:49 +00:00
Sébastien Han	896676ee80	fix json data type Json is a type structure which is always typed as a string, where before this we were declaring a dict, which is not a json valid structure. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-12-04 12:34:54 +01:00
Guillaume Abrioux	78116fa6db	purge: add iscsi support add iscsi support for both non containerized and containerized deployment in purge playbooks. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1651054 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-12-03 17:35:21 +01:00
Sébastien Han	1c760904b0	site: collocated mon and mgr by default This will speed up the deployment and also deploy mon and mgr collocated just as recommended. This won't prevent you of adding more and dedicaded machines for mgr if needed. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-12-03 14:39:43 +01:00
Sébastien Han	bb7bfca113	rolling-update: remove old condition This failure condition was only valid at the time where clusters didn't have ceph-mgr activated. Now since we collocate the ceph-mgr with the mon by default, if the daemon wasn't present it will be created during the upgrade. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-12-03 14:39:43 +01:00
Guillaume Abrioux	a952122c38	rolling_update: create missing keyring only on running mon try to create the potentially missing keys only on monitors that are actually running. The current node being played is stopped before this task. By the way, delegating the command on all nodes but the current node being played ensures that the generated keys will be present on all monitors. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-11-29 16:40:46 +00:00
Sébastien Han	61fb6972ec	rolling_update: default ceph json output to empty dict So we can avoid the following failure: The conditional check 'hostvars[mon_host]['ansible_hostname'] in (ceph_health_raw.stdout \| from_json)["quorum_names"] or hostvars[mon_host]['ansible_fqdn'] in (ceph_health_raw.stdout \| from_json)["quorum_names"] ' failed. The error was: No JSON object could be decoded We just need to set a default, the next iteration will have a more complete json since the command won't fail. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-29 10:46:15 +00:00
Guillaume Abrioux	73287f91bc	mgr: fix mgr keyring error on rolling_update when upgrading from RHCS 2.5 to 3.2, it fails because the task `create ceph mgr keyring(s) when mon is containerized` has a when condition `inventory_hostname == groups[mon_group_name]\|last`. First, this is incorrect because `inventory_hostname` is referring to a mgr node, it means this condition would have never been satisfied. Then, this condition + `serial: 1` makes the mgr keyring creating skipped on the first node. Further, the `ceph-mgr` role tries to copy the mgr keyring (it's not aware we are running `serial: 1`) this leads to a failure like the following: ``` TASK [ceph-mgr : copy ceph keyring(s) if needed] ************************************************************************************************************************************************************************************************************************************************************************* task path: /usr/share/ceph-ansible/roles/ceph-mgr/tasks/common.yml:10 Tuesday 27 November 2018 12:03:34 +0000 (0:00:00.296) 0:11:01.290 **** An exception occurred during task execution. To see the full traceback, use -vvv. The error was: AnsibleFileNotFound: Could not find or access '~/ceph-ansible-keys/48d78ac1-e0d6-4e35-ab3e-772aea7828fc//etc/ceph/local.mgr.magna021.keyring' failed: [magna021] (item={u'dest': u'/var/lib/ceph/mgr/local-magna021/keyring', u'name': u'/etc/ceph/local.mgr.magna021.keyring', u'copy_key': True}) => {"changed": false, "item": {"copy_key": true, "dest": "/var/lib/ceph/mgr/local-magna021/keyring", "name": "/etc/ceph/local.mgr.magna021.keyring"}, "msg": "Could not find or access '~/ceph-ansible-keys/48d78ac1-e0d6-4e35-ab3e-772aea7828fc//etc/ceph/local.mgr.magna021.keyring'"} ``` The ceph_key module is idempotent, so there is no need to have such a condition. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1649957 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-11-27 18:19:56 +01:00
Sébastien Han	e5d5dffeb5	shrink-osd: add missing CEPH_BINARY We need to add the right binary to do the docker exec. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-27 16:47:40 +00:00
Sébastien Han	4f57e44f9c	defaults: declare container_binary Always declare container_binary and assign it a correct value. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-27 16:47:40 +00:00
Sébastien Han	49e0e19056	rolling_update: update ceph_key task for container Use the new way to create keys on containerized env as introduced by: 1098b71bda90db3dad19ac179f0ba900ccb0f953 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-27 16:47:40 +00:00
Sébastien Han	2814d36c93	infra playbooks: use the right container binary Use podman or docker wether they are available or not. podman will be prioritized over docker if present. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-27 16:47:40 +00:00
Guillaume Abrioux	7c99b6df6d	update: fix a typo `hostvars[groups[mon_host]]['ansible_hostname']` seems to be a typo. That should be `hostvars[mon_host]['ansible_hostname']` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-11-26 18:22:20 +01:00
Guillaume Abrioux	af78173584	rolling_update: refact set_fact `mon_host` each monitor node should select another monitor which isn't itself. Otherwise, one node in the monitor group won't set this fact and causes failure. Typical error: ``` TASK [create potentially missing keys (rbd and rbd-mirror) when mon is containerized] * task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-dev-update_docker_cluster/rolling_update.yml:200 Thursday 22 November 2018 14:02:30 +0000 (0:00:07.493) 0:02:50.005 *** fatal: [mon1]: FAILED! => {} MSG: The task includes an option with an undefined variable. The error was: 'dict object' has no attribute u'mon2' ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-11-26 18:22:20 +01:00
Sébastien Han	4e267bee4f	rolling_update: create rbd and rbd-mirror keyrings During an upgrade ceph won't create keys that were not existing on the previous version. So after the upgrade of let's Jewel to Luminous, once all the monitors have the new version they should get or create the keys. It's ok to have the task fails, especially for the rbd-mirror key, which only appears in Nautilus. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1650572 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-26 18:22:20 +01:00
Sébastien Han	c14f9b78ff	switch: do not look for devices anymore It's easier lookup a directoriy instead of the block devices, especially because of ceph-volume and ceph-disk have a different way to handle devices. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-23 07:56:23 +00:00
Sébastien Han	cd56dad9fa	switch: disable all ceph units Prior to this commit we were only disabling ceph-osd units, but forgot the ceph.target which is controlling everything and will restart the ceph-osd units at each reboot. Now that everything gets disabled there won't be any conflicts between the old non-container and the new container units. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-23 07:56:23 +00:00
Sébastien Han	fe1d09925a	switch: do not mask systemd unit If we mask it we won't be able to start the OSD container since now the osd container use the osd ID as a name such as: ceph-osd@0 Fixes the error: Failed to execute operation: Cannot send after transport endpoint shutdown Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-23 07:56:23 +00:00
Guillaume Abrioux	c783bc70da	docker-common: rename role rename `ceph-docker-common` role to `ceph-container-common` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-11-12 10:51:48 +01:00
Rishabh Dave	90f222f6a5	add quotes around package names added in `da6f384` Add quotes around package names added in the commit `da6f384223` so that the difference between the Ansible variables and package names is clear. Signed-off-by: Rishabh Dave <ridave@redhat.com>	2018-11-09 12:59:08 +00:00
Rishabh Dave	d72340abbe	pass the list of packages to package management modules Instead of looping over a list of packages or repeating the task separately for different packages, pass the list of packages to the task performing package management. Signed-off-by: Rishabh Dave <ridave@redhat.com>	2018-11-09 12:59:08 +00:00
Sébastien Han	53910de43b	ceph_key: add fetch_initial_keys capability This is needed for Nautilus since the ceph-create-keys script goes away. (https://github.com/ceph/ceph/pull/21305) Now the module if called with 'state: fetch_initial_keys' will lookup keys generated by the monitor and write them down on the filesystem to the right location (/etc/ceph and /var/lib/ceph/boostrap*). This is not applicable to container since keys are generated by the container only. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-09 12:45:52 +01:00
Mike Christie	b523a44a1a	igw: stop tcmu-runner on iscsi purge When the iscsi purge playbook is run we stop the gw and api daemons but not tcmu-runner which I forgot on the previous PR. Fixes Red Hat BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1621255 Signed-off-by: Mike Christie <mchristi@redhat.com>	2018-11-09 10:02:16 +01:00
Noah Watkins	b848d2be4c	don't use "role" or "roles" to include roles see `3f62fc585f` Signed-off-by: Noah Watkins <nwatkins@redhat.com>	2018-11-08 17:45:37 +01:00
Noah Watkins	9c47950961	Fix comments in shrink-osd-ceph-disk playbook Signed-off-by: Noah Watkins <nwatkins@redhat.com>	2018-11-08 17:45:37 +01:00
Noah Watkins	f5dacbf7de	Add a ceph-volume aware shrink-osd playbook Signed-off-by: Noah Watkins <nwatkins@redhat.com>	2018-11-08 17:45:37 +01:00
Noah Watkins	0782cfc546	Rename ceph-disk version of shrink-osd playbook This will be replaced by a ceph-volume aware verison. Signed-off-by: Noah Watkins <nwatkins@redhat.com>	2018-11-08 17:45:37 +01:00
Sébastien Han	b82995df58	Revert "ceph_key: add fetch_initial_keys capability" This reverts commit `17883e09ba`.	2018-11-08 13:34:47 +00:00
Sébastien Han	17883e09ba	ceph_key: add fetch_initial_keys capability This is needed for Nautilus since the ceph-create-keys script goes away. (https://github.com/ceph/ceph/pull/21305) Now the module if called with 'state: fetch_initial_keys' will lookup keys generated by the monitor and write them down on the filesystem to the right location (/etc/ceph and /var/lib/ceph/boostrap*). This is not applicable to container since keys are generated by the container only. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-08 13:32:18 +00:00

... 3 4 5 6 7 ...

784 Commits (subset-debug-master)