ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Guillaume Abrioux	ba6bd3ca3d	docker2podman: call `container_options_facts.yml` on osd nodes We must call `ceph-osd` role from `container_options_facts.yml` because ceph-osd-run.sh.j2 needs variables set in this file. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1819681 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `4a4f54f6ee`)	2020-04-02 11:01:14 -04:00
Guillaume Abrioux	32f879de32	purge-container: get all osds id Adding `--all` to the `systemctl list-units` command in order to get all osds id on the node (including stoppped osds). Otherwise, it will purge the cluster but there will be leftover after that. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1814542 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `5e7962ccf6`)	2020-03-31 11:00:41 -04:00
Dimitri Savineau	e2f1a0ade8	doc: update infra playbooks statements We don't need to copy the infrastructure playbooks in the root ceph-ansible directory. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `195944b123`)	2020-03-16 14:43:35 +01:00
Dimitri Savineau	957156c0fe	filestore-to-bluestore: stop ceph-volume services We only disable the ceph-osd services but not the ceph-volume lvm services during the filestore to bluestore migration. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `38a683e5bf`)	2020-03-12 21:10:33 +01:00
Dimitri Savineau	928c792f8d	filestore-to-bluestore: reuse dedicated journal If the filestore configuration was using a dedicated journal with either a partition or a LV/VG then we need to reuse this for bluestore DB. When filestore is using a raw devices then we shouldn't destroy everything (data + journal) but only data otherwise the journal partition won't exist anymore. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1790479 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `535da53d69`)	2020-03-12 21:10:33 +01:00
Dimitri Savineau	3b0ee83594	shrink-rbdmirror: fix presence after removal We should add retry/delay to check the presence of the rbdmirror daemon in the cluster status because the status takes some time to be updated. Also the metadata.hostname isn't a good key to check because it doesn't reflect the ansible_hostname fact. We should use metadata.id instead. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `d1316ce77b`)	2020-03-03 15:19:45 +01:00
Dimitri Savineau	4b07d97346	shrink-mgr: fix systemd condition This playbook was using mds systemd condition. Also a command task was using pipeline which is not allowed. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `a664159061`)	2020-03-03 15:19:45 +01:00
Dimitri Savineau	92b671bcbe	shrink: don't use localhost node The ceph-facts are running on localhost so if this node is using a different OS/release that the ceph node we can have a mismatch between docker/podman container binary. This commit also reduces the scope of the ceph-facts role because we only need the container_binary tasks. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `08ac2e3034`)	2020-03-03 15:19:45 +01:00
Dimitri Savineau	e037e99bd2	purge: stop rgw instances by iteration It looks like that the service module doesn't support wildcard anymore for stopping/disabling multiple services. fatal: [rgw0]: FAILED! => changed=false msg: 'This module does not currently support using glob patterns, found '''' in service name: ceph-radosgw@' ...ignoring Instead we should iterate over the rgw_instances list. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `9d3b49293d`)	2020-03-03 10:31:48 +01:00
Guillaume Abrioux	5a51bd12dc	common: support OSDs with more than 2 digits When running environment with OSDs having ID with more than 2 digits, some tasks don't match the system units and therefore, playbook can fail. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1805643 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `a084a2a347`)	2020-02-28 11:06:47 -05:00
Guillaume Abrioux	d254a8b938	shrink-osd: support shrinking ceph-disk prepared osds This commit adds the ceph-disk prepared osds support Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1796453 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `1de2bf9991`)	2020-02-26 18:16:48 +01:00
Guillaume Abrioux	21851457d6	shrink-osd: don't run ceph-facts entirely We need to call ceph-facts only for setting `container_binary`. Since this task has been isolated we can use `tasks_from` to only execute the needed task. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `55970b18f1`)	2020-02-26 18:16:48 +01:00
Benoît Knecht	10b3bb2727	infrastructure-playbooks: Run shrink-osd tasks on monitor Instead of running shring-osd tasks on localhost and delegating most of them to the first monitor, run all of them on the first monitor directly. This has the added advantage of becoming root on the monitor only, not on localhost. Signed-off-by: Benoît Knecht <bknecht@protonmail.ch> (cherry picked from commit `8b3df4e418`)	2020-02-24 16:51:33 -05:00
Guillaume Abrioux	1d2a395aaf	switch_to_containers: increase health check values This commit increases the default values for the following variable consumed in switch-from-non-containerized-to-containerized-ceph-daemons.yml playbook. This also moves these variables in `ceph-defaults` role so the user can set different values if needed. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1783223 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `3700aa5385`)	2020-02-10 12:57:17 -05:00
Guillaume Abrioux	cdc3e10cf3	purge/update: remove backward compatibility legacy This was introduced in 3.1 and marked as deprecation We can definitely drop it in stable-4.0 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `0441812959`)	2020-02-03 09:33:05 -05:00
Guillaume Abrioux	5c3ba0787c	switch_to_containers: exclude clients nodes from facts gathering just like site.yml and rolling_update, let's exclude clients node from the fact gathering. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `332c39376b`)	2020-02-03 09:32:20 -05:00
Dimitri Savineau	487be2675a	filestore-to-bluestore: skip bluestore osd nodes If the OSD node is already using bluestore OSDs then we should skip all the remaining tasks to avoid purging OSD for nothing. Instead we warn the user. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1790472 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `83c5a1d7a8`)	2020-02-03 15:16:51 +01:00
Guillaume Abrioux	675b6788f4	update: remove legacy tasks These tasks should have been removed with backport #4756 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1793564 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-02-03 15:16:13 +01:00
wujie1993	dcd4b2955a	purge: fix purge cluster failed Fix purge cluster failed when local container images does not exist. Purge node-exporter and grafana-server only when dashboard_enabled is set to True. Signed-off-by: wujie1993 qq594jj@gmail.com (cherry picked from commit `d8b0b3cbd9`)	2020-02-03 15:14:56 +01:00
Dimitri Savineau	f982a70f02	filestore-to-bluestore: fix undefine osd_fsid_list If the playbook is used on a host running bluestore OSDs then the osd_fsid_list won't be filled because the bluestore OSDs are reported with 'type: block' via ceph-volume lvm list command but we are looking for 'type: data' (filestore). TASK [zap ceph-volume prepared OSDs] ********* fatal: [xxxxx]: FAILED! => msg: '''osd_fsid_list'' is undefined Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1729267 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `cd76054f76`)	2020-01-28 22:21:49 -05:00
Dimitri Savineau	0a2927ce5e	filestore-to-bluestore: don't fail when with no PV When the PV is already removed from the devices then we should not fail to avoid errors like: stderr: No PV found on device /dev/sdb. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1729267 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `a9c2300545`)	2020-01-24 16:14:47 -05:00
Guillaume Abrioux	fd217d9f08	rolling_update: support upgrading 3.x + ceph-metrics on a dedicated node When upgrading from RHCS 3.x where ceph-metrics was deployed on a dedicated node to RHCS 4.0, it fails like following: ``` fatal: [magna005]: FAILED! => changed=false gid: 0 group: root mode: '0755' msg: 'chown failed: failed to look up user ceph' owner: root path: /etc/ceph secontext: unconfined_u:object_r:etc_t:s0 size: 4096 state: directory uid: 0 ``` because we are trying to run `ceph-config` on this node, it doesn't make sense so we should simply run this play on all groups except `[grafana-server]`. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1793885 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `e5812fe45b`)	2020-01-22 18:28:54 +01:00
Dimitri Savineau	0abea70e29	filestore-to-bluestore: fix osd_auto_discovery When osd_auto_discovery is set then we need to refresh the ansible_devices fact between after the filestore OSD purge otherwise the devices fact won't be populated. Also remove the gpt header on ceph_disk_osds_devices because the devices is empty at this point for osd_auto_discovery. Adding the bool filter when needed. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1729267 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `bb3eae0c80`)	2020-01-22 10:06:17 +01:00
Dimitri Savineau	e4965e9ea9	filestore-to-bluestore: --destroy with raw devices We still need --destroy when using a raw device otherwise we won't be able to recreate the lvm stack on that device with bluestore. Running command: /usr/sbin/vgcreate -s 1G --force --yes ceph-bdc67a84-894a-4687-b43f-bcd76317580a /dev/sdd stderr: Physical volume '/dev/sdd' is already in volume group 'ceph-b7801d50-e827-4857-95ec-3291ad6f0151' Unable to add physical volume '/dev/sdd' to volume group 'ceph-b7801d50-e827-4857-95ec-3291ad6f0151' /dev/sdd: physical volume not initialized. --> Was unable to complete a new OSD, will rollback changes Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792227 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `f995b079a6`)	2020-01-21 18:26:55 +01:00
Guillaume Abrioux	0db611ebf8	shrink-mds: fix condition on fs deletion the new ceph status registered in `ceph_status` will report `fsmap.up` = 0 when it's the last mds given that it's done after we shrink the mds, it means the condition is wrong. Also adding a condition so we don't try to delete the fs if a standby node is going to rejoin the cluster. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1787543 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `3d0898aa5d`)	2020-01-15 11:28:12 +01:00
Guillaume Abrioux	2d85fab02d	osd: support scaling up using --limit This commit lets add-osd.yml in place but mark the deprecation of the playbook. Scaling up OSDs is now possible using --limit Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `3496a0efa2`)	2020-01-14 09:12:34 -05:00
Guillaume Abrioux	e034a6da69	docker2podman: use set_fact to override variables play vars have lower precedence than role vars and `set_fact`. We must use a `set_fact` to reset these variables. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `b0c491800a`)	2020-01-10 17:41:27 +01:00
Guillaume Abrioux	02ec088568	docker2podman: force systemd to reload config This is needed after a change is made in systemd unit files. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `1c2ec9fb40`)	2020-01-10 17:41:27 +01:00
Guillaume Abrioux	34c4f5baac	docker2podman: install podman This commit adds a package installation task in order to install podman during the docker-to-podman.yml migration playbook. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `d746575fd0`)	2020-01-10 17:41:27 +01:00
Guillaume Abrioux	4c4b0edfec	update: only run post osd upgrade play on 1 mon There is no need to run these tasks n times from each monitor. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `c878e99589`)	2020-01-10 17:16:51 +01:00
Guillaume Abrioux	6e47e96a02	update: use flags noout and nodeep-scrub only 1. set noout and nodeep-scrub flags, 2. upgrade each OSD node, one by one, wait for active+clean pgs 3. after all osd nodes are upgraded, unset flags Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> Co-authored-by: Rachana Patel <racpatel@redhat.com> (cherry picked from commit `548db78b95`)	2020-01-10 17:16:51 +01:00
Dimitri Savineau	f00ee1244f	purge-iscsi-gateways: don't run all ceph-facts We only need to have the container_binary fact. Because we're not gathering the facts from all nodes then the purge fails trying to get one of the grafana fact. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786686 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `a09d1c38bf`)	2020-01-10 16:21:53 +01:00
Dimitri Savineau	f042ece9af	rolling_update: run registry auth before upgrading There's some tasks using the new container image during the rolling upgrade playbook that needs to execute the registry login first otherwise the nodes won't be able to pull the container image. Unable to find image 'xxx.io/foo/bar:latest' locally Trying to pull repository xxx.io/foo/bar ... /usr/bin/docker-current: Get https://xxx.io/v2/foo/bar/manifests/latest: unauthorized Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `3f344fdefe`)	2020-01-09 20:16:07 -05:00
Dimitri Savineau	84276f2fe3	shrink-rgw: refact global workflow Instead of running the ceph roles against localhost we should do it on the first mon. The ansible and inventory hostname of the rgw nodes could be different. Ensure that the rgw instance to remove is present in the cluster. Fix rgw service and directory path. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `747555dfa6`)	2020-01-09 21:39:23 +01:00
Guillaume Abrioux	6e7fe62ad5	shrink-osd: support fqdn in inventory When using fqdn in inventory, that playbook fails because of some tasks using the result of ceph osd tree (which returns shortname) to get some datas in hostvars[]. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1779021 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `6d9ca6b05b`)	2020-01-08 16:16:21 -05:00
Dimitri Savineau	e4798e22a8	purge-iscsi-gateways: remove node from dashboard When using the ceph dashboard with iscsi gateways nodes we also need to remove the nodes from the ceph dashboard list. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786686 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `931a842f21`)	2020-01-08 19:29:59 +01:00
Guillaume Abrioux	86bb734397	filestore-to-bluestore: umount partitions before zapping them When an OSD is stopped, it leaves partitions mounted. We must umount them before zapping them, otherwise error like "Device is busy" will show up. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1729267 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `8056514134`)	2020-01-08 11:41:48 -05:00
Guillaume Abrioux	27b1fc8981	shrink-mds: do not play ceph-facts entirely We only need to set `container_binary`. Let's use `tasks_from` option. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `0ae0a9ce28`)	2020-01-08 11:18:45 -05:00
Guillaume Abrioux	edbb207680	shrink-mds: use fact from delegated node The command is delegated on the first monitor so we must use the fact `container_binary` from this node. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `77b39d235b`)	2020-01-08 11:18:45 -05:00
Guillaume Abrioux	0eaa66f394	shrink-mds: fix filesystem removal task This commit deletes the filesystem when no more MDS is present after shrinking operation. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1787543 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `38278a6bb5`)	2020-01-08 11:18:45 -05:00
Guillaume Abrioux	bfd26e7f78	shrink-mds: ensure max_mds is always honored This commit prevent from shrinking an mds node when max_mds wouldn't be honored after that operation. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `2cfe5a04bf`)	2020-01-08 11:18:45 -05:00
Guillaume Abrioux	19068659c7	filestore-to-bluestore: ensure all dm are closed This commit adds a task to ensure device mappers are well closed when lvm batch scenario is used. Otherwise, OSDs can't be redeployed given that devices that are rejected by ceph-volume because they are locked. Adding a condition `devices \| default([]) \| length > 0` to remove these dm only when using lvm batch scenario. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `8e6ef818a2`)	2019-12-11 16:37:21 +01:00
Guillaume Abrioux	99ac694cc0	filestore-to-bluestore: force OSDs to be marked down Otherwise, sometimes it can take a while for an OSD to be seen as down and causes the `ceph osd purge` command to fail. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `51d601193e`)	2019-12-11 16:37:21 +01:00
Guillaume Abrioux	586f6f6262	filestore-to-bluestore: do not use --destroy Do not use `--destroy` when zapping a device. Otherwise, it destroys VGs while they are still needed to redeploy the OSDs. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `e3305e6bb6`)	2019-12-11 16:37:21 +01:00
Guillaume Abrioux	d2b1506712	filestore-to-bluestore: add non containerized support This commit adds the non containerized context support to the filestore-to-bluestore.yml infrastructure playbook. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1729267 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `4833b85e04`)	2019-12-11 16:37:21 +01:00
Guillaume Abrioux	5062d4094c	update: restart iscsigws daemons after upgrade In containerized context, containers aren't stopped early in the sequence. It means they aren't restarted after the upgrade because the task is just checking the daemon status is started (eg: `state: started`). This commit also removes the task which ensure services are started because it's already done in the role ceph-iscsigw. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `c7708eb458`)	2019-12-11 08:48:34 -05:00
Guillaume Abrioux	fe8858af38	upgrade: add dashboard deployment when upgrading from RHCS 3, dashboard has obviously never been deployed and it forces us to deploy it later manually. This commit adds the dashboard deployment as part of the upgrade to RHCS 4. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1779092 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `451c5ca934`)	2019-12-11 08:48:34 -05:00
Dimitri Savineau	3b26df8c75	purge-cluster: add podman support The podman support was added to the purge-container-cluster playbook but containers are always used for the dashboard even on non containerized deployment. This commits adds the podman support on purging the dashboard resources in the purge-cluster playbook. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `89f6cc54a2`)	2019-12-04 18:00:07 -05:00
Guillaume Abrioux	1c03d2b526	purge: rename playbook (container) Since we now support podman, let's rename the playbook so it's more generic. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `7bc7e3669d`)	2019-12-04 09:12:41 -05:00
Dimitri Savineau	98392be368	add-{mon,osd}: run raw install python tasks If the new mon/osd node doesn't have python installed then we need to execute the tasks from raw_install_python.yml. Closes: #4368 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `34b03d1873`)	2019-12-04 10:59:39 +01:00

1 2 3 4 5 ...

558 Commits (6878aab0f94586e4ccd58fb1f3880c8821db0929)