ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Guillaume Abrioux	c5b7b37105	purge-cluster: clean some code Avoid using regexp to match device Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-12-20 17:42:45 +01:00
Guillaume Abrioux	eeedefdf02	purge-cluster: wipe disk using dd `bluestore_purge_osd_non_container` scenario is failing because it keeps old osd_uuid information on devices and cause the `ceph-disk activate` to fail when trying to redeploy a new cluster after a purge. typical error seen : ``` 2017-12-13 14:29:48.021288 7f6620651d00 -1 bluestore(/var/lib/ceph/tmp/mnt.2_3gh6/block) _check_or_set_bdev_label bdev /var/lib/ceph/tmp/mnt.2_3gh6/block fsid 770080e2-20db-450f-bc17-81b55f167982 does not match our fsid f33efff0-2f07-4203-ad8d-8a0844d6bda0 ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-12-20 17:42:45 +01:00
Guillaume Abrioux	aaaf980140	purge: fix bug on 'wait_for' task this task hangs because `{{ inventory_hostname }}` doesn't resolv to an actual ip address. Using `hostvars[inventory_hostname]['ansible_default_ipv4']['address']` should fix this because it will reach the node with its actual IP address. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-11-29 11:10:56 +01:00
Guillaume Abrioux	947766e294	purge-cluster: remove usage of `with_fileglob` `with_fileglob` loops over files on the machine where ansible-playbook is being run. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-11-21 08:24:11 +01:00
Sébastien Han	2837d0a22e	purge: do not reboot by default Rebooting servers is really intrusive and perhaps this is not what the operator wants. So we disable the reboot by default now. Note that the reboot might not happen all the time. It can be enabled by default by running the purge playbook with -e reboot_osd_node=True Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1505011 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-26 14:18:38 +02:00
Sébastien Han	24b82c2679	purge: fix journal purge Using a condition when osd_scenario == 'non-collocated' was wrong since these partitions can be collocated on a single device also. Removing the check makes the purge of these partitions. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1499871 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-10 09:57:39 +02:00
Guillaume Abrioux	f147b119ed	Merge pull request #2014 from ceph/fixes-2 infra: use the pg check in the right place	2017-10-09 20:14:06 +02:00
Sébastien Han	450108fab9	infra: add independant purge-iscsi-gateways.yml The current inclusion of purge-iscsi-gateways.yml in purge-cluster.yml is not working well and blocking the CI too. So removing it from purge-cluster.yml and re-add the original purge-iscsi-gateways.yml. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-09 17:25:44 +02:00
Boris Ranto	64e272d818	purge-cluster: Do not use shell for rm The shell wildcard expansion of non-existing paths fails on zsh making the whole script fail. We can use file module with with_fileglob to alleviate the problem instead. Signed-off-by: Boris Ranto <branto@redhat.com>	2017-10-06 22:54:37 +02:00
Boris Ranto	f696cb7637	purge-cluster: Do not fail on systemd commands The systemd can't stop services if the unit files were removed before the cluster was purged. We should just ignore these. Signed-off-by: Boris Ranto <branto@redhat.com>	2017-10-06 22:52:56 +02:00
Sébastien Han	b6b24a5ca9	iscsi: fix wrong group name for iscsi Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1498490 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-05 17:25:32 +02:00
zhangwentao	86a6db0d58	purge-cluster: delete block partitions if using bluestore	2017-09-29 14:04:17 +08:00
Andrew Schoen	fccc604f4a	purge-cluster: default lvm_volumes if not defined Most osd scenarios do not use lvm_volumes, so default it in purge-cluster.yml if it's not defined. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-09-26 15:14:29 -05:00
Guillaume Abrioux	c80ba7a307	purge: implement mgr purge unti now, mgr nodes are not managed by purge-cluster.yml, therefore it breaks scenario like purge_cluster. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-09-24 21:18:50 +02:00
Sébastien Han	ba3e3b6cc7	purge: only purge specific directories for mon Handles the case when a mon is collocated with an OSD. Closes: https://github.com/ceph/ceph-ansible/issues/1877 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-13 17:07:04 -06:00
Sébastien Han	aa364264cd	resync ceph-iscsi-gw with old upstream Taken from https://github.com/pcuzner/ceph-iscsi-ansible/tree/tcmu-fixes Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1454945 and https://bugzilla.redhat.com/show_bug.cgi?id=1484083 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-12 18:06:10 -06:00
Sébastien Han	b9ced956d7	purge: get lockbox mountpoint and unmount it Prior command was avoiding the lockbox mountpoint and the playbook was failing with: rmtree failed: [Errno 30] Read-only file system: '/var/lib/ceph/osd-lockbox/4e9d8052-87c2-4fde-a56c-b8c108a3eefc/key-management-mode' Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-07 16:31:31 +02:00
Ben England	617d9ee75d	dont use devices var anymore, works for osd_auto_discover	2017-08-28 17:27:01 -04:00
Andrew Schoen	bed57572cc	purge-cluster: adds support for purging lvm osds This also adds a new testing scenario for purging lvm osds Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-08-23 10:33:35 -05:00
Sébastien Han	9c824b9818	purge: add ability to purge bluestore osd We now purge block db and/or wal partitions if we find any. Closes: https://github.com/ceph/ceph-ansible/issues/1770 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-08-21 18:08:18 +02:00
Sébastien Han	30991b1c0a	osd: simplify scenarios There is only two main scenarios now: * collocated: everything remains on the same device: - data, db, wal for bluestore - data and journal for filestore * non-collocated: dedicated device for some of the component Signed-off-by: Sébastien Han <seb@redhat.com>	2017-08-03 10:20:39 +02:00
Sébastien Han	fad9d0caec	Merge pull request #1690 from yanyixing/master fix: when osd device is a disk partition	2017-07-26 15:55:29 +02:00
yanyx	2e6233271e	fix: when osd device is a disk partition	2017-07-25 21:39:43 +08:00
Sébastien Han	0c18cf199e	purge: remove leftover unit files Closes https://github.com/ceph/ceph-ansible/issues/1672 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-07-25 13:26:28 +02:00
Andrew Schoen	5a3f95dfc1	purge-cluster: check for any running ceph process after purge Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-04-25 09:30:22 -05:00
Andrew Schoen	26bdd59f5d	purge-cluster: we don't support sysv or upstart anymore Now that ceph-ansible only supports > jewel we don't need to bother with sysv or upstart Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-04-21 15:14:38 -07:00
Andrew Schoen	7ca2bddcce	purge-cluster: do not need to check for running ceph processes Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-04-21 15:12:46 -07:00
Andrew Schoen	aac79df3b3	purge-cluster: no need to remove ceph.target The package uninstalls will stop ceph.target Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-04-21 15:11:03 -07:00
Daniel Lupescu	d5e56c481a	purge-cluster: fix grep match for NVMe and HP Smart Array devices raw_device would return invalid block device names for NVMe and HPSA devices which would cause sgdisk partition deletion to fail $ echo /dev/nvme1n1p3 \| egrep -o '/dev/([hsv]d[a-z]{1,2}\|cciss/c[0-9]d[0-9]p\|nvme[0-9]n[0-9]p){1,2}' /dev/nvme1n1p $ echo /dev/cciss/c0d0p2 \| egrep -o '/dev/([hsv]d[a-z]{1,2}\|cciss/c[0-9]d[0-9]p\|nvme[0-9]n[0-9]p){1,2}' /dev/cciss/c0d0p	2017-04-11 16:13:28 +03:00
Sébastien Han	c37aaa41f4	playbook: homogenize the way list osd ids Problem: too many different commands to do the same thing. The 'cut' command on infrastructure-playbooks/purge-cluster.yml was also wrong. This sed command from osixia in ceph-docker https://github.com/ceph/ceph-docker/pull/580/ addresses all the scenarios. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-03-30 11:51:38 +02:00
Andrew Schoen	4fe6607004	purge-cluster: do not set group name vars at playbook level This has the behavior of overriding custom values set in group_vars. I've added defaults to the rest of the group names so that if they are not overridden in group_vars then defaults will be used. See: https://bugzilla.redhat.com/show_bug.cgi?id=1354700 Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-03-08 08:57:08 -06:00
Shengjing Zhu	32923fd217	fix grep match pattern for osd ids Some playbooks use [0-9]*, others use \d+$ The latter is more correct since cluster name may contain numbers. Signed-off-by: Shengjing Zhu <zsj950618@gmail.com>	2017-02-20 16:35:56 +08:00
Andrew Schoen	22f52a9dc6	purge-cluster: also purge dmcrypt dedicated journals See: https://bugzilla.redhat.com/show_bug.cgi?id=1414647 Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-02-15 10:27:17 -06:00
Andrew Schoen	c5f561a4e9	purge-cluster: remove calamari-server package See: https://bugzilla.redhat.com/show_bug.cgi?id=1422134 Signed-off-by: Andrew Schoen <aschoen@redhat.com> Resolves rhbz#1422134	2017-02-14 09:24:02 -06:00
Andrew Schoen	865b4500dc	purge-cluster: set a default value for fetch_directory if not defined Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-02-08 06:25:43 -06:00
Andrew Schoen	adf6aee643	purge-cluster: remove all include tasks Including variables from role defaults or files in a group_vars directory relative to the playbook is a bad practice. We don't want to do this because including these defaults at the task level overrides values that would be set in a group_vars directory relative to the inventory file, which is the correct usage if you wish to override those default values. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-02-08 06:25:43 -06:00
Andrew Schoen	0476b24af1	purge-cluster: do not use ceph-detect-init We can not always ensure that ceph-detect-init will be present on the system. See: https://bugzilla.redhat.com/show_bug.cgi?id=1418980 Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-02-08 06:24:44 -06:00
Sébastien Han	72cd9199ac	purge: ability to purge client role Signed-off-by: Sébastien Han <seb@redhat.com>	2017-02-07 22:14:18 +01:00
Sébastien Han	d5dd658cfa	purge: do not stop ceph.target on each daemon Doing this cause some all the daemons to go down at the same time. In a scenario where we colocate a monitor and an osd, this osds will take some time to go down which will make the 'umount' task fail. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-01-30 14:31:56 +01:00
Sébastien Han	cb57a359ba	purge: do not fail on purge ceph files On systems running docker there is an issue with lxfs that results in the find command returning 1 but actually did the job. e.g: on a system with docker runnning find /var will give us the following error: find: '/var/lib/lxcfs/cgroup/devices/lxc/x1/system.slice/systemd-update-utmp.service/devices.deny': Permission denied find: '/var/lib/lxcfs/cgroup/devices/lxc/x1/system.slice/dev-random.mount/devices.allow': Permission denied ... ... However ceph files got deleted so we ignore the error. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-01-30 14:31:56 +01:00
Sébastien Han	e371bd591c	purge: fix ubuntu purge when not using systemd We now rely on the cli tool ceph-detect-init which will tell us the init system in used on the distribution. We do this instead of the previous lookup for systemd unit files to call the right task depending on the init system. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-01-30 14:31:56 +01:00
Sébastien Han	0e2e270ab2	purge: allow purge to run multiple times with_items is evaluated before the when so in a second run where the variable is empty if will fail with "'dict object' has no attribute 'stdout_lines'". To fix this we had a default array so with_items does not fail and the task is skipped with the when. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-01-30 14:31:56 +01:00
Andrew Schoen	d3cb8dba4e	purge-cluster: fix failure when raw_multi_journal is not defined Because the purge-cluster.yml playbook does not have access to the roles default vars then we can be sure that raw_multi_journal is defined. For example, if this was purging a dmcrypt journal then raw_multi_journal might not be defined at all in group_vars/all.yml or group_vars/osds.yml. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-01-27 05:23:17 -06:00
Andrew Schoen	b2a6f095f1	purge-cluster: fix syntax when deleting dmcrypt devices Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-01-26 11:28:30 -06:00
Sébastien Han	73ca1a7a00	purge: remove dm-crypt devices When running encrypted OSDs, an encrypted device mapper is used (because created by the crypsetup tool). So before attempting to remove all the partitions on a device we must delete all the encrypted device mappers, then we can delete all the partitions. Signed-off-by: Sébastien Han <seb@redhat.com> Please enter the commit message for your changes. Lines starting	2017-01-25 22:32:46 +01:00
Sébastien Han	adeb3decf3	purge: remove zap_block_devs variable The name of this variable was a bit confusing since its activation will zap all the block devices no matter which osd scenario we are using. Removing this variable and applying a condition on the OSD scenario is now feasible and easier since we import group_vars variable files for OSDs. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-01-18 10:55:01 +01:00
Sébastien Han	b7fcbe5ca2	purge: cosmetic cleanup Just applying our writing syntax convention in the playbook. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-01-18 10:53:21 +01:00
Andrew Schoen	dd8389cdf7	purge-cluster: do not include ceph-osd and ceph-common defaults for osds When purging OSDs we do not need to include these defaults as nothing in the following tasks uses them. Also, it has the side effect of overwriting any variables defined in group_vars files that are relative to the inventory you are using with the default values. That behavior was causing the CI tests to fail. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-01-10 16:57:58 -06:00
Andrew Schoen	321cea8ba9	purge-cluster: get journal partitions after zapping osd disks In my testing zapping the osd disks deleted the journal partitions, making the 'zap ceph journal partitions' task fail because the partitions it found previously do not exist anymore. This moves the task that finds the journal partitions after 'zap osd disks' to catch any partitions ceph-disk might have missed. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-01-03 15:57:17 -06:00
Andrew Schoen	c9e5914377	purge-cluster: use ignore_errors: true when including group_vars files Using failed_when will still throw an exception and stop the playbook if the file you're trying to include doesn't exist. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-01-03 15:57:17 -06:00

1 2

67 Commits (cfb75b8e299e2761b3e82a9afe15b3164bf07eb4)