ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Guillaume Abrioux	144c92b21f	purge: actually remove of /var/lib/ceph/* `38dc20e74b` introduced a bug in the purge playbooks because using `` in `command` module doesn't work. `/var/lib/ceph/` files are not purged it means there is a leftover. When trying to redeploy a cluster, it failed because monitor daemon was detecting existing keyring, therefore, it assumed a cluster already existed. Typical error (from container output): ``` Sep 26 13:18:16 mon0 docker[31316]: 2018-09-26 13:18:16 /entrypoint.sh: Existing mon, trying to rejoin cluster... Sep 26 13:18:16 mon0 docker[31316]: 2018-09-26 13:18:16.9323937f15b0d74700 -1 auth: unable to find a keyring on /etc/ceph/test.client.admin.keyring,/etc/ceph/test.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,:(2) No such file or directory Sep 26 13:18:23 mon0 docker[31316]: 2018-09-26 13:18:23 /entrypoint.sh: SUCCESS ``` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1633563 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-09-27 17:45:21 +02:00
Sébastien Han	38dc20e74b	purge: only purge /var/lib/ceph content Sometime /var/lib/ceph is mounted on a device so we won't be able to remove it (device busy) so let's remove its content only. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1615872 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-09-03 10:51:24 +02:00
Jeffrey Zhang	85cc61a6d9	Use /var/lib/ceph/osd folder to filter osd mount point In some case, use may mount a partition to /var/lib/ceph, and umount it will be failure and no need to do so too. Signed-off-by: Jeffrey Zhang <zhang.lei.fly@gmail.com>	2018-08-14 13:00:24 +00:00
Guillaume Abrioux	9801bde4d4	purge_cluster: fix dmcrypt purge dmcrypt devices aren't closed properly, therefore, it may fail when trying to redeploy after a purge. Typical errors: ``` ceph-disk: Cannot discover filesystem type: device /dev/sdb1: Command '/sbin/blkid' returned non-zero exit status 2 ``` ``` ceph-disk: Error: unable to read dm-crypt key: /var/lib/ceph/osd-lockbox/c6e01af1-ed8c-4d40-8be7-7fc0b4e104cf: /etc/ceph/dmcrypt-keys/c6e01af1-ed8c-4d40-8be7-7fc0b4e104cf.luks.key ``` Closing properly dmcrypt devices allows to redeploy without error. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1492242 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-05-21 08:23:10 +02:00
Guillaume Abrioux	a9247c4de7	purge_cluster: wipe all partitions In order to ensure there is no leftover after having purged a cluster, we must wipe all partitions properly. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1492242 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-05-17 08:37:17 +02:00
Guillaume Abrioux	9cad113e2f	purge_cluster: fix bug when building device list there is some leftover on devices when purging osds because of a invalid device list construction. typical error: ``` changed: [osd3] => (item=/dev/sda sda1) => { "changed": true, "cmd": "# if the disk passed is a raw device AND the boot system disk\n if parted -s \"/dev/sda sda1\" print \| grep -sq boot; then\n echo \"Looks like /dev/sda sda1 has a boot partition,\"\n echo \"if you want to delete specific partitions point to the partition instead of the raw device\"\n echo \"Do not use your system disk!\"\n exit 1\n fi\n echo sgdisk -Z \"/dev/sda sda1\"\n echo dd if=/dev/zero of=\"/dev/sda sda1\" bs=1M count=200\n echo udevadm settle --timeout=600", "delta": "0:00:00.015188", "end": "2018-05-16 12:41:40.408597", "item": "/dev/sda sda1", "rc": 0, "start": "2018-05-16 12:41:40.393409" } STDOUT: sgdisk -Z /dev/sda sda1 dd if=/dev/zero of=/dev/sda sda1 bs=1M count=200 udevadm settle --timeout=600 STDERR: Error: Could not stat device /dev/sda sda1 - No such file or directory. ``` the devices list in the task `resolve parent device` isn't built properly because the command used to resolve the parent device doesn't return the expected output eg: ``` changed: [osd3] => (item=/dev/sda1) => { "changed": true, "cmd": "echo /dev/$(lsblk -no pkname \"/dev/sda1\")", "delta": "0:00:00.013634", "end": "2018-05-16 12:41:09.068166", "item": "/dev/sda1", "rc": 0, "start": "2018-05-16 12:41:09.054532" } STDOUT: /dev/sda sda1 ``` For instance, it will result with a devices list like: `['/dev/sda sda1', '/dev/sdb', '/dev/sdc sdc1']` where we expect to have: `['/dev/sda', '/dev/sdb', '/dev/sdc']` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1492242 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-05-17 08:37:17 +02:00
Andrew Schoen	08f4875533	ceph_volume: refactor to not run ceph osd destroy This changes state to action and gives the options 'create' or 'zap'. The zap parameter is also removed. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-04-10 14:19:21 +02:00
Andrew Schoen	c6e8f8fb11	purge-cluster: no need to use objectstore for ceph_volume module When zapping objectstore is not required. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-04-10 14:19:21 +02:00
Andrew Schoen	c29a75ac7f	purge-cluster: use ceph_volume module to zap and destroy OSDs Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-04-10 14:19:21 +02:00
Guillaume Abrioux	dd0c98c5a2	common: do not use `shell` module when it is not needed There is no need here to use `shell` instead of `command` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-01-31 10:45:34 +01:00
Guillaume Abrioux	deaf273b25	syntax: change local_action syntax Use a nicer syntax for `local_action` tasks. We used to have oneliner like this: ``` local_action: wait_for port=22 host={{ hostvars[inventory_hostname]['ansible_default_ipv4']['address'] }} state=started delay=10 timeout=500 }} ``` The usual syntax: ``` local_action: module: wait_for port: 22 host: "{{ hostvars[inventory_hostname]['ansible_default_ipv4']['address'] }}" state: started delay: 10 timeout: 500 ``` is nicer and kind of way to keep consistency regarding the whole playbook. This also fix a potential issue about missing quotation : ``` Traceback (most recent call last): File "/tmp/ansible_wQtWsi/ansible_module_command.py", line 213, in <module> main() File "/tmp/ansible_wQtWsi/ansible_module_command.py", line 185, in main rc, out, err = module.run_command(args, executable=executable, use_unsafe_shell=shell, encoding=None, data=stdin) File "/tmp/ansible_wQtWsi/ansible_modlib.zip/ansible/module_utils/basic.py", line 2710, in run_command File "/usr/lib64/python2.7/shlex.py", line 279, in split return list(lex) File "/usr/lib64/python2.7/shlex.py", line 269, in next token = self.get_token() File "/usr/lib64/python2.7/shlex.py", line 96, in get_token raw = self.read_token() File "/usr/lib64/python2.7/shlex.py", line 172, in read_token raise ValueError, "No closing quotation" ValueError: No closing quotation ``` writing `local_action: shell echo {{ fsid }} \| tee {{ fetch_directory }}/ceph_cluster_uuid.conf` can cause trouble because it's complaining with missing quotes, this fix solves this issue. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1510555 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-01-31 10:45:34 +01:00
Guillaume Abrioux	c5b7b37105	purge-cluster: clean some code Avoid using regexp to match device Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-12-20 17:42:45 +01:00
Guillaume Abrioux	eeedefdf02	purge-cluster: wipe disk using dd `bluestore_purge_osd_non_container` scenario is failing because it keeps old osd_uuid information on devices and cause the `ceph-disk activate` to fail when trying to redeploy a new cluster after a purge. typical error seen : ``` 2017-12-13 14:29:48.021288 7f6620651d00 -1 bluestore(/var/lib/ceph/tmp/mnt.2_3gh6/block) _check_or_set_bdev_label bdev /var/lib/ceph/tmp/mnt.2_3gh6/block fsid 770080e2-20db-450f-bc17-81b55f167982 does not match our fsid f33efff0-2f07-4203-ad8d-8a0844d6bda0 ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-12-20 17:42:45 +01:00
Guillaume Abrioux	aaaf980140	purge: fix bug on 'wait_for' task this task hangs because `{{ inventory_hostname }}` doesn't resolv to an actual ip address. Using `hostvars[inventory_hostname]['ansible_default_ipv4']['address']` should fix this because it will reach the node with its actual IP address. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-11-29 11:10:56 +01:00
Guillaume Abrioux	947766e294	purge-cluster: remove usage of `with_fileglob` `with_fileglob` loops over files on the machine where ansible-playbook is being run. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-11-21 08:24:11 +01:00
Sébastien Han	2837d0a22e	purge: do not reboot by default Rebooting servers is really intrusive and perhaps this is not what the operator wants. So we disable the reboot by default now. Note that the reboot might not happen all the time. It can be enabled by default by running the purge playbook with -e reboot_osd_node=True Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1505011 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-26 14:18:38 +02:00
Sébastien Han	24b82c2679	purge: fix journal purge Using a condition when osd_scenario == 'non-collocated' was wrong since these partitions can be collocated on a single device also. Removing the check makes the purge of these partitions. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1499871 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-10 09:57:39 +02:00
Guillaume Abrioux	f147b119ed	Merge pull request #2014 from ceph/fixes-2 infra: use the pg check in the right place	2017-10-09 20:14:06 +02:00
Sébastien Han	450108fab9	infra: add independant purge-iscsi-gateways.yml The current inclusion of purge-iscsi-gateways.yml in purge-cluster.yml is not working well and blocking the CI too. So removing it from purge-cluster.yml and re-add the original purge-iscsi-gateways.yml. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-09 17:25:44 +02:00
Boris Ranto	64e272d818	purge-cluster: Do not use shell for rm The shell wildcard expansion of non-existing paths fails on zsh making the whole script fail. We can use file module with with_fileglob to alleviate the problem instead. Signed-off-by: Boris Ranto <branto@redhat.com>	2017-10-06 22:54:37 +02:00
Boris Ranto	f696cb7637	purge-cluster: Do not fail on systemd commands The systemd can't stop services if the unit files were removed before the cluster was purged. We should just ignore these. Signed-off-by: Boris Ranto <branto@redhat.com>	2017-10-06 22:52:56 +02:00
Sébastien Han	b6b24a5ca9	iscsi: fix wrong group name for iscsi Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1498490 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-05 17:25:32 +02:00
zhangwentao	86a6db0d58	purge-cluster: delete block partitions if using bluestore	2017-09-29 14:04:17 +08:00
Andrew Schoen	fccc604f4a	purge-cluster: default lvm_volumes if not defined Most osd scenarios do not use lvm_volumes, so default it in purge-cluster.yml if it's not defined. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-09-26 15:14:29 -05:00
Guillaume Abrioux	c80ba7a307	purge: implement mgr purge unti now, mgr nodes are not managed by purge-cluster.yml, therefore it breaks scenario like purge_cluster. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-09-24 21:18:50 +02:00
Sébastien Han	ba3e3b6cc7	purge: only purge specific directories for mon Handles the case when a mon is collocated with an OSD. Closes: https://github.com/ceph/ceph-ansible/issues/1877 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-13 17:07:04 -06:00
Sébastien Han	aa364264cd	resync ceph-iscsi-gw with old upstream Taken from https://github.com/pcuzner/ceph-iscsi-ansible/tree/tcmu-fixes Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1454945 and https://bugzilla.redhat.com/show_bug.cgi?id=1484083 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-12 18:06:10 -06:00
Sébastien Han	b9ced956d7	purge: get lockbox mountpoint and unmount it Prior command was avoiding the lockbox mountpoint and the playbook was failing with: rmtree failed: [Errno 30] Read-only file system: '/var/lib/ceph/osd-lockbox/4e9d8052-87c2-4fde-a56c-b8c108a3eefc/key-management-mode' Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-07 16:31:31 +02:00
Ben England	617d9ee75d	dont use devices var anymore, works for osd_auto_discover	2017-08-28 17:27:01 -04:00
Andrew Schoen	bed57572cc	purge-cluster: adds support for purging lvm osds This also adds a new testing scenario for purging lvm osds Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-08-23 10:33:35 -05:00
Sébastien Han	9c824b9818	purge: add ability to purge bluestore osd We now purge block db and/or wal partitions if we find any. Closes: https://github.com/ceph/ceph-ansible/issues/1770 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-08-21 18:08:18 +02:00
Sébastien Han	30991b1c0a	osd: simplify scenarios There is only two main scenarios now: * collocated: everything remains on the same device: - data, db, wal for bluestore - data and journal for filestore * non-collocated: dedicated device for some of the component Signed-off-by: Sébastien Han <seb@redhat.com>	2017-08-03 10:20:39 +02:00
Sébastien Han	fad9d0caec	Merge pull request #1690 from yanyixing/master fix: when osd device is a disk partition	2017-07-26 15:55:29 +02:00
yanyx	2e6233271e	fix: when osd device is a disk partition	2017-07-25 21:39:43 +08:00
Sébastien Han	0c18cf199e	purge: remove leftover unit files Closes https://github.com/ceph/ceph-ansible/issues/1672 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-07-25 13:26:28 +02:00
Andrew Schoen	5a3f95dfc1	purge-cluster: check for any running ceph process after purge Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-04-25 09:30:22 -05:00
Andrew Schoen	26bdd59f5d	purge-cluster: we don't support sysv or upstart anymore Now that ceph-ansible only supports > jewel we don't need to bother with sysv or upstart Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-04-21 15:14:38 -07:00
Andrew Schoen	7ca2bddcce	purge-cluster: do not need to check for running ceph processes Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-04-21 15:12:46 -07:00
Andrew Schoen	aac79df3b3	purge-cluster: no need to remove ceph.target The package uninstalls will stop ceph.target Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-04-21 15:11:03 -07:00
Daniel Lupescu	d5e56c481a	purge-cluster: fix grep match for NVMe and HP Smart Array devices raw_device would return invalid block device names for NVMe and HPSA devices which would cause sgdisk partition deletion to fail $ echo /dev/nvme1n1p3 \| egrep -o '/dev/([hsv]d[a-z]{1,2}\|cciss/c[0-9]d[0-9]p\|nvme[0-9]n[0-9]p){1,2}' /dev/nvme1n1p $ echo /dev/cciss/c0d0p2 \| egrep -o '/dev/([hsv]d[a-z]{1,2}\|cciss/c[0-9]d[0-9]p\|nvme[0-9]n[0-9]p){1,2}' /dev/cciss/c0d0p	2017-04-11 16:13:28 +03:00
Sébastien Han	c37aaa41f4	playbook: homogenize the way list osd ids Problem: too many different commands to do the same thing. The 'cut' command on infrastructure-playbooks/purge-cluster.yml was also wrong. This sed command from osixia in ceph-docker https://github.com/ceph/ceph-docker/pull/580/ addresses all the scenarios. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-03-30 11:51:38 +02:00
Andrew Schoen	4fe6607004	purge-cluster: do not set group name vars at playbook level This has the behavior of overriding custom values set in group_vars. I've added defaults to the rest of the group names so that if they are not overridden in group_vars then defaults will be used. See: https://bugzilla.redhat.com/show_bug.cgi?id=1354700 Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-03-08 08:57:08 -06:00
Shengjing Zhu	32923fd217	fix grep match pattern for osd ids Some playbooks use [0-9]*, others use \d+$ The latter is more correct since cluster name may contain numbers. Signed-off-by: Shengjing Zhu <zsj950618@gmail.com>	2017-02-20 16:35:56 +08:00
Andrew Schoen	22f52a9dc6	purge-cluster: also purge dmcrypt dedicated journals See: https://bugzilla.redhat.com/show_bug.cgi?id=1414647 Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-02-15 10:27:17 -06:00
Andrew Schoen	c5f561a4e9	purge-cluster: remove calamari-server package See: https://bugzilla.redhat.com/show_bug.cgi?id=1422134 Signed-off-by: Andrew Schoen <aschoen@redhat.com> Resolves rhbz#1422134	2017-02-14 09:24:02 -06:00
Andrew Schoen	865b4500dc	purge-cluster: set a default value for fetch_directory if not defined Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-02-08 06:25:43 -06:00
Andrew Schoen	adf6aee643	purge-cluster: remove all include tasks Including variables from role defaults or files in a group_vars directory relative to the playbook is a bad practice. We don't want to do this because including these defaults at the task level overrides values that would be set in a group_vars directory relative to the inventory file, which is the correct usage if you wish to override those default values. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-02-08 06:25:43 -06:00
Andrew Schoen	0476b24af1	purge-cluster: do not use ceph-detect-init We can not always ensure that ceph-detect-init will be present on the system. See: https://bugzilla.redhat.com/show_bug.cgi?id=1418980 Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-02-08 06:24:44 -06:00
Sébastien Han	72cd9199ac	purge: ability to purge client role Signed-off-by: Sébastien Han <seb@redhat.com>	2017-02-07 22:14:18 +01:00
Sébastien Han	d5dd658cfa	purge: do not stop ceph.target on each daemon Doing this cause some all the daemons to go down at the same time. In a scenario where we colocate a monitor and an osd, this osds will take some time to go down which will make the 'umount' task fail. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-01-30 14:31:56 +01:00

1 2

78 Commits (9fe86c22682c7e5eddc610d97c12d4e7eb254102)