ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Andrew Schoen	699c777e68	rolling update: fix undefined jewel_minor_update failure Variables set at the play level with ``vars`` do not carry over into the next play in the playbook. The var jewel_minor_update was set in a previous play but used in this one and was failing because it was not defined. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1544029 Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-02-13 17:03:05 +01:00
Andrew Schoen	7c7017ebe6	infra: do not include host_vars/* in take-over-existing-cluster.yml These are better collected by ansible automatically. This would also fail if the host_var file didn't exist. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-02-12 11:48:47 +01:00
Guillaume Abrioux	3b2f6c34e4	purge-docker: fix ceph-osd-zap name container the `zap ceph osd disks` task should iter on `resolved_parent_device` instead of `combined_devices_list` which contain only the base device name (vs. full path name in `combined_devices_list`). this fixes the issue where docker complain about container name because of illegal characters such as `/` : ``` "/usr/bin/docker-current: Error response from daemon: Invalid container name (ceph-osd-zap-magna074-/dev/sdb1), only [a-zA-Z0-9][a-zA-Z0-9_.-] are allowed.","See '/usr/bin/docker-current run --help'." "" ``` having the the basename of the device path is enough for the container name. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1540137 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-02-02 22:09:11 +01:00
Guillaume Abrioux	dd0c98c5a2	common: do not use `shell` module when it is not needed There is no need here to use `shell` instead of `command` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-01-31 10:45:34 +01:00
Guillaume Abrioux	deaf273b25	syntax: change local_action syntax Use a nicer syntax for `local_action` tasks. We used to have oneliner like this: ``` local_action: wait_for port=22 host={{ hostvars[inventory_hostname]['ansible_default_ipv4']['address'] }} state=started delay=10 timeout=500 }} ``` The usual syntax: ``` local_action: module: wait_for port: 22 host: "{{ hostvars[inventory_hostname]['ansible_default_ipv4']['address'] }}" state: started delay: 10 timeout: 500 ``` is nicer and kind of way to keep consistency regarding the whole playbook. This also fix a potential issue about missing quotation : ``` Traceback (most recent call last): File "/tmp/ansible_wQtWsi/ansible_module_command.py", line 213, in <module> main() File "/tmp/ansible_wQtWsi/ansible_module_command.py", line 185, in main rc, out, err = module.run_command(args, executable=executable, use_unsafe_shell=shell, encoding=None, data=stdin) File "/tmp/ansible_wQtWsi/ansible_modlib.zip/ansible/module_utils/basic.py", line 2710, in run_command File "/usr/lib64/python2.7/shlex.py", line 279, in split return list(lex) File "/usr/lib64/python2.7/shlex.py", line 269, in next token = self.get_token() File "/usr/lib64/python2.7/shlex.py", line 96, in get_token raw = self.read_token() File "/usr/lib64/python2.7/shlex.py", line 172, in read_token raise ValueError, "No closing quotation" ValueError: No closing quotation ``` writing `local_action: shell echo {{ fsid }} \| tee {{ fetch_directory }}/ceph_cluster_uuid.conf` can cause trouble because it's complaining with missing quotes, this fix solves this issue. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1510555 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-01-31 10:45:34 +01:00
Guillaume Abrioux	f372a4232e	purge: fix resolve parent device task This is a typo caused by leftover. It was previously written like this : `shell: echo /dev/$(lsblk -no pkname "{{ item }}") }}")` and has been rewritten to : `shell: $(lsblk --nodeps -no pkname "{{ item }}") }}")` because we are appending later the '/dev/' in the next task. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1540137 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-01-30 17:40:10 +01:00
Guillaume Abrioux	c7ec12d49c	upgrade: skip luminous tasks for jewel minor update These tasks are needed only when upgrading to luminous. They are not needed in Jewel minor upgrade and by the way, they fail because `ceph versions` command doesn't exist. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-01-25 18:30:34 +01:00
Sébastien Han	8af7459476	rolling update: add mgr exception for jewel minor updates When update from a minor Jewel version to another, the playbook will fail on the task "fail if no mgr host is present in the inventory". This now can be worked around by running Ansible with_items -e jewel_minor_update=true Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1535382 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-01-18 14:06:05 +01:00
Guillaume Abrioux	55298fa80c	purge-container: use lsblk to resolv parent device Using `lsblk` to resolv the parent device is better than just removing the last char when passing it to the zap container. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-01-17 15:54:20 +01:00
Guillaume Abrioux	58eb045d2f	purge-container: remove awk usage in favor of blkid Avoid using `awk` to get the different devices from the partlabel. Using `blkid` is more readable. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-01-17 15:54:20 +01:00
Andrew Schoen	b613321c21	switch-to-containers: do not fail when stopping the nfs-ganesha service If we're working with a jewel cluster then this service will not exist. This is mainly a problem with CI testing because our tests are setup to work with both jewel and luminous, meaning that eventhough we want to test jewel we still have a nfs-ganesha host in the test causing these tasks to run. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-01-06 14:07:55 +01:00
Andrew Schoen	0b4b60e3c9	switch-to-containers: do not fail when stopping the ceph-mgr daemon If we are working with a jewel cluster ceph mgr does not exist and this makes the playbook fail. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-01-06 14:07:55 +01:00
Andrew Schoen	997edea271	rolling_update: do not fail the playbook if nfs-ganesha is not present The rolling update playbook was attempting to stop the nfs-ganesha service on nodes where jewel is still installed. The nfs-ganesha service did not exist in jewel so the task fails. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-01-06 14:07:55 +01:00
Guillaume Abrioux	c5b7b37105	purge-cluster: clean some code Avoid using regexp to match device Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-12-20 17:42:45 +01:00
Guillaume Abrioux	eeedefdf02	purge-cluster: wipe disk using dd `bluestore_purge_osd_non_container` scenario is failing because it keeps old osd_uuid information on devices and cause the `ceph-disk activate` to fail when trying to redeploy a new cluster after a purge. typical error seen : ``` 2017-12-13 14:29:48.021288 7f6620651d00 -1 bluestore(/var/lib/ceph/tmp/mnt.2_3gh6/block) _check_or_set_bdev_label bdev /var/lib/ceph/tmp/mnt.2_3gh6/block fsid 770080e2-20db-450f-bc17-81b55f167982 does not match our fsid f33efff0-2f07-4203-ad8d-8a0844d6bda0 ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-12-20 17:42:45 +01:00
Sébastien Han	200785832f	rolling_update: do not require root to answer question There is no need to ask for root on the local action. This will prompt for a password the current user is not part of sudoers. That's unnecessary anyways. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1516947 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-12-19 14:04:55 +01:00
Guillaume Abrioux	aaaf980140	purge: fix bug on 'wait_for' task this task hangs because `{{ inventory_hostname }}` doesn't resolv to an actual ip address. Using `hostvars[inventory_hostname]['ansible_default_ipv4']['address']` should fix this because it will reach the node with its actual IP address. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-11-29 11:10:56 +01:00
Guillaume Abrioux	947766e294	purge-cluster: remove usage of `with_fileglob` `with_fileglob` loops over files on the machine where ansible-playbook is being run. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-11-21 08:24:11 +01:00
Guillaume Abrioux	d9c1b61092	purge-docker: remove osd disk prepare logs `with_fileglob` loops over files on the machine that runs the playbook. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-11-16 14:27:36 +01:00
Sébastien Han	68566444e9	Merge pull request #2142 from squidboylan/master infra: fix take-over-existing-cluster.yml playbook	2017-11-13 22:06:16 +11:00
Guillaume Abrioux	fa675f2ead	purge-docker-cluster: ensure old logs are removed purge-docker-cluster must remove all osd_disk_prepare logs in `{{ ceph_osd_docker_run_script_path }}`, otherwise if you purge your cluster and try to redeploy it, osds will fail to start since because it will try to retrieve find a partition uuid which doesn't exist. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1510470 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-11-09 17:49:20 +01:00
Caleb Boylan	41d10a2f64	infra: fix take-over-existing-cluster.yml playbook The ansible inventory could have more than just ceph-ansible hosts, so we shouldnt use "hosts: all", also only grab one file when getting the ceph cluster name instead of failing when there is more than one file in /etc/ceph. Also fix location of the ceph.conf template	2017-11-06 15:00:30 -08:00
Sébastien Han	473673ab41	shrink-mon: fix typo in the code doc Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-27 11:59:22 +02:00
Sébastien Han	2837d0a22e	purge: do not reboot by default Rebooting servers is really intrusive and perhaps this is not what the operator wants. So we disable the reboot by default now. Note that the reboot might not happen all the time. It can be enabled by default by running the purge playbook with -e reboot_osd_node=True Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1505011 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-26 14:18:38 +02:00
Guillaume Abrioux	f90f2f3a04	purge: containers are not stopped During purge osd, the containers are not stopped because of a typo, as a result, all the devices can't be unmounted later. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-10-25 07:58:00 +02:00
Sébastien Han	4413511b66	all: backward compatibility between stable-2.2 and 3.0 stable-3.0 brought numerous changes in ceph-ansible variables, this PR aims to maintain backward compatibility for someone running stable-2.2 upgrading to stable-3.0 but keeps its groups_vars untouched. We will then determine the right options to make sure the upgrade works but we are expecting that new variables should be used. We will drop this in a near future, maybe 3.1 or 3.2. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-20 11:54:10 +02:00
Guillaume Abrioux	982326373b	upgrade: fix upgrade jewel to luminous for nfs nodes nfs nodes can't be upgraded from jewel to luminous because ceph-nfs role is skipped because of the condition `when: "ceph_release_num[ceph_release] >= ceph_release_num.luminous"`. Indeed, package is upgraded in `ceph-nfs` role, therefore, `ceph_release` is still set to the old version. It means the when can't be satisfied. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-10-19 20:54:23 +02:00
Guillaume Abrioux	70034451e9	upgrade: fix upgrade jewel to luminous for mgr nodes mgr nodes can't be upgraded from jewel to luminous because ceph-mgr role is skipped because of the condition `when: "ceph_release_num[ceph_release] >= ceph_release_num.luminous"`. Indeed, ceph-mgr package is upgraded in `ceph-mgr` role, therefore, `ceph_release` is still set to the old version. It means the when can't be satisfied. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit 302e563601cd6820b1ae44fabdfb1506688c7c9b)	2017-10-19 20:54:23 +02:00
Sébastien Han	d920d4839d	upgrade: support for rbd mirror and nfs - Add upgrade support for rbd mirror and nfs daemons. - Only works with systemd (remove sysvinit and upstart occurence) - A bit of cleanup Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-17 10:54:47 +02:00
Sébastien Han	39bf102b64	switch: nicer way to check mon quorum re-use the same syntax as rolling_udate.yml Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-17 10:54:36 +02:00
Sébastien Han	b685aceede	Merge pull request #2044 from major/avoid-jinja-in-when Remove jinja2 delimiters from `when` keys	2017-10-12 22:23:06 +02:00
Major Hayden	c01851325e	Remove jinja2 delimiters from `when` keys This patch changes the `when:` keys so that they have no jinja2 delimiters. This avoids Ansible warnings which could turn into errors in a future Ansible release.	2017-10-12 11:27:42 -05:00
Major Hayden	33b200d43a	Suppress yum/dnf/rpm command warnings Ansible throws warnings when using yum/dnf/rpm with the command module: [WARNING]: Consider using yum module rather than running yum This patch adds the `warn: no` argument to suppress the warnings in the Ansible output.	2017-10-12 08:38:05 -05:00
Sébastien Han	13bce287ad	infra: replace osd playbook This playbook can replace failed OSD in containerized and non-containerized env. The current limitation is that it won't allow you to choose between filestore/bluestore and will do collocation as well. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-12 11:53:30 +02:00
Sébastien Han	85e13a864c	purge-iscsi: fix group name Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1500281 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-11 12:52:12 +02:00
Sébastien Han	24b82c2679	purge: fix journal purge Using a condition when osd_scenario == 'non-collocated' was wrong since these partitions can be collocated on a single device also. Removing the check makes the purge of these partitions. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1499871 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-10 09:57:39 +02:00
Guillaume Abrioux	f147b119ed	Merge pull request #2014 from ceph/fixes-2 infra: use the pg check in the right place	2017-10-09 20:14:06 +02:00
Sébastien Han	450108fab9	infra: add independant purge-iscsi-gateways.yml The current inclusion of purge-iscsi-gateways.yml in purge-cluster.yml is not working well and blocking the CI too. So removing it from purge-cluster.yml and re-add the original purge-iscsi-gateways.yml. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-09 17:25:44 +02:00
Sébastien Han	774697ebd8	infra: use the pg check in the right place Use the pg check before doing the pg check, not on the quorum check. Also never quote int when doing comparaison. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-09 17:25:41 +02:00
Sébastien Han	a3e7bcb13f	Merge pull request #2013 from ceph/wip-purge-cluster A couple of purge cluster fixes	2017-10-09 17:18:30 +02:00
Sébastien Han	33a3aa0dda	switch: check pgs only when num_pgs > 0 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-07 03:42:09 +02:00
Sébastien Han	05f26031ea	rolling_update: perform pg check when pgs_num > 0 If num_pgs = 0 the check will never return 0. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-07 03:39:09 +02:00
Sébastien Han	c3c63ae539	switch: rework and fix clean pg wait Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-07 03:39:09 +02:00
Sébastien Han	c693e95cbf	purge-docker: rework device detection we don't need "devices" and other device variable anymore, the playbook detects that for us. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-07 03:39:04 +02:00
Sébastien Han	2fb4981ca9	shrink-osd: admin key not needed for container shrink Also do some clean Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-07 00:20:43 +02:00
Boris Ranto	64e272d818	purge-cluster: Do not use shell for rm The shell wildcard expansion of non-existing paths fails on zsh making the whole script fail. We can use file module with with_fileglob to alleviate the problem instead. Signed-off-by: Boris Ranto <branto@redhat.com>	2017-10-06 22:54:37 +02:00
Boris Ranto	f696cb7637	purge-cluster: Do not fail on systemd commands The systemd can't stop services if the unit files were removed before the cluster was purged. We should just ignore these. Signed-off-by: Boris Ranto <branto@redhat.com>	2017-10-06 22:52:56 +02:00
Sébastien Han	b6b24a5ca9	iscsi: fix wrong group name for iscsi Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1498490 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-05 17:25:32 +02:00
Sébastien Han	f37e014a65	Merge pull request #1974 from ceph/mgr-upgrade-luminous upgrade: a support for mgrs	2017-10-03 19:57:31 +02:00
Sébastien Han	99466e79a1	upgrade: a support for mgrs Also we now play ceph-config to have everything being generated for new daemons bootstrap during upgrade. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1497959 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-03 16:57:31 +02:00
Sébastien Han	3bd341f6c0	osd: container use id instead of dev name Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1494127 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-03 14:44:00 +02:00
Sébastien Han	3c2c31a591	Merge pull request #1964 from vatelzh/master purge-cluster: delete block partitions if using bluestore	2017-10-02 12:10:26 +02:00
Sébastien Han	b9050d6229	update: fix var register Even if the task is skipped, ansible registers the var as 'skipped' so this task the task using this variable for its next usage. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-29 14:27:55 +02:00
zhangwentao	86a6db0d58	purge-cluster: delete block partitions if using bluestore	2017-09-29 14:04:17 +08:00
Sébastien Han	a0a5b174ba	rolling_update: clarify mon quorum command Cleaner. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-29 01:19:46 +02:00
Sébastien Han	bd5471b940	update: complete luminous upgrade Once we complete the upgrade to Luminous, we must issue a specific command. For more info read: http://ceph.com/community/new-luminous-upgrade-complete/ Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-28 21:05:00 +02:00
Sébastien Han	68f1f99ee9	update: nicer way to wait for clean pgs More comprhensive and friendly to read. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-28 14:46:26 +02:00
Andrew Schoen	fccc604f4a	purge-cluster: default lvm_volumes if not defined Most osd scenarios do not use lvm_volumes, so default it in purge-cluster.yml if it's not defined. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-09-26 15:14:29 -05:00
Guillaume Abrioux	fcb6454e04	rbd-mirror: fix systemd unit in purge-docker rbd-mirror containers are not stopped in purge-docker-cluster playbook because of the wrong name used. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-09-24 21:18:50 +02:00
Guillaume Abrioux	c80ba7a307	purge: implement mgr purge unti now, mgr nodes are not managed by purge-cluster.yml, therefore it breaks scenario like purge_cluster. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-09-24 21:18:50 +02:00
Guillaume Abrioux	7195b08718	update: update rgw systemd unit name The old name is used in `rolling_update.yml` and `purge-docker-cluster.yml`, it breaks the `test_rgw_service_is_running()` test. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-09-24 14:58:55 +02:00
Sébastien Han	6bac613611	shrink: support for container We can now shrink mon and osds on containerized deployment. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1492115 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-20 16:25:07 +02:00
Sébastien Han	7fedc8ebf4	Merge pull request #1891 from ceph/clarify-update rolling_update: clarify update doc	2017-09-15 07:08:49 -06:00
Sébastien Han	fe1d84d395	Merge pull request #1892 from ceph/purge-dmcrypt-col purge: only purge specific directories for mon	2017-09-13 17:57:06 -06:00
Sébastien Han	ba3e3b6cc7	purge: only purge specific directories for mon Handles the case when a mon is collocated with an OSD. Closes: https://github.com/ceph/ceph-ansible/issues/1877 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-13 17:07:04 -06:00
Sébastien Han	82c4848ec4	Merge pull request #1885 from ceph/shrink-osd shrink-osd: fix when multiple osds	2017-09-13 16:12:49 -06:00
Sébastien Han	92f9be963b	rolling_update: clarify update doc Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1490188 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-13 15:46:29 -06:00
Sébastien Han	3031e51778	shrink-osd: fix when multiple osds The loop was being built properly so we were always getting the last item as osd host. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1490355 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-13 15:20:11 -06:00
Sébastien Han	aa364264cd	resync ceph-iscsi-gw with old upstream Taken from https://github.com/pcuzner/ceph-iscsi-ansible/tree/tcmu-fixes Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1454945 and https://bugzilla.redhat.com/show_bug.cgi?id=1484083 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-12 18:06:10 -06:00
Sébastien Han	477f86e305	switch to container: fix ceph nfs The service is nfs-ganesha where ceph-nfs@{{ ansible_hostname }} will be the name of the container. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-08 22:43:50 +02:00
Sébastien Han	fdacac9fa0	switch: make osd collection idempotent This commits allows us to run switch-from-non-containerized-to-containerized-ceph-daemons.yml multiple times. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1489353 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-08 11:31:47 +02:00
Sébastien Han	e46440e19c	switch-from-non-containerized-to-containerized: fix devices If devices is passed through an extra var this register won't work so let's only register the var is devices is not defined. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1489099 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-07 23:18:14 +02:00
Sébastien Han	b9ced956d7	purge: get lockbox mountpoint and unmount it Prior command was avoiding the lockbox mountpoint and the playbook was failing with: rmtree failed: [Errno 30] Read-only file system: '/var/lib/ceph/osd-lockbox/4e9d8052-87c2-4fde-a56c-b8c108a3eefc/key-management-mode' Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-07 16:31:31 +02:00
Guillaume Abrioux	d987d26719	tests: force docker variable for switch-to-containers scenario we need to force the value of `docker` variable which is initially set to `false` since it's a migration from non-containerized to containerized cluster. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-09-06 18:03:52 +02:00
Sébastien Han	b7db600caa	switch-from-non-containerized-to-containerized: mask unit files We must mask the image so we are sure that even if the system reboots then the OSDs won't start. Also remove Ceph udev rules if found on the system prior to deploy containers. If we don't do this we are exposed to conflicts between udev rules and sytemd unit files. Also add the CI will now test the migration from a non-containerized cluster to a containerized cluster. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-05 15:20:31 +02:00
Sébastien Han	579b95fd8a	shrink-mon: wait a little bit for the mon to be out Monitor removal from the monmap is not immediate, so let's wait a little bit and then fail if the monitor is still in the monmap. We try twice in total with 10 sec intervals. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-04 23:08:57 +02:00
Sébastien Han	54d7a81241	infra playbook: move untested scenario to a new dir Move untested/with few confidence playbooks in a untested-by-ci directory. Also removing this directory from the package build. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1461551 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-01 19:58:24 +02:00
Sébastien Han	298a63c437	shrink mon and osd Rework shrinking a monitor and an OSD playbook. Also adding test scenario. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1366807 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-01 19:12:00 +02:00
Sébastien Han	e0a264c7e9	osd: allow multi dedicated journals for containers Fix: https://bugzilla.redhat.com/show_bug.cgi?id=1475820 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-08-30 12:34:06 +02:00
Ben England	617d9ee75d	dont use devices var anymore, works for osd_auto_discover	2017-08-28 17:27:01 -04:00
Sébastien Han	0205f6d645	rolling_update: nicer way to set osd flags Prior to this patch, we were applying the osd flags like this: " General pre tasks Set flags Upgrade OSDs on a host Unset flags <-- this triggers pending scrub to start Set flags Upgrade OSDs on a hosts Unset flags <-- this triggers pending scrub to start . . . General post tasks " Now instead, we apply the flag once before starting the OSD update and unset them once the last OSD is finished. " General pre tasks Set flags and wait for any scrubs to finish Upgrade OSDs on a host Upgrade OSDs on a host . . . Unset flags General post tasks " Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1450754 Signed-off-by: Sébastien Han <seb@redhat.com> Co-Authored-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-08-25 18:21:28 +02:00
Sébastien Han	4a4a20f07d	rolling update: skip pg check if num_pgs = 0 In our test case we don't have any pgs, thus the check fails. The check always returns an empty array, which makes the comparaison failing. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-08-24 08:50:49 +02:00
Alfredo Deza	e651469a2a	Merge pull request #1797 from ceph/purge-lvm adds purge support for the lvm_osds osd scenario	2017-08-23 14:28:29 -04:00
Sébastien Han	f2499ff5ac	Merge pull request #1788 from ceph/improve-switch switch-from-non-containerized-to-containerized: simplify	2017-08-23 19:47:26 +02:00
Sébastien Han	4f0ecb7f30	switch-from-non-containerized-to-containerized: simplify This commit eases the use of the infrastructure-playbooks/switch-from-non-containerized-to-containerized-ceph-daemons.yml playbook. We basically run it with a couple of pre-tasks and then we let the playbook run the docker roles. It obviously expect to have proper variables configured in order to work. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-08-23 18:39:45 +02:00
Andrew Schoen	bed57572cc	purge-cluster: adds support for purging lvm osds This also adds a new testing scenario for purging lvm osds Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-08-23 10:33:35 -05:00
Sébastien Han	1ac0969c28	Merge pull request #1778 from ceph/fix-1770 purge: add ability to purge bluestore osd	2017-08-22 23:56:36 +02:00
Giulio Fidente	2c01de4350	Default cluster to ceph in switch to containers	2017-08-22 13:13:36 +02:00
Giulio Fidente	f0423b1804	Parse ceph_docker_registry in switch to containers Defaults it to docker.io as it was for backward compatibility.	2017-08-22 13:11:27 +02:00
Giulio Fidente	a59b84d5c9	Assume mon_docker_privileged false in switch to containers	2017-08-22 13:01:25 +02:00
Giulio Fidente	0106fa6835	Consume public_network vs ceph_mon_docker_subnet In the switch to containers migration there were broken references to ceph_mon_docker_subnet variable, replaced with public_network. Also fixes references to ceph_mon_docker_extra_env setting for it a default as it could be undefined.	2017-08-21 18:34:24 +02:00
Giulio Fidente	386303d42e	Extend set_uid fact to support RH Ceph images	2017-08-21 18:32:08 +02:00
Sébastien Han	9c824b9818	purge: add ability to purge bluestore osd We now purge block db and/or wal partitions if we find any. Closes: https://github.com/ceph/ceph-ansible/issues/1770 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-08-21 18:08:18 +02:00
Andrew Schoen	d2f4d3666f	Merge pull request #1725 from ceph/simplify-osd-scenario osd: simply osd scenario declaration	2017-08-03 09:31:57 -05:00
Sébastien Han	671f2cd4bc	Merge pull request #1738 from yanyixing/nvmepart fix for nvme part path	2017-08-03 13:37:10 +02:00
yanyx	d506fad056	fix for nvme part path	2017-08-03 17:37:52 +08:00
Sébastien Han	30991b1c0a	osd: simplify scenarios There is only two main scenarios now: * collocated: everything remains on the same device: - data, db, wal for bluestore - data and journal for filestore * non-collocated: dedicated device for some of the component Signed-off-by: Sébastien Han <seb@redhat.com>	2017-08-03 10:20:39 +02:00
Sébastien Han	fdc6aebd62	infrastructure-playbooks: update with ceph-defaults roles Signed-off-by: Sébastien Han <seb@redhat.com>	2017-08-02 17:12:20 +02:00
Guillaume Abrioux	7a333d05ce	Add handlers for containerized deployment Until now, there is no handlers for containerized deployments. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-08-02 17:12:20 +02:00
Guillaume Abrioux	5adbf0fdaa	Move role dependencies in site.yml/site-docker.yml This will give us more flexibility and avoid a lot of useless when skipping all tasks from a non-desired role. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-08-02 17:12:14 +02:00
Guillaume Abrioux	206c7a16d0	rolling_update: refact code Refact rolling_update playbook. Add ceph-client upgrade. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-08-02 11:10:51 +02:00
yanyx	d0a17b11b2	change the partition's ownership	2017-07-27 11:55:30 +08:00
Sébastien Han	fad9d0caec	Merge pull request #1690 from yanyixing/master fix: when osd device is a disk partition	2017-07-26 15:55:29 +02:00
yanyx	2e6233271e	fix: when osd device is a disk partition	2017-07-25 21:39:43 +08:00
Sébastien Han	0c18cf199e	purge: remove leftover unit files Closes https://github.com/ceph/ceph-ansible/issues/1672 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-07-25 13:26:28 +02:00
Guillaume Abrioux	828f88403e	Update: Avoid screen scraping in rolling update since luminous has revamped the `ceph -s` output, we need to avoid screen scraping. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-07-12 15:02:39 +02:00
Guillaume Abrioux	896d62d78b	Refact: remove ceph_mon_docker_interface variable remove `ceph_mon_docker_interface` and use `monitor_interface` instead for both containerized and non-containerized deployment. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-07-04 18:08:59 +02:00
Guillaume Abrioux	73141118d0	Make the new check PGs working with /bin/sh The new test in the checks PGs are no longer working on distributions where /bin/sh isn't linked to /bin/bash. Fix: #1619 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-06-22 17:59:38 +02:00
David Galloway	127b5ad9b4	infra: Create a backup of ceph.conf when taking over existing cluster Signed-off-by: David Galloway <dgallowa@redhat.com>	2017-06-21 09:53:09 -04:00
David Galloway	40ed2d7be6	infra: Fix ceph.conf creation when taking over existing cluster Fixes bug introduced in https://github.com/ceph/ceph-ansible/pull/1330 The "stat ceph.conf" task was basically using the stat module on a string instead of the ceph.conf filename. This caused the "generate ceph configuration file" task to fail. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1463382 Signed-off-by: David Galloway <dgallowa@redhat.com>	2017-06-21 09:52:01 -04:00
Andrew Schoen	e2104acb62	rolling_update: set health_mon_check_delay to 15 The old value of 10 did not give enough time for a containerized mon to pass the health check. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-06-13 08:56:44 -05:00
Guillaume Abrioux	5af9bb432c	rewrite check pgs clean tasks Avoid screen scrapping by rewriting `waiting for clean pgs` tasks like it is done in `304de48`. Use the json output returned by `ceph -s` instead Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-06-13 09:48:56 +02:00
Andrew Schoen	59992c54cc	purge-docker-cluster: include ceph_docker_registry We need to include ceph_docker_registry when removing containers/images because if we don't it will assume docker.io which is not always where the image originated from, causing the playbook to fail. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-06-02 09:49:17 -05:00
Sébastien Han	fdc7866072	Merge pull request #1469 from ceph/refact_code Docker: Refact code	2017-06-02 12:40:25 +02:00
Andrew Schoen	f7677e4393	purge-docker-cluster: pip is only used on Debian We only need to purge packages installed by pip on Debian systems. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-05-31 09:03:44 -05:00
Andrew Schoen	8e322d4825	purge-docker-cluster: default raw_journal_devices to [] If we're purging a containerized cluster that did not use the raw_multi_journal OSD scenario then raw_journal_devices will not be defined which causes the playbook to fail. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1455187 Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-05-25 07:30:25 -05:00
Guillaume Abrioux	ddfe019342	Refact code `ceph-docker-common`: At the moment there is a lot of duplicated tasks in each `./roles/ceph-<role>/tasks/docker/main.yml` that could be refactored in `./roles/ceph-docker-common/tasks/main.yml`. `_containerized_deployment` variables: All `_containerized_deployment` have been refactored to a single variable `containerized_deployment` duplicate `cephx` variables in `group_vars/* have been removed. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-05-24 15:55:41 +02:00
Sébastien Han	90389864d8	rolling-update: set/unset flags on the right container Problem: we are delegating the set/unset flag to a monitor node but we try to call an osd container Solution: use the right container name. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-05-22 09:38:08 +02:00
Sébastien Han	b93ffe637b	Merge pull request #1476 from WingkaiHo/improve-shrink-osd.yml improve shrink-osd.yml can shrink osd when disk damage	2017-04-27 11:01:27 +02:00
WingkaiHo	0b9f322ca0	improve shrink-osd.yml can shrink osd when disk damage	2017-04-27 10:26:26 +08:00
Andrew Schoen	5a3f95dfc1	purge-cluster: check for any running ceph process after purge Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-04-25 09:30:22 -05:00
Andrew Schoen	26bdd59f5d	purge-cluster: we don't support sysv or upstart anymore Now that ceph-ansible only supports > jewel we don't need to bother with sysv or upstart Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-04-21 15:14:38 -07:00
Andrew Schoen	7ca2bddcce	purge-cluster: do not need to check for running ceph processes Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-04-21 15:12:46 -07:00
Andrew Schoen	aac79df3b3	purge-cluster: no need to remove ceph.target The package uninstalls will stop ceph.target Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-04-21 15:11:03 -07:00
Sébastien Han	dfd8f4d96e	test: add mgr section to the host inventory file Without this, we don't test the mgr role so we need to add it. Co-Authored-by: Guillaume Abrioux <gabrioux@redhat.com> Signed-off-by: Sébastien Han <seb@redhat.com>	2017-04-15 00:16:10 +02:00
Sébastien Han	17ac1fd464	Merge pull request #1443 from WingkaiHo/osds-journal-migrate Migrate osd(s) journal to ssd	2017-04-13 16:45:57 +02:00
WingkaiHo	9fba41b4ce	Migrate osd(s) journal to ssd	2017-04-13 11:05:58 +08:00
Daniel Lupescu	d5e56c481a	purge-cluster: fix grep match for NVMe and HP Smart Array devices raw_device would return invalid block device names for NVMe and HPSA devices which would cause sgdisk partition deletion to fail $ echo /dev/nvme1n1p3 \| egrep -o '/dev/([hsv]d[a-z]{1,2}\|cciss/c[0-9]d[0-9]p\|nvme[0-9]n[0-9]p){1,2}' /dev/nvme1n1p $ echo /dev/cciss/c0d0p2 \| egrep -o '/dev/([hsv]d[a-z]{1,2}\|cciss/c[0-9]d[0-9]p\|nvme[0-9]n[0-9]p){1,2}' /dev/cciss/c0d0p	2017-04-11 16:13:28 +03:00
Sébastien Han	c37aaa41f4	playbook: homogenize the way list osd ids Problem: too many different commands to do the same thing. The 'cut' command on infrastructure-playbooks/purge-cluster.yml was also wrong. This sed command from osixia in ceph-docker https://github.com/ceph/ceph-docker/pull/580/ addresses all the scenarios. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-03-30 11:51:38 +02:00
Sébastien Han	35a90ae283	Merge pull request #1386 from WingkaiHo/master Create recover-osds-after-ssd-journal-failure.yml	2017-03-28 09:50:39 +02:00
Konstantin Shalygin	1662976fc0	Resolve issues when groups names not in default value.	2017-03-27 21:44:30 +07:00
WingkaiHo	ac1498b0d7	Merge https://github.com/ceph/ceph-ansible	2017-03-27 10:50:38 +08:00
WingkaiHo	ebb56ccebf	command module instead shell	2017-03-23 17:38:41 +08:00
WingkaiHo	2d44c1cee6	remove service enable	2017-03-23 15:28:14 +08:00
WingkaiHo	14c189fee5	break it into lines since you already use the string block synta and fix disable it here and enable again in later task	2017-03-23 14:49:10 +08:00
WingkaiHo	62c37042fe	remove this detection and simply rely on {{ cluster }}	2017-03-23 09:22:06 +08:00
WingkaiHo	3d10c5981e	fix some pelling mistakes and wirting format, use full device path for device name	2017-03-22 17:48:34 +08:00
WingkaiHo	1e670bdeb0	This assumes ceph as a cluster name. We need detect the name of the cluster	2017-03-22 10:09:06 +08:00
WingkaiHo	83a1ac0c67	This assumes ceph as a cluster name. We need detect the name of the cluster	2017-03-22 10:06:11 +08:00
WingkaiHo	19f9e200d7	Add auto detect the ceph cluster name	2017-03-22 10:00:44 +08:00
WingkaiHo	8602166f6e	Ansible will include host_vars/ansible_hostname.yml itself, no need this task IMO.	2017-03-21 13:50:27 +08:00
WingkaiHo	55725fd01d	fix some syntax error	2017-03-21 11:19:25 +08:00
WingKai Ho	7445113dc4	Create recover-osds-after-ssd-journal-failure.yml This playbook use to recover Ceph OSDs after ssd journal failure.	2017-03-21 11:08:25 +08:00
Anthony D'Atri	6c4911276e	Enhance clean PG check to catch active+clean+scrubbing and active+clean+scrubbing+deep Signed-off-by: Anthony D'Atri <anthony.datri@gmail.com>	2017-03-19 00:23:26 -07:00
Daniel Marks	77edd3d40a	Fixing tabs that are breaking the syntax check With the merge of PR #1336 the syntax check fails. This commit replaces the tabs with proper indentation.	2017-03-15 14:15:15 +01:00
Sébastien Han	38ab6de602	Merge pull request #1336 from WingkaiHo/master Load a variable file for devices partition	2017-03-15 11:55:26 +01:00
Sébastien Han	8320c14191	Merge pull request #1317 from ibotty/harmonize-docker-names harmonize docker names	2017-03-14 18:20:20 +01:00
Andrew Schoen	e81d690aa0	switch-to-containers: do not include group vars or role defaults Doing so will override any values set for these in the group_vars directory relative to the users inventory. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-03-08 08:57:09 -06:00
Andrew Schoen	cf702b05cf	purge-docker-cluster: do not include role defaults or group vars Doing so at playbook level overrides whatever values might be set for these in the user's group_vars directory that's relative to their inventory. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-03-08 08:57:09 -06:00
Andrew Schoen	aef54d89d9	switch-to-containers: do not set group name vars at playbook level Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-03-08 08:57:09 -06:00

1 2 3 4 5 ...

346 Commits (fe1d09925ae1525e99f22a3eab9ca1823c079bda)