ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Paul Cuzner	2890b57cfc	Add privilege escalation to iscsi purge tasks Without the escalation, invocation from non-root users with fail when accessing the rados config object, or when attempting to log to /var/log Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1549004 Signed-off-by: Paul Cuzner <pcuzner@redhat.com>	2018-05-25 03:50:24 -07:00
Sébastien Han	da5b104098	rolling_update: fix get fsid for containers When running ansible2.4-update_docker_cluster there is an issue on the "get current fsid" task. The current task only works for non-containerized deployment but will run all the time (even for containerized). This currently results in the following error: TASK [get current fsid] ****************************************************** task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-luminous-ansible2.4-update_docker_cluster/rolling_update.yml:214 Tuesday 22 May 2018 22:48:32 +0000 (0:00:02.615) 0:11:01.035 ********* fatal: [mgr0 -> mon0]: FAILED! => { "changed": true, "cmd": [ "ceph", "--cluster", "test", "fsid" ], "delta": "0:05:00.260674", "end": "2018-05-22 22:53:34.555743", "rc": 1, "start": "2018-05-22 22:48:34.295069" } STDERR: 2018-05-22 22:48:34.495651 7f89482c6700 0 -- 192.168.17.10:0/1022712 >> 192.168.17.12:6789/0 pipe(0x7f8944067010 sd=4 :42654 s=1 pgs=0 cs=0 l=1 c=0x7f894405d510).connect protocol feature mismatch, my 83ffffffffffff < peer 481dff8eea4fffb missing 400000000000000 2018-05-22 22:48:34.495684 7f89482c6700 0 -- 192.168.17.10:0/1022712 >> 192.168.17.12:6789/0 pipe(0x7f8944067010 sd=4 :42654 s=1 pgs=0 cs=0 l=1 c=0x7f894405d510).fault This is not really representative on the real error since the 'ceph' cli is available on that machine. On other environments we will have something like "command not found: ceph". Signed-off-by: Sébastien Han <seb@redhat.com>	2018-05-23 04:44:12 +02:00
Guillaume Abrioux	9801bde4d4	purge_cluster: fix dmcrypt purge dmcrypt devices aren't closed properly, therefore, it may fail when trying to redeploy after a purge. Typical errors: ``` ceph-disk: Cannot discover filesystem type: device /dev/sdb1: Command '/sbin/blkid' returned non-zero exit status 2 ``` ``` ceph-disk: Error: unable to read dm-crypt key: /var/lib/ceph/osd-lockbox/c6e01af1-ed8c-4d40-8be7-7fc0b4e104cf: /etc/ceph/dmcrypt-keys/c6e01af1-ed8c-4d40-8be7-7fc0b4e104cf.luks.key ``` Closing properly dmcrypt devices allows to redeploy without error. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1492242 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-05-21 08:23:10 +02:00
Guillaume Abrioux	415dc0a29b	take-over: fix bug when trying to override variable A customer has been facing an issue when trying to override `monitor_interface` in inventory host file. In his use case, all nodes had the same interface for `monitor_interface` name except one. Therefore, they tried to override this variable for that node in the inventory host file but the take-over-existing-cluster playbook was failing when trying to generate the new ceph.conf file because of undefined variable. Typical error: ``` fatal: [srvcto103cnodep01]: FAILED! => {"failed": true, "msg": "'dict object' has no attribute u'ansible_bond0.15'"} ``` Including variables like this `include_vars: group_vars/all.yml` prevent us from overriding anything in inventory host file because it overwrites everything you would have defined in inventory. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1575915 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-05-18 10:10:08 +02:00
Sébastien Han	49a4712485	switch: disable ceph-disk units During the transition from jewel non-container to container old ceph units are disabled. ceph-disk can still remain in some cases and will appear as 'loaded failed', this is not a problem although operators might not like to see these units failing. That's why we remove them if we find them. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1577846 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-05-17 08:48:28 +02:00
Guillaume Abrioux	a9247c4de7	purge_cluster: wipe all partitions In order to ensure there is no leftover after having purged a cluster, we must wipe all partitions properly. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1492242 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-05-17 08:37:17 +02:00
Guillaume Abrioux	9cad113e2f	purge_cluster: fix bug when building device list there is some leftover on devices when purging osds because of a invalid device list construction. typical error: ``` changed: [osd3] => (item=/dev/sda sda1) => { "changed": true, "cmd": "# if the disk passed is a raw device AND the boot system disk\n if parted -s \"/dev/sda sda1\" print \| grep -sq boot; then\n echo \"Looks like /dev/sda sda1 has a boot partition,\"\n echo \"if you want to delete specific partitions point to the partition instead of the raw device\"\n echo \"Do not use your system disk!\"\n exit 1\n fi\n echo sgdisk -Z \"/dev/sda sda1\"\n echo dd if=/dev/zero of=\"/dev/sda sda1\" bs=1M count=200\n echo udevadm settle --timeout=600", "delta": "0:00:00.015188", "end": "2018-05-16 12:41:40.408597", "item": "/dev/sda sda1", "rc": 0, "start": "2018-05-16 12:41:40.393409" } STDOUT: sgdisk -Z /dev/sda sda1 dd if=/dev/zero of=/dev/sda sda1 bs=1M count=200 udevadm settle --timeout=600 STDERR: Error: Could not stat device /dev/sda sda1 - No such file or directory. ``` the devices list in the task `resolve parent device` isn't built properly because the command used to resolve the parent device doesn't return the expected output eg: ``` changed: [osd3] => (item=/dev/sda1) => { "changed": true, "cmd": "echo /dev/$(lsblk -no pkname \"/dev/sda1\")", "delta": "0:00:00.013634", "end": "2018-05-16 12:41:09.068166", "item": "/dev/sda1", "rc": 0, "start": "2018-05-16 12:41:09.054532" } STDOUT: /dev/sda sda1 ``` For instance, it will result with a devices list like: `['/dev/sda sda1', '/dev/sdb', '/dev/sdc sdc1']` where we expect to have: `['/dev/sda', '/dev/sdb', '/dev/sdc']` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1492242 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-05-17 08:37:17 +02:00
Sébastien Han	d80a871a07	rolling_update: move osd flag section During a minor update from a jewel to a higher jewel version (10.2.9 to 10.2.10 for example) osd flags don't get applied because they were done in the mgr section which is skipped in jewel since this daemons does not exist. Moving the set flag section after all the mons have been updated solves that problem. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1548071 Co-authored-by: Tomas Petr <tpetr@redhat.com> Signed-off-by: Sébastien Han <seb@redhat.com>	2018-05-17 08:17:16 +02:00
Guillaume Abrioux	1b4c3f292d	rolling_update: fix dest path for mgr keys fetching the role `ceph-mgr` that is played later in the playbook fails because the destination path for the fetched keys is wrong. This patch fix the destination path used in the task `fetch ceph mgr key(s)` so there is no mismatch. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1574995 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-05-15 19:30:34 +02:00
Guillaume Abrioux	3b89f1bfb1	rolling_update: get fsid in mgr pre_task {{ fsid }} points to {{ cluster_uuid.stdout }} which is not defined in this part of the rolling_update playbook. Since we need to call {{ fsid }} we must get the fsid and register it to `cluster_uuid`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-05-15 09:01:42 +02:00
Sébastien Han	52fc8a0385	rolling_update: move mgr key creation Until all the mons haven't been updated to Luminous, there is no way to create a key. So we should do the key creation in the mon role only if we are not part of an update. If we are then the key creation is done after the mons upgrade to Luminous. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1574995 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-05-15 09:01:42 +02:00
Guillaume Abrioux	adeecc51f8	switch: fix ceph_uid fact for osd In addition to b324c17 this commit fix the ceph uid for osd role in the switch from non containerized to containerized playbook. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-04-30 08:15:18 +02:00
Sébastien Han	5fa92804f9	switch: resolve device path so we can umount the osd data dir If we don't do this, umounting devices declared like this /dev/disk/by-id/ata-QEMU_HARDDISK_QM00001 will fail like: umount: /dev/disk/by-id/ata-QEMU_HARDDISK_QM000011: mountpoint not found Since we append '1' (partition 1), this won't work. So we need to resolved the link to get something like /dev/sdb and then append 1 to /dev/sdb1 Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-04-30 08:15:18 +02:00
Sébastien Han	767abb5de0	switch: fix ceph_uid fact Latest is now centos not ubuntu anymore so the condition was wrong. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-30 08:15:18 +02:00
Sébastien Han	85732d11b9	mon/client: remove acl code Applying ACL on the keyrings is not used anymore so let's remove this code. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-23 18:34:58 +02:00
Sébastien Han	66c1ea8cd5	shrink-osd: ability to shrink NVMe drives Now if the service name contains nvme we know we need to remove the last 2 character instead of 1. If nvme then osd_to_kill_disks is nvme0n1, we need nvme0 If ssd or hdd then osd_to_kill_disks is sda1, we need sda Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1561456 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-20 15:08:29 +02:00
Sébastien Han	641f141c0f	selinux: remove chcon calls We know bindmount with the :z option at the end of the -v command so this will basically run the exact same command as we used to run. So to speak: chcon -Rt svirt_sandbox_file_t /var/lib/ceph Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-19 14:59:37 +02:00
Sébastien Han	473939d215	infra: add playbook example for ceph_key module Helper playbook to manage CephX keys. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-11 12:18:34 +02:00
Andrew Schoen	08f4875533	ceph_volume: refactor to not run ceph osd destroy This changes state to action and gives the options 'create' or 'zap'. The zap parameter is also removed. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-04-10 14:19:21 +02:00
Andrew Schoen	c6e8f8fb11	purge-cluster: no need to use objectstore for ceph_volume module When zapping objectstore is not required. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-04-10 14:19:21 +02:00
Andrew Schoen	c29a75ac7f	purge-cluster: use ceph_volume module to zap and destroy OSDs Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-04-10 14:19:21 +02:00
Randy J. Martinez	d1f2d64b15	purge-docker: added conditionals needed to successfully re-run purge Added 'ignore_errors: true' to multiple lines which run docker commands; even in cases where docker is no longer installed. Because of this, certain tasks in the purge-docker-cluster.yml will cause the playbook to fail if re-run and stop the purge. This leaves behind a dirty environment, and a playbook which can no longer be run. Fix Regex line 275: Sometimes 'list-units' will output 4 spaces between loaded+active. The update will account for both scenarios. purge fetch_directory: in other roles fetch_directory is hard linked ex.: "{{ fetch_directory }}"/"{{ somedir }}". That being said, fetch_directory will never have a trailing slash in the all.yml so this task was never being run(causing failures when trying to re-deploy). Signed-off-by: Randy J. Martinez <ramartin@redhat.com>	2018-04-10 13:39:14 +02:00
Guillaume Abrioux	e32a177af8	purge-docker: remove redundant task The `remove_packages` prompt is redundant to the `ireallymeanit` prompt since it does exactly the same thing. I guess the only goal of this task was to make a break to warn user about `--skip-tags=with_pkg` feature. This warning should be part of the first prompt. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-04-03 11:54:42 +02:00
Andy McCrae	60d4b75f51	Cleanup plugins directories and references Having callback_plugins, and action plugins in random locations causes a lot of disparity. We should centralize this into one place in the plugins directory and fix up the ansible.cfg to reflect this. Additionally, since the ansible.cfg already reflects action_plugins, we don't need a link to action_plugins in the base of the repository.	2018-03-14 11:15:39 +01:00
jtudelag	691f7c5146	Adds handy ceph aliases whe containerized installations. Same approach as openshift-ansible etcdctl: * https://github.com/openshift/openshift-ansible/blob/release-3.7/roles/etcd/tasks/auxiliary/drop_etcdctl.yml * https://github.com/openshift/openshift-ansible/blob/release-3.7/roles/etcd/etcdctl.sh	2018-03-08 13:56:39 +01:00
Guillaume Abrioux	c04e67347c	update: look for short and fqdn in ceph_health_raw According to hostname configuration, the task waiting for mons to be in quorum might fail. The idea here is to look for both shortname and fqdn in `ceph_health_raw` instead of just `ansible_hostname` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1546127 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-02-19 10:27:47 +01:00
Andrew Schoen	699c777e68	rolling update: fix undefined jewel_minor_update failure Variables set at the play level with ``vars`` do not carry over into the next play in the playbook. The var jewel_minor_update was set in a previous play but used in this one and was failing because it was not defined. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1544029 Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-02-13 17:03:05 +01:00
Andrew Schoen	7c7017ebe6	infra: do not include host_vars/* in take-over-existing-cluster.yml These are better collected by ansible automatically. This would also fail if the host_var file didn't exist. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-02-12 11:48:47 +01:00
Guillaume Abrioux	3b2f6c34e4	purge-docker: fix ceph-osd-zap name container the `zap ceph osd disks` task should iter on `resolved_parent_device` instead of `combined_devices_list` which contain only the base device name (vs. full path name in `combined_devices_list`). this fixes the issue where docker complain about container name because of illegal characters such as `/` : ``` "/usr/bin/docker-current: Error response from daemon: Invalid container name (ceph-osd-zap-magna074-/dev/sdb1), only [a-zA-Z0-9][a-zA-Z0-9_.-] are allowed.","See '/usr/bin/docker-current run --help'." "" ``` having the the basename of the device path is enough for the container name. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1540137 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-02-02 22:09:11 +01:00
Guillaume Abrioux	dd0c98c5a2	common: do not use `shell` module when it is not needed There is no need here to use `shell` instead of `command` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-01-31 10:45:34 +01:00
Guillaume Abrioux	deaf273b25	syntax: change local_action syntax Use a nicer syntax for `local_action` tasks. We used to have oneliner like this: ``` local_action: wait_for port=22 host={{ hostvars[inventory_hostname]['ansible_default_ipv4']['address'] }} state=started delay=10 timeout=500 }} ``` The usual syntax: ``` local_action: module: wait_for port: 22 host: "{{ hostvars[inventory_hostname]['ansible_default_ipv4']['address'] }}" state: started delay: 10 timeout: 500 ``` is nicer and kind of way to keep consistency regarding the whole playbook. This also fix a potential issue about missing quotation : ``` Traceback (most recent call last): File "/tmp/ansible_wQtWsi/ansible_module_command.py", line 213, in <module> main() File "/tmp/ansible_wQtWsi/ansible_module_command.py", line 185, in main rc, out, err = module.run_command(args, executable=executable, use_unsafe_shell=shell, encoding=None, data=stdin) File "/tmp/ansible_wQtWsi/ansible_modlib.zip/ansible/module_utils/basic.py", line 2710, in run_command File "/usr/lib64/python2.7/shlex.py", line 279, in split return list(lex) File "/usr/lib64/python2.7/shlex.py", line 269, in next token = self.get_token() File "/usr/lib64/python2.7/shlex.py", line 96, in get_token raw = self.read_token() File "/usr/lib64/python2.7/shlex.py", line 172, in read_token raise ValueError, "No closing quotation" ValueError: No closing quotation ``` writing `local_action: shell echo {{ fsid }} \| tee {{ fetch_directory }}/ceph_cluster_uuid.conf` can cause trouble because it's complaining with missing quotes, this fix solves this issue. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1510555 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-01-31 10:45:34 +01:00
Guillaume Abrioux	f372a4232e	purge: fix resolve parent device task This is a typo caused by leftover. It was previously written like this : `shell: echo /dev/$(lsblk -no pkname "{{ item }}") }}")` and has been rewritten to : `shell: $(lsblk --nodeps -no pkname "{{ item }}") }}")` because we are appending later the '/dev/' in the next task. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1540137 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-01-30 17:40:10 +01:00
Guillaume Abrioux	c7ec12d49c	upgrade: skip luminous tasks for jewel minor update These tasks are needed only when upgrading to luminous. They are not needed in Jewel minor upgrade and by the way, they fail because `ceph versions` command doesn't exist. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-01-25 18:30:34 +01:00
Sébastien Han	8af7459476	rolling update: add mgr exception for jewel minor updates When update from a minor Jewel version to another, the playbook will fail on the task "fail if no mgr host is present in the inventory". This now can be worked around by running Ansible with_items -e jewel_minor_update=true Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1535382 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-01-18 14:06:05 +01:00
Guillaume Abrioux	55298fa80c	purge-container: use lsblk to resolv parent device Using `lsblk` to resolv the parent device is better than just removing the last char when passing it to the zap container. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-01-17 15:54:20 +01:00
Guillaume Abrioux	58eb045d2f	purge-container: remove awk usage in favor of blkid Avoid using `awk` to get the different devices from the partlabel. Using `blkid` is more readable. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-01-17 15:54:20 +01:00
Andrew Schoen	b613321c21	switch-to-containers: do not fail when stopping the nfs-ganesha service If we're working with a jewel cluster then this service will not exist. This is mainly a problem with CI testing because our tests are setup to work with both jewel and luminous, meaning that eventhough we want to test jewel we still have a nfs-ganesha host in the test causing these tasks to run. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-01-06 14:07:55 +01:00
Andrew Schoen	0b4b60e3c9	switch-to-containers: do not fail when stopping the ceph-mgr daemon If we are working with a jewel cluster ceph mgr does not exist and this makes the playbook fail. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-01-06 14:07:55 +01:00
Andrew Schoen	997edea271	rolling_update: do not fail the playbook if nfs-ganesha is not present The rolling update playbook was attempting to stop the nfs-ganesha service on nodes where jewel is still installed. The nfs-ganesha service did not exist in jewel so the task fails. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-01-06 14:07:55 +01:00
Guillaume Abrioux	c5b7b37105	purge-cluster: clean some code Avoid using regexp to match device Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-12-20 17:42:45 +01:00
Guillaume Abrioux	eeedefdf02	purge-cluster: wipe disk using dd `bluestore_purge_osd_non_container` scenario is failing because it keeps old osd_uuid information on devices and cause the `ceph-disk activate` to fail when trying to redeploy a new cluster after a purge. typical error seen : ``` 2017-12-13 14:29:48.021288 7f6620651d00 -1 bluestore(/var/lib/ceph/tmp/mnt.2_3gh6/block) _check_or_set_bdev_label bdev /var/lib/ceph/tmp/mnt.2_3gh6/block fsid 770080e2-20db-450f-bc17-81b55f167982 does not match our fsid f33efff0-2f07-4203-ad8d-8a0844d6bda0 ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-12-20 17:42:45 +01:00
Sébastien Han	200785832f	rolling_update: do not require root to answer question There is no need to ask for root on the local action. This will prompt for a password the current user is not part of sudoers. That's unnecessary anyways. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1516947 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-12-19 14:04:55 +01:00
Guillaume Abrioux	aaaf980140	purge: fix bug on 'wait_for' task this task hangs because `{{ inventory_hostname }}` doesn't resolv to an actual ip address. Using `hostvars[inventory_hostname]['ansible_default_ipv4']['address']` should fix this because it will reach the node with its actual IP address. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-11-29 11:10:56 +01:00
Guillaume Abrioux	947766e294	purge-cluster: remove usage of `with_fileglob` `with_fileglob` loops over files on the machine where ansible-playbook is being run. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-11-21 08:24:11 +01:00
Guillaume Abrioux	d9c1b61092	purge-docker: remove osd disk prepare logs `with_fileglob` loops over files on the machine that runs the playbook. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-11-16 14:27:36 +01:00
Sébastien Han	68566444e9	Merge pull request #2142 from squidboylan/master infra: fix take-over-existing-cluster.yml playbook	2017-11-13 22:06:16 +11:00
Guillaume Abrioux	fa675f2ead	purge-docker-cluster: ensure old logs are removed purge-docker-cluster must remove all osd_disk_prepare logs in `{{ ceph_osd_docker_run_script_path }}`, otherwise if you purge your cluster and try to redeploy it, osds will fail to start since because it will try to retrieve find a partition uuid which doesn't exist. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1510470 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-11-09 17:49:20 +01:00
Caleb Boylan	41d10a2f64	infra: fix take-over-existing-cluster.yml playbook The ansible inventory could have more than just ceph-ansible hosts, so we shouldnt use "hosts: all", also only grab one file when getting the ceph cluster name instead of failing when there is more than one file in /etc/ceph. Also fix location of the ceph.conf template	2017-11-06 15:00:30 -08:00
Sébastien Han	473673ab41	shrink-mon: fix typo in the code doc Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-27 11:59:22 +02:00
Sébastien Han	2837d0a22e	purge: do not reboot by default Rebooting servers is really intrusive and perhaps this is not what the operator wants. So we disable the reboot by default now. Note that the reboot might not happen all the time. It can be enabled by default by running the purge playbook with -e reboot_osd_node=True Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1505011 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-26 14:18:38 +02:00
Guillaume Abrioux	f90f2f3a04	purge: containers are not stopped During purge osd, the containers are not stopped because of a typo, as a result, all the devices can't be unmounted later. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-10-25 07:58:00 +02:00
Sébastien Han	4413511b66	all: backward compatibility between stable-2.2 and 3.0 stable-3.0 brought numerous changes in ceph-ansible variables, this PR aims to maintain backward compatibility for someone running stable-2.2 upgrading to stable-3.0 but keeps its groups_vars untouched. We will then determine the right options to make sure the upgrade works but we are expecting that new variables should be used. We will drop this in a near future, maybe 3.1 or 3.2. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-20 11:54:10 +02:00
Guillaume Abrioux	982326373b	upgrade: fix upgrade jewel to luminous for nfs nodes nfs nodes can't be upgraded from jewel to luminous because ceph-nfs role is skipped because of the condition `when: "ceph_release_num[ceph_release] >= ceph_release_num.luminous"`. Indeed, package is upgraded in `ceph-nfs` role, therefore, `ceph_release` is still set to the old version. It means the when can't be satisfied. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-10-19 20:54:23 +02:00
Guillaume Abrioux	70034451e9	upgrade: fix upgrade jewel to luminous for mgr nodes mgr nodes can't be upgraded from jewel to luminous because ceph-mgr role is skipped because of the condition `when: "ceph_release_num[ceph_release] >= ceph_release_num.luminous"`. Indeed, ceph-mgr package is upgraded in `ceph-mgr` role, therefore, `ceph_release` is still set to the old version. It means the when can't be satisfied. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit 302e563601cd6820b1ae44fabdfb1506688c7c9b)	2017-10-19 20:54:23 +02:00
Sébastien Han	d920d4839d	upgrade: support for rbd mirror and nfs - Add upgrade support for rbd mirror and nfs daemons. - Only works with systemd (remove sysvinit and upstart occurence) - A bit of cleanup Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-17 10:54:47 +02:00
Sébastien Han	39bf102b64	switch: nicer way to check mon quorum re-use the same syntax as rolling_udate.yml Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-17 10:54:36 +02:00
Sébastien Han	b685aceede	Merge pull request #2044 from major/avoid-jinja-in-when Remove jinja2 delimiters from `when` keys	2017-10-12 22:23:06 +02:00
Major Hayden	c01851325e	Remove jinja2 delimiters from `when` keys This patch changes the `when:` keys so that they have no jinja2 delimiters. This avoids Ansible warnings which could turn into errors in a future Ansible release.	2017-10-12 11:27:42 -05:00
Major Hayden	33b200d43a	Suppress yum/dnf/rpm command warnings Ansible throws warnings when using yum/dnf/rpm with the command module: [WARNING]: Consider using yum module rather than running yum This patch adds the `warn: no` argument to suppress the warnings in the Ansible output.	2017-10-12 08:38:05 -05:00
Sébastien Han	13bce287ad	infra: replace osd playbook This playbook can replace failed OSD in containerized and non-containerized env. The current limitation is that it won't allow you to choose between filestore/bluestore and will do collocation as well. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-12 11:53:30 +02:00
Sébastien Han	85e13a864c	purge-iscsi: fix group name Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1500281 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-11 12:52:12 +02:00
Sébastien Han	24b82c2679	purge: fix journal purge Using a condition when osd_scenario == 'non-collocated' was wrong since these partitions can be collocated on a single device also. Removing the check makes the purge of these partitions. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1499871 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-10 09:57:39 +02:00
Guillaume Abrioux	f147b119ed	Merge pull request #2014 from ceph/fixes-2 infra: use the pg check in the right place	2017-10-09 20:14:06 +02:00
Sébastien Han	450108fab9	infra: add independant purge-iscsi-gateways.yml The current inclusion of purge-iscsi-gateways.yml in purge-cluster.yml is not working well and blocking the CI too. So removing it from purge-cluster.yml and re-add the original purge-iscsi-gateways.yml. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-09 17:25:44 +02:00
Sébastien Han	774697ebd8	infra: use the pg check in the right place Use the pg check before doing the pg check, not on the quorum check. Also never quote int when doing comparaison. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-09 17:25:41 +02:00
Sébastien Han	a3e7bcb13f	Merge pull request #2013 from ceph/wip-purge-cluster A couple of purge cluster fixes	2017-10-09 17:18:30 +02:00
Sébastien Han	33a3aa0dda	switch: check pgs only when num_pgs > 0 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-07 03:42:09 +02:00
Sébastien Han	05f26031ea	rolling_update: perform pg check when pgs_num > 0 If num_pgs = 0 the check will never return 0. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-07 03:39:09 +02:00
Sébastien Han	c3c63ae539	switch: rework and fix clean pg wait Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-07 03:39:09 +02:00
Sébastien Han	c693e95cbf	purge-docker: rework device detection we don't need "devices" and other device variable anymore, the playbook detects that for us. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-07 03:39:04 +02:00
Sébastien Han	2fb4981ca9	shrink-osd: admin key not needed for container shrink Also do some clean Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-07 00:20:43 +02:00
Boris Ranto	64e272d818	purge-cluster: Do not use shell for rm The shell wildcard expansion of non-existing paths fails on zsh making the whole script fail. We can use file module with with_fileglob to alleviate the problem instead. Signed-off-by: Boris Ranto <branto@redhat.com>	2017-10-06 22:54:37 +02:00
Boris Ranto	f696cb7637	purge-cluster: Do not fail on systemd commands The systemd can't stop services if the unit files were removed before the cluster was purged. We should just ignore these. Signed-off-by: Boris Ranto <branto@redhat.com>	2017-10-06 22:52:56 +02:00
Sébastien Han	b6b24a5ca9	iscsi: fix wrong group name for iscsi Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1498490 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-05 17:25:32 +02:00
Sébastien Han	f37e014a65	Merge pull request #1974 from ceph/mgr-upgrade-luminous upgrade: a support for mgrs	2017-10-03 19:57:31 +02:00
Sébastien Han	99466e79a1	upgrade: a support for mgrs Also we now play ceph-config to have everything being generated for new daemons bootstrap during upgrade. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1497959 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-03 16:57:31 +02:00
Sébastien Han	3bd341f6c0	osd: container use id instead of dev name Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1494127 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-03 14:44:00 +02:00
Sébastien Han	3c2c31a591	Merge pull request #1964 from vatelzh/master purge-cluster: delete block partitions if using bluestore	2017-10-02 12:10:26 +02:00
Sébastien Han	b9050d6229	update: fix var register Even if the task is skipped, ansible registers the var as 'skipped' so this task the task using this variable for its next usage. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-29 14:27:55 +02:00
zhangwentao	86a6db0d58	purge-cluster: delete block partitions if using bluestore	2017-09-29 14:04:17 +08:00
Sébastien Han	a0a5b174ba	rolling_update: clarify mon quorum command Cleaner. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-29 01:19:46 +02:00
Sébastien Han	bd5471b940	update: complete luminous upgrade Once we complete the upgrade to Luminous, we must issue a specific command. For more info read: http://ceph.com/community/new-luminous-upgrade-complete/ Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-28 21:05:00 +02:00
Sébastien Han	68f1f99ee9	update: nicer way to wait for clean pgs More comprhensive and friendly to read. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-28 14:46:26 +02:00
Andrew Schoen	fccc604f4a	purge-cluster: default lvm_volumes if not defined Most osd scenarios do not use lvm_volumes, so default it in purge-cluster.yml if it's not defined. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-09-26 15:14:29 -05:00
Guillaume Abrioux	fcb6454e04	rbd-mirror: fix systemd unit in purge-docker rbd-mirror containers are not stopped in purge-docker-cluster playbook because of the wrong name used. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-09-24 21:18:50 +02:00
Guillaume Abrioux	c80ba7a307	purge: implement mgr purge unti now, mgr nodes are not managed by purge-cluster.yml, therefore it breaks scenario like purge_cluster. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-09-24 21:18:50 +02:00
Guillaume Abrioux	7195b08718	update: update rgw systemd unit name The old name is used in `rolling_update.yml` and `purge-docker-cluster.yml`, it breaks the `test_rgw_service_is_running()` test. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-09-24 14:58:55 +02:00
Sébastien Han	6bac613611	shrink: support for container We can now shrink mon and osds on containerized deployment. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1492115 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-20 16:25:07 +02:00
Sébastien Han	7fedc8ebf4	Merge pull request #1891 from ceph/clarify-update rolling_update: clarify update doc	2017-09-15 07:08:49 -06:00
Sébastien Han	fe1d84d395	Merge pull request #1892 from ceph/purge-dmcrypt-col purge: only purge specific directories for mon	2017-09-13 17:57:06 -06:00
Sébastien Han	ba3e3b6cc7	purge: only purge specific directories for mon Handles the case when a mon is collocated with an OSD. Closes: https://github.com/ceph/ceph-ansible/issues/1877 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-13 17:07:04 -06:00
Sébastien Han	82c4848ec4	Merge pull request #1885 from ceph/shrink-osd shrink-osd: fix when multiple osds	2017-09-13 16:12:49 -06:00
Sébastien Han	92f9be963b	rolling_update: clarify update doc Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1490188 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-13 15:46:29 -06:00
Sébastien Han	3031e51778	shrink-osd: fix when multiple osds The loop was being built properly so we were always getting the last item as osd host. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1490355 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-13 15:20:11 -06:00
Sébastien Han	aa364264cd	resync ceph-iscsi-gw with old upstream Taken from https://github.com/pcuzner/ceph-iscsi-ansible/tree/tcmu-fixes Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1454945 and https://bugzilla.redhat.com/show_bug.cgi?id=1484083 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-12 18:06:10 -06:00
Sébastien Han	477f86e305	switch to container: fix ceph nfs The service is nfs-ganesha where ceph-nfs@{{ ansible_hostname }} will be the name of the container. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-08 22:43:50 +02:00
Sébastien Han	fdacac9fa0	switch: make osd collection idempotent This commits allows us to run switch-from-non-containerized-to-containerized-ceph-daemons.yml multiple times. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1489353 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-08 11:31:47 +02:00
Sébastien Han	e46440e19c	switch-from-non-containerized-to-containerized: fix devices If devices is passed through an extra var this register won't work so let's only register the var is devices is not defined. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1489099 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-07 23:18:14 +02:00
Sébastien Han	b9ced956d7	purge: get lockbox mountpoint and unmount it Prior command was avoiding the lockbox mountpoint and the playbook was failing with: rmtree failed: [Errno 30] Read-only file system: '/var/lib/ceph/osd-lockbox/4e9d8052-87c2-4fde-a56c-b8c108a3eefc/key-management-mode' Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-07 16:31:31 +02:00
Guillaume Abrioux	d987d26719	tests: force docker variable for switch-to-containers scenario we need to force the value of `docker` variable which is initially set to `false` since it's a migration from non-containerized to containerized cluster. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-09-06 18:03:52 +02:00

1 2 3 4 5 ...

322 Commits (680574ed4c86018387619cc108302759738f963b)