ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Guillaume Abrioux	b72d806f4c	mgr: fix mgr keyring error on rolling_update when upgrading from RHCS 2.5 to 3.2, it fails because the task `create ceph mgr keyring(s) when mon is containerized` has a when condition `inventory_hostname == groups[mon_group_name]\|last`. First, this is incorrect because `inventory_hostname` is referring to a mgr node, it means this condition would have never been satisfied. Then, this condition + `serial: 1` makes the mgr keyring creating skipped on the first node. Further, the `ceph-mgr` role tries to copy the mgr keyring (it's not aware we are running `serial: 1`) this leads to a failure like the following: ``` TASK [ceph-mgr : copy ceph keyring(s) if needed] ************************************************************************************************************************************************************************************************************************************************************************* task path: /usr/share/ceph-ansible/roles/ceph-mgr/tasks/common.yml:10 Tuesday 27 November 2018 12:03:34 +0000 (0:00:00.296) 0:11:01.290 **** An exception occurred during task execution. To see the full traceback, use -vvv. The error was: AnsibleFileNotFound: Could not find or access '~/ceph-ansible-keys/48d78ac1-e0d6-4e35-ab3e-772aea7828fc//etc/ceph/local.mgr.magna021.keyring' failed: [magna021] (item={u'dest': u'/var/lib/ceph/mgr/local-magna021/keyring', u'name': u'/etc/ceph/local.mgr.magna021.keyring', u'copy_key': True}) => {"changed": false, "item": {"copy_key": true, "dest": "/var/lib/ceph/mgr/local-magna021/keyring", "name": "/etc/ceph/local.mgr.magna021.keyring"}, "msg": "Could not find or access '~/ceph-ansible-keys/48d78ac1-e0d6-4e35-ab3e-772aea7828fc//etc/ceph/local.mgr.magna021.keyring'"} ``` The ceph_key module is idempotent, so there is no need to have such a condition. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1649957 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `73287f91bc`)	2018-11-28 23:11:46 +01:00
Sébastien Han	44d0da0dd4	rolling_update: fix upgrade when using fqdn CLusters that were deployed using 'mon_use_fqdn' have a different unit name, so during the upgrade this must be used otherwise the upgrade will fail, looking for a unit that does not exist. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1597516 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-10-19 13:06:56 +00:00
Guillaume Abrioux	40b7747af7	remove jewel support As of now, we should no longer support Jewel in ceph-ansible. The latest ceph-ansible release supporting Jewel is `stable-3.1`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-12 23:38:17 +00:00
Noah Watkins	306e308f13	Avoid using tests as filter Fixes the deprecation warning: [DEPRECATION WARNING]: Using tests as filters is deprecated. Instead of using `result\|search` use `result is search`. Signed-off-by: Noah Watkins <nwatkins@redhat.com>	2018-10-10 04:26:33 +00:00
Guillaume Abrioux	79bd06ad28	rolling_update: add ceph-handler role since the introduction of ceph-handler, it has to be added in rolling_update playbook as well Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-05 13:48:04 +00:00
Guillaume Abrioux	179c4d00d7	rolling_update: ensure pgs_by_state has at least 1 entry Previous commit `c13a3c3` has removed a condition. This commit brings back this condition which is essential to ensure we won't hit a false positive result in the `when` condition for the check PGs task. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-09-25 14:58:54 +00:00
Guillaume Abrioux	c13a3c3492	upgrade: consider all 'active+clean' states as valid pgs In cluster with a large number of PGs, it can be expected some of them scrubbing, it's a normal operation. Preventing from scrubbing operation force to set noscrub flag before a rolling update which is a problem because it pauses an important data integrity operation until the end of the rolling upgrade. This commit allows an upgrade even while PGs are scrubbing. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1616066 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-09-25 12:12:06 +00:00
Sébastien Han	2e6e885bb7	rolling_upgrade: set sortbitwise properly Running 'osd set sortbitwise' when we detect a version 12 of Ceph is wrong. When OSD are getting updated, even though the package is updated they won't send their updated version (12) and will stick with 10 if the command is not applied. So we have to check if OSD are sending a version 10 and then run the command to unlock the OSDs. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1600943 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-08-21 12:22:32 +00:00
Sébastien Han	77a3a682f3	iscsi group name preserve backward compatibility Recently we renamed the group_name for iscsi iscsigws where previously it was named iscsi-gws. Existing deployments with a host file section with iscsi-gws must continue to work. This commit adds the old group name as a backoward compatility, no error from Ansible should be expected, if the hostgroup is not found nothing is played. Close: https://bugzilla.redhat.com/show_bug.cgi?id=1619167 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-08-20 23:52:19 +02:00
Sébastien Han	dad10e8f3f	rolling_update: register container osd units Before running the upgrade, let's call systemd to collect unit names instead of relaying on the device list. This is more accurate and fix the osd_auto_discovery scenario too. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1613626 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-08-16 11:13:12 +02:00
Sébastien Han	b3266c5be2	rolling_update: set osd sortbitwise upgrade RHCS 2 -> RHCS 3 will fail if cluster has still set sortnibblewise, it stay stuck on "TASK [waiting for clean pgs...]" as RHCS 3 osds will not start if nibblewise is set. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1600943 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-07-24 17:19:02 +02:00
Vishal Kanaujia	44d514850a	Rolling upgrades: Migrate to ceph-key module This change moves ceph-mgr upgrades to using ceph-key library. Fixes: #2758 Signed-off-by: Vishal Kanaujia <vishal.kanaujia@flipkart.com>	2018-07-03 18:22:14 +02:00
Sébastien Han	20c8065e48	ceph-iscsi: rename group iscsi_gws Let's try to avoid using dashes as testinfra needs to be able to read the groups. Typically, with iscsi-gws we can't add a marker for these iscsi nodes, using an underscore fixes the issue. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-06-08 10:21:54 +02:00
Guillaume Abrioux	232a16d77f	rolling_update: fix facts gathering delegation this is kind of follow up on what has been made in #2560. See #2560 and #2553 for details. Closes: #2708 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-06-06 16:36:30 +08:00
Vishal Kanaujia	08d9432454	Rolling upgrades should use norebalance flag for OSDs The rolling upgrades playbook should have norebalance flag set for OSDs upgrades to wait only for recovery. Fixes: #2657 Signed-off-by: Vishal Kanaujia <vishal.kanaujia@flipkart.com>	2018-06-04 10:59:01 +02:00
Sébastien Han	e91648a7af	rolling_update: add role ceph-iscsi-gw Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1575829 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-05-26 02:38:47 -07:00
Sébastien Han	da5b104098	rolling_update: fix get fsid for containers When running ansible2.4-update_docker_cluster there is an issue on the "get current fsid" task. The current task only works for non-containerized deployment but will run all the time (even for containerized). This currently results in the following error: TASK [get current fsid] ****************************************************** task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-luminous-ansible2.4-update_docker_cluster/rolling_update.yml:214 Tuesday 22 May 2018 22:48:32 +0000 (0:00:02.615) 0:11:01.035 ********* fatal: [mgr0 -> mon0]: FAILED! => { "changed": true, "cmd": [ "ceph", "--cluster", "test", "fsid" ], "delta": "0:05:00.260674", "end": "2018-05-22 22:53:34.555743", "rc": 1, "start": "2018-05-22 22:48:34.295069" } STDERR: 2018-05-22 22:48:34.495651 7f89482c6700 0 -- 192.168.17.10:0/1022712 >> 192.168.17.12:6789/0 pipe(0x7f8944067010 sd=4 :42654 s=1 pgs=0 cs=0 l=1 c=0x7f894405d510).connect protocol feature mismatch, my 83ffffffffffff < peer 481dff8eea4fffb missing 400000000000000 2018-05-22 22:48:34.495684 7f89482c6700 0 -- 192.168.17.10:0/1022712 >> 192.168.17.12:6789/0 pipe(0x7f8944067010 sd=4 :42654 s=1 pgs=0 cs=0 l=1 c=0x7f894405d510).fault This is not really representative on the real error since the 'ceph' cli is available on that machine. On other environments we will have something like "command not found: ceph". Signed-off-by: Sébastien Han <seb@redhat.com>	2018-05-23 04:44:12 +02:00
Sébastien Han	d80a871a07	rolling_update: move osd flag section During a minor update from a jewel to a higher jewel version (10.2.9 to 10.2.10 for example) osd flags don't get applied because they were done in the mgr section which is skipped in jewel since this daemons does not exist. Moving the set flag section after all the mons have been updated solves that problem. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1548071 Co-authored-by: Tomas Petr <tpetr@redhat.com> Signed-off-by: Sébastien Han <seb@redhat.com>	2018-05-17 08:17:16 +02:00
Guillaume Abrioux	1b4c3f292d	rolling_update: fix dest path for mgr keys fetching the role `ceph-mgr` that is played later in the playbook fails because the destination path for the fetched keys is wrong. This patch fix the destination path used in the task `fetch ceph mgr key(s)` so there is no mismatch. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1574995 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-05-15 19:30:34 +02:00
Guillaume Abrioux	3b89f1bfb1	rolling_update: get fsid in mgr pre_task {{ fsid }} points to {{ cluster_uuid.stdout }} which is not defined in this part of the rolling_update playbook. Since we need to call {{ fsid }} we must get the fsid and register it to `cluster_uuid`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-05-15 09:01:42 +02:00
Sébastien Han	52fc8a0385	rolling_update: move mgr key creation Until all the mons haven't been updated to Luminous, there is no way to create a key. So we should do the key creation in the mon role only if we are not part of an update. If we are then the key creation is done after the mons upgrade to Luminous. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1574995 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-05-15 09:01:42 +02:00
Guillaume Abrioux	c04e67347c	update: look for short and fqdn in ceph_health_raw According to hostname configuration, the task waiting for mons to be in quorum might fail. The idea here is to look for both shortname and fqdn in `ceph_health_raw` instead of just `ansible_hostname` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1546127 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-02-19 10:27:47 +01:00
Andrew Schoen	699c777e68	rolling update: fix undefined jewel_minor_update failure Variables set at the play level with ``vars`` do not carry over into the next play in the playbook. The var jewel_minor_update was set in a previous play but used in this one and was failing because it was not defined. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1544029 Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-02-13 17:03:05 +01:00
Guillaume Abrioux	c7ec12d49c	upgrade: skip luminous tasks for jewel minor update These tasks are needed only when upgrading to luminous. They are not needed in Jewel minor upgrade and by the way, they fail because `ceph versions` command doesn't exist. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-01-25 18:30:34 +01:00
Sébastien Han	8af7459476	rolling update: add mgr exception for jewel minor updates When update from a minor Jewel version to another, the playbook will fail on the task "fail if no mgr host is present in the inventory". This now can be worked around by running Ansible with_items -e jewel_minor_update=true Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1535382 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-01-18 14:06:05 +01:00
Andrew Schoen	997edea271	rolling_update: do not fail the playbook if nfs-ganesha is not present The rolling update playbook was attempting to stop the nfs-ganesha service on nodes where jewel is still installed. The nfs-ganesha service did not exist in jewel so the task fails. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-01-06 14:07:55 +01:00
Sébastien Han	200785832f	rolling_update: do not require root to answer question There is no need to ask for root on the local action. This will prompt for a password the current user is not part of sudoers. That's unnecessary anyways. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1516947 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-12-19 14:04:55 +01:00
Sébastien Han	4413511b66	all: backward compatibility between stable-2.2 and 3.0 stable-3.0 brought numerous changes in ceph-ansible variables, this PR aims to maintain backward compatibility for someone running stable-2.2 upgrading to stable-3.0 but keeps its groups_vars untouched. We will then determine the right options to make sure the upgrade works but we are expecting that new variables should be used. We will drop this in a near future, maybe 3.1 or 3.2. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-20 11:54:10 +02:00
Guillaume Abrioux	982326373b	upgrade: fix upgrade jewel to luminous for nfs nodes nfs nodes can't be upgraded from jewel to luminous because ceph-nfs role is skipped because of the condition `when: "ceph_release_num[ceph_release] >= ceph_release_num.luminous"`. Indeed, package is upgraded in `ceph-nfs` role, therefore, `ceph_release` is still set to the old version. It means the when can't be satisfied. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-10-19 20:54:23 +02:00
Guillaume Abrioux	70034451e9	upgrade: fix upgrade jewel to luminous for mgr nodes mgr nodes can't be upgraded from jewel to luminous because ceph-mgr role is skipped because of the condition `when: "ceph_release_num[ceph_release] >= ceph_release_num.luminous"`. Indeed, ceph-mgr package is upgraded in `ceph-mgr` role, therefore, `ceph_release` is still set to the old version. It means the when can't be satisfied. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit 302e563601cd6820b1ae44fabdfb1506688c7c9b)	2017-10-19 20:54:23 +02:00
Sébastien Han	d920d4839d	upgrade: support for rbd mirror and nfs - Add upgrade support for rbd mirror and nfs daemons. - Only works with systemd (remove sysvinit and upstart occurence) - A bit of cleanup Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-17 10:54:47 +02:00
Major Hayden	c01851325e	Remove jinja2 delimiters from `when` keys This patch changes the `when:` keys so that they have no jinja2 delimiters. This avoids Ansible warnings which could turn into errors in a future Ansible release.	2017-10-12 11:27:42 -05:00
Sébastien Han	774697ebd8	infra: use the pg check in the right place Use the pg check before doing the pg check, not on the quorum check. Also never quote int when doing comparaison. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-09 17:25:41 +02:00
Sébastien Han	05f26031ea	rolling_update: perform pg check when pgs_num > 0 If num_pgs = 0 the check will never return 0. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-07 03:39:09 +02:00
Sébastien Han	99466e79a1	upgrade: a support for mgrs Also we now play ceph-config to have everything being generated for new daemons bootstrap during upgrade. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1497959 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-03 16:57:31 +02:00
Sébastien Han	b9050d6229	update: fix var register Even if the task is skipped, ansible registers the var as 'skipped' so this task the task using this variable for its next usage. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-29 14:27:55 +02:00
Sébastien Han	a0a5b174ba	rolling_update: clarify mon quorum command Cleaner. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-29 01:19:46 +02:00
Sébastien Han	bd5471b940	update: complete luminous upgrade Once we complete the upgrade to Luminous, we must issue a specific command. For more info read: http://ceph.com/community/new-luminous-upgrade-complete/ Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-28 21:05:00 +02:00
Sébastien Han	68f1f99ee9	update: nicer way to wait for clean pgs More comprhensive and friendly to read. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-28 14:46:26 +02:00
Guillaume Abrioux	7195b08718	update: update rgw systemd unit name The old name is used in `rolling_update.yml` and `purge-docker-cluster.yml`, it breaks the `test_rgw_service_is_running()` test. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-09-24 14:58:55 +02:00
Sébastien Han	92f9be963b	rolling_update: clarify update doc Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1490188 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-13 15:46:29 -06:00
Sébastien Han	e0a264c7e9	osd: allow multi dedicated journals for containers Fix: https://bugzilla.redhat.com/show_bug.cgi?id=1475820 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-08-30 12:34:06 +02:00
Sébastien Han	0205f6d645	rolling_update: nicer way to set osd flags Prior to this patch, we were applying the osd flags like this: " General pre tasks Set flags Upgrade OSDs on a host Unset flags <-- this triggers pending scrub to start Set flags Upgrade OSDs on a hosts Unset flags <-- this triggers pending scrub to start . . . General post tasks " Now instead, we apply the flag once before starting the OSD update and unset them once the last OSD is finished. " General pre tasks Set flags and wait for any scrubs to finish Upgrade OSDs on a host Upgrade OSDs on a host . . . Unset flags General post tasks " Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1450754 Signed-off-by: Sébastien Han <seb@redhat.com> Co-Authored-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-08-25 18:21:28 +02:00
Sébastien Han	4a4a20f07d	rolling update: skip pg check if num_pgs = 0 In our test case we don't have any pgs, thus the check fails. The check always returns an empty array, which makes the comparaison failing. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-08-24 08:50:49 +02:00
Guillaume Abrioux	7a333d05ce	Add handlers for containerized deployment Until now, there is no handlers for containerized deployments. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-08-02 17:12:20 +02:00
Guillaume Abrioux	5adbf0fdaa	Move role dependencies in site.yml/site-docker.yml This will give us more flexibility and avoid a lot of useless when skipping all tasks from a non-desired role. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-08-02 17:12:14 +02:00
Guillaume Abrioux	206c7a16d0	rolling_update: refact code Refact rolling_update playbook. Add ceph-client upgrade. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-08-02 11:10:51 +02:00
Guillaume Abrioux	828f88403e	Update: Avoid screen scraping in rolling update since luminous has revamped the `ceph -s` output, we need to avoid screen scraping. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-07-12 15:02:39 +02:00
Guillaume Abrioux	73141118d0	Make the new check PGs working with /bin/sh The new test in the checks PGs are no longer working on distributions where /bin/sh isn't linked to /bin/bash. Fix: #1619 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-06-22 17:59:38 +02:00
Andrew Schoen	e2104acb62	rolling_update: set health_mon_check_delay to 15 The old value of 10 did not give enough time for a containerized mon to pass the health check. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-06-13 08:56:44 -05:00

1 2

88 Commits (4cc1506303739f13bb7a6e1022646ef90e004c90)