this is kind of follow up on what has been made in #2560.
See #2560 and #2553 for details.
Closes: #2708
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The rolling upgrades playbook should have norebalance flag set for
OSDs upgrades to wait only for recovery.
Fixes: #2657
Signed-off-by: Vishal Kanaujia <vishal.kanaujia@flipkart.com>
When running ansible2.4-update_docker_cluster there is an issue on the
"get current fsid" task. The current task only works for
non-containerized deployment but will run all the time (even for
containerized). This currently results in the following error:
TASK [get current fsid] ********************************************************
task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-luminous-ansible2.4-update_docker_cluster/rolling_update.yml:214
Tuesday 22 May 2018 22:48:32 +0000 (0:00:02.615) 0:11:01.035 ***********
fatal: [mgr0 -> mon0]: FAILED! => {
"changed": true,
"cmd": [
"ceph",
"--cluster",
"test",
"fsid"
],
"delta": "0:05:00.260674",
"end": "2018-05-22 22:53:34.555743",
"rc": 1,
"start": "2018-05-22 22:48:34.295069"
}
STDERR:
2018-05-22 22:48:34.495651 7f89482c6700 0 -- 192.168.17.10:0/1022712 >> 192.168.17.12:6789/0 pipe(0x7f8944067010 sd=4 :42654 s=1 pgs=0 cs=0 l=1 c=0x7f894405d510).connect protocol feature mismatch, my 83ffffffffffff < peer 481dff8eea4fffb missing 400000000000000
2018-05-22 22:48:34.495684 7f89482c6700 0 -- 192.168.17.10:0/1022712 >> 192.168.17.12:6789/0 pipe(0x7f8944067010 sd=4 :42654 s=1 pgs=0 cs=0 l=1 c=0x7f894405d510).fault
This is not really representative on the real error since the 'ceph' cli is available on that machine.
On other environments we will have something like "command not found: ceph".
Signed-off-by: Sébastien Han <seb@redhat.com>
During a minor update from a jewel to a higher jewel version (10.2.9 to
10.2.10 for example) osd flags don't get applied because they were done
in the mgr section which is skipped in jewel since this daemons does not
exist.
Moving the set flag section after all the mons have been updated solves
that problem.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1548071
Co-authored-by: Tomas Petr <tpetr@redhat.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
the role `ceph-mgr` that is played later in the playbook fails because
the destination path for the fetched keys is wrong.
This patch fix the destination path used in the task `fetch ceph mgr
key(s)` so there is no mismatch.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1574995
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
{{ fsid }} points to {{ cluster_uuid.stdout }} which is not defined in
this part of the rolling_update playbook.
Since we need to call {{ fsid }} we must get the fsid and register it to
`cluster_uuid`.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Until all the mons haven't been updated to Luminous, there is no way to
create a key. So we should do the key creation in the mon role only if
we are not part of an update.
If we are then the key creation is done after the mons upgrade to
Luminous.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1574995
Signed-off-by: Sébastien Han <seb@redhat.com>
According to hostname configuration, the task waiting for mons to be in
quorum might fail.
The idea here is to look for both shortname and fqdn in
`ceph_health_raw` instead of just `ansible_hostname`
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1546127
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Variables set at the play level with ``vars`` do
not carry over into the next play in the playbook.
The var jewel_minor_update was set in a previous play but
used in this one and was failing because it was not defined.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1544029
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
These tasks are needed only when upgrading to luminous.
They are not needed in Jewel minor upgrade and by the way, they fail because
`ceph versions` command doesn't exist.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
When update from a minor Jewel version to another, the playbook will
fail on the task "fail if no mgr host is present in the inventory".
This now can be worked around by running Ansible with_items
-e jewel_minor_update=true
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1535382
Signed-off-by: Sébastien Han <seb@redhat.com>
The rolling update playbook was attempting to stop the
nfs-ganesha service on nodes where jewel is still installed.
The nfs-ganesha service did not exist in jewel so the task fails.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
There is no need to ask for root on the local action. This will prompt
for a password the current user is not part of sudoers. That's
unnecessary anyways.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1516947
Signed-off-by: Sébastien Han <seb@redhat.com>
stable-3.0 brought numerous changes in ceph-ansible variables, this PR
aims to maintain backward compatibility for someone running stable-2.2
upgrading to stable-3.0 but keeps its groups_vars untouched.
We will then determine the right options to make sure the upgrade works
but we are expecting that new variables should be used.
We will drop this in a near future, maybe 3.1 or 3.2.
Signed-off-by: Sébastien Han <seb@redhat.com>
nfs nodes can't be upgraded from jewel to luminous because ceph-nfs role
is skipped because of the condition `when:
"ceph_release_num[ceph_release] >= ceph_release_num.luminous"`. Indeed,
package is upgraded in `ceph-nfs` role, therefore,
`ceph_release` is still set to the old version. It means the when can't
be satisfied.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
mgr nodes can't be upgraded from jewel to luminous because ceph-mgr role
is skipped because of the condition `when:
"ceph_release_num[ceph_release] >= ceph_release_num.luminous"`. Indeed,
ceph-mgr package is upgraded in `ceph-mgr` role, therefore,
`ceph_release` is still set to the old version. It means the when can't
be satisfied.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 302e563601cd6820b1ae44fabdfb1506688c7c9b)
- Add upgrade support for rbd mirror and nfs daemons.
- Only works with systemd (remove sysvinit and upstart occurence)
- A bit of cleanup
Signed-off-by: Sébastien Han <seb@redhat.com>
This patch changes the `when:` keys so that they have no jinja2
delimiters. This avoids Ansible warnings which could turn into
errors in a future Ansible release.
Use the pg check before doing the pg check, not on the quorum check.
Also never quote int when doing comparaison.
Signed-off-by: Sébastien Han <seb@redhat.com>
Also we now play ceph-config to have everything being generated for new
daemons bootstrap during upgrade.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1497959
Signed-off-by: Sébastien Han <seb@redhat.com>
Even if the task is skipped, ansible registers the var as 'skipped' so
this task the task using this variable for its next usage.
Signed-off-by: Sébastien Han <seb@redhat.com>
The old name is used in `rolling_update.yml` and
`purge-docker-cluster.yml`, it breaks the
`test_rgw_service_is_running()` test.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Prior to this patch, we were applying the osd flags like this:
"
General pre tasks
Set flags
Upgrade OSDs on a host
Unset flags <-- this triggers pending scrub to start
Set flags
Upgrade OSDs on a hosts
Unset flags <-- this triggers pending scrub to start
.
.
.
General post tasks
"
Now instead, we apply the flag once before starting the OSD update and
unset them once the last OSD is finished.
"
General pre tasks
Set flags and wait for any scrubs to finish
Upgrade OSDs on a host
Upgrade OSDs on a host
.
.
.
Unset flags
General post tasks
"
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1450754
Signed-off-by: Sébastien Han <seb@redhat.com>
Co-Authored-by: Guillaume Abrioux <gabrioux@redhat.com>
In our test case we don't have any pgs, thus the check fails. The check
always returns an empty array, which makes the comparaison failing.
Signed-off-by: Sébastien Han <seb@redhat.com>
This will give us more flexibility and avoid a lot of useless when
skipping all tasks from a non-desired role.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The new test in the checks PGs are no longer working on distributions
where /bin/sh isn't linked to /bin/bash.
Fix: #1619
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Avoid screen scrapping by rewriting `waiting for clean pgs` tasks like it is
done in 304de48.
Use the json output returned by `ceph -s` instead
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
`ceph-docker-common`:
At the moment there is a lot of duplicated tasks in each
`./roles/ceph-<role>/tasks/docker/main.yml` that could be refactored in
`./roles/ceph-docker-common/tasks/main.yml`.
`*_containerized_deployment` variables:
All `*_containerized_deployment` have been refactored to a single
variable `containerized_deployment`
duplicate `cephx` variables in `group_vars/* have been removed.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Problem: we are delegating the set/unset flag to a monitor node but we
try to call an osd container
Solution: use the right container name.
Signed-off-by: Sébastien Han <seb@redhat.com>
Without this, we don't test the mgr role so we need to add it.
Co-Authored-by: Guillaume Abrioux <gabrioux@redhat.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
Problem: too many different commands to do the same thing. The 'cut'
command on infrastructure-playbooks/purge-cluster.yml was also wrong.
This sed command from osixia in ceph-docker
https://github.com/ceph/ceph-docker/pull/580/ addresses all the
scenarios.
Signed-off-by: Sébastien Han <seb@redhat.com>
The task waiting for the monitor to join the quorum... , the result for ceph -s | grep monmap only contain monmap, not included quorum:
# ceph -s --cluster ceph | grep monmap
monmap e1: 3 mons at {sh-office-ceph-1=10.12.10.34:6789/0,sh-office-ceph-2=10.12.10.35:6789/0,sh-office-ceph-3=10.12.10.36:6789/0}
If want to get monitor, should use this:
# ceph -s --cluster ceph | grep election
election epoch 80, quorum 0,1 sh-office-ceph-1,sh-office-ceph-2
ceph verison: 10.2.5