Commit Graph

217 Commits (9b5d97adb95a788bc1fdedbba562a9c71a1808be)

Author SHA1 Message Date
Guillaume Abrioux 45ba90c169 update: remove rbdmirror legacy task
This task is no longer needed for next release.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux 0ea0adf039 update: show all daemons version at the end
Let's display all daemons version at the end of the playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux f31d6d9485 update: enable new nautilus-only functionality
once the cluster is upgraded to nautilus, we can complete the process by
disallowing pre-nautilus OSDs and enabling all new nautilus-only functionality

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux afdaa70a63 update: enable msgr2 protocol
This commit enable the msgr2 protocol when the cluster is fully upgraded
to nautilus

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux ef096dd021 update: ensure mgrs are upgraded after ALL monitors
As of 1c760904b0, ceph-ansible implicitly
bootstrap managers on monitors.
mgrs must be upgraded only after all monitors, therefore, this commit
refact the way mgrs are upgraded to be sure we don't upgrade a mgr
during the monitors upgrade.

This commit also ensure we handle the case were we split managers on
dedicated nodes.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux 7fa2434f0f update: ensure /var/lib/ceph/bootstrap-rbd-mirror is present
This directory is created by ceph-config node by node.
In the upgrade context we need it to be created on ALL monitors as soon
as the first iteration because of the task right after which creates and sends
the keyrings on all monitors.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux 82764afe8d update: mask systemd service units during upgrade
This prevents the packaging from restarting services before we do need
to restart them in the rolling update sequence.
We want to handle services restart at rolling_update playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux 8add55451c update: set osd flags only once
There is no need to set osd flags (noout, norebalance) each time we
upgrade a mon.

This commit moves up those tasks (before stopping the mon) so we don't need
to delegate them.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux f7c6f4e0b6 update: fix tasks waiting for the node to join the quorum
We actually want to ensure the node being upgraded is joining the quorum
instead of the monitor picked up earlier.

Indeed, the `mon_host`is used only in `delegate_to:` so we can still run ceph
commands while the monitor being upgraded is stopped.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux 32569b79e2 update: remove an old parameter in ceph_key module call
the `containerized` parameter in ceph_key module doesn't exist anymore.
This was making the module failing but was hidden because of the
`ignore_errors: True`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux edfdc49488 rolling_update: support multiple rgw instance
1ac94c048f introduced the support of
multiple rgw instances on a single host but somehow has missed to
implement this feature in rolling_update.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-01-22 13:45:38 +01:00
Giulio Fidente ff8dbe114c Preserve rolling_update backward compatibility with ansible < 2.5
Signed-off-by: Giulio Fidente <gfidente@redhat.com>
2019-01-21 14:05:45 +01:00
Guillaume Abrioux 268f2cef82 update: do not enforce `serial: 1` on client nodes
There is no need to enforce `serial: 1` on client nodes.
Let's make it parameterizable by introducing a new *extra* variable
`client_update_batch`, if not filled this will default to `{{
ansible_forks }}`.

NOTE: this is only usable as an extra variable passed with
`-e client_update_batch=<num>`

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1650184

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-01-02 16:55:08 +00:00
Guillaume Abrioux 0eb56e36f8 introduce new role ceph-facts
sometimes we play the whole role `ceph-defaults` just to access the
default value of some variables. It means we play the `facts.yml` part
in this role while it's not desired. Splitting this role will speedup
the playbook.

Closes: #3282

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-12-12 11:18:01 +01:00
Rishabh Dave e4f0af2b78 don't use private option for import_role
Since sharing variables amongst roles has been made default since
Ansible 2.6, private option has been deprecated; so stop using it.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
2018-12-04 23:45:59 +00:00
Ramana Raja cb784c601d rolling_update: fail if less than 3 MONs
... for non-containerized deployments as well.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1655470

Signed-off-by: Ramana Raja <rraja@redhat.com>
2018-12-04 14:28:49 +00:00
Sébastien Han 896676ee80 fix json data type
Json is a type structure which is always typed as a string, where before
this we were declaring a dict, which is not a json valid structure.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-12-04 12:34:54 +01:00
Sébastien Han 1c760904b0 site: collocated mon and mgr by default
This will speed up the deployment and also deploy mon and mgr collocated
just as recommended.
This won't prevent you of adding more and dedicaded machines for mgr if
needed.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-12-03 14:39:43 +01:00
Sébastien Han bb7bfca113 rolling-update: remove old condition
This failure condition was only valid at the time where clusters didn't
have ceph-mgr activated. Now since we collocate the ceph-mgr with the
mon by default, if the daemon wasn't present it will be created during
the upgrade.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-12-03 14:39:43 +01:00
Guillaume Abrioux a952122c38 rolling_update: create missing keyring only on running mon
try to create the potentially missing keys only on monitors that are
actually running.
The current node being played is stopped before this task.
By the way, delegating the command on all nodes but the current node
being played ensures that the generated keys will be present on all
monitors.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-11-29 16:40:46 +00:00
Sébastien Han 61fb6972ec rolling_update: default ceph json output to empty dict
So we can avoid the following failure:

The conditional check 'hostvars[mon_host]['ansible_hostname'] in (ceph_health_raw.stdout | from_json)["quorum_names"] or hostvars[mon_host]['ansible_fqdn'] in (ceph_health_raw.stdout | from_json)["quorum_names"]
' failed. The error was: No JSON object could be decoded

We just need to set a default, the next iteration will have a more
complete json since the command won't fail.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-11-29 10:46:15 +00:00
Guillaume Abrioux 73287f91bc mgr: fix mgr keyring error on rolling_update
when upgrading from RHCS 2.5 to 3.2, it fails because the task `create
ceph mgr keyring(s) when mon is containerized` has a when condition
`inventory_hostname == groups[mon_group_name]|last`.
First, this is incorrect because `inventory_hostname` is referring to a
mgr node, it means this condition would have never been satisfied.
Then, this condition + `serial: 1` makes the mgr keyring creating skipped on
the first node. Further, the `ceph-mgr` role tries to copy the mgr
keyring (it's not aware we are running `serial: 1`) this leads to a
failure like the following:

```
TASK [ceph-mgr : copy ceph keyring(s) if needed] ***************************************************************************************************************************************************************************************************************************************************************************
task path: /usr/share/ceph-ansible/roles/ceph-mgr/tasks/common.yml:10
Tuesday 27 November 2018  12:03:34 +0000 (0:00:00.296)       0:11:01.290 ******
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: AnsibleFileNotFound: Could not find or access '~/ceph-ansible-keys/48d78ac1-e0d6-4e35-ab3e-772aea7828fc//etc/ceph/local.mgr.magna021.keyring'
failed: [magna021] (item={u'dest': u'/var/lib/ceph/mgr/local-magna021/keyring', u'name': u'/etc/ceph/local.mgr.magna021.keyring', u'copy_key': True}) => {"changed": false, "item": {"copy_key": true, "dest": "/var/lib/ceph/mgr/local-magna021/keyring", "name": "/etc/ceph/local.mgr.magna021.keyring"}, "msg": "Could not find or access '~/ceph-ansible-keys/48d78ac1-e0d6-4e35-ab3e-772aea7828fc//etc/ceph/local.mgr.magna021.keyring'"}
```

The ceph_key module is idempotent, so there is no need to have such a
condition.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1649957

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-11-27 18:19:56 +01:00
Sébastien Han 4f57e44f9c defaults: declare container_binary
Always declare container_binary and assign it a correct value.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-11-27 16:47:40 +00:00
Sébastien Han 49e0e19056 rolling_update: update ceph_key task for container
Use the new way to create keys on containerized env as introduced by: 1098b71bda90db3dad19ac179f0ba900ccb0f953

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-11-27 16:47:40 +00:00
Sébastien Han 2814d36c93 infra playbooks: use the right container binary
Use podman or docker wether they are available or not. podman will be
prioritized over docker if present.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-11-27 16:47:40 +00:00
Guillaume Abrioux 7c99b6df6d update: fix a typo
`hostvars[groups[mon_host]]['ansible_hostname']` seems to be a typo.
That should be `hostvars[mon_host]['ansible_hostname']`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-11-26 18:22:20 +01:00
Guillaume Abrioux af78173584 rolling_update: refact set_fact `mon_host`
each monitor node should select another monitor which isn't itself.
Otherwise, one node in the monitor group won't set this fact and causes
failure.

Typical error:
```
TASK [create potentially missing keys (rbd and rbd-mirror) when mon is containerized] ***
task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-dev-update_docker_cluster/rolling_update.yml:200
Thursday 22 November 2018  14:02:30 +0000 (0:00:07.493)       0:02:50.005 *****
fatal: [mon1]: FAILED! => {}

MSG:

The task includes an option with an undefined variable. The error was: 'dict object' has no attribute u'mon2'
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-11-26 18:22:20 +01:00
Sébastien Han 4e267bee4f rolling_update: create rbd and rbd-mirror keyrings
During an upgrade ceph won't create keys that were not existing on the
previous version. So after the upgrade of let's Jewel to Luminous, once
all the monitors have the new version they should get or create the
keys. It's ok to have the task fails, especially for the rbd-mirror
key, which only appears in Nautilus.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1650572
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-11-26 18:22:20 +01:00
Guillaume Abrioux c783bc70da docker-common: rename role
rename `ceph-docker-common` role to `ceph-container-common`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-11-12 10:51:48 +01:00
Rishabh Dave 3f62fc585f don't use "role" or "roles" to include roles
Since import_role and include_role are more readable, explicit (about
the nature of inclusion) and flexible (allows placibf inclusion
anywhere) amongst the tasks, use them instead of using roles or role
keyword. Besides, these keywords also allow more arguments.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
2018-10-31 09:38:59 +01:00
Sébastien Han 44d0da0dd4 rolling_update: fix upgrade when using fqdn
CLusters that were deployed using 'mon_use_fqdn' have a different unit
name, so during the upgrade this must be used otherwise the upgrade will
fail, looking for a unit that does not exist.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1597516
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-19 13:06:56 +00:00
Guillaume Abrioux 40b7747af7 remove jewel support
As of now, we should no longer support Jewel in ceph-ansible.
The latest ceph-ansible release supporting Jewel is `stable-3.1`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-12 23:38:17 +00:00
Noah Watkins 306e308f13 Avoid using tests as filter
Fixes the deprecation warning:

  [DEPRECATION WARNING]: Using tests as filters is deprecated. Instead of
  using `result|search` use `result is search`.

Signed-off-by: Noah Watkins <nwatkins@redhat.com>
2018-10-10 04:26:33 +00:00
Guillaume Abrioux 79bd06ad28 rolling_update: add ceph-handler role
since the introduction of ceph-handler, it has to be added in
rolling_update playbook as well

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-05 13:48:04 +00:00
Guillaume Abrioux 179c4d00d7 rolling_update: ensure pgs_by_state has at least 1 entry
Previous commit c13a3c3 has removed a condition.

This commit brings back this condition which is essential to ensure we
won't hit a false positive result in the `when` condition for the check
PGs task.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-09-25 14:58:54 +00:00
Guillaume Abrioux c13a3c3492 upgrade: consider all 'active+clean' states as valid pgs
In cluster with a large number of PGs, it can be expected some of them
scrubbing, it's a normal operation.
Preventing from scrubbing operation force to set noscrub flag before a
rolling update which is a problem because it pauses an important data
integrity operation until the end of the rolling upgrade.

This commit allows an upgrade even while PGs are scrubbing.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1616066

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-09-25 12:12:06 +00:00
Sébastien Han 2e6e885bb7 rolling_upgrade: set sortbitwise properly
Running 'osd set sortbitwise' when we detect a version 12 of Ceph is
wrong. When OSD are getting updated, even though the package is updated
they won't send their updated version (12) and will stick with 10 if the
command is not applied. So we have to check if OSD are sending a version
10 and then run the command to unlock the OSDs.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1600943
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-21 12:22:32 +00:00
Sébastien Han 77a3a682f3 iscsi group name preserve backward compatibility
Recently we renamed the group_name for iscsi iscsigws where previously
it was named iscsi-gws. Existing deployments with a host file section
with iscsi-gws must continue to work.

This commit adds the old group name as a backoward compatility, no error
from Ansible should be expected, if the hostgroup is not found nothing
is played.

Close: https://bugzilla.redhat.com/show_bug.cgi?id=1619167
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-20 23:52:19 +02:00
Sébastien Han dad10e8f3f rolling_update: register container osd units
Before running the upgrade, let's call systemd to collect unit names
instead of relaying on the device list. This is more accurate and fix
the osd_auto_discovery scenario too.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1613626
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-16 11:13:12 +02:00
Sébastien Han b3266c5be2 rolling_update: set osd sortbitwise
upgrade RHCS 2 -> RHCS 3 will fail if cluster has still set
sortnibblewise,
it stay stuck on "TASK [waiting for clean pgs...]" as RHCS 3 osds will
not start if nibblewise is set.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1600943
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-07-24 17:19:02 +02:00
Vishal Kanaujia 44d514850a Rolling upgrades: Migrate to ceph-key module
This change moves ceph-mgr upgrades to using ceph-key library.
Fixes: #2758

Signed-off-by: Vishal Kanaujia <vishal.kanaujia@flipkart.com>
2018-07-03 18:22:14 +02:00
Sébastien Han 20c8065e48 ceph-iscsi: rename group iscsi_gws
Let's try to avoid using dashes as testinfra needs to be able to read
the groups.
Typically, with iscsi-gws we can't add a marker for these iscsi nodes,
using an underscore fixes the issue.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-08 10:21:54 +02:00
Guillaume Abrioux 232a16d77f rolling_update: fix facts gathering delegation
this is kind of follow up on what has been made in #2560.
See #2560 and #2553 for details.

Closes: #2708

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-06 16:36:30 +08:00
Vishal Kanaujia 08d9432454 Rolling upgrades should use norebalance flag for OSDs
The rolling upgrades playbook should have norebalance flag set for
OSDs upgrades to wait only for recovery.

Fixes: #2657
Signed-off-by: Vishal Kanaujia <vishal.kanaujia@flipkart.com>
2018-06-04 10:59:01 +02:00
Sébastien Han e91648a7af rolling_update: add role ceph-iscsi-gw
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1575829
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-05-26 02:38:47 -07:00
Sébastien Han da5b104098 rolling_update: fix get fsid for containers
When running ansible2.4-update_docker_cluster there is an issue on the
"get current fsid" task. The current task only works for
non-containerized deployment but will run all the time (even for
containerized). This currently results in the following error:

TASK [get current fsid] ********************************************************
task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-luminous-ansible2.4-update_docker_cluster/rolling_update.yml:214
Tuesday 22 May 2018  22:48:32 +0000 (0:00:02.615)       0:11:01.035 ***********
fatal: [mgr0 -> mon0]: FAILED! => {
    "changed": true,
    "cmd": [
        "ceph",
        "--cluster",
        "test",
        "fsid"
    ],
    "delta": "0:05:00.260674",
    "end": "2018-05-22 22:53:34.555743",
    "rc": 1,
    "start": "2018-05-22 22:48:34.295069"
}

STDERR:

2018-05-22 22:48:34.495651 7f89482c6700  0 -- 192.168.17.10:0/1022712 >> 192.168.17.12:6789/0 pipe(0x7f8944067010 sd=4 :42654 s=1 pgs=0 cs=0 l=1 c=0x7f894405d510).connect protocol feature mismatch, my 83ffffffffffff < peer 481dff8eea4fffb missing 400000000000000
2018-05-22 22:48:34.495684 7f89482c6700  0 -- 192.168.17.10:0/1022712 >> 192.168.17.12:6789/0 pipe(0x7f8944067010 sd=4 :42654 s=1 pgs=0 cs=0 l=1 c=0x7f894405d510).fault

This is not really representative on the real error since the 'ceph' cli is available on that machine.
On other environments we will have something like "command not found: ceph".

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-05-23 04:44:12 +02:00
Sébastien Han d80a871a07 rolling_update: move osd flag section
During a minor update from a jewel to a higher jewel version (10.2.9 to
10.2.10 for example) osd flags don't get applied because they were done
in the mgr section which is skipped in jewel since this daemons does not
exist.
Moving the set flag section after all the mons have been updated solves
that problem.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1548071
Co-authored-by: Tomas Petr <tpetr@redhat.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-05-17 08:17:16 +02:00
Guillaume Abrioux 1b4c3f292d rolling_update: fix dest path for mgr keys fetching
the role `ceph-mgr` that is played later in the playbook fails because
the destination path for the fetched keys is wrong.
This patch fix the destination path used in the task `fetch ceph mgr
key(s)` so there is no mismatch.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1574995

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-05-15 19:30:34 +02:00
Guillaume Abrioux 3b89f1bfb1 rolling_update: get fsid in mgr pre_task
{{ fsid }} points to {{ cluster_uuid.stdout }} which is not defined in
this part of the rolling_update playbook.
Since we need to call {{ fsid }} we must get the fsid and register it to
`cluster_uuid`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-05-15 09:01:42 +02:00
Sébastien Han 52fc8a0385 rolling_update: move mgr key creation
Until all the mons haven't been updated to Luminous, there is no way to
create a key. So we should do the key creation in the mon role only if
we are not part of an update.
If we are then the key creation is done after the mons upgrade to
Luminous.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1574995
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-05-15 09:01:42 +02:00
Guillaume Abrioux c04e67347c update: look for short and fqdn in ceph_health_raw
According to hostname configuration, the task waiting for mons to be in
quorum might fail.
The idea here is to look for both shortname and fqdn in
`ceph_health_raw` instead of just `ansible_hostname`

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1546127

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-02-19 10:27:47 +01:00
Andrew Schoen 699c777e68 rolling update: fix undefined jewel_minor_update failure
Variables set at the play level with ``vars`` do
not carry over into the next play in the playbook.

The var jewel_minor_update was set in a previous play but
used in this one and was failing because it was not defined.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1544029

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-02-13 17:03:05 +01:00
Guillaume Abrioux c7ec12d49c upgrade: skip luminous tasks for jewel minor update
These tasks are needed only when upgrading to luminous.
They are not needed in Jewel minor upgrade and by the way, they fail because
`ceph versions` command doesn't exist.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-01-25 18:30:34 +01:00
Sébastien Han 8af7459476 rolling update: add mgr exception for jewel minor updates
When update from a minor Jewel version to another, the playbook will
fail on the task "fail if no mgr host is present in the inventory".
This now can be worked around by running Ansible with_items

-e jewel_minor_update=true

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1535382
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-01-18 14:06:05 +01:00
Andrew Schoen 997edea271 rolling_update: do not fail the playbook if nfs-ganesha is not present
The rolling update playbook was attempting to stop the
nfs-ganesha service on nodes where jewel is still installed.
The nfs-ganesha service did not exist in jewel so the task fails.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-01-06 14:07:55 +01:00
Sébastien Han 200785832f rolling_update: do not require root to answer question
There is no need to ask for root on the local action. This will prompt
for a password the current user is not part of sudoers. That's
  unnecessary anyways.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1516947
Signed-off-by: Sébastien Han <seb@redhat.com>
2017-12-19 14:04:55 +01:00
Sébastien Han 4413511b66 all: backward compatibility between stable-2.2 and 3.0
stable-3.0 brought numerous changes in ceph-ansible variables, this PR
aims to maintain backward compatibility for someone running stable-2.2
upgrading to stable-3.0 but keeps its groups_vars untouched.
We will then determine the right options to make sure the upgrade works
but we are expecting that new variables should be used.

We will drop this in a near future, maybe 3.1 or 3.2.

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-10-20 11:54:10 +02:00
Guillaume Abrioux 982326373b upgrade: fix upgrade jewel to luminous for nfs nodes
nfs nodes can't be upgraded from jewel to luminous because ceph-nfs role
is skipped because of the condition `when:
"ceph_release_num[ceph_release] >= ceph_release_num.luminous"`. Indeed,
package is upgraded in `ceph-nfs` role, therefore,
`ceph_release` is still set to the old version. It means the when can't
be satisfied.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-10-19 20:54:23 +02:00
Guillaume Abrioux 70034451e9 upgrade: fix upgrade jewel to luminous for mgr nodes
mgr nodes can't be upgraded from jewel to luminous because ceph-mgr role
is skipped because of the condition `when:
"ceph_release_num[ceph_release] >= ceph_release_num.luminous"`. Indeed,
ceph-mgr package is upgraded in `ceph-mgr` role, therefore,
`ceph_release` is still set to the old version. It means the when can't
be satisfied.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 302e563601cd6820b1ae44fabdfb1506688c7c9b)
2017-10-19 20:54:23 +02:00
Sébastien Han d920d4839d upgrade: support for rbd mirror and nfs
- Add upgrade support for rbd mirror and nfs daemons.
- Only works with systemd (remove sysvinit and upstart occurence)
- A bit of cleanup

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-10-17 10:54:47 +02:00
Major Hayden c01851325e
Remove jinja2 delimiters from `when` keys
This patch changes the `when:` keys so that they have no jinja2
delimiters. This avoids Ansible warnings which could turn into
errors in a future Ansible release.
2017-10-12 11:27:42 -05:00
Sébastien Han 774697ebd8 infra: use the pg check in the right place
Use the pg check before doing the pg check, not on the quorum check.
Also never quote int when doing comparaison.

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-10-09 17:25:41 +02:00
Sébastien Han 05f26031ea rolling_update: perform pg check when pgs_num > 0
If num_pgs = 0 the check will never return 0.

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-10-07 03:39:09 +02:00
Sébastien Han 99466e79a1 upgrade: a support for mgrs
Also we now play ceph-config to have everything being generated for new
daemons bootstrap during upgrade.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1497959
Signed-off-by: Sébastien Han <seb@redhat.com>
2017-10-03 16:57:31 +02:00
Sébastien Han b9050d6229 update: fix var register
Even if the task is skipped, ansible registers the var as 'skipped' so
this task the task using this variable for its next usage.

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-09-29 14:27:55 +02:00
Sébastien Han a0a5b174ba rolling_update: clarify mon quorum command
Cleaner.

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-09-29 01:19:46 +02:00
Sébastien Han bd5471b940 update: complete luminous upgrade
Once we complete the upgrade to Luminous, we must issue a specific
command. For more info read:
http://ceph.com/community/new-luminous-upgrade-complete/

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-09-28 21:05:00 +02:00
Sébastien Han 68f1f99ee9 update: nicer way to wait for clean pgs
More comprhensive and friendly to read.

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-09-28 14:46:26 +02:00
Guillaume Abrioux 7195b08718 update: update rgw systemd unit name
The old name is used in `rolling_update.yml` and
`purge-docker-cluster.yml`, it breaks the
`test_rgw_service_is_running()` test.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-09-24 14:58:55 +02:00
Sébastien Han 92f9be963b rolling_update: clarify update doc
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1490188
Signed-off-by: Sébastien Han <seb@redhat.com>
2017-09-13 15:46:29 -06:00
Sébastien Han e0a264c7e9 osd: allow multi dedicated journals for containers
Fix: https://bugzilla.redhat.com/show_bug.cgi?id=1475820
Signed-off-by: Sébastien Han <seb@redhat.com>
2017-08-30 12:34:06 +02:00
Sébastien Han 0205f6d645 rolling_update: nicer way to set osd flags
Prior to this patch, we were applying the osd flags like this:

"
General pre tasks
Set flags
Upgrade OSDs on a host
Unset flags <-- this triggers pending scrub to start
Set flags
Upgrade OSDs on a hosts
Unset flags <-- this triggers pending scrub to start
.
.
.
General post tasks
"

Now instead, we apply the flag once before starting the OSD update and
unset them once the last OSD is finished.

"
General pre tasks
Set flags and wait for any scrubs to finish
Upgrade OSDs on a host
Upgrade OSDs on a host
.
.
.
Unset flags
General post tasks
"

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1450754
Signed-off-by: Sébastien Han <seb@redhat.com>
Co-Authored-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-08-25 18:21:28 +02:00
Sébastien Han 4a4a20f07d rolling update: skip pg check if num_pgs = 0
In our test case we don't have any pgs, thus the check fails. The check
always returns an empty array, which makes the comparaison failing.

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-08-24 08:50:49 +02:00
Guillaume Abrioux 7a333d05ce Add handlers for containerized deployment
Until now, there is no handlers for containerized deployments.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-08-02 17:12:20 +02:00
Guillaume Abrioux 5adbf0fdaa Move role dependencies in site.yml/site-docker.yml
This will give us more flexibility and avoid a lot of useless when
skipping all tasks from a non-desired role.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-08-02 17:12:14 +02:00
Guillaume Abrioux 206c7a16d0 rolling_update: refact code
Refact rolling_update playbook.
Add ceph-client upgrade.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-08-02 11:10:51 +02:00
Guillaume Abrioux 828f88403e Update: Avoid screen scraping in rolling update
since luminous has revamped the `ceph -s` output, we need to avoid screen
scraping.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-07-12 15:02:39 +02:00
Guillaume Abrioux 73141118d0 Make the new check PGs working with /bin/sh
The new test in the checks PGs are no longer working on distributions
where /bin/sh isn't linked to /bin/bash.

Fix: #1619
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-06-22 17:59:38 +02:00
Andrew Schoen e2104acb62 rolling_update: set health_mon_check_delay to 15
The old value of 10 did not give enough time for a containerized mon to
pass the health check.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2017-06-13 08:56:44 -05:00
Guillaume Abrioux 5af9bb432c rewrite check pgs clean tasks
Avoid screen scrapping by rewriting `waiting for clean pgs` tasks like it is
done in 304de48.

Use the json output returned by `ceph -s` instead

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-06-13 09:48:56 +02:00
Sébastien Han fdc7866072 Merge pull request #1469 from ceph/refact_code
Docker: Refact code
2017-06-02 12:40:25 +02:00
Guillaume Abrioux ddfe019342 Refact code
`ceph-docker-common`:
  At the moment there is a lot of duplicated tasks in each
  `./roles/ceph-<role>/tasks/docker/main.yml` that could be refactored in
  `./roles/ceph-docker-common/tasks/main.yml`.

`*_containerized_deployment` variables:
  All `*_containerized_deployment` have been refactored to a single
  variable `containerized_deployment`

duplicate `cephx` variables in `group_vars/* have been removed.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-05-24 15:55:41 +02:00
Sébastien Han 90389864d8 rolling-update: set/unset flags on the right container
Problem: we are delegating the set/unset flag to a monitor node but we
try to call an osd container

Solution: use the right container name.

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-05-22 09:38:08 +02:00
Sébastien Han dfd8f4d96e test: add mgr section to the host inventory file
Without this, we don't test the mgr role so we need to add it.

Co-Authored-by: Guillaume Abrioux <gabrioux@redhat.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
2017-04-15 00:16:10 +02:00
Sébastien Han c37aaa41f4 playbook: homogenize the way list osd ids
Problem: too many different commands to do the same thing. The 'cut'
command on infrastructure-playbooks/purge-cluster.yml was also wrong.
This sed command from osixia in ceph-docker
https://github.com/ceph/ceph-docker/pull/580/ addresses all the
scenarios.

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-03-30 11:51:38 +02:00
Konstantin Shalygin 1662976fc0
Resolve issues when groups names not in default value. 2017-03-27 21:44:30 +07:00
Anthony D'Atri 6c4911276e Enhance clean PG check to catch active+clean+scrubbing and active+clean+scrubbing+deep
Signed-off-by: Anthony D'Atri <anthony.datri@gmail.com>
2017-03-19 00:23:26 -07:00
Sébastien Han 8320c14191 Merge pull request #1317 from ibotty/harmonize-docker-names
harmonize docker names
2017-03-14 18:20:20 +01:00
Andrew Schoen 46f26bec13 rolling-update: do not set group name vars at playbook level
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2017-03-08 08:57:08 -06:00
yangyimincn 8b36cbac64 Update rolling_update.yml
The task waiting for the monitor to join the quorum... , the result for ceph -s | grep monmap only contain monmap, not included quorum:

# ceph -s --cluster ceph | grep monmap
     monmap e1: 3 mons at {sh-office-ceph-1=10.12.10.34:6789/0,sh-office-ceph-2=10.12.10.35:6789/0,sh-office-ceph-3=10.12.10.36:6789/0}

If want to get monitor, should use this:

# ceph -s --cluster ceph | grep election
            election epoch 80, quorum 0,1 sh-office-ceph-1,sh-office-ceph-2

ceph verison: 10.2.5
2017-02-28 16:56:02 +08:00
Tobias Florek 931027e6f7 harmonize docker names
Created containers now are named more or less in the form of

    <ansible role>-<ansible_hostname>
2017-02-23 09:15:05 +01:00
Andrew Schoen 5622c94e8b rolling-update: do not use upstart to stop mons when using systemd
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2017-02-21 12:31:26 -06:00
Sébastien Han 8f94bfb498 rolling-update: detect init system properly
Simply use the ansible_service_mgr fact.

Closes: #1286

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-02-08 08:52:05 +01:00
Guillaume Abrioux 76ddcbc271 Remove support of releases prior to Jewel.
According to #1216, we need to simply the code by removing the
support of anything before Jewel.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-01-31 11:00:54 +01:00
Sébastien Han 829e2b6598 Merge pull request #1077 from font/rolling_update
Support containerized rolling update
2016-11-22 16:56:46 +01:00
Sébastien Han 38e846e542 rolling_update: clarify "serial" usage
Prior to this commit the serial variable was poorly documented. Now we
are making clear that this value should be left untouched as the rolling
update mechanism should happen serially.

Solves: bz-1396742
Signed-off-by: Sébastien Han <seb@redhat.com>
2016-11-21 14:42:46 +01:00
Ivan Font 255e816e28 Rolling update changes for containerized deployments
Separate out systemd restart tasks for containerized and
non-containerized deployments

Signed-off-by: Ivan Font <ifont@redhat.com>
2016-11-17 11:25:25 -08:00
Ivan Font e72f08080d Warn user when upgrading cluster with only one mon
Signed-off-by: Ivan Font <ifont@redhat.com>
2016-11-17 11:25:25 -08:00
Ivan Font 3ff17f1c8f Support containerized rolling update
- Update rolling update playbook to support containerized deployments
  for mons, osds, mdss, and rgws
- Skip checking if existing cluster is running when performing a rolling
  update
- Fixed bug where we were failing to start the mds container because it
  was missing the admin keyring. The admin keyring was missing because
  it was not being pushed from the mon host to the ansible host due to
  the keyring not being available before running the copy_configs.yml
  task include file. Now we forcefully wait for the admin keyring to be
  generated before continuing with the copy_configs.yml task include file
- Skip pre_requisite.yml when running on atomic host. This technically
  no longer requires specifying to skip tasks containing the with_pkg tag
- Add missing variables to all.docker.sample
- Misc. cleanup

Signed-off-by: Ivan Font <ifont@redhat.com>
2016-11-17 11:25:25 -08:00
Alfredo Deza 60ce2311b8 rolling_update: bump retries for osd_check/retries to 20 minutes
Signed-off-by: Alfredo Deza <adeza@redhat.com>

Resolves: rhbz#1395073
2016-11-17 10:43:58 -05:00
Sébastien Han 81a72cb85d Merge pull request #1068 from ceph/v2.2
moving to ansible v2.2 compatibility
2016-11-16 16:33:40 +01:00
Andrew Schoen 5f44b118b8 rolling update: stop RGWs before upgrade and start afterwards
Signed-off-by: Andrew Schoen <aschoen@redhat.com>

Resolves: rhbz#1394929
2016-11-14 14:47:12 -06:00
Andrew Schoen ded9d9dfd3 rolling update: stop MDSs before upgrading and start afterwards
Signed-off-by: Andrew Schoen <aschoen@redhat.com>

Resolves: rhbz#1394929
2016-11-14 14:47:12 -06:00
Andrew Schoen 5429c5f8c5 rolling update: stop MONs before upgrading and start afterwards
Signed-off-by: Andrew Schoen <aschoen@redhat.com>

Resolves: rhbz#1394929
2016-11-14 14:47:12 -06:00
Andrew Schoen 66f09bdac4 rolling update: stop OSDs before upgrading
This avoids a bug where OSDs are sometimes restarted twice on
upgrades which leaves the OSD process running but not marked up.

See:

https://bugzilla.redhat.com/show_bug.cgi?id=1394928
https://bugzilla.redhat.com/show_bug.cgi?id=1391675
https://bugzilla.redhat.com/show_bug.cgi?id=1394929

Signed-off-by: Andrew Schoen <aschoen@redhat.com>

Resolves: rhbz#1394929
2016-11-14 14:46:58 -06:00
Sébastien Han 991341f525 rolling_update: add variable to upgrade ceph
My stupid self removed this crucial variable here: 217ce3ca thinking it
was another hard coded variable import where this is actually the
trigger for the upgrade.

Closes: #1071

Signed-off-by: Sébastien Han <seb@redhat.com>
2016-11-04 17:31:02 +01:00
Sébastien Han a2fcd222d2 moving to ansible v2.2 compatibility
Signed-off-by: Sébastien Han <seb@redhat.com>
Co-Authored-By: Julien Francoz julien@francoz.net
2016-11-04 10:09:38 +01:00
Andrew Schoen 8262ce5e40 rolling update: fix restarts of radosgw
Signed-off-by: Andrew Schoen <aschoen@redhat.com>

Resolves: rhbz#1391675
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2016-11-03 14:36:42 -05:00
Eduard Egorov 645b5efebf Fix hard-coded host group names in include tasks for group variables' file paths.
Signed-off-by: Eduard Egorov <eduard.egorov@icl-services.com>
2016-11-01 12:21:40 +00:00
Andrew Schoen 0897c965ff rolling_update: define mon_group_name when upgrading the mons
see: https://bugzilla.redhat.com/show_bug.cgi?id=1389456

Signed-off-by: Andrew Schoen <aschoen@redhat.com>

Resolves: rhbz#1389456
2016-10-27 14:17:56 -05:00
Sébastien Han b0989c700f rolling_update: fix wrong indent
Fixing: https://bugzilla.redhat.com/show_bug.cgi?id=1388295
Also add some notes in the README on how to run infrastructure
playbooks.

Signed-off-by: Sébastien Han <seb@redhat.com>
2016-10-26 12:51:08 -05:00
Andrew Schoen bebf412c92 infrastructure-playbooks: fix syntax errors in all playbooks
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2016-10-25 16:56:58 -05:00
Sébastien Han f49bf2832c rolling_update: improve variables import
we now have pointer to default role so we don't miss any of the
variables defined.

Signed-off-by: Sébastien Han <seb@redhat.com>
2016-10-06 14:08:04 +02:00
Sébastien Han ac2cb9ac2c upgrade: add custom timeout options
This commit introduces the ability to configure delays and retries for
cluster health checks, for both monitors and OSDs.

Signed-off-by: Sébastien Han <seb@redhat.com>
2016-10-03 11:27:02 +02:00
Sébastien Han 21356c653f rolling updates: remove mon compact command
Users have reported this task to hang. Since this command is not
required to perform the upgrade, we remove it.

Signed-off-by: Sébastien Han <seb@redhat.com>
2016-09-13 10:09:07 +02:00
Rachana Patel ad5805f03e rolling_update.yml will not work if cluster name is not 'ceph'. Adding --cluster will solve this problem
Fixes issue #969

Signed-off-by: Rachana Patel <rachana83.patel@gmail.com>
2016-09-07 15:38:58 -04:00
Sébastien Han fde819d1a8 create a directory for infrastructure playbooks
Since we have a couple of infrastructure related playbooks
(additionnally to the roles we are using to deploy Ceph), it makes sense
to have them located in a separate directory.

Signed-off-by: Sébastien Han <seb@redhat.com>
2016-08-17 11:53:34 +02:00