Commit Graph

121 Commits (4c3e77d8690a7be4fb89f7292c51f8644faaeafa)

Author SHA1 Message Date
Dimitri Savineau c8442f3705 rolling_update: Update systemd unit regex for nvme
The systemd unit regex doesn't handle nvme devices (/dev/nvmeXn1).

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1687828

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-03-26 12:01:00 +00:00
Guillaume Abrioux 78aac3e96a update: followup on edfdc49
all rgw instances should be stopped according to the multiple rgw
instances support added in rolling_update.yml

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux f6e0185146 update: add containerized deployment upgrade support (L->N)
Add a couple of fixes to allow containerized deployments upgrade support
to upgrade from luminous/mimic to nautilus.

- pass CEPH_CONTAINER_IMAGE and CEPH_CONTAINER_BINARY environment
variable to the ceph_key module,
- fix the docker exec command in 'waiting for the containerized monitor
to join the quorum' task according to the `delegate_to` parameter,
- override `docker_exec_cmd` in `ceph-facts` with `mon_host` when
rolling_update is `True`,
- do not run unnecessarily `create_mds_filesystems.yml` when performing an
upgrade.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux 1816b876ee update: add missing hosts in facts gathering
iscsigws were missing.
The 'complete upgrade' couldn't complete because rolling_update was set
to False for iscsigw nodes.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux 45ba90c169 update: remove rbdmirror legacy task
This task is no longer needed for next release.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux 0ea0adf039 update: show all daemons version at the end
Let's display all daemons version at the end of the playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux f31d6d9485 update: enable new nautilus-only functionality
once the cluster is upgraded to nautilus, we can complete the process by
disallowing pre-nautilus OSDs and enabling all new nautilus-only functionality

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux afdaa70a63 update: enable msgr2 protocol
This commit enable the msgr2 protocol when the cluster is fully upgraded
to nautilus

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux ef096dd021 update: ensure mgrs are upgraded after ALL monitors
As of 1c760904b0, ceph-ansible implicitly
bootstrap managers on monitors.
mgrs must be upgraded only after all monitors, therefore, this commit
refact the way mgrs are upgraded to be sure we don't upgrade a mgr
during the monitors upgrade.

This commit also ensure we handle the case were we split managers on
dedicated nodes.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux 7fa2434f0f update: ensure /var/lib/ceph/bootstrap-rbd-mirror is present
This directory is created by ceph-config node by node.
In the upgrade context we need it to be created on ALL monitors as soon
as the first iteration because of the task right after which creates and sends
the keyrings on all monitors.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux 82764afe8d update: mask systemd service units during upgrade
This prevents the packaging from restarting services before we do need
to restart them in the rolling update sequence.
We want to handle services restart at rolling_update playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux 8add55451c update: set osd flags only once
There is no need to set osd flags (noout, norebalance) each time we
upgrade a mon.

This commit moves up those tasks (before stopping the mon) so we don't need
to delegate them.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux f7c6f4e0b6 update: fix tasks waiting for the node to join the quorum
We actually want to ensure the node being upgraded is joining the quorum
instead of the monitor picked up earlier.

Indeed, the `mon_host`is used only in `delegate_to:` so we can still run ceph
commands while the monitor being upgraded is stopped.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux 32569b79e2 update: remove an old parameter in ceph_key module call
the `containerized` parameter in ceph_key module doesn't exist anymore.
This was making the module failing but was hidden because of the
`ignore_errors: True`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux edfdc49488 rolling_update: support multiple rgw instance
1ac94c048f introduced the support of
multiple rgw instances on a single host but somehow has missed to
implement this feature in rolling_update.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-01-22 13:45:38 +01:00
Giulio Fidente ff8dbe114c Preserve rolling_update backward compatibility with ansible < 2.5
Signed-off-by: Giulio Fidente <gfidente@redhat.com>
2019-01-21 14:05:45 +01:00
Guillaume Abrioux 268f2cef82 update: do not enforce `serial: 1` on client nodes
There is no need to enforce `serial: 1` on client nodes.
Let's make it parameterizable by introducing a new *extra* variable
`client_update_batch`, if not filled this will default to `{{
ansible_forks }}`.

NOTE: this is only usable as an extra variable passed with
`-e client_update_batch=<num>`

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1650184

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-01-02 16:55:08 +00:00
Guillaume Abrioux 0eb56e36f8 introduce new role ceph-facts
sometimes we play the whole role `ceph-defaults` just to access the
default value of some variables. It means we play the `facts.yml` part
in this role while it's not desired. Splitting this role will speedup
the playbook.

Closes: #3282

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-12-12 11:18:01 +01:00
Rishabh Dave e4f0af2b78 don't use private option for import_role
Since sharing variables amongst roles has been made default since
Ansible 2.6, private option has been deprecated; so stop using it.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
2018-12-04 23:45:59 +00:00
Ramana Raja cb784c601d rolling_update: fail if less than 3 MONs
... for non-containerized deployments as well.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1655470

Signed-off-by: Ramana Raja <rraja@redhat.com>
2018-12-04 14:28:49 +00:00
Sébastien Han 896676ee80 fix json data type
Json is a type structure which is always typed as a string, where before
this we were declaring a dict, which is not a json valid structure.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-12-04 12:34:54 +01:00
Sébastien Han 1c760904b0 site: collocated mon and mgr by default
This will speed up the deployment and also deploy mon and mgr collocated
just as recommended.
This won't prevent you of adding more and dedicaded machines for mgr if
needed.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-12-03 14:39:43 +01:00
Sébastien Han bb7bfca113 rolling-update: remove old condition
This failure condition was only valid at the time where clusters didn't
have ceph-mgr activated. Now since we collocate the ceph-mgr with the
mon by default, if the daemon wasn't present it will be created during
the upgrade.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-12-03 14:39:43 +01:00
Guillaume Abrioux a952122c38 rolling_update: create missing keyring only on running mon
try to create the potentially missing keys only on monitors that are
actually running.
The current node being played is stopped before this task.
By the way, delegating the command on all nodes but the current node
being played ensures that the generated keys will be present on all
monitors.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-11-29 16:40:46 +00:00
Sébastien Han 61fb6972ec rolling_update: default ceph json output to empty dict
So we can avoid the following failure:

The conditional check 'hostvars[mon_host]['ansible_hostname'] in (ceph_health_raw.stdout | from_json)["quorum_names"] or hostvars[mon_host]['ansible_fqdn'] in (ceph_health_raw.stdout | from_json)["quorum_names"]
' failed. The error was: No JSON object could be decoded

We just need to set a default, the next iteration will have a more
complete json since the command won't fail.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-11-29 10:46:15 +00:00
Guillaume Abrioux 73287f91bc mgr: fix mgr keyring error on rolling_update
when upgrading from RHCS 2.5 to 3.2, it fails because the task `create
ceph mgr keyring(s) when mon is containerized` has a when condition
`inventory_hostname == groups[mon_group_name]|last`.
First, this is incorrect because `inventory_hostname` is referring to a
mgr node, it means this condition would have never been satisfied.
Then, this condition + `serial: 1` makes the mgr keyring creating skipped on
the first node. Further, the `ceph-mgr` role tries to copy the mgr
keyring (it's not aware we are running `serial: 1`) this leads to a
failure like the following:

```
TASK [ceph-mgr : copy ceph keyring(s) if needed] ***************************************************************************************************************************************************************************************************************************************************************************
task path: /usr/share/ceph-ansible/roles/ceph-mgr/tasks/common.yml:10
Tuesday 27 November 2018  12:03:34 +0000 (0:00:00.296)       0:11:01.290 ******
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: AnsibleFileNotFound: Could not find or access '~/ceph-ansible-keys/48d78ac1-e0d6-4e35-ab3e-772aea7828fc//etc/ceph/local.mgr.magna021.keyring'
failed: [magna021] (item={u'dest': u'/var/lib/ceph/mgr/local-magna021/keyring', u'name': u'/etc/ceph/local.mgr.magna021.keyring', u'copy_key': True}) => {"changed": false, "item": {"copy_key": true, "dest": "/var/lib/ceph/mgr/local-magna021/keyring", "name": "/etc/ceph/local.mgr.magna021.keyring"}, "msg": "Could not find or access '~/ceph-ansible-keys/48d78ac1-e0d6-4e35-ab3e-772aea7828fc//etc/ceph/local.mgr.magna021.keyring'"}
```

The ceph_key module is idempotent, so there is no need to have such a
condition.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1649957

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-11-27 18:19:56 +01:00
Sébastien Han 4f57e44f9c defaults: declare container_binary
Always declare container_binary and assign it a correct value.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-11-27 16:47:40 +00:00
Sébastien Han 49e0e19056 rolling_update: update ceph_key task for container
Use the new way to create keys on containerized env as introduced by: 1098b71bda90db3dad19ac179f0ba900ccb0f953

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-11-27 16:47:40 +00:00
Sébastien Han 2814d36c93 infra playbooks: use the right container binary
Use podman or docker wether they are available or not. podman will be
prioritized over docker if present.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-11-27 16:47:40 +00:00
Guillaume Abrioux 7c99b6df6d update: fix a typo
`hostvars[groups[mon_host]]['ansible_hostname']` seems to be a typo.
That should be `hostvars[mon_host]['ansible_hostname']`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-11-26 18:22:20 +01:00
Guillaume Abrioux af78173584 rolling_update: refact set_fact `mon_host`
each monitor node should select another monitor which isn't itself.
Otherwise, one node in the monitor group won't set this fact and causes
failure.

Typical error:
```
TASK [create potentially missing keys (rbd and rbd-mirror) when mon is containerized] ***
task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-dev-update_docker_cluster/rolling_update.yml:200
Thursday 22 November 2018  14:02:30 +0000 (0:00:07.493)       0:02:50.005 *****
fatal: [mon1]: FAILED! => {}

MSG:

The task includes an option with an undefined variable. The error was: 'dict object' has no attribute u'mon2'
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-11-26 18:22:20 +01:00
Sébastien Han 4e267bee4f rolling_update: create rbd and rbd-mirror keyrings
During an upgrade ceph won't create keys that were not existing on the
previous version. So after the upgrade of let's Jewel to Luminous, once
all the monitors have the new version they should get or create the
keys. It's ok to have the task fails, especially for the rbd-mirror
key, which only appears in Nautilus.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1650572
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-11-26 18:22:20 +01:00
Guillaume Abrioux c783bc70da docker-common: rename role
rename `ceph-docker-common` role to `ceph-container-common`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-11-12 10:51:48 +01:00
Rishabh Dave 3f62fc585f don't use "role" or "roles" to include roles
Since import_role and include_role are more readable, explicit (about
the nature of inclusion) and flexible (allows placibf inclusion
anywhere) amongst the tasks, use them instead of using roles or role
keyword. Besides, these keywords also allow more arguments.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
2018-10-31 09:38:59 +01:00
Sébastien Han 44d0da0dd4 rolling_update: fix upgrade when using fqdn
CLusters that were deployed using 'mon_use_fqdn' have a different unit
name, so during the upgrade this must be used otherwise the upgrade will
fail, looking for a unit that does not exist.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1597516
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-19 13:06:56 +00:00
Guillaume Abrioux 40b7747af7 remove jewel support
As of now, we should no longer support Jewel in ceph-ansible.
The latest ceph-ansible release supporting Jewel is `stable-3.1`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-12 23:38:17 +00:00
Noah Watkins 306e308f13 Avoid using tests as filter
Fixes the deprecation warning:

  [DEPRECATION WARNING]: Using tests as filters is deprecated. Instead of
  using `result|search` use `result is search`.

Signed-off-by: Noah Watkins <nwatkins@redhat.com>
2018-10-10 04:26:33 +00:00
Guillaume Abrioux 79bd06ad28 rolling_update: add ceph-handler role
since the introduction of ceph-handler, it has to be added in
rolling_update playbook as well

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-05 13:48:04 +00:00
Guillaume Abrioux 179c4d00d7 rolling_update: ensure pgs_by_state has at least 1 entry
Previous commit c13a3c3 has removed a condition.

This commit brings back this condition which is essential to ensure we
won't hit a false positive result in the `when` condition for the check
PGs task.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-09-25 14:58:54 +00:00
Guillaume Abrioux c13a3c3492 upgrade: consider all 'active+clean' states as valid pgs
In cluster with a large number of PGs, it can be expected some of them
scrubbing, it's a normal operation.
Preventing from scrubbing operation force to set noscrub flag before a
rolling update which is a problem because it pauses an important data
integrity operation until the end of the rolling upgrade.

This commit allows an upgrade even while PGs are scrubbing.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1616066

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-09-25 12:12:06 +00:00
Sébastien Han 2e6e885bb7 rolling_upgrade: set sortbitwise properly
Running 'osd set sortbitwise' when we detect a version 12 of Ceph is
wrong. When OSD are getting updated, even though the package is updated
they won't send their updated version (12) and will stick with 10 if the
command is not applied. So we have to check if OSD are sending a version
10 and then run the command to unlock the OSDs.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1600943
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-21 12:22:32 +00:00
Sébastien Han 77a3a682f3 iscsi group name preserve backward compatibility
Recently we renamed the group_name for iscsi iscsigws where previously
it was named iscsi-gws. Existing deployments with a host file section
with iscsi-gws must continue to work.

This commit adds the old group name as a backoward compatility, no error
from Ansible should be expected, if the hostgroup is not found nothing
is played.

Close: https://bugzilla.redhat.com/show_bug.cgi?id=1619167
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-20 23:52:19 +02:00
Sébastien Han dad10e8f3f rolling_update: register container osd units
Before running the upgrade, let's call systemd to collect unit names
instead of relaying on the device list. This is more accurate and fix
the osd_auto_discovery scenario too.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1613626
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-16 11:13:12 +02:00
Sébastien Han b3266c5be2 rolling_update: set osd sortbitwise
upgrade RHCS 2 -> RHCS 3 will fail if cluster has still set
sortnibblewise,
it stay stuck on "TASK [waiting for clean pgs...]" as RHCS 3 osds will
not start if nibblewise is set.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1600943
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-07-24 17:19:02 +02:00
Vishal Kanaujia 44d514850a Rolling upgrades: Migrate to ceph-key module
This change moves ceph-mgr upgrades to using ceph-key library.
Fixes: #2758

Signed-off-by: Vishal Kanaujia <vishal.kanaujia@flipkart.com>
2018-07-03 18:22:14 +02:00
Sébastien Han 20c8065e48 ceph-iscsi: rename group iscsi_gws
Let's try to avoid using dashes as testinfra needs to be able to read
the groups.
Typically, with iscsi-gws we can't add a marker for these iscsi nodes,
using an underscore fixes the issue.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-08 10:21:54 +02:00
Guillaume Abrioux 232a16d77f rolling_update: fix facts gathering delegation
this is kind of follow up on what has been made in #2560.
See #2560 and #2553 for details.

Closes: #2708

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-06 16:36:30 +08:00
Vishal Kanaujia 08d9432454 Rolling upgrades should use norebalance flag for OSDs
The rolling upgrades playbook should have norebalance flag set for
OSDs upgrades to wait only for recovery.

Fixes: #2657
Signed-off-by: Vishal Kanaujia <vishal.kanaujia@flipkart.com>
2018-06-04 10:59:01 +02:00
Sébastien Han e91648a7af rolling_update: add role ceph-iscsi-gw
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1575829
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-05-26 02:38:47 -07:00
Sébastien Han da5b104098 rolling_update: fix get fsid for containers
When running ansible2.4-update_docker_cluster there is an issue on the
"get current fsid" task. The current task only works for
non-containerized deployment but will run all the time (even for
containerized). This currently results in the following error:

TASK [get current fsid] ********************************************************
task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-luminous-ansible2.4-update_docker_cluster/rolling_update.yml:214
Tuesday 22 May 2018  22:48:32 +0000 (0:00:02.615)       0:11:01.035 ***********
fatal: [mgr0 -> mon0]: FAILED! => {
    "changed": true,
    "cmd": [
        "ceph",
        "--cluster",
        "test",
        "fsid"
    ],
    "delta": "0:05:00.260674",
    "end": "2018-05-22 22:53:34.555743",
    "rc": 1,
    "start": "2018-05-22 22:48:34.295069"
}

STDERR:

2018-05-22 22:48:34.495651 7f89482c6700  0 -- 192.168.17.10:0/1022712 >> 192.168.17.12:6789/0 pipe(0x7f8944067010 sd=4 :42654 s=1 pgs=0 cs=0 l=1 c=0x7f894405d510).connect protocol feature mismatch, my 83ffffffffffff < peer 481dff8eea4fffb missing 400000000000000
2018-05-22 22:48:34.495684 7f89482c6700  0 -- 192.168.17.10:0/1022712 >> 192.168.17.12:6789/0 pipe(0x7f8944067010 sd=4 :42654 s=1 pgs=0 cs=0 l=1 c=0x7f894405d510).fault

This is not really representative on the real error since the 'ceph' cli is available on that machine.
On other environments we will have something like "command not found: ceph".

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-05-23 04:44:12 +02:00