Commit Graph

310 Commits (ebc901c6af67300f7b7b8da1b2d0a74147798da5)

Author SHA1 Message Date
Guillaume Abrioux 741ef74629 update: fix a typo
`hostvars[groups[mon_host]]['ansible_hostname']` seems to be a typo.
That should be `hostvars[mon_host]['ansible_hostname']`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7c99b6df6d)
2018-11-26 19:36:30 +00:00
Guillaume Abrioux 9022f83450 rolling_update: refact set_fact `mon_host`
each monitor node should select another monitor which isn't itself.
Otherwise, one node in the monitor group won't set this fact and causes
failure.

Typical error:
```
TASK [create potentially missing keys (rbd and rbd-mirror) when mon is containerized] ***
task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-dev-update_docker_cluster/rolling_update.yml:200
Thursday 22 November 2018  14:02:30 +0000 (0:00:07.493)       0:02:50.005 *****
fatal: [mon1]: FAILED! => {}

MSG:

The task includes an option with an undefined variable. The error was: 'dict object' has no attribute u'mon2'
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit af78173584)
2018-11-26 19:36:30 +00:00
Sébastien Han 5c9aa5ed66 rolling_update: create rbd and rbd-mirror keyrings
During an upgrade ceph won't create keys that were not existing on the
previous version. So after the upgrade of let's Jewel to Luminous, once
all the monitors have the new version they should get or create the
keys. It's ok to have the task fails, especially for the rbd-mirror
key, which only appears in Nautilus.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1650572
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 4e267bee4f)
2018-11-26 19:36:30 +00:00
Sébastien Han d814644c4a rolling_update: fix upgrade when using fqdn
CLusters that were deployed using 'mon_use_fqdn' have a different unit
name, so during the upgrade this must be used otherwise the upgrade will
fail, looking for a unit that does not exist.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1597516
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 44d0da0dd4)
2018-10-24 12:42:14 +00:00
Noah Watkins e089f46607 Stringify ceph_docker_image_tag
This could be a numeric input, but is treated like a string leading to
runtime errors.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1635823

Signed-off-by: Noah Watkins <nwatkins@redhat.com>
(cherry picked from commit 8dcc8d1434)
2018-10-16 14:35:08 +02:00
Noah Watkins 75c9130865 Avoid using tests as filter
Fixes the deprecation warning:

  [DEPRECATION WARNING]: Using tests as filters is deprecated. Instead of
  using `result|search` use `result is search`.

Signed-off-by: Noah Watkins <nwatkins@redhat.com>
(cherry picked from commit 306e308f13)
2018-10-16 14:35:08 +02:00
Sébastien Han d0b03f6faa switch: copy initial mon keyring
We need to copy this key into /etc/ceph so when ceph-docker-common runs
it can fetch it to the ansible server. Previously the task wasn't not
failing because `fail_on_missing` was False before 2.5, so now it's True
hence the failure.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit bae0f41705)
2018-10-15 13:59:21 +02:00
Guillaume Abrioux da05c1fd31 switch: support migration when cluster is scrubbing
Similar to c13a3c3 we must allow scrubbing when running this playbook.

In cluster with a large number of PGs, it can be expected some of them
scrubbing, it's a normal operation.
Preventing from scrubbing operation force to set noscrub flag.

This commit allows to switch from non containerized to containerized
environment even while PGs are scrubbing.

Closes: #3182

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 54b02fe187)
2018-10-15 13:59:21 +02:00
Sébastien Han 513608cebe switch: allow switch big clusters (more than 99 osds)
The current regex had a limitation of 99 OSDs, now this limit has been
removed and regardless the number of OSDs they will all be collected.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1630430
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 9fccffa1ca)
(cherry picked from commit d5e57af23d)
2018-10-15 10:33:56 +02:00
Guillaume Abrioux 79a5725cf6 purge: actually remove of /var/lib/ceph/*
38dc20e74b introduced a bug in the purge
playbooks because using `*` in `command` module doesn't work.

`/var/lib/ceph/*` files are not purged it means there is a leftover.

When trying to redeploy a cluster, it failed because monitor daemon was
detecting existing keyring, therefore, it assumed a cluster already
existed.

Typical error (from container output):

```
Sep 26 13:18:16 mon0 docker[31316]: 2018-09-26 13:18:16  /entrypoint.sh: Existing mon, trying to rejoin cluster...
Sep 26 13:18:16 mon0 docker[31316]: 2018-09-26 13:18:16.9323937f15b0d74700 -1 auth: unable to find a keyring on /etc/ceph/test.client.admin.keyring,/etc/ceph/test.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,:(2) No such file or directory
Sep 26 13:18:23 mon0 docker[31316]: 2018-09-26 13:18:23  /entrypoint.sh:
SUCCESS
```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1633563

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 144c92b21f)
2018-09-27 21:42:43 +02:00
Guillaume Abrioux fdc2d7681d rolling_update: ensure pgs_by_state has at least 1 entry
Previous commit c13a3c3 has removed a condition.

This commit brings back this condition which is essential to ensure we
won't hit a false positive result in the `when` condition for the check
PGs task.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 179c4d00d7)
2018-09-26 10:58:51 +00:00
Guillaume Abrioux f008f40628 upgrade: consider all 'active+clean' states as valid pgs
In cluster with a large number of PGs, it can be expected some of them
scrubbing, it's a normal operation.
Preventing from scrubbing operation force to set noscrub flag before a
rolling update which is a problem because it pauses an important data
integrity operation until the end of the rolling upgrade.

This commit allows an upgrade even while PGs are scrubbing.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1616066

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c13a3c3492)
2018-09-25 14:13:16 +00:00
Guillaume Abrioux 2975387373 shrink-osd: fix purge osd on containerized deployment
ce1dd8d introduced the purge osd on containers but it was incorrect.

`resolve parent device` and `zap ceph osd disks` tasks must be delegated to
their respective OSD nodes.
Indeed, they were run on the ansible node, it means it was trying to
resolve parent devices from this node where it should be done on OSD
nodes.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1612095

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4159326a18)
2018-09-14 18:22:12 +00:00
Sébastien Han 6db4fceba4 purge: only purge /var/lib/ceph content
Sometime /var/lib/ceph is mounted on a device so we won't be able to
remove it (device busy) so let's remove its content only.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1615872
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 38dc20e74b)
2018-09-04 14:06:49 +02:00
Sébastien Han b187c508e7 rolling_upgrade: set sortbitwise properly
Running 'osd set sortbitwise' when we detect a version 12 of Ceph is
wrong. When OSD are getting updated, even though the package is updated
they won't send their updated version (12) and will stick with 10 if the
command is not applied. So we have to check if OSD are sending a version
10 and then run the command to unlock the OSDs.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1600943
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 2e6e885bb7)
2018-08-21 14:21:29 +00:00
Sébastien Han 4ef9d42e86 iscsi group name preserve backward compatibility
Recently we renamed the group_name for iscsi iscsigws where previously
it was named iscsi-gws. Existing deployments with a host file section
with iscsi-gws must continue to work.

This commit adds the old group name as a backoward compatility, no error
from Ansible should be expected, if the hostgroup is not found nothing
is played.

Close: https://bugzilla.redhat.com/show_bug.cgi?id=1619167
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 77a3a682f3)
2018-08-21 00:04:37 +02:00
Sébastien Han 988b5a81d3 take-over-existing-cluster: do not call var_files
We were using var_files long ago when default variables were not in
ceph-defaults, now the role exists this is not need. Moreover having
these two var files added:

- roles/ceph-defaults/defaults/main.yml
- group_vars/all.yml

Will create collision and override necessary variables.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1555305
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit b738706810)
2018-08-20 14:47:32 +02:00
Andrew Schoen f183be0328 lv-create: use copy instead of the template module
The copy module does in fact do variable interpolation so we do not need
to use the template module or keep a template in the source.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 04df3f0802)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-16 17:01:41 +02:00
Andrew Schoen 6081aea5a1 lv-create: add an example logfile_path config option in lv_vars.yml
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 131796f275)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-16 17:01:41 +02:00
Andrew Schoen 634cc14393 lv-teardown: fail silently if lv_vars.yml is not found
This allows user to opt out of using lv_vars.yml and load configuration
from other sources.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit b0bfc17351)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-16 17:01:41 +02:00
Andrew Schoen 09e4ef3371 lv-teardown: set become: true at the playbook level
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 8424858b40)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-16 17:01:41 +02:00
Andrew Schoen 293aaaf758 lv-create: fail silenty if lv_vars.yml is not found
If a user decides to to use the lv_vars.yml file then it should fail
silenty so that configuration can be picked up from other places.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit e43eec57bb)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-16 17:01:41 +02:00
Andrew Schoen 2648751488 lv-create: set become: true at the playbook level
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit fde47be13c)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-16 17:01:41 +02:00
Andrew Schoen 9af842467e lv-create: use the template module to write log file
The copy module will not expand the template and render the variables
included, so we must use template.

Creating a temp file and using it locally means that you must run the
playbook with sudo privledges, which I don't think we want to require.
This introduces a logfile_path variable that the user can use to control
where the logfile is written to, defaulting to the cwd.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 35301b35af)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-16 17:01:41 +02:00
Neha Ojha 7f44244d23 infrastructure-playbooks/vars/lv_vars.yaml: minor fixes
Signed-off-by: Neha Ojha <nojha@redhat.com>
(cherry picked from commit 909b38da82)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-16 17:01:41 +02:00
Neha Ojha db0e06cbb6 infrastructure-playbooks/lv-create.yml: use tempfile to create logfile
Signed-off-by: Neha Ojha <nojha@redhat.com>
(cherry picked from commit f65f3ea89f)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-16 17:01:41 +02:00
Neha Ojha 89d950fd3c infrastructure-playbooks/lv-create.yml: add lvm_volumes to suggested paste
Signed-off-by: Neha Ojha <nojha@redhat.com>
(cherry picked from commit 65fdad0723)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-16 17:01:41 +02:00
Neha Ojha 1a0f7baf21 infrastructure-playbooks/lv-create.yml: copy without using a template file
Signed-off-by: Neha Ojha <nojha@redhat.com>
(cherry picked from commit 50a6d8141c)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-16 17:01:41 +02:00
Neha Ojha f1245e6011 infrastructure-playbooks/lv-create.yml: don't use action to copy
Signed-off-by: Neha Ojha <nojha@redhat.com>
(cherry picked from commit 186c4e11c7)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-16 17:01:41 +02:00
Neha Ojha 21902f0113 infrastructure-playbooks: standardize variable usage with a space after brackets
Signed-off-by: Neha Ojha <nojha@redhat.com>
(cherry picked from commit 9d43806df9)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-16 17:01:41 +02:00
Neha Ojha fb06c6cb80 vars/lv_vars.yaml: remove journal_device
Signed-off-by: Neha Ojha <nojha@redhat.com>
(cherry picked from commit e0293de3e7)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-16 17:01:41 +02:00
Ali Maredia 10da777634 infrastructure-playbooks: playbooks for creating LVs for bucket indexes and journals
These playbooks create and tear down logical
volumes for OSD data on HDDs and for a bucket index and
journals on 1 NVMe device.

Users should follow the guidelines set in var/lv_vars.yaml

After the lv-create.yml playbook is run, output is
sent to /tmp/logfile.txt for copy and paste into
osds.yml

Signed-off-by: Ali Maredia <amaredia@redhat.com>
(cherry picked from commit 1f018d8612)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-16 17:01:41 +02:00
Sébastien Han 6f1499800f rolling_update: register container osd units
Before running the upgrade, let's call systemd to collect unit names
instead of relaying on the device list. This is more accurate and fix
the osd_auto_discovery scenario too.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1613626
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit dad10e8f3f)
2018-08-16 13:35:23 +00:00
Jeffrey Zhang 19c7ca1983 Use /var/lib/ceph/osd folder to filter osd mount point
In some case, use may mount a partition to /var/lib/ceph, and umount
it will be failure and no need to do so too.

Signed-off-by: Jeffrey Zhang <zhang.lei.fly@gmail.com>
(cherry picked from commit 85cc61a6d9)
2018-08-14 14:55:56 +00:00
Guillaume Abrioux 9a013ab333 tests: resync iscsigw group name with master
let's align the name of that group in stable-3.1 with master branch.

Not having the same group name on different branches is confusing and
make some nightlies job failing in the CI.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-08-13 12:24:59 +02:00
Sébastien Han 4785799110 rolling_update: add role ceph-iscsi-gw
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1575829
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit e91648a7af)
2018-08-10 14:38:19 +02:00
Sébastien Han 31dd4eeecf rolling_update: set osd sortbitwise
upgrade RHCS 2 -> RHCS 3 will fail if cluster has still set
sortnibblewise,
it stay stuck on "TASK [waiting for clean pgs...]" as RHCS 3 osds will
not start if nibblewise is set.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1600943
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit b3266c5be2)
2018-08-02 14:53:06 +00:00
Sébastien Han 36f24a8054 shrink-osd: purge osd on containerized deployment
Prior to this commit we were only stopping the container, but now we
also purge the devices.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1572933
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit ce1dd8d2b3)
2018-07-23 14:32:03 +02:00
Guillaume Abrioux f8a4c57c03 common: switch from docker module to docker_container
As of ansible 2.4, `docker` module has been removed (was deprecated
since ansible 2.1).
We must switch to `docker_container` instead.

See: https://docs.ansible.com/ansible/latest/modules/docker_module.html#docker-module

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d0746e0858)
2018-07-11 10:56:48 +02:00
Guillaume Abrioux c533556935 rolling_update: fix facts gathering delegation
this is kind of follow up on what has been made in #2560.
See #2560 and #2553 for details.

Closes: #2708

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 232a16d77f)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-07 09:46:42 +02:00
Paul Cuzner bdff7204f2 Add privilege escalation to iscsi purge tasks
Without the escalation, invocation from non-root
users with fail when accessing the rados config
object, or when attempting to log to /var/log

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1549004

Signed-off-by: Paul Cuzner <pcuzner@redhat.com>
(cherry picked from commit 2890b57cfc)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-05-25 03:52:06 -07:00
Sébastien Han 37693870df rolling_update: fix get fsid for containers
When running ansible2.4-update_docker_cluster there is an issue on the
"get current fsid" task. The current task only works for
non-containerized deployment but will run all the time (even for
containerized). This currently results in the following error:

TASK [get current fsid] ********************************************************
task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-luminous-ansible2.4-update_docker_cluster/rolling_update.yml:214
Tuesday 22 May 2018  22:48:32 +0000 (0:00:02.615)       0:11:01.035 ***********
fatal: [mgr0 -> mon0]: FAILED! => {
    "changed": true,
    "cmd": [
        "ceph",
        "--cluster",
        "test",
        "fsid"
    ],
    "delta": "0:05:00.260674",
    "end": "2018-05-22 22:53:34.555743",
    "rc": 1,
    "start": "2018-05-22 22:48:34.295069"
}

STDERR:

2018-05-22 22:48:34.495651 7f89482c6700  0 -- 192.168.17.10:0/1022712 >> 192.168.17.12:6789/0 pipe(0x7f8944067010 sd=4 :42654 s=1 pgs=0 cs=0 l=1 c=0x7f894405d510).connect protocol feature mismatch, my 83ffffffffffff < peer 481dff8eea4fffb missing 400000000000000
2018-05-22 22:48:34.495684 7f89482c6700  0 -- 192.168.17.10:0/1022712 >> 192.168.17.12:6789/0 pipe(0x7f8944067010 sd=4 :42654 s=1 pgs=0 cs=0 l=1 c=0x7f894405d510).fault

This is not really representative on the real error since the 'ceph' cli is available on that machine.
On other environments we will have something like "command not found: ceph".

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit da5b104098)
2018-05-22 23:22:38 -07:00
Sébastien Han ddafad3f32 switch: disable ceph-disk units
During the transition from jewel non-container to container old ceph
units are disabled. ceph-disk can still remain in some cases and will
appear as 'loaded failed', this is not a problem although operators
might not like to see these units failing. That's why we remove them if
we find them.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1577846
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 49a4712485)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-05-22 17:08:01 -07:00
Guillaume Abrioux ec528b9278 purge_cluster: fix dmcrypt purge
dmcrypt devices aren't closed properly, therefore, it may fail when
trying to redeploy after a purge.

Typical errors:

```
ceph-disk: Cannot discover filesystem type: device /dev/sdb1: Command
'/sbin/blkid' returned non-zero exit status 2
```

```
ceph-disk: Error: unable to read dm-crypt key:
/var/lib/ceph/osd-lockbox/c6e01af1-ed8c-4d40-8be7-7fc0b4e104cf:
/etc/ceph/dmcrypt-keys/c6e01af1-ed8c-4d40-8be7-7fc0b4e104cf.luks.key
```

Closing properly dmcrypt devices allows to redeploy without error.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1492242

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9801bde4d4)
2018-05-22 16:44:06 +02:00
Guillaume Abrioux 17ee4e92f0 purge_cluster: wipe all partitions
In order to ensure there is no leftover after having purged a cluster,
we must wipe all partitions properly.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1492242

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a9247c4de7)
2018-05-22 16:44:06 +02:00
Guillaume Abrioux 7d0e072da4 purge_cluster: fix bug when building device list
there is some leftover on devices when purging osds because of a invalid
device list construction.

typical error:
```
changed: [osd3] => (item=/dev/sda sda1) => {
    "changed": true,
    "cmd": "# if the disk passed is a raw device AND the boot system disk\n if parted -s \"/dev/sda sda1\" print | grep -sq boot; then\n echo \"Looks like /dev/sda sda1 has a boot partition,\"\n echo \"if you want to delete specific partitions point to the partition instead of the raw device\"\n echo \"Do not use your system disk!\"\n exit 1\n fi\n echo sgdisk -Z \"/dev/sda sda1\"\n echo dd if=/dev/zero of=\"/dev/sda sda1\" bs=1M count=200\n echo udevadm settle --timeout=600",
    "delta": "0:00:00.015188",
    "end": "2018-05-16 12:41:40.408597",
    "item": "/dev/sda sda1",
    "rc": 0,
    "start": "2018-05-16 12:41:40.393409"
}

STDOUT:

sgdisk -Z /dev/sda sda1
dd if=/dev/zero of=/dev/sda sda1 bs=1M count=200
udevadm settle --timeout=600

STDERR:

Error: Could not stat device /dev/sda sda1 - No such file or directory.
```

the devices list in the task `resolve parent device` isn't built
properly because the command used to resolve the parent device doesn't
return the expected output

eg:

```
changed: [osd3] => (item=/dev/sda1) => {
    "changed": true,
    "cmd": "echo /dev/$(lsblk -no pkname \"/dev/sda1\")",
    "delta": "0:00:00.013634",
    "end": "2018-05-16 12:41:09.068166",
    "item": "/dev/sda1",
    "rc": 0,
    "start": "2018-05-16 12:41:09.054532"
}

STDOUT:

/dev/sda sda1
```

For instance, it will result with a devices list like:
`['/dev/sda sda1', '/dev/sdb', '/dev/sdc sdc1']`
where we expect to have:
`['/dev/sda', '/dev/sdb', '/dev/sdc']`

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1492242

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9cad113e2f)
2018-05-22 16:44:06 +02:00
Guillaume Abrioux fb0304230b take-over: fix bug when trying to override variable
A customer has been facing an issue when trying to override
`monitor_interface` in inventory host file.
In his use case, all nodes had the same interface for
`monitor_interface` name except one. Therefore, they tried to override
this variable for that node in the inventory host file but the
take-over-existing-cluster playbook was failing when trying to generate
the new ceph.conf file because of undefined variable.

Typical error:

```
fatal: [srvcto103cnodep01]: FAILED! => {"failed": true, "msg": "'dict object' has no attribute u'ansible_bond0.15'"}
```

Including variables like this `include_vars: group_vars/all.yml` prevent
us from overriding anything in inventory host file because it
overwrites everything you would have defined in inventory.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1575915

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 415dc0a29b)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-05-18 11:21:33 +02:00
Sébastien Han 1556b69e13 rolling_update: move osd flag section
During a minor update from a jewel to a higher jewel version (10.2.9 to
10.2.10 for example) osd flags don't get applied because they were done
in the mgr section which is skipped in jewel since this daemons does not
exist.
Moving the set flag section after all the mons have been updated solves
that problem.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1548071
Co-authored-by: Tomas Petr <tpetr@redhat.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit d80a871a07)
2018-05-17 11:54:12 +02:00
Guillaume Abrioux 6338f749a9 rolling_update: fix dest path for mgr keys fetching
the role `ceph-mgr` that is played later in the playbook fails because
the destination path for the fetched keys is wrong.
This patch fix the destination path used in the task `fetch ceph mgr
key(s)` so there is no mismatch.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1574995

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1b4c3f292d)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-05-15 19:35:09 +02:00
Guillaume Abrioux adeecc51f8 switch: fix ceph_uid fact for osd
In addition to b324c17 this commit fix the ceph uid for osd role in the
switch from non containerized to containerized playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-04-30 08:15:18 +02:00