When update from a minor Jewel version to another, the playbook will
fail on the task "fail if no mgr host is present in the inventory".
This now can be worked around by running Ansible with_items
-e jewel_minor_update=true
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1535382
Signed-off-by: Sébastien Han <seb@redhat.com>
Using `lsblk` to resolv the parent device is better than just removing the last
char when passing it to the zap container.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Avoid using `awk` to get the different devices from the partlabel.
Using `blkid` is more readable.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
If we're working with a jewel cluster then this service will not exist.
This is mainly a problem with CI testing because our tests are setup to
work with both jewel and luminous, meaning that eventhough we want to
test jewel we still have a nfs-ganesha host in the test causing these
tasks to run.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
The rolling update playbook was attempting to stop the
nfs-ganesha service on nodes where jewel is still installed.
The nfs-ganesha service did not exist in jewel so the task fails.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
`bluestore_purge_osd_non_container` scenario is failing because it
keeps old osd_uuid information on devices and cause the `ceph-disk activate`
to fail when trying to redeploy a new cluster after a purge.
typical error seen :
```
2017-12-13 14:29:48.021288 7f6620651d00 -1
bluestore(/var/lib/ceph/tmp/mnt.2_3gh6/block) _check_or_set_bdev_label
bdev /var/lib/ceph/tmp/mnt.2_3gh6/block fsid
770080e2-20db-450f-bc17-81b55f167982 does not match our fsid
f33efff0-2f07-4203-ad8d-8a0844d6bda0
```
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
There is no need to ask for root on the local action. This will prompt
for a password the current user is not part of sudoers. That's
unnecessary anyways.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1516947
Signed-off-by: Sébastien Han <seb@redhat.com>
this task hangs because `{{ inventory_hostname }}` doesn't resolv to an
actual ip address.
Using `hostvars[inventory_hostname]['ansible_default_ipv4']['address']`
should fix this because it will reach the node with its actual IP
address.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
purge-docker-cluster must remove all osd_disk_prepare logs in
`{{ ceph_osd_docker_run_script_path }}`, otherwise if you purge your
cluster and try to redeploy it, osds will fail to start since because it
will try to retrieve find a partition uuid which doesn't exist.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1510470
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The ansible inventory could have more than just ceph-ansible hosts, so
we shouldnt use "hosts: all", also only grab one file when getting
the ceph cluster name instead of failing when there is more than one
file in /etc/ceph. Also fix location of the ceph.conf template
Rebooting servers is really intrusive and perhaps this is not what the
operator wants. So we disable the reboot by default now. Note that the
reboot might not happen all the time.
It can be enabled by default by running the purge playbook with -e
reboot_osd_node=True
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1505011
Signed-off-by: Sébastien Han <seb@redhat.com>
During purge osd, the containers are not stopped because of a typo, as a
result, all the devices can't be unmounted later.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
stable-3.0 brought numerous changes in ceph-ansible variables, this PR
aims to maintain backward compatibility for someone running stable-2.2
upgrading to stable-3.0 but keeps its groups_vars untouched.
We will then determine the right options to make sure the upgrade works
but we are expecting that new variables should be used.
We will drop this in a near future, maybe 3.1 or 3.2.
Signed-off-by: Sébastien Han <seb@redhat.com>
nfs nodes can't be upgraded from jewel to luminous because ceph-nfs role
is skipped because of the condition `when:
"ceph_release_num[ceph_release] >= ceph_release_num.luminous"`. Indeed,
package is upgraded in `ceph-nfs` role, therefore,
`ceph_release` is still set to the old version. It means the when can't
be satisfied.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
mgr nodes can't be upgraded from jewel to luminous because ceph-mgr role
is skipped because of the condition `when:
"ceph_release_num[ceph_release] >= ceph_release_num.luminous"`. Indeed,
ceph-mgr package is upgraded in `ceph-mgr` role, therefore,
`ceph_release` is still set to the old version. It means the when can't
be satisfied.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 302e563601cd6820b1ae44fabdfb1506688c7c9b)
- Add upgrade support for rbd mirror and nfs daemons.
- Only works with systemd (remove sysvinit and upstart occurence)
- A bit of cleanup
Signed-off-by: Sébastien Han <seb@redhat.com>
This patch changes the `when:` keys so that they have no jinja2
delimiters. This avoids Ansible warnings which could turn into
errors in a future Ansible release.
Ansible throws warnings when using yum/dnf/rpm with the command
module:
[WARNING]: Consider using yum module rather than running yum
This patch adds the `warn: no` argument to suppress the warnings
in the Ansible output.
This playbook can replace failed OSD in containerized and
non-containerized env.
The current limitation is that it won't allow you to choose between
filestore/bluestore and will do collocation as well.
Signed-off-by: Sébastien Han <seb@redhat.com>
Using a condition when osd_scenario == 'non-collocated' was wrong since
these partitions can be collocated on a single device also. Removing the
check makes the purge of these partitions.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1499871
Signed-off-by: Sébastien Han <seb@redhat.com>
The current inclusion of purge-iscsi-gateways.yml in purge-cluster.yml
is not working well and blocking the CI too. So removing it from
purge-cluster.yml and re-add the original purge-iscsi-gateways.yml.
Signed-off-by: Sébastien Han <seb@redhat.com>
Use the pg check before doing the pg check, not on the quorum check.
Also never quote int when doing comparaison.
Signed-off-by: Sébastien Han <seb@redhat.com>
The shell wildcard expansion of non-existing paths fails on zsh making
the whole script fail. We can use file module with with_fileglob to
alleviate the problem instead.
Signed-off-by: Boris Ranto <branto@redhat.com>
The systemd can't stop services if the unit files were removed before
the cluster was purged. We should just ignore these.
Signed-off-by: Boris Ranto <branto@redhat.com>
Also we now play ceph-config to have everything being generated for new
daemons bootstrap during upgrade.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1497959
Signed-off-by: Sébastien Han <seb@redhat.com>
Even if the task is skipped, ansible registers the var as 'skipped' so
this task the task using this variable for its next usage.
Signed-off-by: Sébastien Han <seb@redhat.com>
rbd-mirror containers are not stopped in purge-docker-cluster playbook
because of the wrong name used.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
unti now, mgr nodes are not managed by purge-cluster.yml, therefore it
breaks scenario like purge_cluster.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The old name is used in `rolling_update.yml` and
`purge-docker-cluster.yml`, it breaks the
`test_rgw_service_is_running()` test.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This commits allows us to run
switch-from-non-containerized-to-containerized-ceph-daemons.yml multiple
times.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1489353
Signed-off-by: Sébastien Han <seb@redhat.com>
If devices is passed through an extra var this register won't work so
let's only register the var is devices is not defined.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1489099
Signed-off-by: Sébastien Han <seb@redhat.com>
Prior command was avoiding the lockbox mountpoint and the playbook was
failing with:
rmtree failed: [Errno 30] Read-only file system:
'/var/lib/ceph/osd-lockbox/4e9d8052-87c2-4fde-a56c-b8c108a3eefc/key-management-mode'
Signed-off-by: Sébastien Han <seb@redhat.com>
we need to force the value of `docker` variable which is initially set
to `false` since it's a migration from non-containerized to
containerized cluster.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
We must mask the image so we are sure that even if the system reboots
then the OSDs won't start.
Also remove Ceph udev rules if found on the system prior to deploy
containers. If we don't do this we are exposed to conflicts between udev
rules and sytemd unit files.
Also add the CI will now test the migration from a non-containerized cluster to a
containerized cluster.
Signed-off-by: Sébastien Han <seb@redhat.com>
Monitor removal from the monmap is not immediate, so let's wait a little
bit and then fail if the monitor is still in the monmap.
We try twice in total with 10 sec intervals.
Signed-off-by: Sébastien Han <seb@redhat.com>
Move untested/with few confidence playbooks in a untested-by-ci
directory.
Also removing this directory from the package build.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1461551
Signed-off-by: Sébastien Han <seb@redhat.com>
Prior to this patch, we were applying the osd flags like this:
"
General pre tasks
Set flags
Upgrade OSDs on a host
Unset flags <-- this triggers pending scrub to start
Set flags
Upgrade OSDs on a hosts
Unset flags <-- this triggers pending scrub to start
.
.
.
General post tasks
"
Now instead, we apply the flag once before starting the OSD update and
unset them once the last OSD is finished.
"
General pre tasks
Set flags and wait for any scrubs to finish
Upgrade OSDs on a host
Upgrade OSDs on a host
.
.
.
Unset flags
General post tasks
"
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1450754
Signed-off-by: Sébastien Han <seb@redhat.com>
Co-Authored-by: Guillaume Abrioux <gabrioux@redhat.com>
In our test case we don't have any pgs, thus the check fails. The check
always returns an empty array, which makes the comparaison failing.
Signed-off-by: Sébastien Han <seb@redhat.com>
This commit eases the use of the
infrastructure-playbooks/switch-from-non-containerized-to-containerized-ceph-daemons.yml
playbook. We basically run it with a couple of pre-tasks and then we let
the playbook run the docker roles.
It obviously expect to have proper variables configured in order to
work.
Signed-off-by: Sébastien Han <seb@redhat.com>
In the switch to containers migration there were broken references
to ceph_mon_docker_subnet variable, replaced with public_network.
Also fixes references to ceph_mon_docker_extra_env setting for it
a default as it could be undefined.
There is only two main scenarios now:
* collocated: everything remains on the same device:
- data, db, wal for bluestore
- data and journal for filestore
* non-collocated: dedicated device for some of the component
Signed-off-by: Sébastien Han <seb@redhat.com>
This will give us more flexibility and avoid a lot of useless when
skipping all tasks from a non-desired role.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
remove `ceph_mon_docker_interface` and use `monitor_interface` instead
for both containerized and non-containerized deployment.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>