Commit Graph

527 Commits (4535985188dcc656ff4da60318dc07b44eabf3a6)

Author SHA1 Message Date
Guillaume Abrioux 8e6ef818a2 filestore-to-bluestore: ensure all dm are closed
This commit adds a task to ensure device mappers are well closed when
lvm batch scenario is used.
Otherwise, OSDs can't be redeployed given that devices that are rejected
by ceph-volume because they are locked.

Adding a condition `devices | default([]) | length > 0` to remove these
dm only when using lvm batch scenario.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-12-11 09:04:41 -05:00
Guillaume Abrioux 51d601193e filestore-to-bluestore: force OSDs to be marked down
Otherwise, sometimes it can take a while for an OSD to be seen as down
and causes the `ceph osd purge` command to fail.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-12-11 09:04:41 -05:00
Guillaume Abrioux e3305e6bb6 filestore-to-bluestore: do not use --destroy
Do not use `--destroy` when zapping a device.
Otherwise, it destroys VGs while they are still needed to redeploy the
OSDs.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-12-11 09:04:41 -05:00
Guillaume Abrioux 4833b85e04 filestore-to-bluestore: add non containerized support
This commit adds the non containerized context support to the
filestore-to-bluestore.yml infrastructure playbook.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1729267

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-12-11 09:04:41 -05:00
Guillaume Abrioux 6d9ca6b05b shrink-osd: support fqdn in inventory
When using fqdn in inventory, that playbook fails because of some tasks
using the result of ceph osd tree (which returns shortname) to get
some datas in hostvars[].

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1779021

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-12-09 10:52:38 -05:00
Guillaume Abrioux 332c39376b switch_to_containers: exclude clients nodes from facts gathering
just like site.yml and rolling_update, let's exclude clients node from
the fact gathering.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-12-09 10:49:13 -05:00
Guillaume Abrioux c7708eb458 update: restart iscsigws daemons after upgrade
In containerized context, containers aren't stopped early in the
sequence.
It means they aren't restarted after the upgrade because the task is
just checking the daemon status is started (eg: `state: started`).

This commit also removes the task which ensure services are started
because it's already done in the role ceph-iscsigw.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-12-05 13:02:06 -05:00
Guillaume Abrioux 451c5ca934 upgrade: add dashboard deployment
when upgrading from RHCS 3, dashboard has obviously never been deployed
and it forces us to deploy it later manually.
This commit adds the dashboard deployment as part of the upgrade to
RHCS 4.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1779092

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-12-05 13:02:06 -05:00
Dimitri Savineau 89f6cc54a2 purge-cluster: add podman support
The podman support was added to the purge-container-cluster playbook but
containers are always used for the dashboard even on non containerized
deployment.
This commits adds the podman support on purging the dashboard resources
in the purge-cluster playbook.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-12-04 14:15:12 -05:00
Guillaume Abrioux f5a81b1790 purge: fix symlink to purge-container-cluster
ceph/ceph-ansible#4805 introduced a symlink to
purge-container-cluster.yml playbook which is broken.

This commit fixes it.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-12-04 09:38:34 +01:00
Guillaume Abrioux 7bc7e3669d purge: rename playbook (container)
Since we now support podman, let's rename the playbook so it's more
generic.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-12-03 11:10:21 -05:00
Guillaume Abrioux b18476a1a6 purge: do not try to stop docker when binary is podman
If the container binary is podman, we shouldn't try to stop docker here.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-12-03 13:29:52 +01:00
Guillaume Abrioux fe5ffe589e facts: isolate container_binary facts
in order to be able to call container_binary without having to run the
whole ceph-facts role.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-12-03 13:29:52 +01:00
Guillaume Abrioux d23383a820 purge: remove docker_* task
All containers are removed when systemd stops them.
There is no need to call this module in purge container playbook.

This commit also removes all docker_image task and remove all container
images in the final cleanup play.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1776736

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-12-03 13:29:52 +01:00
Guillaume Abrioux a43a872105 docker2podman: import ceph-handler role
This is needed to avoid following error:

```
ERROR! The requested handler 'restart ceph mons' was not found in either the main handlers list nor in the listening handlers list
```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1777829

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-12-02 09:11:12 -05:00
Guillaume Abrioux 7fe0d55eff docker2podman: do not hardcode group name
let's use `client_group_name` instead of hardcoding the name.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-12-02 09:11:12 -05:00
Guillaume Abrioux 6526a25ab5 docker2podman: import ceph-defaults in first play
We must import this role in the first play otherwise the first call to
`client_group_name`fails.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1777829

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-12-02 09:11:12 -05:00
Dimitri Savineau 39cfe0aa65 switch_to_containers: fix umount ceph partitions
When a container is already running on a non containerized node then the
umount ceph partition task is skipped.
This is due to the container ps command which always returns 0 even if
the filter matches nothing.

We should run the umount task when:
1/ the container command is failing (not installed) : rc != 0
2/ the container command reports running ceph-osd containers : rc == 0

Also we should not fail on the ceph directory listing.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1616159

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-12-02 09:19:50 +01:00
Guillaume Abrioux 0441812959 purge/update: remove backward compatibility legacy
This was introduced in 3.1 and marked as deprecation
We can definitely drop it in stable-4.0

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-11-27 10:27:43 -05:00
Guillaume Abrioux c878e99589 update: only run post osd upgrade play on 1 mon
There is no need to run these tasks n times from each monitor.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-11-20 09:22:19 -05:00
Guillaume Abrioux 548db78b95 update: use flags noout and nodeep-scrub only
1. set noout and nodeep-scrub flags,
2. upgrade each OSD node, one by one, wait for active+clean pgs
3. after all osd nodes are upgraded, unset flags

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Rachana Patel <racpatel@redhat.com>
2019-11-20 09:22:19 -05:00
Guillaume Abrioux 206ee589d6 update: reset flags before and after each osd node upgrade
It might be possible at some point even with osd flags `noout` and
`norebalance` set the PGs states can change depending on the amount of data
written meantime. It means the check for PGs state will fail.

This commit changes the way we set those flags:
we set them before an OSD node upgrade and unset them before the PGs
state check so they can recover.

Fixes: #3961

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-11-08 09:10:52 -05:00
Guillaume Abrioux 3cfcc7a105 purge: use sysfs to unmap rbd devices
in containerized context, using the binary provided in atomic os won't
work because it's an old version provided by ceph-common based on
10.2.5.
Using a container could be an idea but for large cluster with hundreds
of client nodes, that would require to pull the image of each of them
just to unmap the rbd devices.

Let's use the sysfs method in order to avoid any issue related to ceph
version that is shipped on the host.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1766064

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-11-08 09:06:11 -05:00
Dimitri Savineau 34b03d1873 add-{mon,osd}: run raw install python tasks
If the new mon/osd node doesn't have python installed then we need to
execute the tasks from raw_install_python.yml.

Closes: #4368

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-11-07 14:04:26 +01:00
Guillaume Abrioux e9823f319b update: add default values when setting fact
This commit adds a default value in the `with_dict` because when using
python 2.7, if a task using a `with_dict` has a condition, it is
evaluated anyway whereas in python 3 it isn't.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1766499

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-10-29 14:45:28 -04:00
Dimitri Savineau 2ca79fcc99 rolling_update: remove default filter on mds group
There's no need to use the default filter on active/standby groups
because if the group doesn't exist then the play is just skipped.

Currently this generates warnings like:

[WARNING]: Could not match supplied host pattern, ignoring: |
[WARNING]: Could not match supplied host pattern, ignoring: default([])

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-10-28 15:02:50 +01:00
Dimitri Savineau f1f2352c79 rolling_update: fix active mds host value
The active mds host should be based on the inventory hostname and not on
the ansible hostname.
The value returns under the mdsmap structure is based on the OS hostname
so we need to find the right node in the inventory with this value when
doing operation on inventory nodes.

Othewise we could see error like:

The task includes an option with an undefined variable. The error was:
"hostvars[foobar]" is undefined

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-10-28 15:02:50 +01:00
Dimitri Savineau 77b212833e add-mon: add missing become flag
Without the become flag set to true, we can't executed the roles
successfully.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-10-25 23:05:25 +02:00
Dimitri Savineau 650bc0c3f0 rolling_update: fix reset mon_host variable
mon_host should use the inventory hostname and not the node hostname.
Fix creates an issue when the inventory and node hostname are different.

Closes: #4670

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-10-25 23:04:35 +02:00
Dimitri Savineau bfb1d6be12 add-{mon,osd}: add ceph-container-engine role
The ceph-container-engine role is missing from both playbooks so the
container engine (docker, podman) isn't install resulting in a failure
on the added nodes.

fatal: [xxxxx]: FAILED! => changed=false
  cmd: docker --version
  msg: '[Errno 2] No such file or directory'
  rc: 2

Closes: #4634

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-10-24 16:24:02 -04:00
Guillaume Abrioux d06057ebd2 update: use right node when creating active mds group
This must be consistent with what is used in `name` parameter.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-10-24 15:15:51 -04:00
Guillaume Abrioux 1122da7f4a update: avoid skipping single mds deployment upgrade
otherwise a single MDS would never be updated.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-10-24 09:28:36 +02:00
Guillaume Abrioux 5ec906c3af update: skip mds deactivation when no mds in inventory
Let's skip this part of the code if there's no mds node in the
inventory.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-10-23 11:06:13 -04:00
Guillaume Abrioux 8d72ff8e5e update: add missing quotes
Add missing quote in order to keep consistency.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-10-21 09:19:34 -04:00
Guillaume Abrioux 25b98b2ce3 tests: add multimds coverage
This commit makes the all_daemons scenario deploying 3 mds in order to
cover the multimds case.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-10-18 13:43:13 -04:00
Guillaume Abrioux c4fc8cc878 upgrade: fix standby_mdss group creation
This commit fixes the standby_mdss group creation by using `{{ item }}`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-10-18 13:43:13 -04:00
Dimitri Savineau 8426856262 Move the dashboard playbook in the main directory
The [group|host]_vars directories are ignored for the dashboard playbook
when the inventory file directory doesn't contain those directories.

Closes: #4601
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1761612

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-10-17 21:54:09 -04:00
Guillaume Abrioux 71cebf80a6 update: follow new recommandation to upgrade mds cluster
Refact the mds cluster upgrade code in order to follow the documented
recommandation.
See: https://github.com/ceph/ceph/blob/master/doc/cephfs/upgrading.rst

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1569689

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-10-16 11:23:12 -04:00
Dimitri Savineau 68a3dac7cd Execute common roles once on all nodes
The common roles don't need to be executed again on each group plays
(like mons, osds, etc..).
We only need to execute them during the first play. That wat, we will
apply the changes on all nodes in parallel instead of doing it once per
group.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-10-16 15:31:24 +02:00
Dimitri Savineau 5ae7304ace dashboard: disable facts gathering
This is already done in the main playbooks but absent in the dashboard
playbook.
The facts are already gathered during the first play of the main
playbooks so we don't need to doing twice.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-10-11 14:51:33 -04:00
Guillaume Abrioux 0b245bd007 dashboard: if no host is available, let's just skip these plays.
If there is no host available, let's just skip these plays.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1759917

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-10-09 13:27:09 -04:00
Dimitri Savineau 19edf707a5 switch_to_containers: umount osd lockbox partition
When switching from a baremetal deployment to a containerized deployment
we only umount the OSD data partition.
If the OSD is encrypted (dmcrypt: true) then there's an additional
partition (part number 5) used for the lockbox and mount in the
/var/lib/ceph/osd-lockbox/ directory.
Because this partition isn't umount then the containerized OSD aren't
able to start. The partition is still mount by the system and can't be
remount from the container.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1616159

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-10-08 00:45:52 +02:00
Guillaume Abrioux fa9b42e98e switch_to_containers: do not re-set `ceph_uid`
This commit refacts the way we set `ceph_uid` fact in `ceph-facts` and
removes all `set_fact` tasks for `ceph_uid` in switch-to-containers playbook
to avoid duplicated code.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-10-07 14:15:56 +02:00
Guillaume Abrioux c5d0c90bb7 switch_to_containers: optimize ownership change
As per https://github.com/ceph/ceph-ansible/pull/4323#issuecomment-538420164

using `find` command should be faster.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1757400

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-Authored-by: Giulio Fidente <gfidente@redhat.com>
2019-10-07 14:15:56 +02:00
Guillaume Abrioux 8138d4193c update: import ceph-defaults role in first play
Typical error:

```
fatal: [mon0]: FAILED! =>
  msg: |-
    The conditional check 'not delegate_facts_host | bool or inventory_hostname in groups.get(client_group_name, [])' failed. The error was: error while evaluating conditional (not delegate_facts_host | bool or inventory_hostname in groups.get(client_group_name, [])): 'client_group_name' is undefined
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-10-07 09:00:38 +02:00
Guillaume Abrioux 865d2eac9b main: exclude client nodes from facts gathering when delegate_facts_host
This commit excludes client nodes from facts gathering, they are not
needed and can speed up this task.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-10-07 09:00:38 +02:00
Dimitri Savineau cf47594b47 dashboard: remove useless block section
The block section were used with the dashboard_enabled condition when
the code was included in the main playbooks.
Because this condition isn't present in the dashboard playbook anymore
we can remove the block section.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-10-04 07:42:05 +02:00
Guillaume Abrioux e08194dd67 rgw: refact tasks directory layout
This commit moves containerized deployment related files to `./tasks/`
directory. This is needed to make `docker-to-podman.yml` working since
we use `tasks_from:` option.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-10-01 10:27:51 -04:00
Guillaume Abrioux c69816c6b7 rbdmirror: refact tasks directory layout
This commit moves containerized deployment related files to `./tasks/`
directory. This is needed to make `docker-to-podman.yml` working since
we use `tasks_from:` option.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-10-01 10:27:51 -04:00
Guillaume Abrioux 4636f3f7e2 iscsigw: refact tasks directory layout
This commit moves containerized deployment related files to `./tasks/
directory. This is needed to make `docker-to-podman.yml` working since
we use `tasks_from:` option.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-10-01 10:27:51 -04:00