Commit Graph

522 Commits (86bb734397f7b1f77a005cdded39b229096dc8f4)

Author SHA1 Message Date
Guillaume Abrioux 86bb734397 filestore-to-bluestore: umount partitions before zapping them
When an OSD is stopped, it leaves partitions mounted.
We must umount them before zapping them, otherwise error like "Device is
busy" will show up.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1729267

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8056514134)
2020-01-08 11:41:48 -05:00
Guillaume Abrioux 27b1fc8981 shrink-mds: do not play ceph-facts entirely
We only need to set `container_binary`.
Let's use `tasks_from` option.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0ae0a9ce28)
2020-01-08 11:18:45 -05:00
Guillaume Abrioux edbb207680 shrink-mds: use fact from delegated node
The command is delegated on the first monitor so we must use the fact
`container_binary` from this node.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 77b39d235b)
2020-01-08 11:18:45 -05:00
Guillaume Abrioux 0eaa66f394 shrink-mds: fix filesystem removal task
This commit deletes the filesystem when no more MDS is present after
shrinking operation.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1787543

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 38278a6bb5)
2020-01-08 11:18:45 -05:00
Guillaume Abrioux bfd26e7f78 shrink-mds: ensure max_mds is always honored
This commit prevent from shrinking an mds node when max_mds wouldn't be
honored after that operation.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2cfe5a04bf)
2020-01-08 11:18:45 -05:00
Guillaume Abrioux 19068659c7 filestore-to-bluestore: ensure all dm are closed
This commit adds a task to ensure device mappers are well closed when
lvm batch scenario is used.
Otherwise, OSDs can't be redeployed given that devices that are rejected
by ceph-volume because they are locked.

Adding a condition `devices | default([]) | length > 0` to remove these
dm only when using lvm batch scenario.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8e6ef818a2)
2019-12-11 16:37:21 +01:00
Guillaume Abrioux 99ac694cc0 filestore-to-bluestore: force OSDs to be marked down
Otherwise, sometimes it can take a while for an OSD to be seen as down
and causes the `ceph osd purge` command to fail.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 51d601193e)
2019-12-11 16:37:21 +01:00
Guillaume Abrioux 586f6f6262 filestore-to-bluestore: do not use --destroy
Do not use `--destroy` when zapping a device.
Otherwise, it destroys VGs while they are still needed to redeploy the
OSDs.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e3305e6bb6)
2019-12-11 16:37:21 +01:00
Guillaume Abrioux d2b1506712 filestore-to-bluestore: add non containerized support
This commit adds the non containerized context support to the
filestore-to-bluestore.yml infrastructure playbook.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1729267

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4833b85e04)
2019-12-11 16:37:21 +01:00
Guillaume Abrioux 5062d4094c update: restart iscsigws daemons after upgrade
In containerized context, containers aren't stopped early in the
sequence.
It means they aren't restarted after the upgrade because the task is
just checking the daemon status is started (eg: `state: started`).

This commit also removes the task which ensure services are started
because it's already done in the role ceph-iscsigw.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c7708eb458)
2019-12-11 08:48:34 -05:00
Guillaume Abrioux fe8858af38 upgrade: add dashboard deployment
when upgrading from RHCS 3, dashboard has obviously never been deployed
and it forces us to deploy it later manually.
This commit adds the dashboard deployment as part of the upgrade to
RHCS 4.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1779092

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 451c5ca934)
2019-12-11 08:48:34 -05:00
Dimitri Savineau 3b26df8c75 purge-cluster: add podman support
The podman support was added to the purge-container-cluster playbook but
containers are always used for the dashboard even on non containerized
deployment.
This commits adds the podman support on purging the dashboard resources
in the purge-cluster playbook.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 89f6cc54a2)
2019-12-04 18:00:07 -05:00
Guillaume Abrioux 1c03d2b526 purge: rename playbook (container)
Since we now support podman, let's rename the playbook so it's more
generic.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7bc7e3669d)
2019-12-04 09:12:41 -05:00
Dimitri Savineau 98392be368 add-{mon,osd}: run raw install python tasks
If the new mon/osd node doesn't have python installed then we need to
execute the tasks from raw_install_python.yml.

Closes: #4368

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 34b03d1873)
2019-12-04 10:59:39 +01:00
Dimitri Savineau a325ff61e8 switch_to_containers: fix umount ceph partitions
When a container is already running on a non containerized node then the
umount ceph partition task is skipped.
This is due to the container ps command which always returns 0 even if
the filter matches nothing.

We should run the umount task when:
1/ the container command is failing (not installed) : rc != 0
2/ the container command reports running ceph-osd containers : rc == 0

Also we should not fail on the ceph directory listing.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1616159

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 39cfe0aa65)
2019-12-03 15:58:36 +01:00
Guillaume Abrioux 1e7fd9fe36 purge: do not try to stop docker when binary is podman
If the container binary is podman, we shouldn't try to stop docker here.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b18476a1a6)
2019-12-03 09:57:11 -05:00
Guillaume Abrioux 6592caab08 facts: isolate container_binary facts
in order to be able to call container_binary without having to run the
whole ceph-facts role.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fe5ffe589e)
2019-12-03 09:57:11 -05:00
Guillaume Abrioux 1f30327688 purge: remove docker_* task
All containers are removed when systemd stops them.
There is no need to call this module in purge container playbook.

This commit also removes all docker_image task and remove all container
images in the final cleanup play.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1776736

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d23383a820)
2019-12-03 09:57:11 -05:00
Guillaume Abrioux 88d060f6e1 docker2podman: import ceph-handler role
This is needed to avoid following error:

```
ERROR! The requested handler 'restart ceph mons' was not found in either the main handlers list nor in the listening handlers list
```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1777829

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a43a872105)
2019-12-03 10:44:48 +01:00
Guillaume Abrioux 3bd8129859 docker2podman: do not hardcode group name
let's use `client_group_name` instead of hardcoding the name.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7fe0d55eff)
2019-12-03 10:44:48 +01:00
Guillaume Abrioux c5145ccf25 docker2podman: import ceph-defaults in first play
We must import this role in the first play otherwise the first call to
`client_group_name`fails.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1777829

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 6526a25ab5)
2019-12-03 10:44:48 +01:00
Guillaume Abrioux 15b78ae252 purge: use sysfs to unmap rbd devices
in containerized context, using the binary provided in atomic os won't
work because it's an old version provided by ceph-common based on
10.2.5.
Using a container could be an idea but for large cluster with hundreds
of client nodes, that would require to pull the image of each of them
just to unmap the rbd devices.

Let's use the sysfs method in order to avoid any issue related to ceph
version that is shipped on the host.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1766064

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3cfcc7a105)
2019-11-14 10:49:38 -05:00
Guillaume Abrioux e4c657d711 update: add default values when setting fact
This commit adds a default value in the `with_dict` because when using
python 2.7, if a task using a `with_dict` has a condition, it is
evaluated anyway whereas in python 3 it isn't.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1766499

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e9823f319b)
2019-10-29 16:00:21 -04:00
Dimitri Savineau 56f0cf79d9 rolling_update: remove default filter on mds group
There's no need to use the default filter on active/standby groups
because if the group doesn't exist then the play is just skipped.

Currently this generates warnings like:

[WARNING]: Could not match supplied host pattern, ignoring: |
[WARNING]: Could not match supplied host pattern, ignoring: default([])

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 2ca79fcc99)
2019-10-28 13:08:33 -04:00
Dimitri Savineau ba4059d15a rolling_update: fix active mds host value
The active mds host should be based on the inventory hostname and not on
the ansible hostname.
The value returns under the mdsmap structure is based on the OS hostname
so we need to find the right node in the inventory with this value when
doing operation on inventory nodes.

Othewise we could see error like:

The task includes an option with an undefined variable. The error was:
"hostvars[foobar]" is undefined

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f1f2352c79)
2019-10-28 13:08:33 -04:00
Dimitri Savineau b547ad9e71 rolling_update: fix reset mon_host variable
mon_host should use the inventory hostname and not the node hostname.
Fix creates an issue when the inventory and node hostname are different.

Closes: #4670

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 650bc0c3f0)
2019-10-26 08:20:54 -04:00
Dimitri Savineau ff3bea871d add-mon: add missing become flag
Without the become flag set to true, we can't executed the roles
successfully.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 77b212833e)
2019-10-26 08:18:27 -04:00
Guillaume Abrioux 3625ea6ef8 update: use right node when creating active mds group
This must be consistent with what is used in `name` parameter.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d06057ebd2)
2019-10-25 09:42:52 +02:00
Guillaume Abrioux 73d97f525e update: avoid skipping single mds deployment upgrade
otherwise a single MDS would never be updated.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d8ab11d2f8)
2019-10-25 09:42:52 +02:00
Guillaume Abrioux c599af6724 update: skip mds deactivation when no mds in inventory
Let's skip this part of the code if there's no mds node in the
inventory.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5ec906c3af)
2019-10-25 09:42:52 +02:00
Dimitri Savineau f36306ebf4 add-{mon,osd}: add ceph-container-engine role
The ceph-container-engine role is missing from both playbooks so the
container engine (docker, podman) isn't install resulting in a failure
on the added nodes.

fatal: [xxxxx]: FAILED! => changed=false
  cmd: docker --version
  msg: '[Errno 2] No such file or directory'
  rc: 2

Closes: #4634

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit bfb1d6be12)
2019-10-24 20:01:04 -04:00
Guillaume Abrioux 4a5d3c3c2d update: add missing quotes
Add missing quote in order to keep consistency.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8d72ff8e5e)
2019-10-21 13:26:37 -04:00
Dimitri Savineau 703c834dab Move the dashboard playbook in the main directory
The [group|host]_vars directories are ignored for the dashboard playbook
when the inventory file directory doesn't contain those directories.

Closes: #4601
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1761612

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 8426856262)
2019-10-18 19:32:42 -04:00
Guillaume Abrioux 9bc7f8a7d7 tests: add multimds coverage
This commit makes the all_daemons scenario deploying 3 mds in order to
cover the multimds case.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 25b98b2ce3)
2019-10-18 22:09:04 +02:00
Guillaume Abrioux bc3138eff4 upgrade: fix standby_mdss group creation
This commit fixes the standby_mdss group creation by using `{{ item }}`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c4fc8cc878)
2019-10-18 22:09:04 +02:00
Guillaume Abrioux c962d87def update: follow new recommandation to upgrade mds cluster
Refact the mds cluster upgrade code in order to follow the documented
recommandation.
See: https://github.com/ceph/ceph/blob/master/doc/cephfs/upgrading.rst

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1569689

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 71cebf80a6)
2019-10-16 12:59:08 -04:00
Dimitri Savineau 0b49538621 Execute common roles once on all nodes
The common roles don't need to be executed again on each group plays
(like mons, osds, etc..).
We only need to execute them during the first play. That wat, we will
apply the changes on all nodes in parallel instead of doing it once per
group.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 68a3dac7cd)
2019-10-16 10:41:32 -04:00
Dimitri Savineau fd759f97fa dashboard: disable facts gathering
This is already done in the main playbooks but absent in the dashboard
playbook.
The facts are already gathered during the first play of the main
playbooks so we don't need to doing twice.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 5ae7304ace)
2019-10-14 09:45:11 +02:00
Guillaume Abrioux ebfe7f31ed dashboard: if no host is available, let's just skip these plays.
If there is no host available, let's just skip these plays.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1759917

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0b245bd007)
2019-10-09 14:47:36 -04:00
Dimitri Savineau 5f91be8740 switch_to_containers: umount osd lockbox partition
When switching from a baremetal deployment to a containerized deployment
we only umount the OSD data partition.
If the OSD is encrypted (dmcrypt: true) then there's an additional
partition (part number 5) used for the lockbox and mount in the
/var/lib/ceph/osd-lockbox/ directory.
Because this partition isn't umount then the containerized OSD aren't
able to start. The partition is still mount by the system and can't be
remount from the container.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1616159

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 19edf707a5)
2019-10-08 00:57:05 +00:00
Guillaume Abrioux b325cc386e switch_to_containers: do not re-set `ceph_uid`
This commit refacts the way we set `ceph_uid` fact in `ceph-facts` and
removes all `set_fact` tasks for `ceph_uid` in switch-to-containers playbook
to avoid duplicated code.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fa9b42e98e)
2019-10-07 10:18:17 -04:00
Guillaume Abrioux 468aa5d63b switch_to_containers: optimize ownership change
As per https://github.com/ceph/ceph-ansible/pull/4323#issuecomment-538420164

using `find` command should be faster.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1757400

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-Authored-by: Giulio Fidente <gfidente@redhat.com>
(cherry picked from commit c5d0c90bb7)
2019-10-07 10:18:17 -04:00
Guillaume Abrioux 37fd0b179b update: import ceph-defaults role in first play
Typical error:

```
fatal: [mon0]: FAILED! =>
  msg: |-
    The conditional check 'not delegate_facts_host | bool or inventory_hostname in groups.get(client_group_name, [])' failed. The error was: error while evaluating conditional (not delegate_facts_host | bool or inventory_hostname in groups.get(client_group_name, [])): 'client_group_name' is undefined
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8138d4193c)
2019-10-07 11:21:23 +02:00
Guillaume Abrioux 9a4fcfabe1 main: exclude client nodes from facts gathering when delegate_facts_host
This commit excludes client nodes from facts gathering, they are not
needed and can speed up this task.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 865d2eac9b)
2019-10-07 11:21:23 +02:00
Dimitri Savineau ec1c57f690 dashboard: remove useless block section
The block section were used with the dashboard_enabled condition when
the code was included in the main playbooks.
Because this condition isn't present in the dashboard playbook anymore
we can remove the block section.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit cf47594b47)
2019-10-04 13:28:37 +02:00
Guillaume Abrioux 9a79ed1bf0 rgw: refact tasks directory layout
This commit moves containerized deployment related files to `./tasks/`
directory. This is needed to make `docker-to-podman.yml` working since
we use `tasks_from:` option.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e08194dd67)
2019-10-01 18:50:51 +02:00
Guillaume Abrioux 7f902994b3 rbdmirror: refact tasks directory layout
This commit moves containerized deployment related files to `./tasks/`
directory. This is needed to make `docker-to-podman.yml` working since
we use `tasks_from:` option.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c69816c6b7)
2019-10-01 18:50:51 +02:00
Guillaume Abrioux d7a06c67db iscsigw: refact tasks directory layout
This commit moves containerized deployment related files to `./tasks/
directory. This is needed to make `docker-to-podman.yml` working since
we use `tasks_from:` option.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4636f3f7e2)
2019-10-01 18:50:51 +02:00
Guillaume Abrioux b564c37696 upgrade: add an infra playbook to migrate systemd units to podman
this commit adds a new playbook to force systemd units for containers to
use podman instead of docker.
This is needed in the rhel8 upgrade context so after the base OS is upgraded
containers can be started using podman.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f2017dcda2)
2019-10-01 18:50:51 +02:00
Guillaume Abrioux 4afe1b748c update: reset mon_host after mons upgrade
after all mon are upgraded, let's reset mon_host which is used in the
rest of the playbook for setting `container_exec_cmd` so we are sure to
use the right value.

Typical error:

```
failed: [mds0 -> mon0] (item={u'path': u'/var/lib/ceph/bootstrap-mds/ceph.keyring', u'name': u'client.bootstrap-mds', u'copy_key': True}) => changed=true
  ansible_loop_var: item
  cmd:
  - docker
  - exec
  - ceph-mon-mon2
  - ceph
  - --cluster
  - ceph
  - auth
  - get
  - client.bootstrap-mds
  delta: '0:00:00.016294'
  end: '2019-09-27 13:54:58.828835'
  item:
    copy_key: true
    name: client.bootstrap-mds
    path: /var/lib/ceph/bootstrap-mds/ceph.keyring
  msg: non-zero return code
  rc: 1
  start: '2019-09-27 13:54:58.812541'
  stderr: 'Error response from daemon: No such container: ceph-mon-mon2'
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d84160a170)
2019-09-28 09:01:16 +02:00