Commit Graph

5177 Commits (3987a82d29fe15c4dd975e198970924373c7aaeb)
 

Author SHA1 Message Date
Guillaume Abrioux a81830ddc0 osd: use _devices fact in lvm batch scenario
since fd1718f379, we must use `_devices`
when deploying with lvm batch scenario.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5558664f37)
2020-01-14 15:33:15 +01:00
Guillaume Abrioux ffdfa634ac osd: do not run openstack_config during upgrade
There is no need to run this part of the playbook when upgrading the
cluter.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit af6875706a)
2020-01-14 09:12:34 -05:00
Guillaume Abrioux 51596e8b32 tests: use main playbook for add_osds job
This commit replaces the playbook used for add_osds job given
accordingly to the add-osd.yml playbook removal

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fef1cd4c4b)
2020-01-14 09:12:34 -05:00
Guillaume Abrioux 2d85fab02d osd: support scaling up using --limit
This commit lets add-osd.yml in place but mark the deprecation of the
playbook.
Scaling up OSDs is now possible using --limit

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3496a0efa2)
2020-01-14 09:12:34 -05:00
Dimitri Savineau dc797971ce ceph-facts: move grafana fact to dedicated file
We don't need to executed the grafana fact everytime but only during
the dashboard deployment.
Especially for ceph-grafana, ceph-prometheus and ceph-dashboard roles.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1790303

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f940e695ab)
2020-01-13 16:28:23 -05:00
Guillaume Abrioux 266c4c7763 facts: fix osp/ceph external use case
d6da508a9b broke the osp/ceph external use case.

We must skip these tasks when no monitor is present in the inventory.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1790508

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2592a1e1e8)
2020-01-13 21:07:01 +01:00
Guillaume Abrioux 532abbb9b2 defaults: change monitor|radosgw_address default values
To avoid confusion, let's change the default value from `0.0.0.0` to
`x.x.x.x`.
Users might think setting `0.0.0.0` will make the daemon binding on all
interfaces.

Fixes: #4827

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fc02fc98eb)
2020-01-13 14:55:23 -05:00
Guillaume Abrioux 9ed540da7e osd: ensure osd ids collected are well restarted
This commit refact the condition in the loop of that task so all
potential osd ids found are well started.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1790212

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 58e6bfed2d)
2020-01-13 14:31:03 -05:00
Guillaume Abrioux fc7212b192 tests: add time command in vagrant_up.sh
monitor how long it takes to get all VMs up and running

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 16bcef4f28)
2020-01-10 17:41:27 +01:00
Guillaume Abrioux 2c96155c32 tests: retry to fire up VMs on vagrant failure
Add a script to retry several times to fire up VMs to avoid vagrant
failures.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 1ecb3a9352)
2020-01-10 17:41:27 +01:00
Guillaume Abrioux 7c2918d684 tests: add a docker2podman scenario
This commit adds a new scenario in order to test docker-to-podman.yml
migration playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit dc672e86ec)
2020-01-10 17:41:27 +01:00
Guillaume Abrioux e034a6da69 docker2podman: use set_fact to override variables
play vars have lower precedence than role vars and `set_fact`.
We must use a `set_fact` to reset these variables.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b0c491800a)
2020-01-10 17:41:27 +01:00
Guillaume Abrioux 02ec088568 docker2podman: force systemd to reload config
This is needed after a change is made in systemd unit files.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1c2ec9fb40)
2020-01-10 17:41:27 +01:00
Guillaume Abrioux 34c4f5baac docker2podman: install podman
This commit adds a package installation task in order to install podman
during the docker-to-podman.yml migration playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d746575fd0)
2020-01-10 17:41:27 +01:00
Guillaume Abrioux 4c4b0edfec update: only run post osd upgrade play on 1 mon
There is no need to run these tasks n times from each monitor.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c878e99589)
2020-01-10 17:16:51 +01:00
Guillaume Abrioux 6e47e96a02 update: use flags noout and nodeep-scrub only
1. set noout and nodeep-scrub flags,
2. upgrade each OSD node, one by one, wait for active+clean pgs
3. after all osd nodes are upgraded, unset flags

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Rachana Patel <racpatel@redhat.com>
(cherry picked from commit 548db78b95)
2020-01-10 17:16:51 +01:00
Dimitri Savineau bc0f16f270 ceph-validate: add rbdmirror validation
When ceph_rbd_mirror_configure is set to true we need to ensure that
the required variables aren't empty.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1760553

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 4a065cebd7)
2020-01-10 11:11:37 -05:00
Dimitri Savineau f2e1941ef1 ceph-osd: wait for all osds once
cf8c6a3 moves the 'wait for all osds' task from openstack_config to the
main tasks list.
But the openstack_config code was executed only on the last OSD node.
We don't need to do this check on all OSD node so we need to add set
run_once to true on that task.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 5bd1cf40eb)
2020-01-10 11:07:25 -05:00
Dimitri Savineau 9fa5b296ca ceph-osd: wait for all osd before crush rules
When creating crush rules with device class parameter we need to be sure
that all OSDs are up and running because the device class list is
is populated with this information.
This is now enable for all scenario not openstack_config only.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit cf8c6a3849)
2020-01-10 11:07:25 -05:00
Dimitri Savineau bd016960cf ceph-osd: add device class to crush rules
This adds device class support to crush rules when using the class key
in the rule dict via the create-replicated sub command.
If the class key isn't specified then we use the create-simple sub
command for backward compatibility.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1636508

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ef2cb99f73)
2020-01-10 11:07:25 -05:00
Dimitri Savineau 661b2c013a move crush rule creation from mon to osd role
If we want to create crush rules with the create-replicated sub command
and device class then we need to have the OSD created before the crush
rules otherwise the device classes won't exist.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ed36a11eab)
2020-01-10 11:07:25 -05:00
Dimitri Savineau f00ee1244f purge-iscsi-gateways: don't run all ceph-facts
We only need to have the container_binary fact. Because we're not
gathering the facts from all nodes then the purge fails trying to get
one of the grafana fact.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786686

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit a09d1c38bf)
2020-01-10 16:21:53 +01:00
Dimitri Savineau f042ece9af rolling_update: run registry auth before upgrading
There's some tasks using the new container image during the rolling
upgrade playbook that needs to execute the registry login first otherwise
the nodes won't be able to pull the container image.

Unable to find image 'xxx.io/foo/bar:latest' locally
Trying to pull repository xxx.io/foo/bar ...
/usr/bin/docker-current: Get https://xxx.io/v2/foo/bar/manifests/latest:
unauthorized

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 3f344fdefe)
2020-01-09 20:16:07 -05:00
Guillaume Abrioux d6921f798d config: exclude ceph-disk prepared osds in lvm batch report
We must exclude the devices already used and prepared by ceph-disk when
doing the lvm batch report. Otherwise it fails because ceph-volume
complains about GPT header.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786682

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fd1718f379)
2020-01-09 20:15:27 -05:00
Dimitri Savineau 09ccf22052 tests: use community repository
We don't need to use dev repository on stable branches.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-01-09 21:39:23 +01:00
Dimitri Savineau 84276f2fe3 shrink-rgw: refact global workflow
Instead of running the ceph roles against localhost we should do it
on the first mon.
The ansible and inventory hostname of the rgw nodes could be different.
Ensure that the rgw instance to remove is present in the cluster.
Fix rgw service and directory path.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 747555dfa6)
2020-01-09 21:39:23 +01:00
Guillaume Abrioux d6da508a9b mon: support replacing a mon
We must pick up a mon which actually exists in ceph-facts in order to
detect if a cluster is running. Otherwise, it will state no cluster is
already running which will end up deploying a new monitor isolated in a
new quorum.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1622688

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 86f3eeb717)
2020-01-09 15:02:03 -05:00
Dimitri Savineau a3c2259bde ceph-iscsi: manage ipv6 in trusted_ip_list
Only the ipv4 addresses from the nodes running the dashboard mgr module
were added to the trusted_ip_list configuration file on the iscsigws
nodes.
This also add the iscsi gateways with ipv6 configuration to the ceph
dashboard.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1787531

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 70eba66182)
2020-01-08 21:29:02 -05:00
Benoît Knecht 0cb1235962 ceph-rgw: Fix custom pool size setting
RadosGW pools can be created by setting

```yaml
rgw_create_pools:
  .rgw.root:
    pg_num: 512
    size: 2
```

for instance. However, doing so would create pools of size
`osd_pool_default_size` regardless of the `size` value. This was due to
the fact that the Ansible task used

```
{{ item.size | default(osd_pool_default_size) }}
```

as the pool size value, but `item.size` is always undefined; the
correct variable is `item.value.size`.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit 3c31b19ab3)
2020-01-08 21:28:03 -05:00
Guillaume Abrioux e001ded6f6 handler: fix bug
411bd07d54 introduced a bug in handlers

using `handler_*_status` instead of `hostvars[item]['handler_*_status']`
causes handlers to be triggered in anycase even though
`handler_*_status` was set to `False` on a specific node.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1622688

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 30200802d9)
2020-01-08 19:46:11 -05:00
Dimitri Savineau 56f1b232a5 ceph-nfs: add ganesha_t type to selinux
Since RHEL 8.1 we need to add the ganesha_t type to the permissive
SELinux list.
Otherwise the nfs-ganesha service won't start.
This was done on RHEL 7 previously and part of the nfs-ganesha-selinux
package on RHEL 8.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786110

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d758125290)
2020-01-08 16:23:41 -05:00
Guillaume Abrioux 6e7fe62ad5 shrink-osd: support fqdn in inventory
When using fqdn in inventory, that playbook fails because of some tasks
using the result of ceph osd tree (which returns shortname) to get
some datas in hostvars[].

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1779021

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 6d9ca6b05b)
2020-01-08 16:16:21 -05:00
Dimitri Savineau 701ade88c3 ceph-defaults: exclude rbd devices from discovery
The RBD devices aren't excluded from the devices list in the LVM auto
discovery scenario.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1783908

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 6f0556f015)
2020-01-08 16:15:26 -05:00
Dimitri Savineau af3c1b4c1a ceph-infra: replace hardcoded grafana group name
The grafana-server group name was hardcoded for the grafana/prometheus
firewalld tasks condition.
We should we the associated variable : grafana_server_group_name

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 2c06678cde)
2020-01-08 16:15:09 -05:00
Dimitri Savineau 27530b1d3f ceph-infra: move dashboard into a dedicated file
Instead of using multiple dashboard_enabled condition in the
configure_firewall file we could just have the condition once
and include the dedicated tasks list.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f4c261ef90)
2020-01-08 16:15:09 -05:00
Dimitri Savineau 43ffcd7d28 ceph-infra: open dashboard port on monitor
When there's no mgr group defined in the ansible inventory then the
mgrs are deployed implicitly on the mons nodes.
If the dashboard is enabled then we need to open the dashboard port on
the node that is running the ceph mgr process (mgr or mon).
The current code only allow to open that port on the mgr nodes when they
are present explicitly in the inventory but not implicitly.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1783520

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 4535985188)
2020-01-08 16:15:09 -05:00
Guillaume Abrioux 3caba2c31c dashboard: use fqdn in external url
Force fqdn to be used in external url for prometheus and alertmanager.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1765485

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 498bc45859)
2020-01-08 16:14:33 -05:00
Dimitri Savineau d625cefbac tests: use ceph iscsi stable repository
The ceph iscsi repository was still set to dev (shaman) instead of
using the stable ceph-iscsi repository.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-01-08 19:29:59 +01:00
Dimitri Savineau e4798e22a8 purge-iscsi-gateways: remove node from dashboard
When using the ceph dashboard with iscsi gateways nodes we also need to
remove the nodes from the ceph dashboard list.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786686

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 931a842f21)
2020-01-08 19:29:59 +01:00
Guillaume Abrioux 86bb734397 filestore-to-bluestore: umount partitions before zapping them
When an OSD is stopped, it leaves partitions mounted.
We must umount them before zapping them, otherwise error like "Device is
busy" will show up.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1729267

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8056514134)
2020-01-08 11:41:48 -05:00
Guillaume Abrioux 27b1fc8981 shrink-mds: do not play ceph-facts entirely
We only need to set `container_binary`.
Let's use `tasks_from` option.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0ae0a9ce28)
2020-01-08 11:18:45 -05:00
Guillaume Abrioux edbb207680 shrink-mds: use fact from delegated node
The command is delegated on the first monitor so we must use the fact
`container_binary` from this node.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 77b39d235b)
2020-01-08 11:18:45 -05:00
Guillaume Abrioux 4cf5c08cd8 facts: use correct python interpreter
that task is delegated on the first mon so we should always use the
`discovered_interpreter_python` from that node.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5adb735c78)
2020-01-08 11:18:45 -05:00
Guillaume Abrioux 0eaa66f394 shrink-mds: fix filesystem removal task
This commit deletes the filesystem when no more MDS is present after
shrinking operation.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1787543

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 38278a6bb5)
2020-01-08 11:18:45 -05:00
Guillaume Abrioux bfd26e7f78 shrink-mds: ensure max_mds is always honored
This commit prevent from shrinking an mds node when max_mds wouldn't be
honored after that operation.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2cfe5a04bf)
2020-01-08 11:18:45 -05:00
Guillaume Abrioux d91c280232 ceph_volume: support filestore to bluestore migration
This commit adds the filestore to bluestore migration support in
ceph_volume module.

We must append to the executed command only the relevant options
according to what is passed in `osd_objectostore`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit aabba3baab)
2020-01-08 11:18:12 -05:00
Guillaume Abrioux 19068659c7 filestore-to-bluestore: ensure all dm are closed
This commit adds a task to ensure device mappers are well closed when
lvm batch scenario is used.
Otherwise, OSDs can't be redeployed given that devices that are rejected
by ceph-volume because they are locked.

Adding a condition `devices | default([]) | length > 0` to remove these
dm only when using lvm batch scenario.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8e6ef818a2)
2019-12-11 16:37:21 +01:00
Guillaume Abrioux 99ac694cc0 filestore-to-bluestore: force OSDs to be marked down
Otherwise, sometimes it can take a while for an OSD to be seen as down
and causes the `ceph osd purge` command to fail.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 51d601193e)
2019-12-11 16:37:21 +01:00
Guillaume Abrioux 586f6f6262 filestore-to-bluestore: do not use --destroy
Do not use `--destroy` when zapping a device.
Otherwise, it destroys VGs while they are still needed to redeploy the
OSDs.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e3305e6bb6)
2019-12-11 16:37:21 +01:00
Guillaume Abrioux 9d89ae2c77 ceph_volume: add destroy option support
The zap action from ceph_volume module always implies `--destroy`.
This commit adds the destroy option support so we can ask ceph-volume to
not use `--destroy` when zapping a device.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0dcacdbed0)
2019-12-11 16:37:21 +01:00