Commit Graph

5227 Commits (5407e898a6d36cd5aecdc32f1c3d054003574973)
 

Author SHA1 Message Date
Guillaume Abrioux fd1718f379 config: exclude ceph-disk prepared osds in lvm batch report
We must exclude the devices already used and prepared by ceph-disk when
doing the lvm batch report. Otherwise it fails because ceph-volume
complains about GPT header.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786682

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-01-10 00:04:22 +01:00
Dimitri Savineau 3f344fdefe rolling_update: run registry auth before upgrading
There's some tasks using the new container image during the rolling
upgrade playbook that needs to execute the registry login first otherwise
the nodes won't be able to pull the container image.

Unable to find image 'xxx.io/foo/bar:latest' locally
Trying to pull repository xxx.io/foo/bar ...
/usr/bin/docker-current: Get https://xxx.io/v2/foo/bar/manifests/latest:
unauthorized

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-01-09 16:14:33 -05:00
Dimitri Savineau 747555dfa6 shrink-rgw: refact global workflow
Instead of running the ceph roles against localhost we should do it
on the first mon.
The ansible and inventory hostname of the rgw nodes could be different.
Ensure that the rgw instance to remove is present in the cluster.
Fix rgw service and directory path.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-01-09 19:02:17 +01:00
Guillaume Abrioux 86f3eeb717 mon: support replacing a mon
We must pick up a mon which actually exists in ceph-facts in order to
detect if a cluster is running. Otherwise, it will state no cluster is
already running which will end up deploying a new monitor isolated in a
new quorum.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1622688

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-01-09 12:59:12 -05:00
Guillaume Abrioux 30200802d9 handler: fix bug
411bd07d54 introduced a bug in handlers

using `handler_*_status` instead of `hostvars[item]['handler_*_status']`
causes handlers to be triggered in anycase even though
`handler_*_status` was set to `False` on a specific node.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1622688

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-01-08 17:11:42 -05:00
Benoît Knecht 3c31b19ab3 ceph-rgw: Fix custom pool size setting
RadosGW pools can be created by setting

```yaml
rgw_create_pools:
  .rgw.root:
    pg_num: 512
    size: 2
```

for instance. However, doing so would create pools of size
`osd_pool_default_size` regardless of the `size` value. This was due to
the fact that the Ansible task used

```
{{ item.size | default(osd_pool_default_size) }}
```

as the pool size value, but `item.size` is always undefined; the
correct variable is `item.value.size`.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
2020-01-08 16:16:38 -05:00
Dimitri Savineau 70eba66182 ceph-iscsi: manage ipv6 in trusted_ip_list
Only the ipv4 addresses from the nodes running the dashboard mgr module
were added to the trusted_ip_list configuration file on the iscsigws
nodes.
This also add the iscsi gateways with ipv6 configuration to the ceph
dashboard.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1787531

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-01-08 13:54:04 -05:00
Guillaume Abrioux 0ae0a9ce28 shrink-mds: do not play ceph-facts entirely
We only need to set `container_binary`.
Let's use `tasks_from` option.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-01-08 10:39:27 -05:00
Guillaume Abrioux 77b39d235b shrink-mds: use fact from delegated node
The command is delegated on the first monitor so we must use the fact
`container_binary` from this node.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-01-08 10:06:43 -05:00
Guillaume Abrioux 5adb735c78 facts: use correct python interpreter
that task is delegated on the first mon so we should always use the
`discovered_interpreter_python` from that node.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-01-08 10:06:43 -05:00
Guillaume Abrioux 38278a6bb5 shrink-mds: fix filesystem removal task
This commit deletes the filesystem when no more MDS is present after
shrinking operation.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1787543

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-01-08 10:06:43 -05:00
Guillaume Abrioux 2cfe5a04bf shrink-mds: ensure max_mds is always honored
This commit prevent from shrinking an mds node when max_mds wouldn't be
honored after that operation.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-01-08 10:06:43 -05:00
Guillaume Abrioux 498bc45859 dashboard: use fqdn in external url
Force fqdn to be used in external url for prometheus and alertmanager.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1765485

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-01-08 09:06:49 -05:00
Guillaume Abrioux fca6f788a0 Revert "nfs: do not run privileged nfs container"
This reverts commit d06158e9d9.

Otherwise ganesha consumers can't dynamically update exports using dbus.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1784562
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-01-08 14:18:21 +01:00
Dimitri Savineau 931a842f21 purge-iscsi-gateways: remove node from dashboard
When using the ceph dashboard with iscsi gateways nodes we also need to
remove the nodes from the ceph dashboard list.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786686

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-01-08 14:17:56 +01:00
Guillaume Abrioux aabba3baab ceph_volume: support filestore to bluestore migration
This commit adds the filestore to bluestore migration support in
ceph_volume module.

We must append to the executed command only the relevant options
according to what is passed in `osd_objectostore`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-01-08 11:48:21 +01:00
Dimitri Savineau 42366f0a6c purge-container-cluster: prune exited containers
Remove all stopped/exited containers.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-01-08 11:13:46 +01:00
Dimitri Savineau 254ab54f80 ceph-iscsi: remove python rtslib shaman repository
The rtslib python library is now available in the distribution so we
shouldn't have to use the shaman repository

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-01-08 11:13:46 +01:00
Guillaume Abrioux 4f2baaab8c tests: disable nfs testing
nfs-ganesha makes the CI failing because of issue related to SELinux.

See:
- https://bugzilla.redhat.com/show_bug.cgi?id=1788563
- https://github.com/nfs-ganesha/nfs-ganesha/issues/527

Until we can get this fixed, let's disable nfs-ganesha testing
temporarily.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-01-08 11:13:46 +01:00
Guillaume Abrioux e665d8e239 tests: upgrade from octopus to octopus
on master we can't test upgrade from stable-4.0/CentOS 7 to
master/CentOS 8.

This commit refact the upgrade so we test upgrade from master/CentOS 8
to master/CentOS 8 (octopus to octopus)

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-01-08 11:13:46 +01:00
Dimitri Savineau 7b3e6b932c tests/functional: change docker to podman
Some docker commands were hardcoded in tests playbooks and some
conditions were not taking care of the containerized_deployment
variable but only the atomic fact.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-01-08 11:13:46 +01:00
Dimitri Savineau d758125290 ceph-nfs: add ganesha_t type to selinux
Since RHEL 8.1 we need to add the ganesha_t type to the permissive
SELinux list.
Otherwise the nfs-ganesha service won't start.
This was done on RHEL 7 previously and part of the nfs-ganesha-selinux
package on RHEL 8.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786110

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-01-08 11:13:46 +01:00
Dimitri Savineau de8f2a9f83 container: move lvm2 package installation
Before this patch, the lvm2 package installation was done during the
ceph-osd role.
However we were running ceph-volume command in the ceph-config role
before ceph-osd. If lvm2 wasn't installed then the ceph-volume command
fails:

error checking path "/run/lock/lvm": stat /run/lock/lvm: no such file or
directory

This wasn't visible before because lvm2 was automatically installed as
docker dependency but it's not the same for podman on CentOS 8.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-01-08 11:13:46 +01:00
Dimitri Savineau d4fd38c967 ceph-nfs: change ganesha CentOS repository
Since we don't have nfs-ganesha builds available on CentOS 8 at the
moment on shaman then we can use the alternative repository at [1]

[1] https://download.nfs-ganesha.org/3/LATEST/CentOS

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-01-08 11:13:46 +01:00
Guillaume Abrioux 217d95abb2 common: add centos8 support
Ceph octopus only supports CentOS 8.

This commit adds CentOS 8 support:
  - update vagrant image in tox configurations.
  - add CentOS 8 repository for el8 dependencies.
  - CentOS 8 container engine is podman (same than RHEL 8).
  - don't use the epel mirror on sepia because it's epel7 only.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Dimitri Savineau <dsavinea@redhat.com>
2020-01-08 11:13:46 +01:00
Stanley Lam 2ca3364109 ceph-rgw-loadbalancer: Modify keepalived master selection
Currently the keepalived template only works when system hostnames exactly match the Ansible inventory name. If these are different, all generated templates become BACKUP without a MASTER assigned. Using the inventory_hostname in the template file resolves this issue.

Signed-off-by: Stanley Lam stanleylam_604@hotmail.com
2020-01-06 09:25:04 -05:00
Guillaume Abrioux 8056514134 filestore-to-bluestore: umount partitions before zapping them
When an OSD is stopped, it leaves partitions mounted.
We must umount them before zapping them, otherwise error like "Device is
busy" will show up.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1729267

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-12-19 09:22:25 +01:00
Dimitri Savineau 2c06678cde ceph-infra: replace hardcoded grafana group name
The grafana-server group name was hardcoded for the grafana/prometheus
firewalld tasks condition.
We should we the associated variable : grafana_server_group_name

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-12-18 16:09:14 +01:00
Dimitri Savineau f4c261ef90 ceph-infra: move dashboard into a dedicated file
Instead of using multiple dashboard_enabled condition in the
configure_firewall file we could just have the condition once
and include the dedicated tasks list.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-12-18 16:09:14 +01:00
Dimitri Savineau 4535985188 ceph-infra: open dashboard port on monitor
When there's no mgr group defined in the ansible inventory then the
mgrs are deployed implicitly on the mons nodes.
If the dashboard is enabled then we need to open the dashboard port on
the node that is running the ceph mgr process (mgr or mon).
The current code only allow to open that port on the mgr nodes when they
are present explicitly in the inventory but not implicitly.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1783520

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-12-18 16:09:14 +01:00
Dimitri Savineau b46839bad0 ceph-defaults: regenerate group_vars samples
In fc02fc9 the group_vars samples have been generated but only for
monitor_address variable not radosgw_address.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-12-18 09:03:52 +01:00
Dimitri Savineau 6f0556f015 ceph-defaults: exclude rbd devices from discovery
The RBD devices aren't excluded from the devices list in the LVM auto
discovery scenario.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1783908

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-12-18 09:03:19 +01:00
Guillaume Abrioux fc02fc98eb defaults: change monitor|radosgw_address default values
To avoid confusion, let's change the default value from `0.0.0.0` to
`x.x.x.x`.
Users might think setting `0.0.0.0` will make the daemon binding on all
interfaces.

Fixes: #4827

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-12-12 09:58:33 +01:00
Guillaume Abrioux 8e6ef818a2 filestore-to-bluestore: ensure all dm are closed
This commit adds a task to ensure device mappers are well closed when
lvm batch scenario is used.
Otherwise, OSDs can't be redeployed given that devices that are rejected
by ceph-volume because they are locked.

Adding a condition `devices | default([]) | length > 0` to remove these
dm only when using lvm batch scenario.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-12-11 09:04:41 -05:00
Guillaume Abrioux 51d601193e filestore-to-bluestore: force OSDs to be marked down
Otherwise, sometimes it can take a while for an OSD to be seen as down
and causes the `ceph osd purge` command to fail.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-12-11 09:04:41 -05:00
Guillaume Abrioux e3305e6bb6 filestore-to-bluestore: do not use --destroy
Do not use `--destroy` when zapping a device.
Otherwise, it destroys VGs while they are still needed to redeploy the
OSDs.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-12-11 09:04:41 -05:00
Guillaume Abrioux 0dcacdbed0 ceph_volume: add destroy option support
The zap action from ceph_volume module always implies `--destroy`.
This commit adds the destroy option support so we can ask ceph-volume to
not use `--destroy` when zapping a device.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-12-11 09:04:41 -05:00
Guillaume Abrioux 4833b85e04 filestore-to-bluestore: add non containerized support
This commit adds the non containerized context support to the
filestore-to-bluestore.yml infrastructure playbook.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1729267

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-12-11 09:04:41 -05:00
Guillaume Abrioux 40de34fb5e tests: add filestore_to_bluestore job
This commit adds a new job in order to test the
filestore-to-bluestore.yml infrastructure playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-12-11 09:04:41 -05:00
Philip Brown 9021c29b61 Add comment on auto-SSL cert generation
Fixes: #4830

Signed-off-by: Philip Brown <phil@bolthole.com>
2019-12-11 10:57:28 +01:00
Dimitri Savineau 68c6f39349 ceph-facts: set use_new_ceph_iscsi on iscsi nodes
We don't need to set the use_new_ceph_iscsi fact on other nodes than
those present in the iscsigws group.
Also remove the duplicate iscsi_gw_group_name condition already present
on the include_task.
Finally validate the ansible distribution as the first task.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-12-10 23:57:03 +01:00
Guillaume Abrioux 8d0dc34ebe defaults: fix a typo
s/above/below

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-12-10 09:32:02 -05:00
Guillaume Abrioux d682412e2a ansible.cfg: do not enforce PreferredAuthentications
There's no need to enforce PreferredAuthentications by default.
Users can still choose to override the ansible.cfg with any additional
parameter like this one to fit their infrastructure.

Fixes: #4826

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-12-09 17:15:09 -05:00
Guillaume Abrioux a234338eff defaults: add a comment
This commit isolates and adds an explicit comment about variables not
intended to be modified by the user.

Fixes: #4828

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-12-09 13:50:43 -05:00
Guillaume Abrioux 6d9ca6b05b shrink-osd: support fqdn in inventory
When using fqdn in inventory, that playbook fails because of some tasks
using the result of ceph osd tree (which returns shortname) to get
some datas in hostvars[].

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1779021

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-12-09 10:52:38 -05:00
Guillaume Abrioux 332c39376b switch_to_containers: exclude clients nodes from facts gathering
just like site.yml and rolling_update, let's exclude clients node from
the fact gathering.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-12-09 10:49:13 -05:00
Guillaume Abrioux d245eb7e7d dashboard: run node_export as privileged container
Typical error:

```
type=AVC msg=audit(1575367499.582:3210): avc:  denied  { search } for  pid=26680 comm="node_exporter" name="1" dev="proc" ino=11528 scontext=system_u:system_r:container_t:s0:c100,c1014 tcontext=system_u:system_r:init_t:s0 tclass=dir permissive=0
```

node_exporter needs to be run as privileged to avoid avc denied error
since it gathers lot of information on the host.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1762168

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-12-09 09:40:13 -05:00
Dimitri Savineau 1a77dd7e91 ceph-validate: start with ansible version test
It doesn't make sense to start validating configuration if the ansible
version isn't the good one.
This commit moves the check_system task as the first task in the
ceph-validate role.
The ansible version test tasks are moved at the top of this file.
Also moving the iscsi kernel tests from check_system to check_iscsi
file.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-12-09 09:35:03 +01:00
Dimitri Savineau 12aa8f4025 ceph-facts: move ntp/chrony facts to ceph-infra
The ntp/chrony facts are only used in the ceph-infra role so we don't
really need to set them in the ceph-facts roles.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-12-05 19:46:59 +01:00
Guillaume Abrioux 0756fa467d defaults: change default value for dashboard_admin_password
A recent change in ceph/ceph prevent from having username in the
password:

`Error EINVAL: Password cannot contain username.`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-12-05 13:02:06 -05:00