Commit Graph

5142 Commits (ab16128506152bdf1b7f68dfaa2017afd9fecb34)
 

Author SHA1 Message Date
Dimitri Savineau 43ffcd7d28 ceph-infra: open dashboard port on monitor
When there's no mgr group defined in the ansible inventory then the
mgrs are deployed implicitly on the mons nodes.
If the dashboard is enabled then we need to open the dashboard port on
the node that is running the ceph mgr process (mgr or mon).
The current code only allow to open that port on the mgr nodes when they
are present explicitly in the inventory but not implicitly.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1783520

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 4535985188)
2020-01-08 16:15:09 -05:00
Guillaume Abrioux 3caba2c31c dashboard: use fqdn in external url
Force fqdn to be used in external url for prometheus and alertmanager.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1765485

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 498bc45859)
2020-01-08 16:14:33 -05:00
Dimitri Savineau d625cefbac tests: use ceph iscsi stable repository
The ceph iscsi repository was still set to dev (shaman) instead of
using the stable ceph-iscsi repository.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-01-08 19:29:59 +01:00
Dimitri Savineau e4798e22a8 purge-iscsi-gateways: remove node from dashboard
When using the ceph dashboard with iscsi gateways nodes we also need to
remove the nodes from the ceph dashboard list.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786686

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 931a842f21)
2020-01-08 19:29:59 +01:00
Guillaume Abrioux 86bb734397 filestore-to-bluestore: umount partitions before zapping them
When an OSD is stopped, it leaves partitions mounted.
We must umount them before zapping them, otherwise error like "Device is
busy" will show up.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1729267

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8056514134)
2020-01-08 11:41:48 -05:00
Guillaume Abrioux 27b1fc8981 shrink-mds: do not play ceph-facts entirely
We only need to set `container_binary`.
Let's use `tasks_from` option.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0ae0a9ce28)
2020-01-08 11:18:45 -05:00
Guillaume Abrioux edbb207680 shrink-mds: use fact from delegated node
The command is delegated on the first monitor so we must use the fact
`container_binary` from this node.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 77b39d235b)
2020-01-08 11:18:45 -05:00
Guillaume Abrioux 4cf5c08cd8 facts: use correct python interpreter
that task is delegated on the first mon so we should always use the
`discovered_interpreter_python` from that node.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5adb735c78)
2020-01-08 11:18:45 -05:00
Guillaume Abrioux 0eaa66f394 shrink-mds: fix filesystem removal task
This commit deletes the filesystem when no more MDS is present after
shrinking operation.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1787543

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 38278a6bb5)
2020-01-08 11:18:45 -05:00
Guillaume Abrioux bfd26e7f78 shrink-mds: ensure max_mds is always honored
This commit prevent from shrinking an mds node when max_mds wouldn't be
honored after that operation.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2cfe5a04bf)
2020-01-08 11:18:45 -05:00
Guillaume Abrioux d91c280232 ceph_volume: support filestore to bluestore migration
This commit adds the filestore to bluestore migration support in
ceph_volume module.

We must append to the executed command only the relevant options
according to what is passed in `osd_objectostore`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit aabba3baab)
2020-01-08 11:18:12 -05:00
Guillaume Abrioux 19068659c7 filestore-to-bluestore: ensure all dm are closed
This commit adds a task to ensure device mappers are well closed when
lvm batch scenario is used.
Otherwise, OSDs can't be redeployed given that devices that are rejected
by ceph-volume because they are locked.

Adding a condition `devices | default([]) | length > 0` to remove these
dm only when using lvm batch scenario.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8e6ef818a2)
2019-12-11 16:37:21 +01:00
Guillaume Abrioux 99ac694cc0 filestore-to-bluestore: force OSDs to be marked down
Otherwise, sometimes it can take a while for an OSD to be seen as down
and causes the `ceph osd purge` command to fail.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 51d601193e)
2019-12-11 16:37:21 +01:00
Guillaume Abrioux 586f6f6262 filestore-to-bluestore: do not use --destroy
Do not use `--destroy` when zapping a device.
Otherwise, it destroys VGs while they are still needed to redeploy the
OSDs.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e3305e6bb6)
2019-12-11 16:37:21 +01:00
Guillaume Abrioux 9d89ae2c77 ceph_volume: add destroy option support
The zap action from ceph_volume module always implies `--destroy`.
This commit adds the destroy option support so we can ask ceph-volume to
not use `--destroy` when zapping a device.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0dcacdbed0)
2019-12-11 16:37:21 +01:00
Guillaume Abrioux d2b1506712 filestore-to-bluestore: add non containerized support
This commit adds the non containerized context support to the
filestore-to-bluestore.yml infrastructure playbook.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1729267

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4833b85e04)
2019-12-11 16:37:21 +01:00
Guillaume Abrioux 78799ecf55 tests: add filestore_to_bluestore job
This commit adds a new job in order to test the
filestore-to-bluestore.yml infrastructure playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 40de34fb5e)
2019-12-11 16:37:21 +01:00
Guillaume Abrioux 9867937117 ansible.cfg: do not enforce PreferredAuthentications
There's no need to enforce PreferredAuthentications by default.
Users can still choose to override the ansible.cfg with any additional
parameter like this one to fit their infrastructure.

Fixes: #4826

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d682412e2a)
2019-12-11 08:56:44 -05:00
Guillaume Abrioux ba4787817e defaults: change default value for dashboard_admin_password
A recent change in ceph/ceph prevent from having username in the
password:

`Error EINVAL: Password cannot contain username.`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0756fa467d)
2019-12-11 08:48:34 -05:00
Guillaume Abrioux 5062d4094c update: restart iscsigws daemons after upgrade
In containerized context, containers aren't stopped early in the
sequence.
It means they aren't restarted after the upgrade because the task is
just checking the daemon status is started (eg: `state: started`).

This commit also removes the task which ensure services are started
because it's already done in the role ceph-iscsigw.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c7708eb458)
2019-12-11 08:48:34 -05:00
Guillaume Abrioux fe8858af38 upgrade: add dashboard deployment
when upgrading from RHCS 3, dashboard has obviously never been deployed
and it forces us to deploy it later manually.
This commit adds the dashboard deployment as part of the upgrade to
RHCS 4.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1779092

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 451c5ca934)
2019-12-11 08:48:34 -05:00
Guillaume Abrioux 00bdb60663 defaults: add a comment
This commit isolates and adds an explicit comment about variables not
intended to be modified by the user.

Fixes: #4828

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a234338eff)
2019-12-10 13:45:18 -05:00
Guillaume Abrioux 6295a33912 dashboard: run node_export as privileged container
Typical error:

```
type=AVC msg=audit(1575367499.582:3210): avc:  denied  { search } for  pid=26680 comm="node_exporter" name="1" dev="proc" ino=11528 scontext=system_u:system_r:container_t:s0:c100,c1014 tcontext=system_u:system_r:init_t:s0 tclass=dir permissive=0
```

node_exporter needs to be run as privileged to avoid avc denied error
since it gathers lot of information on the host.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1762168

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d245eb7e7d)
2019-12-09 17:27:51 +01:00
Dimitri Savineau 0340929ed3 ceph-defaults: exclude md devices from discovery
The md devices (RAID software) aren't excluded from the devices list in
the auto discovery scenario.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1764601

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 014f51c2a4)
2019-12-09 09:32:21 +01:00
Guillaume Abrioux cfc10a8142 facts: avoid duplicated element in devices list
When using `osd_auto_discovery`, `devices` is built multiple times due
to multiple runs of `ceph-facts` role. It end up with duplicate
instances of a same device in the list.

Using `unique` filter when building the list fixes this issue.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 23b1f43897)
2019-12-05 14:51:18 +01:00
Dimitri Savineau 3b26df8c75 purge-cluster: add podman support
The podman support was added to the purge-container-cluster playbook but
containers are always used for the dashboard even on non containerized
deployment.
This commits adds the podman support on purging the dashboard resources
in the purge-cluster playbook.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 89f6cc54a2)
2019-12-04 18:00:07 -05:00
Dimitri Savineau acf2476f09 tests: reduce max_mds from 3 to 2
Having max_mds value equals to the number of mds nodes generates a
warning in the ceph cluster status:

cluster:
id:     6d3e49a4-ab4d-4e03-a7d6-58913b8ec00a'
health: HEALTH_WARN'
        insufficient standby MDS daemons available'
(...)
services:
  mds:     cephfs:3 {0=mds1=up:active,1=mds0=up:active,2=mds2=up:active}'

Let's use 2 active and 1 standby mds.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 4a6d19dae2)
2019-12-04 17:49:33 -05:00
Guillaume Abrioux 1c03d2b526 purge: rename playbook (container)
Since we now support podman, let's rename the playbook so it's more
generic.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7bc7e3669d)
2019-12-04 09:12:41 -05:00
Dimitri Savineau 98392be368 add-{mon,osd}: run raw install python tasks
If the new mon/osd node doesn't have python installed then we need to
execute the tasks from raw_install_python.yml.

Closes: #4368

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 34b03d1873)
2019-12-04 10:59:39 +01:00
Dimitri Savineau f4e5f3ee9e ceph-grafana: remove ipv6 brakets on wait_for
The wait_for ansible module doesn't support the backets on IPv6 address
so need to remove them.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1769710

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 55adc10be3)
2019-12-03 17:12:36 +01:00
Dimitri Savineau a3bfed88c9 ceph-defaults: pin prometheus container tags
In addition to the grafana container tag change, we need to do the same
for the prometheus container stack based on the release present in the
OSE 4.1 container image.

$ docker run --rm openshift4/ose-prometheus-node-exporter:v4.1 --version
node_exporter, version 0.17.0
  build user:       root@67fee13ed48f
  build date:       20191023-14:38:12
  go version:       go1.11.13
$ docker run --rm openshift4/ose-prometheus-alertmanager:4.1 --version
alertmanager, version 0.16.2
  build user:       root@70b79a3f29b6
  build date:       20191023-14:57:30
  go version:       go1.11.13
$ docker run --rm openshift4/ose-prometheus:4.1 --version
prometheus, version 2.7.2
  build user:       root@12da054778a3
  build date:       20191023-14:39:36
  go version:       go1.11.13

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 3e29b8d5ff)
2019-12-03 16:19:16 +01:00
Dimitri Savineau a325ff61e8 switch_to_containers: fix umount ceph partitions
When a container is already running on a non containerized node then the
umount ceph partition task is skipped.
This is due to the container ps command which always returns 0 even if
the filter matches nothing.

We should run the umount task when:
1/ the container command is failing (not installed) : rc != 0
2/ the container command reports running ceph-osd containers : rc == 0

Also we should not fail on the ceph directory listing.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1616159

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 39cfe0aa65)
2019-12-03 15:58:36 +01:00
Guillaume Abrioux 1e7fd9fe36 purge: do not try to stop docker when binary is podman
If the container binary is podman, we shouldn't try to stop docker here.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b18476a1a6)
2019-12-03 09:57:11 -05:00
Guillaume Abrioux 6592caab08 facts: isolate container_binary facts
in order to be able to call container_binary without having to run the
whole ceph-facts role.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fe5ffe589e)
2019-12-03 09:57:11 -05:00
Guillaume Abrioux 1f30327688 purge: remove docker_* task
All containers are removed when systemd stops them.
There is no need to call this module in purge container playbook.

This commit also removes all docker_image task and remove all container
images in the final cleanup play.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1776736

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d23383a820)
2019-12-03 09:57:11 -05:00
Guillaume Abrioux ba2925df32 dashboard: use fqdn url for active alert
When using the shortname, the URL for active alert launches with short
hostname and fails to connect to the server.

This commit changes the template in order to use the fqdn.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1765485

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a8d76d72d7)
2019-12-03 09:44:17 -05:00
Guillaume Abrioux e8ed36fdae dashboard: only print dashboard url of the grafana-server node
This commit makes the ceph-dashboard role only printing ceph-dashboard
URL of the nodes present in grafana-server group

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1762163

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit cc0c1ce301)
2019-12-03 14:49:12 +01:00
Guillaume Abrioux 88d060f6e1 docker2podman: import ceph-handler role
This is needed to avoid following error:

```
ERROR! The requested handler 'restart ceph mons' was not found in either the main handlers list nor in the listening handlers list
```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1777829

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a43a872105)
2019-12-03 10:44:48 +01:00
Guillaume Abrioux 3bd8129859 docker2podman: do not hardcode group name
let's use `client_group_name` instead of hardcoding the name.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7fe0d55eff)
2019-12-03 10:44:48 +01:00
Guillaume Abrioux c5145ccf25 docker2podman: import ceph-defaults in first play
We must import this role in the first play otherwise the first call to
`client_group_name`fails.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1777829

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 6526a25ab5)
2019-12-03 10:44:48 +01:00
Guillaume Abrioux 1d055c0c0f tests: revert vagrant_variable file name detection
This commit reverts the following change:

fcf181342a (diff-23b6f443c01ea2efcb4f36eedfea9089R7-R14)

this is causing CI failures so this commit is intended to unlock the CI.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5353ab8a23)
2019-11-25 14:52:57 +01:00
VasishtaShastry 4edef59feb Fixes failure of cephfs configuration using --limit
Configuration of cephfs with an existing cluster using --limit used to fail
at different tasks while running with site-docker.yml
This commit addresses both of those tasks

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1773489
Signed-off-by: VasishtaShastry <vipin.indiasmg@gmail.com>
(cherry picked from commit 72c43cc5d9)
2019-11-20 09:41:38 -05:00
Dimitri Savineau 225d4ceabc container: add always tag on gather fact tasks
If we execute the site-container.yml playbook with specific tags (like
ceph_update_config) then we need to be sure to gather the facts otherwise
we will see error like:

The task includes an option with an undefined variable. The error was:
'ansible_hostname' is undefined

This commit also adds missing 'gather_facts: false' to mons plays.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1754432

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d7fd769b6d)
2019-11-18 16:42:16 +01:00
VasishtaShastry e54b6be74e Evades validation of ceph_repository_type in containerized scenario
This will prevent failure of site-docker.yml with configs in doc.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1769760

Signed-off-by: VasishtaShastry <vipin.indiasmg@gmail.com>
Co-Authored-By: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9a1f1626c3)
2019-11-18 16:41:10 +01:00
Guillaume Abrioux 8be15a46f5 ceph_key: restore file mode after a key is fetched
when `import_key` is enabled, if the key already exists, it will only be
fetched using ceph cli, if the mode specified in the `ceph_key` task is
different from what is applied by the ceph cli, the mode isn't restored because
we don't call `module.set_fs_attributes_if_different()` before
`module.exit_json(**result)`

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1734513

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b717b5f736)
2019-11-15 06:10:40 +01:00
Guillaume Abrioux 99cdcf9d29 tests: add coverage on purge playbook
This commit adds a playbook to be played before we run purge playbook,
it first creates an rbd image then map an rbd device on client0 so the
purge playbook will try to unmap it.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit db77fbda15)
2019-11-14 10:49:38 -05:00
Guillaume Abrioux 15b78ae252 purge: use sysfs to unmap rbd devices
in containerized context, using the binary provided in atomic os won't
work because it's an old version provided by ceph-common based on
10.2.5.
Using a container could be an idea but for large cluster with hundreds
of client nodes, that would require to pull the image of each of them
just to unmap the rbd devices.

Let's use the sysfs method in order to avoid any issue related to ceph
version that is shipped on the host.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1766064

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3cfcc7a105)
2019-11-14 10:49:38 -05:00
Guillaume Abrioux f3b248e6b9 mergify: remove mergify config on stable-4.0
This commit removes the mergify config on stable-4.0

At the moment there is no need to have a mergify config on this branch
given that we don't use it.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-11-07 15:18:06 -05:00
Dimitri Savineau 3ebd71c8c2 ceph-osd: fix fs.aio-max-nr sysctl condition
[1] introduced a regression on the fs.aio-max-nr sysctl value condition.
The enable key isn't a boolean but a string because the expression isn't
evaluated.
This string output "(osd_objectstore == 'bluestore')" is always true
because item.enable condition only matches non empty string. So the
sysctl value was applyied for both filestore and bluestore backend.

[2] added the bool filter to the condition but the filter always returns
false on string and the sysctl wasn't applyed at all.

This commit fixes the enable key value by evaluating the value instead
of using the string.

[1] https://github.com/ceph/ceph-ansible/commit/08a2b58
[2] https://github.com/ceph/ceph-ansible/commit/ab54fe2

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ece46d33be)
2019-11-07 20:31:02 +01:00
Dimitri Savineau 54365389f8 tests/requirements: bump testinfra and pytest
The ansible ssh connections are now using the ssh backend instead of
paramiko starting testinfra 3.1 and persistent connections too.
pytest 4.6 is the latest release to be supported by python 2.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 02df2ab5ea)
2019-11-04 12:58:37 -05:00