Commit Graph

5424 Commits (a6cf646e455c1252412d79f02f1eb5c5ba9d9079)
 

Author SHA1 Message Date
Guillaume Abrioux f9d4eb8b41 facts: refact `ceph_uid` fact
There's no need to set this fact with a `set_fact`
We can achieve this in `ceph-defaults`

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1875058

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit bcc673f66c)
2020-09-21 13:49:03 -04:00
Dimitri Savineau 1385d2fdd0 ceph-facts: move facts to defaults value
There's no need to define a variable via a fact if we can do it via a
default value. Using a fact could be interesseting to override the
default value on some condition.

- ceph_uid could be set to 167 by default because it's only different on
non containerized deployment on Debian/Ubuntu.
- rbd_client_directory_{owner,group,mode} could be set to ceph,ceph,0770
by default install of null as we are doing in the facts.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1875058

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 7f997e623a)
2020-09-21 13:49:03 -04:00
Dimitri Savineau 9412c44906 container: quote registry password
When using a quote in the registry password then we have the following
error:

The error was: ValueError: No closing quotation

To fix this we need to use the quote filter.

Close: https://bugzilla.redhat.com/show_bug.cgi?id=1880252

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 6dcfdf17d4)
2020-09-18 15:21:32 -04:00
Guillaume Abrioux 1527b9b12a facts: fix 'set_fact rgw_instances with rgw multisite'
the current condition doesn't work, as soon as the first iteration is
done the condition makes next iterations skip since `rgw_instances` got
set with the first iteration.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1859872

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ff19c1d851)
2020-09-18 10:35:28 -04:00
Dimitri Savineau 195ce88e26 ceph-infra: include iscsi nodes for logrotate
The iscsi nodes aren't included in the logrotate condition.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 85643edfe3)
2020-09-17 14:49:56 -04:00
Guillaume Abrioux c60a7ad4f6 infra: support log rotation for tcmu-runner
This commit adds the log rotation support for tcmu-runner.

ceph-container related PR: ceph/ceph-container#1726

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1873915

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f576c02ff7)
2020-09-16 22:37:18 -04:00
Dimitri Savineau fbc375387a container: add optional http(s) proxy option
When using a http(s) proxy with either docker or podman we can rely on
the HTTP_PROXY, HTTPS_PROXY and NO_PROXY environment variables.
But with ansible, even if those variables are defined in a source file
then they aren't loaded during the container pull/login tasks.
This implements the http(s) proxy support with docker/podman.
Both implementations are different:
  1/ docker doesn't rely en the environment variables with the CLI.
Thos are needed by the docker daemon via systemd.
  2/ podman uses the environment variables so we need to add them to
the login/pull tasks.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1876692

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit bda3581294)
2020-09-16 11:32:24 -04:00
Dimitri Savineau 13fb83fc93 ceph-prometheus: update pool stat counter
Since [1] The bytes_used pool counter in prometheus has been renamed
to stored.

Closes: #5781

[1] https://github.com/ceph/ceph/commit/71fe9149

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit e54b924eaf)
2020-09-16 10:08:54 -04:00
Dimitri Savineau 05d4e76d42 switch2container: chown symlink for devices
If the OSD directory is using symlinks for referencing devices (like
block, db, wal for bluestore and journal for filestore) then the chown
command could fail to change the owner:group on some system.

$ ls -hl /var/lib/ceph/osd/ceph-0/
total 28K
lrwxrwxrwx 1 ceph ceph 92 Sep 15 01:53 block -> /dev/ceph-45113532-95ca-471b-bd75-51de46f1339c/osd-data-570a1aee-60c0-44c9-8036-ffed7d67a4e6
-rw------- 1 ceph ceph 37 Sep 15 01:53 ceph_fsid
-rw------- 1 ceph ceph 37 Sep 15 01:53 fsid
-rw------- 1 ceph ceph 55 Sep 15 01:53 keyring
-rw------- 1 ceph ceph  6 Sep 15 01:53 ready
-rw------- 1 ceph ceph  3 Sep 15 02:00 require_osd_release
-rw------- 1 ceph ceph 10 Sep 15 01:53 type
-rw------- 1 ceph ceph  2 Sep 15 01:53 whoami
$ find /var/lib/ceph/osd/ceph-0 -not -user 167 -execdir chown 167:167 {} +
chown: cannot dereference './block': Permission denied
$ find /var/lib/ceph/osd/ceph-0 -not -user 167
/var/lib/ceph/osd/ceph-0/block

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit da4280e243)
2020-09-15 15:30:21 -04:00
Dimitri Savineau dac0415d75 switch2container: remove deb systemd units
When running the switch2container playbook on a Debian based system
then the systemd unit path isn't the same than Red Hat based system.
Because the systemd unit files aren't removed then the new container
systemd unit isn't take in count.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit c1af69a7e7)
2020-09-15 15:30:21 -04:00
Dimitri Savineau fd0b9491b6 ansible: bump to ansible 2.9
Prior this commit we were supporting both ansible 2.8 and 2.9.
Let's drop 2.8 now.

Closes: #5459
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1879178

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-09-15 13:13:09 -04:00
Guillaume Abrioux a88f911155 purge: remove potential socket leftover
This commit ensure we remove any socket left by ceph and the
`ceph-osd-run.sh` script.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1861755

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5e91e0f3e2)
2020-09-14 16:51:00 -04:00
Guillaume Abrioux f31258d604 tests: do not run node_exporter test on clients
We need to skip these tests on client nodes since we don't deploy
node_exporter on them anymore

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5650a6d7d0)
2020-09-14 16:13:25 -04:00
Dimitri Savineau 5cbbc904c1 node-exporter: exclude client nodes
We don't need to install node-exporter on client node because there's
no ceph services running on them.
This also makes sure we use the group name variables in the prometheus
service template instead of hardcoding the values.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit b105549ed8)
2020-09-14 16:13:25 -04:00
Guillaume Abrioux edb7bdd911 Revert "Make 'disable ssl for dashboard task' idempotent."
This reverts commit f607857f2a.

> That commit [1] introduced a regression in the dashboard configuration
> because the ceph config get mgr xxxx command doesn't work with
> nautilus.
> In that release the get operation needs an entity.

> [1] f607857

Signed-off-by: Dimitri Savineau dsavinea@redhat.com
2020-09-11 09:37:23 -04:00
Guillaume Abrioux 44e3195ded facts: refact and optimize memory consumption
there's no need to run this task on all nodes.
This uses too much memory for nothing.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1856981

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f0fe193d8e)
2020-09-11 09:37:23 -04:00
Guillaume Abrioux 448f36fbbd config: only add related rgw section
there's no need to add each rgw section on all rgw nodes.
With this commit, only related rgw section are rendered.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0a581a6e60)
2020-09-10 20:55:07 -04:00
Dimitri Savineau 6177a87185 ceph-iscsi: remove python rtslib shaman repository
The rtslib python library is now available in the distribution so we
shouldn't have to use the shaman repository

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 254ab54f80)
2020-09-10 20:38:34 -04:00
Dimitri Savineau 47f24ec047 Add CentOS 8 support for rpm deployment
We were only supporting CentOS 8 for containerized deployment.
Since Nautilus 14.2.10 we now have el8 rpm packages so we should be
able to deploy a nautilus ceph cluster with el8.
Note that the nfs-ganesha isn't supported because there's no el8 rpm
packages for nfs-ganesha V2.8.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-09-10 20:38:34 -04:00
Niko Smeds 67d505af82 Enable HAProxy backend checks for Ceph RGW
Add the `check` option to server definitions to enable basic HAProxy health
checks for Ceph RADOS gateway backends.

Currently traffic will be forwarded to unhealthly `radosgw.service` servers.
These changes resolve the issue.

Signed-off-by: Niko Smeds nikosmeds@gmail.com
(cherry picked from commit a951c1a3f0)
2020-09-10 20:38:01 -04:00
Guillaume Abrioux 97a2640714 dashboard: refact admin user creation task
this commit splits this task in order to avoid using a `shell` module.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 54d3e9650f)
2020-09-10 20:37:42 -04:00
George Shuklin f607857f2a Make 'disable ssl for dashboard task' idempotent.
This should reduce number of 'changed' tasks during convergence test.

Signed-off-by: George Shuklin <george.shuklin@gmail.com>
(cherry picked from commit 73d4bb6bd6)
2020-09-10 20:37:26 -04:00
Rafał Wądołowski db71eabeef Comment out ceph_custom_key
Since there is a check if ceph_custom_key is defined, there is no reason
to define it by default.

Signed-off-by: Rafał Wądołowski <rwadolowski@cloudferro.com>
(cherry picked from commit 55cd6e83e4)
2020-09-10 20:37:15 -04:00
Anthony Rusdi 46e4d2aeeb ceph_custom_repo: define apt and rpm key for custom repo
This commit also remove the notify on new added debian repo,
force update_cache to yes and define sample ceph_custom_key vars.

Signed-off-by: Anthony Rusdi <33247310+antrusd@users.noreply.github.com>
(cherry picked from commit 4c592066b7)
2020-09-10 20:37:15 -04:00
Dimitri Savineau df70345e6a ceph-rgw: allow specifying crush rule on pool
We already support specifiying a custom crush rule during pool creation
in ceph-osd role but not in ceph-rgw role.
This patch adds the missing code to implement this feature.
Note this is only available for replicated pool not erasure. The rule
must also exist prior the pool creation.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1855439

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit cb8f0237e1)
2020-09-10 20:36:54 -04:00
Dimitri Savineau 43da364188 container: run engine/common roles on first client
We already do this in the site-container.yml playbook because we don't
need docker/podman installed on all client nodes and having the
container image only on the first client node.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 8ecbdc6ede)
2020-09-10 20:36:08 -04:00
Dimitri Savineau 62f70f96ca container: don't install the engine on all clients
We only need the container engine to be installed on the first clients
node in order to execute the pools/keys operation. We already do the
same worflow with the ceph-container-common role which pull the ceph
container image.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 9805589ef9)
2020-09-10 20:36:08 -04:00
Dimitri Savineau 69b09f9336 Allow updating crush rule on existing pool
The crush rule value was only set once during the pool creation. It was
not possible to update the crush rule value by updating the value in the
configuration.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1847166

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-09-10 20:35:44 -04:00
Ali Maredia 30d08e1302 rgw: allow rgws to be concurrently with or without multisite
Allows rgws in a ceph cluster to be run with
multisite and without multisite at the same time.

Signed-off-by: Ali Maredia <amaredia@redhat.com>
(cherry picked from commit 5c1f4b1a1e)
2020-09-10 20:35:28 -04:00
Guillaume Abrioux 851a89b8fc purge-cluster: use sysfs method for unmapping rbd devices
This way we keep consistency with purge-container-cluster.yml playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f77fa6e2a4)
2020-09-10 20:35:16 -04:00
Dimitri Savineau 0f7da8b9d1 pytest: register ceph_crash mark
Otherwise we see some pytest warning.

PytestUnknownMarkWarning: Unknown pytest.mark.ceph_crash - is this a typo?
You can register custom marks to avoid this warning - for details,
see https://docs.pytest.org/en/latest/mark.html

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 03d4620269)
2020-09-10 20:35:04 -04:00
Dimitri Savineau 182319d58c ceph-handler: add missing condition on ceph-crash
The ceph-crash tasks present in the ceph-handler role don't need to be
executed on all nodes.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 18e3c7a0a2)
2020-09-10 20:35:04 -04:00
Guillaume Abrioux e0ad8194db crash: rm container in ExecPreStart even with docker
We should ensure the container is removed in `ExecPreStart` even when
`{{ container_binary }}` is docker.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 39bb279a53)
2020-09-10 20:35:04 -04:00
Guillaume Abrioux 66dde0034b ceph-crash: introduce new role ceph-crash
This commit introduces a new role `ceph-crash` in order to deploy
everything needed for the ceph-crash daemon.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9d2f2108e1)
2020-09-10 20:35:04 -04:00
Dimitri Savineau b745c76491 ceph-facts: only get fsid when monitor are present
When running the rolling_update playbook with an inventory without
monitor nodes defined (like external scenario) then we can't retrieve
the cluster fsid from the running monitor.
In this scenario we have to pass this information manually (group_vars
or host_vars).

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1877426

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f63022dfec)
2020-09-10 17:42:28 -04:00
Dimitri Savineau d461631c86 tests: use grafana from quay.io
This changes the grafana container image regitry from docker.io to
quay.io to avoid rate limit.
This also adds the missing container image values for docker2podman
and podman scenarios.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit dd05d8ba90)
2020-09-10 21:37:06 +02:00
Guillaume Abrioux 2001039c0e tests: migrate to quay.ceph.io registry
in order to avoid docker.io rate limiting

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 218aedaab6)
2020-09-10 21:37:06 +02:00
Francesco Pantano 858e50da6b Add --cluster option on ceph require-osd-release command
On DCN environments, or when multiple ceph cluster are configured,
we need to specify the cluster name before running the command or
the rolling_update playbook will fail during minor updates.

Closes: https://bugzilla.redhat.com/1876447
Signed-off-by: Francesco Pantano <fpantano@redhat.com>
(cherry picked from commit cb64df30b6)
2020-09-09 15:11:24 +02:00
Francesco Pantano 2691e385fb Fix hosts field in rolling_update playbook when mds are processed
In the OSP context, during the rolling update the playbook fails
with the following error:

'''
ERROR! The field 'hosts' has an invalid value, which includes an
undefined variable. The error was: list object has no element 0
'''

This PR just change the hosts field providing a valid mons group
value.

Closes: https://bugzilla.redhat.com/1876803
Signed-off-by: Francesco Pantano <fpantano@redhat.com>
(cherry picked from commit e65f9a5c72)
2020-09-09 15:11:02 +02:00
Guillaume Abrioux 2754895b89 tests: move erasure pool testing in lvm_osds
This commit moves the erasure pool creation testing from `all_daemons`
to `lvm_osds` so we can decrease the number of osd nodes we spawn so the
OVH Jenkins slaves aren't less overwhelmed when a `all_daemons` based
scenario is being tested.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8476beb5b1)
2020-08-20 14:16:57 +02:00
John Fulton 5b73af9c34 Set default permission for prometheus config files
Regardless of the outcome of Ansible 2.9.12 issue 71200
we can set a default permission for these files.

Closes: https://github.com/ceph/ceph-ansible/issues/5677

Signed-off-by: John Fulton <fulton@redhat.com>
(cherry picked from commit 95dee6f1ca)
2020-08-18 21:39:56 -04:00
Guillaume Abrioux c7f6d15793 shrink-mds: use mds_to_kill_hostname instead
When using fqdn in inventory host file, this task will fail because the
mds is registered with its shortname.

It means we must use `mds_to_kill_hostname` in this task.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1869837

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 51c382677d)
2020-08-18 15:10:06 -04:00
Guillaume Abrioux 03931362dc mgr: enable pg_autoscaler by default
Otherwise, even though we set the pg autoscaler attribute on a pool, the
feature won't be working as expected.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1836431

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-08-18 14:49:31 -04:00
Guillaume Abrioux d84161db1a infra: only install logrotate on right nodes
For intsance, there is no need to install logrotate on clients nodes.

This also ensure logrotate is installed only for containerized
deployments since the packaging has an explicit dependency to logrotate

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8ed11ea3ee)
2020-08-18 11:10:38 -04:00
Guillaume Abrioux af36ee479b travis: enforce ansible-lint 4.2.0
Let's pin to 4.2.0

(because of ansible/ansible-lint/issues/966)

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 04d77dcaeb)
2020-08-18 10:40:19 -04:00
Guillaume Abrioux 51b51b854a infra: add missing tag
This commit adds the missing `with_pkg` tag on the logrotate
installation task.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e1cb385740)
2020-08-13 10:09:40 -04:00
Guillaume Abrioux 886e1d85c7 purge: import ceph-defaults in purge osd play
Otherwise, `ceph_volume_debug` variable is undefined

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 33a544644a)
2020-08-13 14:21:44 +02:00
Guillaume Abrioux 3987a82d29 infra: add log rotation support (containers)
This commit adds the log rotation support via logrotate in containerized
deployments.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1848388

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f1aa6cea21)
2020-08-13 14:21:44 +02:00
Guillaume Abrioux 88c9f6d969 common: don't enable debug log on ceph-volume calls by default
ceph-volume can generate large logs at some point.

debug logs by definition should be enabled only when debugging.

Let's make it customizable with a variable which is set to `False` by
default.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 448cc280b7)
2020-08-13 14:21:44 +02:00
Guillaume Abrioux fa0484d481 nfs: do not copy rgw keyring when `nfs_obj_gw` is true
This keyring shouldn't be copied when `nfs_obj_gw` is `True` if the
cluster doesn't contain a rgw node, which can be the case given we are
using `nfs_obj_gw` instead of `nfs_file_gw` (cephfs vs. object), the
deployment will fail trying to copy a key that doesn't exist.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit dd4b5b0328)
2020-08-12 14:58:13 -04:00