Commit Graph

5493 Commits (11b4bf5083639abea66a874ba86ac38a1b706ca6)
 

Author SHA1 Message Date
Dimitri Savineau a3f4e2b4d1 library/ceph_key: set no_log on secret
We don't need to show this information during the module execution.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-09-24 09:23:38 -04:00
Dmitriy Rabotyagov 297532ca41 Remove libjemalloc1 installation task
libjemalloc1 package is not required neither for ganesha dependency nor
for the package build process. So this task can be simply dropped.

Signed-off-by: Dmitriy Rabotyagov <noonedeadpunk@ya.ru>
2020-09-24 13:56:16 +02:00
Dimitri Savineau 6dcfdf17d4 container: quote registry password
When using a quote in the registry password then we have the following
error:

The error was: ValueError: No closing quotation

To fix this we need to use the quote filter.

Close: https://bugzilla.redhat.com/show_bug.cgi?id=1880252

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-09-18 11:14:00 -04:00
Guillaume Abrioux ff19c1d851 facts: fix 'set_fact rgw_instances with rgw multisite'
the current condition doesn't work, as soon as the first iteration is
done the condition makes next iterations skip since `rgw_instances` got
set with the first iteration.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1859872

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-09-18 10:14:34 -04:00
Dimitri Savineau 7dfa205610 tests: disable container nfs testing
Looks like nfs-ganesha 3.3 and 4.-dev doesn't work with recent changes
in librgw 16.0.0.
The nfs-ganesha daemon is segfaulting and restart in a loop.

See https://tracker.ceph.com/issues/47520

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-09-17 16:51:33 -04:00
Dimitri Savineau 85643edfe3 ceph-infra: include iscsi nodes for logrotate
The iscsi nodes aren't included in the logrotate condition.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-09-17 20:34:56 +02:00
Guillaume Abrioux f576c02ff7 infra: support log rotation for tcmu-runner
This commit adds the log rotation support for tcmu-runner.

ceph-container related PR: ceph/ceph-container#1726

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1873915

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-09-16 20:23:22 -04:00
Dimitri Savineau e54b924eaf ceph-prometheus: update pool stat counter
Since [1] The bytes_used pool counter in prometheus has been renamed
to stored.

Closes: #5781

[1] https://github.com/ceph/ceph/commit/71fe9149

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-09-16 09:50:42 -04:00
Dimitri Savineau bda3581294 container: add optional http(s) proxy option
When using a http(s) proxy with either docker or podman we can rely on
the HTTP_PROXY, HTTPS_PROXY and NO_PROXY environment variables.
But with ansible, even if those variables are defined in a source file
then they aren't loaded during the container pull/login tasks.
This implements the http(s) proxy support with docker/podman.
Both implementations are different:
  1/ docker doesn't rely en the environment variables with the CLI.
Thos are needed by the docker daemon via systemd.
  2/ podman uses the environment variables so we need to add them to
the login/pull tasks.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1876692

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-09-16 06:52:26 +02:00
Dimitri Savineau da4280e243 switch2container: chown symlink for devices
If the OSD directory is using symlinks for referencing devices (like
block, db, wal for bluestore and journal for filestore) then the chown
command could fail to change the owner:group on some system.

$ ls -hl /var/lib/ceph/osd/ceph-0/
total 28K
lrwxrwxrwx 1 ceph ceph 92 Sep 15 01:53 block -> /dev/ceph-45113532-95ca-471b-bd75-51de46f1339c/osd-data-570a1aee-60c0-44c9-8036-ffed7d67a4e6
-rw------- 1 ceph ceph 37 Sep 15 01:53 ceph_fsid
-rw------- 1 ceph ceph 37 Sep 15 01:53 fsid
-rw------- 1 ceph ceph 55 Sep 15 01:53 keyring
-rw------- 1 ceph ceph  6 Sep 15 01:53 ready
-rw------- 1 ceph ceph  3 Sep 15 02:00 require_osd_release
-rw------- 1 ceph ceph 10 Sep 15 01:53 type
-rw------- 1 ceph ceph  2 Sep 15 01:53 whoami
$ find /var/lib/ceph/osd/ceph-0 -not -user 167 -execdir chown 167:167 {} +
chown: cannot dereference './block': Permission denied
$ find /var/lib/ceph/osd/ceph-0 -not -user 167
/var/lib/ceph/osd/ceph-0/block

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-09-15 20:05:49 +02:00
Dimitri Savineau c1af69a7e7 switch2container: remove deb systemd units
When running the switch2container playbook on a Debian based system
then the systemd unit path isn't the same than Red Hat based system.
Because the systemd unit files aren't removed then the new container
systemd unit isn't take in count.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-09-15 20:05:49 +02:00
Guillaume Abrioux 5e91e0f3e2 purge: remove potential socket leftover
This commit ensure we remove any socket left by ceph and the
`ceph-osd-run.sh` script.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1861755

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-09-14 15:34:49 -04:00
Dimitri Savineau 3ba11c1434 tests/library: rename ceph_dashboard_user class
Rename the test class with the right information.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-09-14 15:17:20 -04:00
Guillaume Abrioux 5650a6d7d0 tests: do not run node_exporter test on clients
We need to skip these tests on client nodes since we don't deploy
node_exporter on them anymore

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-09-14 14:12:21 -04:00
Dimitri Savineau abb4023d76 ceph_key: set state as optional
Most ansible module using a state parameter default to the present
value (when available) instead of using it as a mandatory option.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-09-14 14:12:21 -04:00
Guillaume Abrioux f0fc59258a Revert "ceph_pool: use default size/min_size and rule_name"
This reverts commit 142934057f.

This is already handled in the ceph_pool module itself

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-09-14 14:12:21 -04:00
Dimitri Savineau 2c4af70abd dashboard: use run_once at block level
Instead of using run_once: true on each tasks in a block section, we
can use the run_once statement at the block level.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-09-14 13:47:36 +02:00
Dimitri Savineau b105549ed8 node-exporter: exclude client nodes
We don't need to install node-exporter on client node because there's
no ceph services running on them.
This also makes sure we use the group name variables in the prometheus
service template instead of hardcoding the values.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-09-14 13:46:51 +02:00
Dimitri Savineau 3a05aeb6cb ceph_pool: set state as optional
Most ansible module using a state parameter default to the present
value (when available) instead of using it as a mandatory option.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-09-11 10:26:15 +02:00
Dimitri Savineau ee6f0547ba library: add ceph_dashboard_user module
This adds the ceph_dashboard_user ansible module for replacing the
command module usage with the ceph dashboard ac-user-xxx command.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-09-11 10:16:08 +02:00
Dimitri Savineau 142934057f ceph_pool: use default size/min_size and rule_name
Before [1] we were using default value for
  - size
  - min_size
  - rule_name

when the key wasn't present in the pool dict.

The commit [1] changed this by defaulting to omit.

This patch restores the original workflow by using facts:
  - osd_pool_default_size
  - osd_pool_default_min_size
  - ceph_osd_pool_default_crush_rule_name

[1] af9f6684f2

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-09-11 10:15:28 +02:00
Dimitri Savineau 78cb9f44bd tests: add quay registry for collocation baremetal
Even if the non containerized collocation scenario deploys ceph with
RPMs then we also deploy the dashboard/monitoring but with containers.
This requires to set the registry variable to ceph's quay.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-09-10 14:23:21 -04:00
Dimitri Savineau 8ecbdc6ede container: run engine/common roles on first client
We already do this in the site-container.yml playbook because we don't
need docker/podman installed on all client nodes and having the
container image only on the first client node.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-09-10 13:19:44 -04:00
Dimitri Savineau f63022dfec ceph-facts: only get fsid when monitor are present
When running the rolling_update playbook with an inventory without
monitor nodes defined (like external scenario) then we can't retrieve
the cluster fsid from the running monitor.
In this scenario we have to pass this information manually (group_vars
or host_vars).

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1877426

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-09-10 13:19:44 -04:00
Dimitri Savineau 8dacbce68f ceph-rgw: use ceph_pool module
Since [1] we can use the ceph_pool module instead of using the command
module combined with ceph osd pool commands.

[1] bddcb439ce

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-09-10 15:16:58 +02:00
Dimitri Savineau 98c9afceb9 tests: use grafana from quay.io
This changes the grafana container image regitry from docker.io to
quay.io to avoid rate limit.
This also adds the missing container image values for docker2podman
and podman scenarios.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-09-09 10:35:02 -04:00
Guillaume Abrioux 657e6c8c3b tests: clean legacy
clean some legacies since quay.ceph.io migration

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-09-09 14:42:41 +02:00
Francesco Pantano e65f9a5c72 Fix hosts field in rolling_update playbook when mds are processed
In the OSP context, during the rolling update the playbook fails
with the following error:

'''
ERROR! The field 'hosts' has an invalid value, which includes an
undefined variable. The error was: list object has no element 0
'''

This PR just change the hosts field providing a valid mons group
value.

Closes: https://bugzilla.redhat.com/1876803
Signed-off-by: Francesco Pantano <fpantano@redhat.com>
2020-09-08 11:52:08 -04:00
Francesco Pantano cb64df30b6 Add --cluster option on ceph require-osd-release command
On DCN environments, or when multiple ceph cluster are configured,
we need to specify the cluster name before running the command or
the rolling_update playbook will fail during minor updates.

Closes: https://bugzilla.redhat.com/1876447
Signed-off-by: Francesco Pantano <fpantano@redhat.com>
2020-09-07 16:31:14 +02:00
Guillaume Abrioux 7348e9a253 tests: disable nfs-ganesha testing
This commit diables nfs-ganesha testing on master for non-containerized
deployment because the dev repos are broken at the moment.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-09-07 12:54:29 +02:00
Guillaume Abrioux 2cbb7de3b2 tests: migrate to quay.ceph.io registry
in order to avoid docker.io rate limiting

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-09-07 12:54:29 +02:00
Dai Dang Van ae38b01d08 Fix typo shrink osd file name in day-2 docs
Signed-off-by: Dai Dang Van <daikk115@gmail.com>
2020-09-03 09:20:47 -04:00
Dimitri Savineau 4f308dcf4a tests: reenable ceph-iscsi testing
This re-adds the ceph-iscsi testing for both non containerized and
containerized deployment since the rados connection error on ceph
dev has been fixed [1].

[1] https://tracker.ceph.com/issues/47002

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-08-27 11:13:36 -04:00
Niko Smeds a951c1a3f0 Enable HAProxy backend checks for Ceph RGW
Add the `check` option to server definitions to enable basic HAProxy health
checks for Ceph RADOS gateway backends.

Currently traffic will be forwarded to unhealthly `radosgw.service` servers.
These changes resolve the issue.

Signed-off-by: Niko Smeds nikosmeds@gmail.com
2020-08-27 10:57:46 -04:00
Guillaume Abrioux cec994b973 rolling_update: remove 'ignore_errors'
There's no need to use `ignore_errors: true` on these tasks.

Using a loop on the task stopping mon daemons allows us to avoid
duplicating this task, the `ignore_errors` isn't needed here because it
won't fail the playbook if one of the ID doesn't exist (shortname vs. fqdn)

Using the right condition on the task starting the mgr daemon allows us
to avoid using an `ignore_errors: true` as well.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-08-21 09:22:36 -04:00
Guillaume Abrioux 13e2311cbe ceph_key: refact the code and minor fixes
This commit refactors the code to remove a duplicate condition and it
makes the `state: absent` code idempotent

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-08-21 09:44:47 +02:00
Guillaume Abrioux 27ca884d99 tests: add more coverage for test_ceph_key
This commit adds more coverage regarding the testing of ceph_key module

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-08-21 09:44:47 +02:00
Guillaume Abrioux 54d3e9650f dashboard: refact admin user creation task
this commit splits this task in order to avoid using a `shell` module.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-08-21 09:22:11 +02:00
Guillaume Abrioux f0fe193d8e facts: refact and optimize memory consumption
there's no need to run this task on all nodes.
This uses too much memory for nothing.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1856981

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-08-20 11:16:26 -04:00
Dimitri Savineau 6c11695fbe tests: reenable nfs-ganesha testing
This re-adds the nfs-ganesha testing in non containerized deployment.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-08-20 16:58:54 +02:00
George Shuklin 73d4bb6bd6 Make 'disable ssl for dashboard task' idempotent.
This should reduce number of 'changed' tasks during convergence test.

Signed-off-by: George Shuklin <george.shuklin@gmail.com>
2020-08-20 16:48:32 +02:00
Rafał Wądołowski 55cd6e83e4 Comment out ceph_custom_key
Since there is a check if ceph_custom_key is defined, there is no reason
to define it by default.

Signed-off-by: Rafał Wądołowski <rwadolowski@cloudferro.com>
2020-08-20 13:36:24 +02:00
Guillaume Abrioux 899d317196 iscsigw: add retry/until
In order to avoid failures that could be fixed by simply
retrying.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-08-20 13:25:05 +02:00
Guillaume Abrioux 8476beb5b1 tests: move erasure pool testing in lvm_osds
This commit moves the erasure pool creation testing from `all_daemons`
to `lvm_osds` so we can decrease the number of osd nodes we spawn so the
OVH Jenkins slaves aren't less overwhelmed when a `all_daemons` based
scenario is being tested.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-08-20 11:50:28 +02:00
John Fulton 95dee6f1ca Set default permission for prometheus config files
Regardless of the outcome of Ansible 2.9.12 issue 71200
we can set a default permission for these files.

Closes: https://github.com/ceph/ceph-ansible/issues/5677

Signed-off-by: John Fulton <fulton@redhat.com>
2020-08-18 15:49:31 -04:00
Guillaume Abrioux 51c382677d shrink-mds: use mds_to_kill_hostname instead
When using fqdn in inventory host file, this task will fail because the
mds is registered with its shortname.

It means we must use `mds_to_kill_hostname` in this task.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1869837

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-08-18 14:56:57 -04:00
Guillaume Abrioux 8ed11ea3ee infra: only install logrotate on right nodes
For intsance, there is no need to install logrotate on clients nodes.

This also ensure logrotate is installed only for containerized
deployments since the packaging has an explicit dependency to logrotate

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-08-18 10:56:09 -04:00
Guillaume Abrioux 04d77dcaeb travis: enforce ansible-lint 4.2.0
Let's pin to 4.2.0

(because of ansible/ansible-lint/issues/966)

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-08-18 10:29:19 -04:00
Guillaume Abrioux 093e1dcb21 tests: remove hosts-ubuntu inventories
Since we've dropped ubuntu testing, we don't need these inventories
anymore. Let's remove this leftover.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-08-18 11:20:48 +02:00
Guillaume Abrioux bd9e126357 tests: disable iscsigw testing (container)
Temporarily disable iscsigw testing for containerized deployments
because it's broken upstream on ceph@master.
non-containerized deployments use stable build for iscsigw to get around
this issue.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-08-18 11:20:48 +02:00