Commit Graph

2857 Commits (1b49988e4aa311559287eb51dc4040584ad1472a)

Author SHA1 Message Date
Guillaume Abrioux b5985d2e83 common: drop `fetch_directory` feature
This commit drops the `fetch_directory` feature.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1cc9666c09)
2020-10-21 18:28:25 -04:00
Guillaume Abrioux 76be9a4292 ceph-config: ceph.conf rendering refactor
This commit cleans up the `main.yml` task file of `ceph-config`.
It drops the local ceph.conf generation.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 900c0f4492)
2020-10-21 18:28:25 -04:00
Guillaume Abrioux 3eed44907b iscsi: fix ownership on iscsi-gateway.cfg
This file is currently deployed with '0644' ownership making this file
readable by any user on the system.
Since it contains sensitive information it should be readable by the
owner only.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1890119

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a822f77300)
2020-10-21 18:27:50 -04:00
Guillaume Abrioux a6dac8c93d crash: refact caps definition
there is no need to use `{{ }}` syntax here.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a8bd947c7d)
2020-10-20 09:09:14 +02:00
Benoît Knecht 5e67492ef4 ceph-osd: Fix check mode for start osds tasks
Correctly set `osd_ids_non_container.stdout_lines` to an empty list if it's
undefined (i.e. in check mode).

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit 8b0023cb77)
2020-10-19 22:53:20 +02:00
Benoît Knecht 9f5ec22d34 ceph-mon: Fix check mode for deploy monitor tasks
Skip the `get initial keyring when it already exists` task when both commands
whose `stdout` output it requires have been skipped (e.g. when running in check
mode).

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit 8f436ab5d8)
2020-10-19 22:53:20 +02:00
Gaudenz Steinlin f8a64ce452 ceph-crash: Only deploy key to targeted hosts
The current task installs the ceph-crash key to "most" hosts via
"delegate_to". This key is only used by the ceph-crash daemon and should
just be installed on all hosts targeted by this role. There is no need
for using a delegated task.

Signed-off-by: Gaudenz Steinlin <gaudenz.steinlin@cloudscale.ch>
(cherry picked from commit 68cc93fb18)
2020-10-19 20:20:25 +02:00
Guillaume Abrioux 0c66f90968 ceph-osd: start osd after systemd overrides
The service should be started after the ceph-osd systemd overrides has
been added, otherwise, the latter isn't considered.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1860739

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 59d0f01992)
2020-10-15 13:52:35 +02:00
Dimitri Savineau 3f610811fe ceph-osd: don't start the OSD services twice
Using the + operation on two lists doesn't filter out the duplicate
keys.
Currently each OSDs is started (via systemd) twice.
Instead we could use the union filter.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 4eaa65c362)
2020-10-14 09:57:52 -04:00
Guillaume Abrioux d258bf4d2d handler: refact check_socket_non_container
the `stat --printf=%n` returns something like following:

```
ok: [osd0] => changed=false
  cmd: |-
    stat --printf=%n /var/run/ceph/ceph-osd*.asok
  delta: '0:00:00.009388'
  end: '2020-10-06 06:18:28.109500'
  failed_when_result: false
  rc: 0
  start: '2020-10-06 06:18:28.100112'
  stderr: ''
  stderr_lines: <omitted>
  stdout: /var/run/ceph/ceph-osd.2.asok/var/run/ceph/ceph-osd.5.asok
  stdout_lines: <omitted>
```

it makes the next task "check if the ceph osd socket is in-use" grep
like this:

```
ok: [osd0] => changed=false
  cmd:
  - grep
  - -q
  - /var/run/ceph/ceph-osd.2.asok/var/run/ceph/ceph-osd.5.asok
  - /proc/net/unix
```

which will obviously fail because this path never exists. It makes the
OSD handler broken.

Let's use `find` module instead.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 46d4d97da9)
2020-10-09 13:55:28 +02:00
Benoît Knecht c733af9d43 Fix Ansible check mode for site.yml.sample playbook
Make sure the `site.yml.sample` playbook can be run in check mode by skipping
tasks that try to read the output of commands that have been skipped.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit 54ba38e35e)
2020-10-07 07:06:19 +02:00
Dimitri Savineau 2185a2201d library: add radosgw_zone module
This adds radosgw_zone ansible module for replacing the command module
usage with the radosgw-admin zone command.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 1281e8bcc8)
2020-10-06 15:00:17 +02:00
Dimitri Savineau 5a371b3607 library: add radosgw_zonegroup module
This adds radosgw_zonegroup ansible module for replacing the command
module usage with the radosgw-admin zonegroup command.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 65dbe0782e)
2020-10-06 15:00:17 +02:00
Dimitri Savineau 1643210ca6 library: add radosgw_realm module
This adds radosgw_realm ansible module for replacing the command module
usage with the radosgw-admin realm command.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d171f4068d)
2020-10-06 15:00:17 +02:00
Dimitri Savineau 858169a27d library: add radosgw_user module
This adds radosgw_user ansible module for replacing the command module
usage with the radosgw-admin user command.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 235c7e27cc)
2020-10-06 15:00:17 +02:00
Dimitri Savineau 4fc2d788b4 library: add ceph_fs module
This adds the ceph_fs ansible module for replacing the command module
usage with the ceph fs command.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit bd611a785b)
2020-10-06 14:59:49 +02:00
Guillaume Abrioux 2a3b563c7e rgw: fix multi instances scaleout in baremetal
When rgw and osd are collocated, the current workflow prevents from
scaling out the radosgw_num_instances parameter when rerunning the
playbook in baremetal deployments.

When ceph-osd notifies handlers, it means rgw handlers are triggered
too. The issue with this is that they are triggered before the role
ceph-rgw is run.
In the case a scaleout operation is expected on `radosgw_num_instances`
it causes an issue because keyrings haven't been created yet so the new
instances won't start.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1881313

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a802fa2810)
2020-10-06 10:31:34 +02:00
Dimitri Savineau a5f19b7864 ceph_key: remove backward compatibility
It's time to remove this backward compatibility. Users had enough time
to convert their openstack_keys and key values.
We now fail in ceph-validate if the caps key isn't set.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit c960362639)
2020-10-06 10:09:16 +02:00
Guillaume Abrioux 32be163360 ceph-osd: refact `docker_exec_start_osd`
This commit drops nested jinja construction in this set_fact task.
It also rename it to `container_exec_start_osd`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ff95fa9c32)
2020-10-06 09:54:50 +02:00
Guillaume Abrioux 80879df44d defaults: change defaults value
this commit changes defaults value in default pool definitions.

there's no need to define `pg_num`, `pgp_num`, `size` and `min_size`,
`ceph_pool` module will use the current default if needed.

This also drops the 3 following `set_fact` in `ceph-facts`:

- osd_pool_default_pg_num,
- osd_pool_default_pgp_num,
- osd_pool_default_size_num

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c101cb3931)
2020-10-02 09:32:53 +02:00
Guillaume Abrioux cb44f655fc ceph_pool: refact module
remove complexity about current defaults in running cluster

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 29fc115f4a)
2020-10-02 09:32:53 +02:00
Seena Fallah 10fc2d1d92 ceph-facts: add get default crush rule from running monitor
In case of deploying new monitor node to an existing cluster,
osd_pool_default_crush_rule should be taken from running monitor because
ceph-osd role won't be run and the new monitor will have different
osd_pool_default_crush_role from other monitors.

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
(cherry picked from commit ff9f4d138f)
2020-09-29 12:15:09 -04:00
Ali Maredia 9f58d4a3d1 rgw multisite: check connection for realm endpoint
This commit adds connection checks before realm pulls
Curls are performed on the endpoint being pulled from
the mons and the rgws

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1731158

Signed-off-by: Ali Maredia <amaredia@redhat.com>
(cherry picked from commit 902575369c)
2020-09-29 09:24:59 -04:00
Tyler Bishop e3284b20ac facts: support device aliases for (dedicated|bluestore_wal)_devices
Just likve `devices`, this commit adds the support for linux device aliases for
`dedicated_devices` and `bluestore_wal_devices`.

Signed-off-by: Tyler Bishop <tbishop@liquidweb.com>
(cherry picked from commit ee4b8804ae)
2020-09-29 09:24:17 -04:00
Dimitri Savineau 6294244c4f ceph-handler: set handler on xxx_stat result
In non containerized deployment we check if the service is running
via the socket file presence.
This is done via the xxx_socket_stat variable that check the file
socket in the /var/run/ceph/ directory.
In some scenarios, we could have the socket file still present in
that directory but not used by any process.
That's why we have the xxx_stat variable which clean those leftovers.

The problem here is that we're set the variable for the handlers status
(like handler_mon_status) based on xxx_socket_stat instead of xxx_stat.
That means we will trigger the handlers if there's an old socket file
present on the system without any process associated.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1866834

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 733596582d)
2020-09-29 09:23:45 -04:00
Dimitri Savineau 77c5115ad2 ceph-iscsi: create pool once from monitor
af9f6684 introduced a regression on the ceph iscsi pool creation
because it was delegated to the first monitor node before that change.
This patch restores the initial worflow.
When the iscsi node doesn't have the admin keyring then the pool
creation fails.
This commit also ensures that the pool creation is only executed once
when having multiple iscsi nodes.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 501b8e0fd3)
2020-09-29 09:23:14 -04:00
Dimitri Savineau babc8a05fd ceph-mds: remove unused block condition
Since af9f6684 the cephfs pool(s) creation don't use the fs_pools_created
variable anymore because the ceph_pool module is idempotent.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 1db4dc807c)
2020-09-28 20:39:25 -04:00
Seena Fallah 9b0f45431d ceph-facts: check for mon socket in its own host
delegate to its own host after checking mon socket to findout if mon socket is in-use or not.

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
(cherry picked from commit 69f7e35382)
2020-09-28 20:38:52 -04:00
Guillaume Abrioux 5538dd8b3b Revert "ceph-rgw: remove ceph_pool state and default value"
This reverts commit ba3512a8fc.

(cherry picked from commit bf7b044c9a)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-09-28 16:47:45 -04:00
Dimitri Savineau 7e2e11320d ceph-rgw: remove ceph_pool state and default value
Since the state is now optional and default values are handled in the
ceph_pool module itself.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ba3512a8fc)
2020-09-25 14:05:34 -04:00
Dimitri Savineau 8d49d97582 ceph-config: remove ceph_release from ceph.conf.j2
We don't use ceph_release variable in the ceph.conf jinja template.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 62bd41f0d4)
2020-09-25 13:37:36 -04:00
Dmitriy Rabotyagov f996a232d7 Remove libjemalloc1 installation task
libjemalloc1 package is not required neither for ganesha dependency nor
for the package build process. So this task can be simply dropped.

Signed-off-by: Dmitriy Rabotyagov <noonedeadpunk@ya.ru>
(cherry picked from commit 297532ca41)
2020-09-24 14:02:27 -04:00
Guillaume Abrioux b714e04d91 facts: refact `ceph_uid` fact
There's no need to set this fact with a `set_fact`
We can achieve this in `ceph-defaults`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit bcc673f66c)
2020-09-21 12:26:19 -04:00
Dimitri Savineau c0654fa37a container: quote registry password
When using a quote in the registry password then we have the following
error:

The error was: ValueError: No closing quotation

To fix this we need to use the quote filter.

Close: https://bugzilla.redhat.com/show_bug.cgi?id=1880252

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 6dcfdf17d4)
2020-09-18 14:25:44 -04:00
Guillaume Abrioux 7e29b22c76 facts: fix 'set_fact rgw_instances with rgw multisite'
the current condition doesn't work, as soon as the first iteration is
done the condition makes next iterations skip since `rgw_instances` got
set with the first iteration.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1859872

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ff19c1d851)
2020-09-18 10:35:20 -04:00
Dimitri Savineau fb43400903 ceph-infra: include iscsi nodes for logrotate
The iscsi nodes aren't included in the logrotate condition.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 85643edfe3)
2020-09-17 14:49:35 -04:00
Guillaume Abrioux 3e53d2a811 infra: support log rotation for tcmu-runner
This commit adds the log rotation support for tcmu-runner.

ceph-container related PR: ceph/ceph-container#1726

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1873915

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f576c02ff7)
2020-09-16 21:35:48 -04:00
Dimitri Savineau 8f175eb60c container: add optional http(s) proxy option
When using a http(s) proxy with either docker or podman we can rely on
the HTTP_PROXY, HTTPS_PROXY and NO_PROXY environment variables.
But with ansible, even if those variables are defined in a source file
then they aren't loaded during the container pull/login tasks.
This implements the http(s) proxy support with docker/podman.
Both implementations are different:
  1/ docker doesn't rely en the environment variables with the CLI.
Thos are needed by the docker daemon via systemd.
  2/ podman uses the environment variables so we need to add them to
the login/pull tasks.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1876692

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit bda3581294)
2020-09-16 11:32:14 -04:00
Dimitri Savineau 95a073cb3b ceph-prometheus: update pool stat counter
Since [1] The bytes_used pool counter in prometheus has been renamed
to stored.

Closes: #5781

[1] https://github.com/ceph/ceph/commit/71fe9149

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit e54b924eaf)
2020-09-16 10:08:46 -04:00
Dimitri Savineau 25ba7f5314 node-exporter: exclude client nodes
We don't need to install node-exporter on client node because there's
no ceph services running on them.
This also makes sure we use the group name variables in the prometheus
service template instead of hardcoding the values.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit b105549ed8)
2020-09-14 16:13:11 -04:00
Dimitri Savineau 24698e7f4b dashboard: use run_once at block level
Instead of using run_once: true on each tasks in a block section, we
can use the run_once statement at the block level.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 2c4af70abd)
2020-09-14 15:54:22 -04:00
Dimitri Savineau 23522a11e4 ceph_key: set state as optional
Most ansible module using a state parameter default to the present
value (when available) instead of using it as a mandatory option.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit abb4023d76)
2020-09-14 15:37:56 -04:00
Dimitri Savineau e785654632 ceph_pool: set state as optional
Most ansible module using a state parameter default to the present
value (when available) instead of using it as a mandatory option.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 3a05aeb6cb)
2020-09-14 15:37:56 -04:00
Dimitri Savineau 6fd6b31305 library: add ceph_dashboard_user module
This adds the ceph_dashboard_user ansible module for replacing the
command module usage with the ceph dashboard ac-user-xxx command.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ee6f0547ba)
2020-09-11 09:08:56 -04:00
Guillaume Abrioux 828817489c facts: refact and optimize memory consumption
there's no need to run this task on all nodes.
This uses too much memory for nothing.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1856981

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f0fe193d8e)
2020-09-10 22:52:38 -04:00
Dimitri Savineau 593264e5f7 ceph-rgw: use ceph_pool module
Since [1] we can use the ceph_pool module instead of using the command
module combined with ceph osd pool commands.

[1] bddcb439ce

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 8dacbce68f)
2020-09-10 21:44:19 -04:00
Dimitri Savineau 0c0a930374 ceph-facts: only get fsid when monitor are present
When running the rolling_update playbook with an inventory without
monitor nodes defined (like external scenario) then we can't retrieve
the cluster fsid from the running monitor.
In this scenario we have to pass this information manually (group_vars
or host_vars).

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1877426

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f63022dfec)
2020-09-10 20:57:16 +02:00
Niko Smeds a41c572785 Enable HAProxy backend checks for Ceph RGW
Add the `check` option to server definitions to enable basic HAProxy health
checks for Ceph RADOS gateway backends.

Currently traffic will be forwarded to unhealthly `radosgw.service` servers.
These changes resolve the issue.

Signed-off-by: Niko Smeds nikosmeds@gmail.com
(cherry picked from commit a951c1a3f0)
2020-09-02 09:54:52 -04:00
Guillaume Abrioux 8fde5f7396 dashboard: refact admin user creation task
this commit splits this task in order to avoid using a `shell` module.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 54d3e9650f)
2020-08-21 13:55:54 +02:00
George Shuklin c0d98878ff Make 'disable ssl for dashboard task' idempotent.
This should reduce number of 'changed' tasks during convergence test.

Signed-off-by: George Shuklin <george.shuklin@gmail.com>
(cherry picked from commit 73d4bb6bd6)
2020-08-20 17:16:45 +02:00
Rafał Wądołowski 21a37e23b3 Comment out ceph_custom_key
Since there is a check if ceph_custom_key is defined, there is no reason
to define it by default.

Signed-off-by: Rafał Wądołowski <rwadolowski@cloudferro.com>
(cherry picked from commit 55cd6e83e4)
2020-08-20 13:43:44 +02:00
John Fulton 489efd5689 Set default permission for prometheus config files
Regardless of the outcome of Ansible 2.9.12 issue 71200
we can set a default permission for these files.

Closes: https://github.com/ceph/ceph-ansible/issues/5677

Signed-off-by: John Fulton <fulton@redhat.com>
(cherry picked from commit 95dee6f1ca)
2020-08-18 18:04:17 -04:00
Guillaume Abrioux 3fad1677d6 infra: only install logrotate on right nodes
For intsance, there is no need to install logrotate on clients nodes.

This also ensure logrotate is installed only for containerized
deployments since the packaging has an explicit dependency to logrotate

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8ed11ea3ee)
2020-08-18 11:10:57 -04:00
Dimitri Savineau e9c6028eb9 ceph-rgw: allow specifying crush rule on pool
We already support specifiying a custom crush rule during pool creation
in ceph-osd role but not in ceph-rgw role.
This patch adds the missing code to implement this feature.
Note this is only available for replicated pool not erasure. The rule
must also exist prior the pool creation.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1855439

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit cb8f0237e1)
2020-08-17 23:00:13 +02:00
Ali Maredia 63d991dc3d rgw: allow rgws to be concurrently with or without multisite
Allows rgws in a ceph cluster to be run with
multisite and without multisite at the same time.

Signed-off-by: Ali Maredia <amaredia@redhat.com>
(cherry picked from commit 5c1f4b1a1e)
2020-08-17 13:56:45 +02:00
Guillaume Abrioux 2609da6ce7 infra: add missing tag
This commit adds the missing `with_pkg` tag on the logrotate
installation task.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e1cb385740)
2020-08-13 10:09:31 -04:00
Guillaume Abrioux 29d4c42f80 infra: add log rotation support (containers)
This commit adds the log rotation support via logrotate in containerized
deployments.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1848388

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f1aa6cea21)
2020-08-12 22:57:10 +02:00
Guillaume Abrioux 8a7e4193db common: don't enable debug log on ceph-volume calls by default
ceph-volume can generate large logs at some point.

debug logs by definition should be enabled only when debugging.

Let's make it customizable with a variable which is set to `False` by
default.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 448cc280b7)
2020-08-12 22:57:10 +02:00
Guillaume Abrioux 223254e8bf nfs: do not copy rgw keyring when `nfs_obj_gw` is true
This keyring shouldn't be copied when `nfs_obj_gw` is `True` if the
cluster doesn't contain a rgw node, which can be the case given we are
using `nfs_obj_gw` instead of `nfs_file_gw` (cephfs vs. object), the
deployment will fail trying to copy a key that doesn't exist.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit dd4b5b0328)
2020-08-12 14:57:56 -04:00
raul 5fc3af5f4d rgw: support 1+ rgw instance in `radosgw_frontend_port`
Change the radosgw_frontend_port to take in account more than 1 RGW instance,
in it's original form `radosgw_frontend_port: radosgw_frontend_port | int`,
it configured the 8080 port to all instances, with the following modification
`radosgw_frontend_port: radosgw_frontend_port | int + item|int` we increase in
1 the port count.

Co-authored-by: Daniel Parkes <dparkes@redhat.com>
Signed-off-by: raul <rmahique@redhat.com>
(cherry picked from commit 110eaf5f9f)
2020-08-12 14:57:35 -04:00
Guillaume Abrioux e0dc56b73c config: only add related rgw section
there's no need to add each rgw section on all rgw nodes.
With this commit, only related rgw section are rendered.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0a581a6e60)
2020-08-05 09:50:09 -04:00
Dimitri Savineau f9a24e2541 dashboard: allow remote TLS cert/key copy
When using TLS on the ceph dashboard or grafana services, we can provide
the TLS certificate and key.
Those files should be present on the ansible controller and they will be
copyied to the right node(s).
In some situation, the TLS certificate and key could be already present
on the target node and not on the ansible controller.
For this scenario, we just need to copy the files locally (on each remote
host).

This patch adds the dashboard_tls_external variable (with default to
false) to allow users to achieve this scenario when configuring this
variable to true.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1860815

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 0d0f1e71df)
2020-08-04 14:01:59 +02:00
Dimitri Savineau 56cf7168fa ceph-handler: remove iscsigws restart scripts
The iscsigws restart scripts for tcmu-runner and rbd-target-{api,gw}
services only call the systemctl restart command.
We don't really need to copy a shell script to do it when we can use
the ansible service module instead.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit cbe79428e6)
2020-07-25 09:34:25 +02:00
Dimitri Savineau 2faed4c204 podman: always remove container on start
In case of failure, the systemd ExecStop isn't executed so the container
isn't removed. After a reboot of a failed node, the container doesn't
start because the old container is still present in created state.
We should always try to remove the container in ExecStartPre for this
situation.
A normal reboot doesn't trigger this issue and this also doesn't affect
nodes running containers via docker.
This behaviour was introduced by d43769d.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1858865

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 47b7c00287)
2020-07-24 12:47:01 -04:00
Dimitri Savineau d5974086dd ceph-facts: remove mds_name fact
The mds_name fact always gets the ansible_hostname value so we don't
need to have a dedicated fact for this and use the ansible_hostname fact
instead.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 4e84b4beed)
2020-07-24 10:50:44 -04:00
Dimitri Savineau c694454f82 ceph-handler: add missing condition on ceph-crash
The ceph-crash tasks present in the ceph-handler role don't need to be
executed on all nodes.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 18e3c7a0a2)
2020-07-22 18:47:01 -04:00
Guillaume Abrioux c0b32e4a79 crash: rm container in ExecPreStart even with docker
We should ensure the container is removed in `ExecPreStart` even when
`{{ container_binary }}` is docker.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 39bb279a53)
2020-07-22 18:47:01 -04:00
Guillaume Abrioux e6059fdcd3 ceph-crash: introduce new role ceph-crash
This commit introduces a new role `ceph-crash` in order to deploy
everything needed for the ceph-crash daemon.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9d2f2108e1)
2020-07-22 18:47:01 -04:00
Guillaume Abrioux bd12158a1c facts: fix broken facts when using --limit
This commit fixes these tasks when --limit is used.

It makes sure the fact is set on right nodes even when the playbook is
run with `--limit`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f8a951f50c)
2020-07-20 22:49:43 -04:00
Dimitri Savineau b11eeed833 ceph-dashboard: copy TLS cert/key on monitor
The ceph-dashboard role is executed on the mgr nodes so the TLS cert/key
files are copied to those nodes.
But we are running importing the cert/key files into the ceph
configuration on the monitor.

Closes: #5557

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 2b8ebf1457)
2020-07-20 22:49:20 -04:00
Guillaume Abrioux c0b8edfea2 rgw: set container memory limit to 4g
This commit changes the container memory limit for rgw daemons.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1707488

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 86edae724f)
2020-07-10 10:10:32 -04:00
Dimitri Savineau 16c5c93411 ceph-nfs: change ganesha devel source
The download.nfs-ganesha.org source for nfs-ganesha on CentOS isn't
available anymore.
Let's switch back to shaman since we have builds available now.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 1438ca0120)
2020-07-06 11:01:13 -04:00
Dimitri Savineau df3acbd974 ceph-defaults: update nfs-ganesha to 3.3
nfs-ganesha 3.3 is the latest 3.x release available for octopus so we
should update to this version.

https://download.ceph.com/nfs-ganesha/rpm-V3.3-stable/octopus

This will also match the version used in RHCS 5.

Ceph container already uses that version too.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 93754bd70c)
2020-07-03 09:05:12 +02:00
Dimitri Savineau 596f3fa161 radosgw: remove INST_PORT environment variable
This variable isn't consumed by the container so we can remove it.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 1361e84a4e)
2020-07-03 06:37:34 +02:00
Guillaume Abrioux cdf61540d8 rgw: fix multi instances scaleout
When rgw and osd are collocated, the current workflow prevents from
scaling out the radosgw_num_instances parameter when rerunning the
playbook.

The environment file used in the rgw systemd template is rendered when
executing the `ceph-rgw` role but during a new run of the playbook (in
order to scale out rgw instances), handlers are triggered from `ceph-osd`
role which is run before `ceph-rgw`, therefore it tries to start the new
rgw daemon whereas its corresponding environment file hasn't been
rendered yet and fails like following:

```
ceph-radosgw@rgw.ceph4osd3.rgw1.service failed to run 'start-pre' task: No such file or directory
```

This commit moves the tasks generating this file in `ceph-config` role
so it is generated early.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1851906

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7dd68b9ac1)
2020-07-03 06:37:34 +02:00
Dimitri Savineau 48fd6e6b16 ceph-common: remove copr and sepia repositories
All EL8 dependencies are now present on EPEL 8 so we don't need the
additional repositories that were only a temporary solution.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 3592ba1d61)
2020-06-30 17:08:53 +02:00
Dimitri Savineau e5eba9555b dashboard: configure mgr backend before restart
We need to set the mgr dashboard server ip address before restarting the
dashboard module otherwise we can try to bind the dashboard module on an
already used address.
We already do this configuration for the dashboard port value and ssl
setup so we should do the same for server address too.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1851455

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 03cd75845f)
2020-06-29 12:34:26 -04:00
George Shuklin cfc808804f Add container settings for Ubuntu 20 (the same as Ubuntu 18)
Signed-off-by: George Shuklin <george.shuklin@gmail.com>
(cherry picked from commit 3e87f53875)
2020-06-29 12:33:49 -04:00
Jonathan Rosser 28ee26ae58 Ansible tests are not filters
The use of "| success" and "| changed" are not valid syntax for modern
ansible releases.

Signed-off-by: Jonathan Rosser <jonathan.rosser@rd.bbc.co.uk>
(cherry picked from commit 42884e8175)
2020-06-26 13:36:42 -04:00
Jonathan Rosser 77002c12c8 Install python routes package as a dependancy rather than directly
This is now a dependancy of ceph-mgr so will be installed automatically
and does not need a specific task.

This change means that ceph-mgr installs correctly on Ubuntu Focal where
the python3-routes package is necessary.

Signed-off-by: Jonathan Rosser <jonathan.rosser@rd.bbc.co.uk>
(cherry picked from commit 92288c11c5)
2020-06-26 13:36:42 -04:00
Dimitri Savineau 7eddd89afa podman: Add Type and PIDFile value to unit files
This changes the way we are running the podman containers via systemd.
They are now in dettached mode and Type/PIDFile set.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1834974

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d43769dc2a)
2020-06-23 17:35:24 +02:00
Dimitri Savineau 51cfb89501 ceph-osd: remove ceph-osd-run.sh script
Since we only have one scenario since nautilus then we can just move
the container start command from ceph-osd-run.sh to the systemd unit
service.
As a result, the ceph-osd-run.sh.j2 template and the
ceph_osd_docker_run_script_path variable are removed.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 829990e60d)
2020-06-23 17:35:24 +02:00
Guillaume Abrioux e1c8a0daf6 dashboard: copy self-signed generated crt to mons
This commit makes the playbook copying self-signed generated certificate
to monitors.
When mons and mgrs are deployed on dedicated nodes the playbook will
fail when trying to import certificate and key files since they are
generated on mgrs whereas we try to import them from a monitor.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1846995

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b7539eb275)
2020-06-23 15:43:26 +02:00
Dimitri Savineau 5428a41fcf docker: Add Requires on docker service
When using docker container engine then the systemd unit scripts only
use a dependency on the docker daemon via the After parameter.
But if docker is restarted on a live system then the ceph systemd units
should wait for the docker daemon to be fully restarted.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1846830

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit bd22f1d1ec)
2020-06-22 17:30:28 -04:00
Guillaume Abrioux 4fe8e12484 switch_to_containers: don't set noup flag
We shouldn't set this flag when running switch_to_containers playbook.
Otherwise the playbook fails waiting for pgs to be clean.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1843569

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b91d60d384)
2020-06-17 09:24:02 -04:00
Dimitri Savineau c6e60db2fb container: inspect Id field instead of RepoDigests
When a container image managed by podman isn't tag anymore then the
RepoDigests field when inspecting the image doesn't return any value.
This is different from docker workflow and it breaks the ceph-ansible
container upgrade when collocated multiple services and using a non
fix container tag (like latest or 4).

$ podman images
REPOSITORY              TAG      IMAGE ID       CREATED        SIZE
docker.io/ceph/daemon   latest   680c9c0d38c3   8 days ago     957 MB
<none>                  <none>   011ee108bfc9   2 months ago   1.01 GB

$ podman inspect 680c9c0d38c3 | jq .[0].RepoDigests[0]
"docker.io/ceph/daemon@sha256:20cf789235e23ddaf38e109b391d1496bb88011239d16862c4c106d0e05fea9e"
$ podman inspect 011ee108bfc9 | jq .[0].RepoDigests[0]
null

Because this field returns "null" then the ansible task trying to
determine this value is failing

-----------------------------
fatal: [foo]: FAILED! =>
  msg: |-
    The task includes an option with an undefined variable. The error
    was: None has no element 0

    The error appears to be in
    'roles/ceph-container-common/tasks/fetch_image.yml': line 137,
    column 3, but may be elsewhere in the file depending on the exact
    syntax problem.

    The offending line appears to be:

    - name: set_fact ceph_osd_image_repodigest_before_pulling
      ^ here
-----------------------------

We don't have this behaviour with docker.

$ docker images
REPOSITORY              TAG      IMAGE ID       CREATED        SIZE
docker.io/ceph/daemon   latest   680c9c0d38c3   8 days ago     928 MB
docker.io/ceph/daemon   <none>   011ee108bfc9   2 months ago   986 MB

$ docker inspect 680c9c0d38c3 | jq .[0].RepoDigests[0]
"docker.io/ceph/daemon@sha256:45e6f28bb67c81b826acb64fad5c0da1cac3dffb41a88992fe4ca2be79575fa6"
$ docker inspect 011ee108bfc9 | jq .[0].RepoDigests[0]
"docker.io/ceph/daemon@sha256:b393a73309d72e43ca7d65cd3519036007947671e373eb59aa75a46185c52231"

Instead we should just get the Id field.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1844496

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

(cherry picked from commit cdb30bd125)
2020-06-16 13:12:26 -04:00
Ali Maredia 5b76ba12f7 rgw multisite: add master zone endpoints to zonegroup
We were only adding the endpoints to the master zone but not to the
zonegroup.
This patch fixes the issue.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1839228

Signed-off-by: Ali Maredia <amaredia@redhat.com>
(cherry picked from commit 0175c205fa)
2020-06-09 12:29:56 -04:00
Ansible Deployment User 85df54a698 rgwloadbalancer undefined index variable
The vrrp_instances variable is using a loop with index but the index_var
wasn't defined.
As a result, the fact task was failing on this undefined index variable.

The task includes an option with an undefined variable. The error was:
'index' is undefined

Closes: #5395

Signed-off-by: Florian Faltermeier <florian.faltermeier@uibk.ac.at>
(cherry picked from commit 3f906e0c26)
2020-05-26 12:09:41 -04:00
Guillaume Abrioux 4453028862 common: introduce ceph_pool module calls
This commits calls the `ceph_pool` module for creating ceph pools
everywhere it's needed in the playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit af9f6684f2)
2020-05-22 17:05:22 +02:00
Dimitri Savineau 3247b1eea9 ceph-nfs: add stable noarch repository
When using the stable nfs ganesha repository, we need have both arch
and noarch repositories enabled.
Currently the noarch repository is missing which cause the non
containerized deployment to fail.

Closes: #5375

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 44e1ebaaff)
2020-05-19 11:18:45 -04:00
Guillaume Abrioux 521c356f33 common: fix target_size_ratio task enablement
The condition on this task is wrong, we have to check whether
`target_size_ratio` is set in the pool definition instead.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8c7a48832c)
2020-05-19 15:15:03 +02:00
Guillaume Abrioux ec21d57d23 facts: always set ceph_run_cmd and ceph_admin_command
always set these facts on monitor nodes whatever we run with `--limit`.
Otherwise, playbook will fail when using `--limit` on nodes where these
facts are used on a delegated task to monitor.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e5e81843e9)
2020-05-15 09:56:10 -04:00
Dimitri Savineau 02e5167f2a ceph-nfs: bind mount ganesha log directory
The current ganesha log directory is only present in the container
and not bind mount on the host.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 222fe4abd8)
2020-05-13 16:41:49 -04:00
Dimitri Savineau 015fb8e0b9 dashboard: allow disabling grafana api ssl verify
When using an untrusted TLS certificate (like self-signed) on grafana
then the grafana dashboards update subcommand will fail.
One solution could be to trust the TLS certificate.
The other one is to disable the TLS verification on the grafana API.

Closes: #5324

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit b20519efd0)
2020-05-13 16:41:36 -04:00
Dimitri Savineau 9a7af0ce6a docker2podman: manage dashboard nodes
The dashboard nodes (alertmanager, grafana, node-exporter, and prometheus)
were not manage during the docker to podman migration.

This adds the systemd container template of those services to a dedicated
file (systemd.yml) in order to include it in the docker2podman playbook.

This also adds the dashboard container images pull from docker to podman.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1829389

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 252e78b4e4)
2020-05-13 16:41:11 -04:00
Benoît Knecht 94a71258a8 ceph-validate: Expand templates in rgw_create_pools
Same fix as `ceph-rgw` for `rgw_create_pools` pool names that contain Jinja
templates.

See #5348 for details.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit 444b46ea24)
2020-05-11 14:00:08 -04:00
Benoît Knecht 9268b34464 ceph-rgw: Make sure pool name templates are expanded
It is common to set templated pool names in `rgw_create_pools`, e.g.

```yaml
rgw_create_pools:
  "{{ rgw_zone }}.rgw.buckets.index":
    pg_num: 16
    size: 3
    type: replicated
```

This worked fine with Ansible 2.8, but broke in Ansible 2.9 due to a change in
the way `with_dict` works [1].

This commit replaces the use of `with_dict` with

```yaml
loop: "{{ rgw_create_pools | dict2items }}"
```

which works as intended and expands the template in the pool name.

[1]: https://docs.ansible.com/ansible/latest/porting_guides/porting_guide_2.9.html#loops

Closes #5348

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit d2b7670c7d)
2020-05-11 14:00:08 -04:00
Benoît Knecht da6e31a4c6 ceph-validate: Fix "fail on unsupported CentOS release"
The `dashboard_enabled` condition used a `true` filter (which doesn't exist)
instead of the `bool` filter.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit b7efca1785)
2020-05-08 12:50:14 -04:00
Dimitri Savineau 837657b959 ceph-rgw: use match instead of equalto from jinja2
The '==' jinja2 operator (or 'equalto') has been introduced in jinja2
2.8.
On EL7, jinja2 version is 2.7 so the operator isn't present creating
templating error like:

The error was: TemplateRuntimeError: no test named '=='

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1747206

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 34e6e8e06c)
2020-05-06 15:31:19 -04:00
Dimitri Savineau 4c3a21845c ceph-nfs: fix internal ganesha deployment
Since ea2b654d9 we're not running the rados command from the monitor
nodes but from the ganesha node. Unfortunately we don't have the
required keyring on that node to run the rados command as we don't
import the right keyring.
This commit restores the workflow for internal ganesha deployment like
before ea2b654d9 but keeps the rados commands from the ganesha node for
external deployment until we have a better design.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 8a890306ad)
2020-05-06 13:30:12 -04:00
Dimitri Savineau ddd907c9ec ceph-nfs: fix keyring copy for external ganesha
Fix the condition on the keyring copy task that prevent the ganesha
keyring to be created in the /var/lib/ceph directory.
Also ensure that the directory exists first.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1831285

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 748ac4b928)
2020-05-06 13:30:12 -04:00
Guillaume Abrioux ab9795e1b9 nfs: fix 2 typo
The condition is missing an index here which makes the playbook failing.

Typical error:
```
The conditional check 'not item.get('skipped', False)' failed. The error was: error while evaluating conditional (not item.get('skipped', False)): 'list object' has no attribute 'get'",
```

Also, adds the missing '/keyring' on the `exec_cmd_nfs` fact.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1831342

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit cf460274c7)
2020-05-06 13:30:12 -04:00
Dimitri Savineau c220e6f941 ceph-facts: fix IPv6 _radosgw_address interface
When using radosgw_interface and IPv6 setup then the _radosgw_address
fact doesn't use square brackets compared to the radosgw_address and
radosgw_address_block configuration.

Closes: #5325

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ed4f23d530)
2020-04-30 13:29:22 -04:00
fmount 1185867ae1 Refresh ceph dashboard user role
This change allows the operator to refresh the
ceph dashboard admin role on multiple ceph-ansible
executions.
In the current state the role is set only when the
user is created, and there's no way to change it if
the user exists.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1826002
Signed-off-by: fmount <fpantano@redhat.com>
(cherry picked from commit 5eb363e033)
2020-04-23 18:03:55 -04:00
Dimitri Savineau 777d65f0ac ceph-dashboard: fix mgr dashboard IPv6 fact
15ed9ee introduced a regression for the mgr dashboard daemon using
IPv6 since the mgr dashboard configuration doesn't support brackets.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1827299

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f1728929cd)
2020-04-23 16:23:20 -04:00
Dimitri Savineau 8a10918e49 Readd CentOS 7 with conditions
The CentOS 7 distribution could still be used be deploying ceph if
  - it's a containerized deployment
  - it's a non containerized deployment without the dashboard (due to
missing python3 libraries).

The ceph_stable_redhat_distro variable has been remove because we can
rely on the ansible_distribution_major_version fact instead.

The copr el8 repository configuration is only applied for CentOS 8.

The ceph-mgr-dashboard package is only installed when the
dashboard_enabled variable is set to true.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 2547ab601a)
2020-04-23 16:07:14 -04:00
Guillaume Abrioux 476aac6eee mds: don't enable application pool on cephfs pools
this commit removes the task which enable application on cephfs pools.

See: https://tracker.ceph.com/issues/43761

Fixes: #5278

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 86dc6f8206)
2020-04-23 11:13:30 -04:00
ianwatsonrh 821d0079b1 typo: updating type check on rc
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1826884
Signed-off-by: ianwatsonrh <ianwatson@redhat.com>
(cherry picked from commit ccf6a7f153)
2020-04-23 09:58:55 -04:00
abaird-rh c5e48dc316 Updated use of deprecated filter
This was removed in Ansible 2.9.

[DEPRECATION WARNING]: Using tests as filters is deprecated. Instead of
using `result|version_compare` use `result is version_compare`. This
feature will be removed in version 2.9. Deprecation warnings can be
disabled by setting deprecation_warnings=False in ansible.cfg.

Rename 'version_compare' to the function 'version'.

version_compose was renamed to version since ansible 2.5

Signed-off-by: abaird-rh <abaird@redhat.com>
(cherry picked from commit eb71244bfd)
2020-04-20 13:37:34 -04:00
Guillaume Abrioux a385f12505 mds: fix --limit run against mds nodes
This commit fixes --limit runs against mds nodes.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 378405e328)
2020-04-14 11:51:28 -04:00
Guillaume Abrioux 65cb085af6 nfs: create empty rados index object for nfs standalone
This commit creates an empty rados index object even when deploying
standalone nfs-ganesha.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1822328

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ea2b654d95)
2020-04-14 11:50:51 -04:00
Paulo Matias 3c32ed285b Allow user to specify grafana_server_fqdn
This is needed to get a TLS certificate to validate correctly.

If unspecified, auto-detected grafana_server_addr is used.

Signed-off-by: Paulo Matias <matias@ufscar.br>
(cherry picked from commit 38ce02c2ea)
2020-04-14 11:39:06 -04:00
Paulo Matias 3f76f3abad Prometheus APIs are only available through plain http
Trying to access these APIs through TLS produces "Could not reach
external API" errors in Ceph dashboard.

Signed-off-by: Paulo Matias <matias@ufscar.br>
(cherry picked from commit dac8e1d0a9)
2020-04-14 11:39:06 -04:00
Guillaume Abrioux 73b370cacf osd: fix monitor_name error when scaling out OSDs
This commit fixes a bug when trying to scale out osd nodes with
`crush_rule_config` is enabled.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1822599

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4bcc52cb2a)
2020-04-09 15:14:07 -04:00
Dimitri Savineau 4428487abc ceph-validate: update RHEL requirement for RHCS
We were not testing the right ansible_distribution fact value for RHEL
distribution.
This commit also updates the minial RHEL version supported by RHCS.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 5de74fe512)
2020-04-09 14:52:11 -04:00
Dimitri Savineau c07dc38763 ceph-mgr: add saml python lib for dashboard SSO
The dashboard SSO mgr module requires the saml python library to be
installed. This is only a valid scenario for RHCS deployment because
the saml python library isn't available in other classic repositories.
This package is present in RHCS Tools repository so we also need to
enable it on the mgr nodes.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1820233

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 6617d90733)
2020-04-06 16:51:57 +02:00
Guillaume Abrioux ed1f63134b osd: use default crush rule name when needed
When `rule_name` isn't set in `crush_rules` the osd pool creation will
fail.
This commit adds a new fact `ceph_osd_pool_default_crush_rule_name` with
the default crush rule name.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1817586

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1bb9860dfd)
2020-03-31 19:42:14 -04:00
Guillaume Abrioux 5b89635a50 tests: add more coverage in external_clients scenario
Run create_users_keys.yml in external_clients scenario

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8c1c34b201)
2020-03-31 19:42:14 -04:00
Guillaume Abrioux 4c5e0f7f78 osd: support changing default rule even when osd_crush_location isn't defined
Creating crush rules even with no crush hierarchy configuration is a
valid scenario so we shouldn't be bound to the first task result (which
configure crush hierarchy) to be able to add new crush rules.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1816989

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5b0476385c)
2020-03-31 16:14:11 -04:00
Guillaume Abrioux 963e155476 nfs: update repo url
as of V3.2 / octopus, the url has changed.
This commit updates the url in the playbook accordingly.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-31 10:02:47 -04:00
Guillaume Abrioux cbdfc10dc6 defaults: change nfs_ganesha_stable_branch
In master, even though we are using dev repo, the value here should be closer
from the last stable released.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-31 10:02:47 -04:00
Guillaume Abrioux 9aa8e9aff1 common: add el8 repo configuration
This commit adds el8 repo configuration since we are still missing these
packages in el8.

This is only intended to make CI passing on stable-5.0 and *must* be
reverted once these packages will be available.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-31 10:02:47 -04:00
Guillaume Abrioux ddeb603e3e tests: update testing
This commit updates the testing so we test stable-5.0 against
ceph@octopus.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-31 10:02:47 -04:00
Dimitri Savineau 1b80174feb ceph-defaults: update ceph_stable_redhat_distro
Since octopus the ceph_stable_redhat_distro variable should be set to
el8 instead of el7.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-31 10:02:47 -04:00
Dimitri Savineau 1b094acf24 container: remove ulimit nofile parameter
Since Ceph Octopus is python3 only we don't need to specify the max open
files anymore with the container engine.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 64701437de)
2020-03-30 09:22:28 -04:00
Dimitri Savineau ef5d015b21 rhcs: drop debian support
Support for debian with RHCS has been dropped starting RHCS 4

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 4ac99223b2)
2020-03-27 10:22:32 -04:00
Dimitri Savineau a6f7c0d335 rhcs: update release to 5 for octopus
RHCS 5 will be based on Ceph Octopus release and only supported on
RHEL 8.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 90ad110861)
2020-03-26 19:02:58 -04:00
Guillaume Abrioux 7105a12924 defaults: remove legacy comment
This is no longer true, let's remove this comment given that this option
is not ignored in containerized deployments.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e551b5ba1a)
2020-03-26 12:08:11 -04:00
Dimitri Savineau abf40c59ab ceph-facts: fix rgw_instances_all fact
The rgw_instances_all fact is supposed to be the list of all radosgw
instances from all rgw nodes.
But the fact is always using the local rgw_instances variable so this
won't work on multiple nodes.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 0487d21938)
2020-03-25 15:23:08 -04:00
Guillaume Abrioux 1b0b7af119 osd: add a default value for 'default' in crush_rules
Let's default to `False` for the `default` attribute in `crush_rules`
variable.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1797774

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-24 08:41:43 -04:00
Dimitri Savineau df8f853c85 Add pacific release
Add the 16th ceph release: pacific.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-24 09:47:12 +01:00
Guillaume Abrioux 1a7f3caecb facts: fix typo
This commit fixes a typo in some task titles

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-23 14:03:52 -04:00
Guillaume Abrioux cc28d9ec26 nfs: fix nfs with external ceph cluster support
This commit refact and fix the nfs deployment with external ceph cluster
support.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1814942

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-19 18:21:16 -04:00
Dimitri Savineau fb69f6990c dashboard: allow to set read-only admin user
This commit allows one to set the role for the admin user as read-only.
This can be controlled via the dashboard_admin_user_ro variable but the
default value is false for backward compatibility.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1810176

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-19 15:34:41 +01:00
Dimitri Savineau 5051e67f8f ceph-defaults: add registry name on dashboard vars
We don't use the registry name when using the community dashboard
container images (grafana, prometheus, alertmanager & node exporter).
This commit adds the docker.io registry explicitly in the default
dashboard container image name values.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-19 14:27:50 +01:00
Dimitri Savineau b97a4d5201 ceph-defaults: update grafana container tag
Since 8e8aa73 we're using grafana 5.4.3 in RHCS 4.1 via [1].
We should also update the grafana container tag from docker.io when
using the community release.

[1] registry.redhat.io/rhceph/rhceph-4-dashboard-rhel8:4

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-17 14:35:06 +01:00
petruha 73b3fadb0e ceph-facts: Fix system_secret_key variable handling
This commit fixes the system_secret_key variable not substitued by the
right value and always using the 'system_secret_key' string instead.

$ egrep 'system_(access|secret)_key' group_vars/all.yml
system_access_key: foofoofoofoofoofoofo
system_secret_key: barbarbarbarbarbarbarbarbarbarbarbarbarb

$ ansible-playbook -vv -i hosts site.yml.sample -e rgw_multisite=true
(...)
  - hostname: storage0
    endpoint: http://192.168.100.42:8080
    instance_name: rgw0
    radosgw_address: 192.168.50.3
    radosgw_frontend_port: 8085
    rgw_realm: canada
    rgw_zone: montreal
    rgw_zone_user: justin.trudeau
    rgw_zone_user_display_name: Justin Trudeau
    rgw_zonegroup: quebec
    system_access_key: foofoofoofoofoofoofo
    system_secret_key: system_secret_key

Fixes https://github.com/ceph/ceph-ansible/issues/5150

Signed-off-by: petruha <5363545+p37ruh4@users.noreply.github.com>
2020-03-16 17:38:52 -04:00
Guillaume Abrioux 152c2caa9f config: remove legacy option in ceph.conf.j2
This option has been deprecated (As of 0.51).
By the way, ceph-ansible already sets the
auth_{service,client,cluster}_required variables.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1623586

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-16 09:49:36 -04:00
Dimitri Savineau 3626c688cf handler: add rgw multi-instances support
This commit adds the rgw multi-instances support in ceph-handler
(restart_rgw_daemons.sh.j2)

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-12 16:44:48 -04:00
Guillaume Abrioux 60a2e28189 rgw: add multi-instances support when deploying multisite
This commit adds the multi-instances when deploying rgw multisite

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-12 16:44:48 -04:00
Dimitri Savineau e8bf0a0cf2 ceph-infra: open radosgw ports for multi instances
When using the radosgw multi instances configuration then the firewall
rules aren't adapted to that setup.
We only open the port according to the radosgw_frontend_port variable
so only the first radosgw instance port will be opened in the firewall
configuration.
We should instead iterate over the rgw_instances list.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-12 16:44:48 -04:00
Dimitri Savineau e62532de46 update osd pool set size command
Since [1] we can't use osd pool without replicas (size: 1) by default.
We now need to set the mon_allow_pool_size_one flag to true in the ceph
configuration and add the --yes-i-really-mean-it flag to the osd pool
set size cli.

[1] https://github.com/ceph/ceph/commit/21508bd

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-11 11:25:42 +01:00
Guillaume Abrioux b3bbd6bb77 rgw: fix a typo in create_realm_zonegroup_zone_lists
This commit fixes a typo.

`s/realms/secondary_realms`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-10 14:13:30 +01:00
Guillaume Abrioux b3d943fe9f infra: add retries/until on firewalld start task
This commit make that task retrying 5 times to start the service
firewalld to avoid failure like following:

```
TASK [ceph-infra : start firewalld] ********************************************
task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-centos-container-purge/roles/ceph-infra/tasks/configure_firewall.yml:22
Monday 09 March 2020  08:58:48 +0000 (0:00:00.963)       0:02:16.457 **********
fatal: [osd4]: FAILED! => changed=false
  msg: |-
    Unable to enable service firewalld: Created symlink from /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service to /usr/lib/systemd/system/firewalld.service.
    Created symlink from /etc/systemd/system/multi-user.target.wants/firewalld.service to /usr/lib/systemd/system/firewalld.service.
    Failed to execute operation: Connection reset by peer
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-09 15:01:34 -04:00
Christian Berendt 608d7188a1 openstack_keys: use openstack_cinder_pool.name
Instead of volumes as a static string the openstack_cinder_pool.name
variable should be used as with the other keys.

Signed-off-by: Christian Berendt <berendt@betacloud-solutions.de>
2020-03-09 08:17:22 -04:00
Guillaume Abrioux 7a8a719e75 rgw: add retry/until on pools tasks
Sometimes, these task can timeout for some reason.
Adding these retries can help to avoid unexcepted failures.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-06 08:55:13 -05:00
Guillaume Abrioux eac207091b client: skip create_users_keys.yml when rolling_update
There's no need to run this part of the role when upgrading clients
node. Let's skip it when rolling_update.yml is being run.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-04 13:06:32 -05:00
Ali Maredia 71f55bd54d rgw multisite: enable more than 1 realm per cluster
Make it so that more than one realm, zonegroup,
or zone can be created during a run of the rgw
multisite ansible playbooks.

The rgw hosts now need to be grouped into zones
and realms in the inventory.

.yml files need to be created in group_vars
for the realms and zones. Sample yaml files
are available.

Also remove multsite destroy playbook
and add --cluster before radosgw-admin commands

remove manually added rgw_zone_endpoints var
and have ceph-ansible automatically add the
correct endpoints of all the rgws in a rgw_zone
from the information provided in that rgws hostvars.

Signed-off-by: Ali Maredia <amaredia@redhat.com>
2020-03-04 12:58:13 -05:00
Guillaume Abrioux e17c79b871 osd: do not change pool size on erasure pool
This commit adds condition in order to not try to customize pools size
when its type is erasure.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-04 09:29:01 -05:00
Guillaume Abrioux 47adc2bb08 osd: add pg autoscaler support
This commit adds the pg autoscaler support.

The structure for pool definition has now two additional attributes
`pg_autoscale_mode` and `target_size_ratio`, eg:

```
test:
  name: "test"
  pg_num: "{{ osd_pool_default_pg_num }}"
  pgp_num: "{{ osd_pool_default_pg_num }}"
  rule_name: "replicated_rule"
  application: "rbd"
  type: 1
  erasure_profile: ""
  expected_num_objects: ""
  size: "{{ osd_pool_default_size }}"
  min_size: "{{ osd_pool_default_min_size }}"
  pg_autoscale_mode: False
  target_size_ratio": 0.1
```

when `pg_autoscale_mode` is `True` user has to set a decent value in
`target_size_ratio`.

Given that it's a new feature, it's still disabled by default.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1782253

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-04 09:29:01 -05:00
Guillaume Abrioux bf1f125d71 osd: refact osd pool creation
Currently, the command executed is wrong, eg:

```
  cmd:
  - podman
  - exec
  - ceph-mon-controller-0
  - ceph
  - --cluster
  - ceph
  - osd
  - pool
  - create
  - volumes
  - '32'
  - '32'
  - replicated_rule
  - '1'
  delta: '0:00:01.625525'
  end: '2020-02-27 16:41:05.232705'
  item:
```

From documentation, the osd pool creation command is :

```
ceph osd pool create {pool-name} {pg-num} [{pgp-num}] [replicated] \
     [crush-rule-name] [expected-num-objects]
ceph osd pool create {pool-name} {pg-num}  {pgp-num}   erasure \
     [erasure-code-profile] [crush-rule-name] [expected_num_objects]
```

it means we pass '1' (from item.type) as value for
`expected_num_objects` by default which is very likely not what we want.

Also, this commit modifies the default value when no `rule_name` is set
to use the existing variable `osd_pool_default_crush_rule`

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1808495

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-04 09:29:01 -05:00
Dimitri Savineau be8b315102 ceph-validate: add key format validation
If the user provides manually the key value for a specific keyring then
there's not valation on the content which could lead to unexpected
failures in the ceph_key module.

Closes: #5104

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-03 10:01:58 +01:00
Dimitri Savineau 9d3b49293d purge: stop rgw instances by iteration
It looks like that the service module doesn't support wildcard anymore
for stopping/disabling multiple services.

fatal: [rgw0]: FAILED! => changed=false
  msg: 'This module does not currently support using glob patterns,
        found ''*'' in service name: ceph-radosgw@*'
...ignoring

Instead we should iterate over the rgw_instances list.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-02 16:32:06 +01:00
Dimitri Savineau 90b1fc8fe9 ceph-infra: install firewalld python bindings
When using the firewalld ansible module we need to be sure that the
python bindings are installed.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-02 16:32:06 +01:00
Dimitri Savineau 45fb9241c0 ceph-infra: split firewalld tasks
Since ansible 2.9 the firewalld task could not be used with service and
source in the same time anymore.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-02 16:32:06 +01:00
Dimitri Savineau aefba82a2e Add ansible 2.9 support
This commit adds ansible 2.9 support in addition of 2.8.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-02 16:32:06 +01:00
Guillaume Abrioux 0326d992c2 osd: add journal option in ceph_volume call (batch)
This commit adds the journal option to the ceph_volume call when
scenario is lvm batch

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-02-28 17:29:59 -05:00
Guillaume Abrioux a084a2a347 common: support OSDs with more than 2 digits
When running environment with OSDs having ID with more than 2 digits,
some tasks don't match the system units and therefore, playbook can fail.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1805643

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-02-27 09:48:36 +01:00
Dimitri Savineau 44e750ee5d ceph-rgw: increase connection timeout to 10
5s as a connection timeout could be low in some setup. Let's increase
it to 10s.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-02-24 16:01:36 +01:00
Francesco Pantano 15ed9eebf1 Configure ceph dashboard backend and dashboard_frontend_vip
This change introduces a new set of tasks to configure the
ceph dashboard backend and listen just on the mgr related
subnet (and not on '*'). For the same reason the proper
server address is added in both prometheus and alertmanger
systemd units.
This patch also adds the "dashboard_frontend_vip" parameter
to make sure we're able to support the HA model when multiple
grafana instances are deployed.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792230
Signed-off-by: Francesco Pantano <fpantano@redhat.com>
2020-02-19 17:52:53 -05:00
Dimitri Savineau ac0f68ccf0 ceph-dashboard: update create/get rgw user tasks
Since [1] if a rgw user already exists then the radosgw-admin user create
command will return an error instead of modifying the current user.
We were already doing separated tasks for create and get operation but
only for multisite configuration but it's not enough.
Instead we should do the get task first and depending on the result
execute the create.
This commit also adds missing run_once and delegate_to statement.

[1] https://github.com/ceph/ceph/commit/269e9b9

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-02-18 10:22:21 +01:00
Sam Choraria 2a2656a985 ceph-rgw: allow SSL certificate content to supplied
Allow SSL certificate & key contents to be written to the path
specified by radosgw_frontend_ssl_certificate. This permits a
certificate to be deployed & renewal of expired certificates
through ceph-ansible.

Signed-off-by: Sam Choraria <sam.choraria@bbc.co.uk>
2020-02-17 16:22:11 +01:00
Dimitri Savineau c644ea9041 ceph-defaults: remove bootstrap_dirs_xxx vars
Both bootstrap_dirs_owner and bootstrap_dirs_group variables aren't
used anymore in the code.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-02-17 16:17:40 +01:00
Ali Maredia 1834c1e48d rgw: extend automatic rgw pool creation capability
Add support for erasure code pools.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1731148

Signed-off-by: Ali Maredia <amaredia@redhat.com>
Co-authored-by: Dimitri Savineau <dsavinea@redhat.com>
2020-02-17 16:07:43 +01:00
Florian Faltermeier 9d081e2453 ceph-rgw-loadbalancer: Fix SSL newline issue
The ad7a5da commit introduced a regression when using TLS on haproxy
via the haproxy_frontend_ssl_certificate variable.
This cause the "stats socket" and the "tune.ssl.default-dh-param"
parameters to be on the same line resulting haproxy failing to start.

[ALERT] 351/140240 (21388) : parsing [xxxxx] : 'stats socket' : unknown
keyword 'tune.ssl.default-dh-param'. Registered
[ALERT] 351/140240 (21388) : Fatal errors found in configuration.

Fixes: #4869

Signed-off-by: Florian Faltermeier <florian.faltermeier@uibk.ac.at>
2020-02-17 16:05:42 +01:00
Dimitri Savineau 16e12bf2bb rgw: don't create user on secondary zones
The rgw user creation for the Ceph dashboard integration shouldn't be
created on secondary rgw zones.

Closes: #4707
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1794351

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-02-17 15:08:11 +01:00
John Fulton e4bf4857f5 The _filtered_clients list should intersect with ansible_play_batch
Client configuration with --limit fails without this patch
because certain tasks are only done to the first host in the
_filtered_clients list and it's likely that first host will
not be included in what's sepcified with --limit. To fix this
the _filtered_clients list should be built from all clients
in the inventory that are also in the running play.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1798781

Signed-off-by: John Fulton <fulton@redhat.com>
2020-02-17 11:29:18 +01:00
Dimitri Savineau 6dd9b25565 ceph-iscsi: don't use ceph_dev_xxx variables
Using ceph_dev_branch and ceph_dev_sha1 for configuring ceph-iscsi
repositories from shaman doesn't make sense because the ceph devel
branches and sha1 aren't compatible with ceph-iscsi devel.
Instead we could rely on the master branch and the latest sha1.
Currently it's not possible to using a custom ceph branch/sha1 value
with iscsi setup otherwise the repository setup will fail.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-02-17 10:56:52 +01:00
Dimitri Savineau 10951eeea8 ceph-nfs: fix ceph_nfs_ceph_user variable
The ceph_nfs_ceph_user variable is a string for the ceph-nfs role but a
list in ceph-client role.
6a6785b introduced a confusion between both variable type in the ceph-nfs
role for external ceph with ganesha.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1801319

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-02-17 10:56:05 +01:00
Dimitri Savineau 0a3e85e8ca ceph-nfs: add nfs-ganesha-rados-urls package
Since nfs-ganesha 2.8.3 the rados-urls library has been move to a
dedicated package.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-02-17 10:52:30 +01:00
Dimitri Savineau 1fc6b33714 ceph-{mon,osd}: move default crush variables
Since ed36a11 we move the crush rules creation code from the ceph-mon to
the ceph-osd role.
To keep the backward compatibility we kept the possibility to set the
crush variables on the mons side but we didn't move the default values.
As a result, when using crush_rule_config set to true and wanted to use
the default values for crush_rules then the crush rule ansible task
creation will fail.

"msg": "'ansible.vars.hostvars.HostVarsVars object' has no attribute
'crush_rules'"

This patch move the default crush variables from ceph-mon to ceph-osd
role but also use those default values when nothing is defined on the
mons side.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1798864

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-02-17 10:50:53 +01:00
Dimitri Savineau 15bd4cd189 ceph-grafana: fix grafana_{crt,key} condition
The grafana_{crt,key} aren't boolean variables but strings. The default
value is an empty string so we should do the conditional on the string
length instead of the bool filter

Closes: #5053

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-02-17 10:49:08 +01:00
Dimitri Savineau b9d975385c ceph-prometheus: add alertmanager HA config
When using multiple alertmanager nodes (via the grafana-server group)
then we need to specify the other peers in the configuration.

https://prometheus.io/docs/alerting/alertmanager/#high-availability
https://github.com/prometheus/alertmanager#high-availability

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792225

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-02-17 10:46:21 +01:00
Dimitri Savineau 5a03e0ee1c containers: add KillMode=none to systemd templates
Because we are relying on docker|podman for managing containers then we
don't need systemd to manage the process (like kill).

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-02-13 16:11:33 +01:00
Dimitri Savineau c6e96699f7 dashboard: allow configuring multiple grafana host
When using multiple grafana hosts then we push set the grafana and
prometheus URL and push the dashboard layout to a single node.

grafana_server_addrs is the list of all grafana nodes and used during
the ceph-dashboard role (on mgr/mon nodes).
grafana_server_addr is the current grafana node used during the
ceph-grafana and ceph-prometheus role (on grafana-server nodes).

We don't have the grafana_server_addr fact duplication code between
external vs collocated nodes.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1784011

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-02-10 11:18:45 -05:00
Guillaume Abrioux 3700aa5385 switch_to_containers: increase health check values
This commit increases the default values for the following variable
consumed in switch-from-non-containerized-to-containerized-ceph-daemons.yml
playbook.
This also moves these variables in `ceph-defaults` role so the user can
set different values if needed.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1783223

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-02-07 14:59:14 -05:00
Dimitri Savineau 298ba0bf03 ceph-facts: set devices osd_auto_discovery on OSDs
We only need to set the devices fact with osd_auto_discovery on OSD
nodes.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-02-03 16:23:38 +01:00
Dimitri Savineau ed461544a7 ceph-facts: remove is_podman fact
This was used before the CentOS 8 requirement when using CentOS 7
atomic which has both docker and podman installed.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-02-03 10:11:03 -05:00
Mike Christie 77f3b5d51b iscsi: Fix crashes during rolling update
During a rolling update we will run the ceph iscsigw tasks that start
the daemons then run the configure_iscsi.yml tasks which can create
iscsi objects like targets, disks, clients, etc. The problem is that
once the daemons are started they will accept confifguration requests,
or may want to update the system themself. Those operations can then
conflict with the configure_iscsi.yml tasks that setup objects and we
can end up in crashes due to the kernel being in a unsupported state.

This could also happen during creation, but is less likely due to no
objects being setup yet, so there are no watchers or users accessing the
gws yet. The fix in this patch works for both update and initial setup.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1795806

Signed-off-by: Mike Christie <mchristi@redhat.com>
2020-01-31 11:15:36 -05:00
Dimitri Savineau 9b40a959b9 ceph-common: rhcs 4 repositories for rhel 7
RHCS 4 is available for both RHEL 7 and 8 so we should also enable the
cdn repositories for that distribution.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1796853

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-01-31 09:33:51 -05:00
Guillaume Abrioux e7bc079405 config: fix external client scenario
When no monitor group is present in the inventory, this task fails.
This affects only non-containerized deployments.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-01-31 12:02:15 +01:00
Dimitri Savineau fa8aa8c864 ceph-container-engine: lvm2 on OSD nodes only
Since de8f2a9 the lvm2 package installation has been moved from ceph-osd
role to ceph-container-engine role.
But the scope wasn't limited to the OSD nodes only.
This commit fixes this behaviour.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-01-29 14:41:34 +01:00
Dimitri Savineau 2f07b85131 ceph-defaults: remove rgw from ceph_conf_overrides
The [rgw] section in the ceph.conf file or via the ceph_conf_overrides
variable doesn't exist and has no effect.
To apply overrides to all radosgw instances we should use either the
[global] or [client] sections.
Overrides per radosgw instance should still use the
[client.rgw.{instance-name}] section.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1794552

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-01-29 14:11:14 +01:00
Guillaume Abrioux 8c3759f8ce dashboard: add quotes when passing password to the CLI
Otherwise, if the variables contains a '$' it will be interpreted as a BASH
variable.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-01-29 08:45:34 +01:00
Guillaume Abrioux 99328545de validate: fail if dashboard|grafana_admin_password aren't set
This commit adds a task to make sure user set a custom password for
`grafana_admin_password` and `dashboard_admin_password` variables.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1795509

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-01-29 08:45:34 +01:00
Dimitri Savineau 1fcafffdad ceph-facts: fix _container_exec_cmd fact value
When using different name between the inventory_hostname and the
ansible_hostname then the _container_exec_cmd fact will get a wrong
value based on the inventory_hostname instead of the ansible_hostname.
This happens when the ceph cluster is already running (update/upgrade).

Later the container exec commands will fail because the container name
is wrong.

We should always set the _container_exec_cmd based on the
ansible_hostname fact.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1795792

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-01-29 08:44:59 +01:00
Guillaume Abrioux 2f919f8971 fix calls to `container_exec_cmd` in ceph-osd role
We must call `container_exec_cmd` from the right monitor node otherwise
the value of the fact might mistmatch between the delegated node and the
node being played.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1794900

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-01-27 15:30:45 -05:00
Dmitriy Rabotyagov 0961ab8e60 Ensure that ganesha log directory exists
Some ganesha packages do not create ganesha log directories
while it's expected to be created while changing it's permissions.
Additionally it's no much sense in doing that as a separate task,
so directory is created as correct permissions are set with creation of
the rest required directories.

Signed-off-by: Dmitriy Rabotyagov <drabotyagov@vexxhost.com>
2020-01-24 11:10:08 -05:00
Guillaume Abrioux eb9112d8fb handler: read container_exec_cmd value from first mon
Given that we delegate to the first monitor, we must read the value of
`container_exec_cmd` from this node.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792320

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-01-23 11:35:57 -05:00
Vytenis Sabaliauskas ed1eaa1f38 ceph-facts: Fix for 'running_mon is undefined' error, so that
fact 'running_mon' is set once 'grep' successfully exits with 'rc == 0'

Signed-off-by: Vytenis Sabaliauskas <vytenis.sabaliauskas@protonmail.com>
2020-01-23 16:27:11 +01:00
Guillaume Abrioux 483adb5d79 common: add a default value for ceph_directories_mode
Since this variable makes it possible to customize the mode for ceph
directories, let's make it a bit more explicit by adding a default value
in ceph-defaults.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-01-22 09:35:35 +01:00
Dimitri Savineau c9e1fe3d92 ceph-osd: set container objectstore env variables
Because we need to manage legacy ceph-disk based OSD with ceph-volume
then we need a way to know the osd_objectstore in the container.
This was done like this previously with ceph-disk so we should also
do it with ceph-volume.
Note that this won't have any impact for ceph-volume lvm based OSD.

Rename docker_env_args fact to container_env_args and move the container
condition on the include_tasks call.
Remove OSD_DMCRYPT env variable from the ceph-osd template because it's
now included in the container_env_args variable.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792122

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-01-20 13:59:44 -05:00
Benoît Knecht 3842aa1a30 ceph-rgw: Fix customize pool size "when" condition
In 3c31b19ab3, I fixed the `customize pool
size` task by replacing `item.size` with `item.value.size`. However, I
missed the same issue in the `when` condition.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
2020-01-20 09:26:53 -05:00
Guillaume Abrioux 22865cde9c handler: fix call to container_exec_cmd in handler_osds
When unsetting the noup flag, we must call container_exec_cmd from the
delegated node (first mon member)
Also, adding a `run_once: true` because this task needs to be run only 1
time.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792320

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-01-20 09:25:56 -05:00
Dmitriy Rabotyagov 2478a7b948 Fix undefined running_mon
Since commit [1] running_mon introduced, it can be not defined
which results in fatal error [2]. This patch defines default value which
was used before patch [1]

Signed-off-by: Dmitriy Rabotyagov <drabotyagov@vexxhost.com>

[1] 8dcbcecd71
[2] https://zuul.opendev.org/t/openstack/build/c82a73aeabd64fd583694ed04b947731/log/job-output.txt#14011
2020-01-16 17:03:25 -05:00
Dmitriy Rabotyagov c81a213a6d Fix application for openstack_cephfs pools
RBD is invalid application for cephfs pools, so it was change to cephfs.

Signed-off-by: Dmitriy Rabotyagov <drabotyagov@vexxhost.com>
2020-01-16 16:27:53 -05:00
Dimitri Savineau 7f997e623a ceph-facts: move facts to defaults value
There's no need to define a variable via a fact if we can do it via a
default value. Using a fact could be interesseting to override the
default value on some condition.

- ceph_uid could be set to 167 by default because it's only different on
non containerized deployment on Debian/Ubuntu.
- rbd_client_directory_{owner,group,mode} could be set to ceph,ceph,0770
by default install of null as we are doing in the facts.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-01-16 13:57:11 -05:00
Dimitri Savineau e790b0851d group_vars: remove useless files
Delete legacy files that aren't used anymore.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-01-16 13:53:12 -05:00
Guillaume Abrioux 3e262e072b containers: use --cpus instead --cpu-quota
When using docker 1.13.1, the current condition:

```
{% if (container_binary == 'docker' and ceph_docker_version.split('.')[0] is version_compare('13', '>=')) or container_binary == 'podman' -%}
```

is wrong because it compares the first digit (1) whereas it should
compare the second one.
It means we always use `--cpu-quota` although documentation recommend
using `--cpus` when docker version is 1.13.1 or higher.

From the doc:
> --cpu-quota=<value>	Impose a CPU CFS quota on the container. The number of
> microseconds per --cpu-period that the container is limited to before
> throttled. As such acting as the effective ceiling.
> If you use Docker 1.13 or higher, use --cpus instead.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-01-16 13:51:43 -05:00
Guillaume Abrioux 8dcbcecd71 remove container_exec_cmd_mgr fact
Iterating over all monitors in order to delegate a `
{{ container_binary }}` fails when collocating mgrs with mons, because
ceph-facts reset `container_exec_cmd` to point to the first member of
the monitor group.

The idea is to force `container_exec_cmd` to be reset in ceph-mgr.
This commit also removes the `container_exec_cmd_mgr` fact.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1791282

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-01-15 14:03:49 -05:00