Commit Graph

2759 Commits (guits-no_log_fetch_initial_keys)

Author SHA1 Message Date
Dimitri Savineau 9252b75173 container: remove container_binding_name variable
The container_binding_name package was only mandatory when we were
using the docker modules (docker_image and docker_container) but since
we manage both docker and podman containers without using the dedicated
module then we can remove it.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-10-14 10:32:31 +02:00
Dimitri Savineau 4eaa65c362 ceph-osd: don't start the OSD services twice
Using the + operation on two lists doesn't filter out the duplicate
keys.
Currently each OSDs is started (via systemd) twice.
Instead we could use the union filter.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-10-14 10:30:39 +02:00
Guillaume Abrioux 46d4d97da9 handler: refact check_socket_non_container
the `stat --printf=%n` returns something like following:

```
ok: [osd0] => changed=false
  cmd: |-
    stat --printf=%n /var/run/ceph/ceph-osd*.asok
  delta: '0:00:00.009388'
  end: '2020-10-06 06:18:28.109500'
  failed_when_result: false
  rc: 0
  start: '2020-10-06 06:18:28.100112'
  stderr: ''
  stderr_lines: <omitted>
  stdout: /var/run/ceph/ceph-osd.2.asok/var/run/ceph/ceph-osd.5.asok
  stdout_lines: <omitted>
```

it makes the next task "check if the ceph osd socket is in-use" grep
like this:

```
ok: [osd0] => changed=false
  cmd:
  - grep
  - -q
  - /var/run/ceph/ceph-osd.2.asok/var/run/ceph/ceph-osd.5.asok
  - /proc/net/unix
```

which will obviously fail because this path never exists. It makes the
OSD handler broken.

Let's use `find` module instead.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-10-08 17:37:50 -04:00
Benoît Knecht 54ba38e35e Fix Ansible check mode for site.yml.sample playbook
Make sure the `site.yml.sample` playbook can be run in check mode by skipping
tasks that try to read the output of commands that have been skipped.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
2020-10-07 00:29:44 +02:00
Dimitri Savineau 1281e8bcc8 library: add radosgw_zone module
This adds radosgw_zone ansible module for replacing the command module
usage with the radosgw-admin zone command.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-10-06 10:07:58 +02:00
Dimitri Savineau 65dbe0782e library: add radosgw_zonegroup module
This adds radosgw_zonegroup ansible module for replacing the command
module usage with the radosgw-admin zonegroup command.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-10-06 10:07:58 +02:00
Dimitri Savineau d171f4068d library: add radosgw_realm module
This adds radosgw_realm ansible module for replacing the command module
usage with the radosgw-admin realm command.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-10-06 10:07:58 +02:00
Dimitri Savineau 235c7e27cc library: add radosgw_user module
This adds radosgw_user ansible module for replacing the command module
usage with the radosgw-admin user command.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-10-06 10:07:58 +02:00
Dimitri Savineau bd611a785b library: add ceph_fs module
This adds the ceph_fs ansible module for replacing the command module
usage with the ceph fs command.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-10-06 08:02:58 +02:00
Dimitri Savineau c960362639 ceph_key: remove backward compatibility
It's time to remove this backward compatibility. Users had enough time
to convert their openstack_keys and key values.
We now fail in ceph-validate if the caps key isn't set.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-10-06 07:59:38 +02:00
Guillaume Abrioux a802fa2810 rgw: fix multi instances scaleout in baremetal
When rgw and osd are collocated, the current workflow prevents from
scaling out the radosgw_num_instances parameter when rerunning the
playbook in baremetal deployments.

When ceph-osd notifies handlers, it means rgw handlers are triggered
too. The issue with this is that they are triggered before the role
ceph-rgw is run.
In the case a scaleout operation is expected on `radosgw_num_instances`
it causes an issue because keyrings haven't been created yet so the new
instances won't start.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1881313

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-10-06 07:38:44 +02:00
Guillaume Abrioux ff95fa9c32 ceph-osd: refact `docker_exec_start_osd`
This commit drops nested jinja construction in this set_fact task.
It also rename it to `container_exec_start_osd`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-10-04 21:18:10 +02:00
Guillaume Abrioux c101cb3931 defaults: change defaults value
this commit changes defaults value in default pool definitions.

there's no need to define `pg_num`, `pgp_num`, `size` and `min_size`,
`ceph_pool` module will use the current default if needed.

This also drops the 3 following `set_fact` in `ceph-facts`:

- osd_pool_default_pg_num,
- osd_pool_default_pgp_num,
- osd_pool_default_size_num

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-10-02 07:42:40 +02:00
Guillaume Abrioux 29fc115f4a ceph_pool: refact module
remove complexity about current defaults in running cluster

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-10-02 07:42:40 +02:00
Seena Fallah ff9f4d138f ceph-facts: add get default crush rule from running monitor
In case of deploying new monitor node to an existing cluster,
osd_pool_default_crush_rule should be taken from running monitor because
ceph-osd role won't be run and the new monitor will have different
osd_pool_default_crush_role from other monitors.

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
2020-09-29 09:27:58 -04:00
Guillaume Abrioux eefe11d90c defaults: change default grafana-server name
This change default value of grafana-server group name.
Adding some tasks in ceph-defaults in order to keep backward
compatibility.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-09-29 07:42:26 +02:00
Ali Maredia 902575369c rgw multisite: check connection for realm endpoint
This commit adds connection checks before realm pulls
Curls are performed on the endpoint being pulled from
the mons and the rgws

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1731158

Signed-off-by: Ali Maredia <amaredia@redhat.com>
2020-09-29 07:37:21 +02:00
Dimitri Savineau e11453c6f5 Remove unused centos docker tasks
The `enable extras on centos` task just doesn't work when using the
variable ceph_docker_enable_centos_extra_repo to true.

fatal: [xxx]; FAILED! => {"changed": false, "msg": "Parameter
'baseurl', 'metalink' or 'mirrorlist' is required."}

The CentOS extras repository is enabled by default so it's pretty
safe to remove this task and the associated variable.

This also removes the ceph_docker_on_openstack variable as it's a
leftover and it is unused.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-09-29 07:35:10 +02:00
Dimitri Savineau 733596582d ceph-handler: set handler on xxx_stat result
In non containerized deployment we check if the service is running
via the socket file presence.
This is done via the xxx_socket_stat variable that check the file
socket in the /var/run/ceph/ directory.
In some scenarios, we could have the socket file still present in
that directory but not used by any process.
That's why we have the xxx_stat variable which clean those leftovers.

The problem here is that we're set the variable for the handlers status
(like handler_mon_status) based on xxx_socket_stat instead of xxx_stat.
That means we will trigger the handlers if there's an old socket file
present on the system without any process associated.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1866834

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-09-29 07:32:10 +02:00
Dimitri Savineau 501b8e0fd3 ceph-iscsi: create pool once from monitor
af9f6684 introduced a regression on the ceph iscsi pool creation
because it was delegated to the first monitor node before that change.
This patch restores the initial worflow.
When the iscsi node doesn't have the admin keyring then the pool
creation fails.
This commit also ensures that the pool creation is only executed once
when having multiple iscsi nodes.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-09-29 07:31:24 +02:00
Seena Fallah 69f7e35382 ceph-facts: check for mon socket in its own host
delegate to its own host after checking mon socket to findout if mon socket is in-use or not.

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
2020-09-29 00:21:12 +02:00
Dimitri Savineau 50104650e7 add missing boolean filter
Otherwise this will generate an ansible warning about the missing
filter.

[DEPRECATION WARNING]: evaluating xxx as a bare variable, this behaviour
will go away and you might need to add |bool to the expression in the
future.
Also see CONDITIONAL_BARE_VARS configuration toggle.. This feature will
be removed in version 2.12.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-09-28 20:45:01 +02:00
Guillaume Abrioux bf7b044c9a Revert "ceph-rgw: remove ceph_pool state and default value"
This reverts commit ba3512a8fc.
2020-09-28 16:56:33 +02:00
Dimitri Savineau 1db4dc807c ceph-mds: remove unused block condition
Since af9f6684 the cephfs pool(s) creation don't use the fs_pools_created
variable anymore because the ceph_pool module is idempotent.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-09-28 10:22:35 +02:00
Tyler Bishop ee4b8804ae facts: support device aliases for (dedicated|bluestore_wal)_devices
Just likve `devices`, this commit adds the support for linux device aliases for
`dedicated_devices` and `bluestore_wal_devices`.

Signed-off-by: Tyler Bishop <tbishop@liquidweb.com>
2020-09-25 19:59:45 +02:00
Dimitri Savineau ba3512a8fc ceph-rgw: remove ceph_pool state and default value
Since the state is now optional and default values are handled in the
ceph_pool module itself.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-09-25 19:18:07 +02:00
Dimitri Savineau 4808523403 rolling_update: remove msgr2 migration
In Pacific we're are sure that users already achieved the msgr2 because
that was introduced in Nautilus.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-09-25 19:14:42 +02:00
Dimitri Savineau 62bd41f0d4 ceph-config: remove ceph_release from ceph.conf.j2
We don't use ceph_release variable in the ceph.conf jinja template.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-09-25 19:13:57 +02:00
Dmitriy Rabotyagov 297532ca41 Remove libjemalloc1 installation task
libjemalloc1 package is not required neither for ganesha dependency nor
for the package build process. So this task can be simply dropped.

Signed-off-by: Dmitriy Rabotyagov <noonedeadpunk@ya.ru>
2020-09-24 13:56:16 +02:00
Dimitri Savineau 6dcfdf17d4 container: quote registry password
When using a quote in the registry password then we have the following
error:

The error was: ValueError: No closing quotation

To fix this we need to use the quote filter.

Close: https://bugzilla.redhat.com/show_bug.cgi?id=1880252

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-09-18 11:14:00 -04:00
Guillaume Abrioux ff19c1d851 facts: fix 'set_fact rgw_instances with rgw multisite'
the current condition doesn't work, as soon as the first iteration is
done the condition makes next iterations skip since `rgw_instances` got
set with the first iteration.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1859872

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-09-18 10:14:34 -04:00
Dimitri Savineau 85643edfe3 ceph-infra: include iscsi nodes for logrotate
The iscsi nodes aren't included in the logrotate condition.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-09-17 20:34:56 +02:00
Guillaume Abrioux f576c02ff7 infra: support log rotation for tcmu-runner
This commit adds the log rotation support for tcmu-runner.

ceph-container related PR: ceph/ceph-container#1726

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1873915

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-09-16 20:23:22 -04:00
Dimitri Savineau e54b924eaf ceph-prometheus: update pool stat counter
Since [1] The bytes_used pool counter in prometheus has been renamed
to stored.

Closes: #5781

[1] https://github.com/ceph/ceph/commit/71fe9149

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-09-16 09:50:42 -04:00
Dimitri Savineau bda3581294 container: add optional http(s) proxy option
When using a http(s) proxy with either docker or podman we can rely on
the HTTP_PROXY, HTTPS_PROXY and NO_PROXY environment variables.
But with ansible, even if those variables are defined in a source file
then they aren't loaded during the container pull/login tasks.
This implements the http(s) proxy support with docker/podman.
Both implementations are different:
  1/ docker doesn't rely en the environment variables with the CLI.
Thos are needed by the docker daemon via systemd.
  2/ podman uses the environment variables so we need to add them to
the login/pull tasks.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1876692

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-09-16 06:52:26 +02:00
Dimitri Savineau abb4023d76 ceph_key: set state as optional
Most ansible module using a state parameter default to the present
value (when available) instead of using it as a mandatory option.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-09-14 14:12:21 -04:00
Guillaume Abrioux f0fc59258a Revert "ceph_pool: use default size/min_size and rule_name"
This reverts commit 142934057f.

This is already handled in the ceph_pool module itself

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-09-14 14:12:21 -04:00
Dimitri Savineau 2c4af70abd dashboard: use run_once at block level
Instead of using run_once: true on each tasks in a block section, we
can use the run_once statement at the block level.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-09-14 13:47:36 +02:00
Dimitri Savineau b105549ed8 node-exporter: exclude client nodes
We don't need to install node-exporter on client node because there's
no ceph services running on them.
This also makes sure we use the group name variables in the prometheus
service template instead of hardcoding the values.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-09-14 13:46:51 +02:00
Dimitri Savineau 3a05aeb6cb ceph_pool: set state as optional
Most ansible module using a state parameter default to the present
value (when available) instead of using it as a mandatory option.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-09-11 10:26:15 +02:00
Dimitri Savineau ee6f0547ba library: add ceph_dashboard_user module
This adds the ceph_dashboard_user ansible module for replacing the
command module usage with the ceph dashboard ac-user-xxx command.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-09-11 10:16:08 +02:00
Dimitri Savineau 142934057f ceph_pool: use default size/min_size and rule_name
Before [1] we were using default value for
  - size
  - min_size
  - rule_name

when the key wasn't present in the pool dict.

The commit [1] changed this by defaulting to omit.

This patch restores the original workflow by using facts:
  - osd_pool_default_size
  - osd_pool_default_min_size
  - ceph_osd_pool_default_crush_rule_name

[1] af9f6684f2

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-09-11 10:15:28 +02:00
Dimitri Savineau f63022dfec ceph-facts: only get fsid when monitor are present
When running the rolling_update playbook with an inventory without
monitor nodes defined (like external scenario) then we can't retrieve
the cluster fsid from the running monitor.
In this scenario we have to pass this information manually (group_vars
or host_vars).

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1877426

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-09-10 13:19:44 -04:00
Dimitri Savineau 8dacbce68f ceph-rgw: use ceph_pool module
Since [1] we can use the ceph_pool module instead of using the command
module combined with ceph osd pool commands.

[1] bddcb439ce

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-09-10 15:16:58 +02:00
Guillaume Abrioux 657e6c8c3b tests: clean legacy
clean some legacies since quay.ceph.io migration

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-09-09 14:42:41 +02:00
Niko Smeds a951c1a3f0 Enable HAProxy backend checks for Ceph RGW
Add the `check` option to server definitions to enable basic HAProxy health
checks for Ceph RADOS gateway backends.

Currently traffic will be forwarded to unhealthly `radosgw.service` servers.
These changes resolve the issue.

Signed-off-by: Niko Smeds nikosmeds@gmail.com
2020-08-27 10:57:46 -04:00
Guillaume Abrioux 54d3e9650f dashboard: refact admin user creation task
this commit splits this task in order to avoid using a `shell` module.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-08-21 09:22:11 +02:00
Guillaume Abrioux f0fe193d8e facts: refact and optimize memory consumption
there's no need to run this task on all nodes.
This uses too much memory for nothing.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1856981

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-08-20 11:16:26 -04:00
George Shuklin 73d4bb6bd6 Make 'disable ssl for dashboard task' idempotent.
This should reduce number of 'changed' tasks during convergence test.

Signed-off-by: George Shuklin <george.shuklin@gmail.com>
2020-08-20 16:48:32 +02:00
Rafał Wądołowski 55cd6e83e4 Comment out ceph_custom_key
Since there is a check if ceph_custom_key is defined, there is no reason
to define it by default.

Signed-off-by: Rafał Wądołowski <rwadolowski@cloudferro.com>
2020-08-20 13:36:24 +02:00
Guillaume Abrioux 899d317196 iscsigw: add retry/until
In order to avoid failures that could be fixed by simply
retrying.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-08-20 13:25:05 +02:00
John Fulton 95dee6f1ca Set default permission for prometheus config files
Regardless of the outcome of Ansible 2.9.12 issue 71200
we can set a default permission for these files.

Closes: https://github.com/ceph/ceph-ansible/issues/5677

Signed-off-by: John Fulton <fulton@redhat.com>
2020-08-18 15:49:31 -04:00
Guillaume Abrioux 8ed11ea3ee infra: only install logrotate on right nodes
For intsance, there is no need to install logrotate on clients nodes.

This also ensure logrotate is installed only for containerized
deployments since the packaging has an explicit dependency to logrotate

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-08-18 10:56:09 -04:00
Dimitri Savineau cb8f0237e1 ceph-rgw: allow specifying crush rule on pool
We already support specifiying a custom crush rule during pool creation
in ceph-osd role but not in ceph-rgw role.
This patch adds the missing code to implement this feature.
Note this is only available for replicated pool not erasure. The rule
must also exist prior the pool creation.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1855439

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-08-17 22:59:06 +02:00
Ali Maredia 5c1f4b1a1e rgw: allow rgws to be concurrently with or without multisite
Allows rgws in a ceph cluster to be run with
multisite and without multisite at the same time.

Signed-off-by: Ali Maredia <amaredia@redhat.com>
2020-08-17 11:11:11 +02:00
Guillaume Abrioux e1cb385740 infra: add missing tag
This commit adds the missing `with_pkg` tag on the logrotate
installation task.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-08-13 10:08:18 -04:00
Guillaume Abrioux f1aa6cea21 infra: add log rotation support (containers)
This commit adds the log rotation support via logrotate in containerized
deployments.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1848388

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-08-11 15:03:20 +02:00
Guillaume Abrioux 448cc280b7 common: don't enable debug log on ceph-volume calls by default
ceph-volume can generate large logs at some point.

debug logs by definition should be enabled only when debugging.

Let's make it customizable with a variable which is set to `False` by
default.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-08-11 15:03:20 +02:00
raul 110eaf5f9f rgw: support 1+ rgw instance in `radosgw_frontend_port`
Change the radosgw_frontend_port to take in account more than 1 RGW instance,
in it's original form `radosgw_frontend_port: radosgw_frontend_port | int`,
it configured the 8080 port to all instances, with the following modification
`radosgw_frontend_port: radosgw_frontend_port | int + item|int` we increase in
1 the port count.

Co-authored-by: Daniel Parkes <dparkes@redhat.com>
Signed-off-by: raul <rmahique@redhat.com>
2020-08-11 14:05:43 +02:00
Guillaume Abrioux dd4b5b0328 nfs: do not copy rgw keyring when `nfs_obj_gw` is true
This keyring shouldn't be copied when `nfs_obj_gw` is `True` if the
cluster doesn't contain a rgw node, which can be the case given we are
using `nfs_obj_gw` instead of `nfs_file_gw` (cephfs vs. object), the
deployment will fail trying to copy a key that doesn't exist.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-08-07 13:21:17 +02:00
Guillaume Abrioux 0a581a6e60 config: only add related rgw section
there's no need to add each rgw section on all rgw nodes.
With this commit, only related rgw section are rendered.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-08-03 14:47:27 +02:00
Dimitri Savineau 0d0f1e71df dashboard: allow remote TLS cert/key copy
When using TLS on the ceph dashboard or grafana services, we can provide
the TLS certificate and key.
Those files should be present on the ansible controller and they will be
copyied to the right node(s).
In some situation, the TLS certificate and key could be already present
on the target node and not on the ansible controller.
For this scenario, we just need to copy the files locally (on each remote
host).

This patch adds the dashboard_tls_external variable (with default to
false) to allow users to achieve this scenario when configuring this
variable to true.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1860815

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-08-03 13:39:47 +02:00
Dimitri Savineau 4e84b4beed ceph-facts: remove mds_name fact
The mds_name fact always gets the ansible_hostname value so we don't
need to have a dedicated fact for this and use the ansible_hostname fact
instead.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-07-23 17:02:43 +02:00
Dimitri Savineau cbe79428e6 ceph-handler: remove iscsigws restart scripts
The iscsigws restart scripts for tcmu-runner and rbd-target-{api,gw}
services only call the systemctl restart command.
We don't really need to copy a shell script to do it when we can use
the ansible service module instead.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-07-23 17:02:12 +02:00
Dimitri Savineau 47b7c00287 podman: always remove container on start
In case of failure, the systemd ExecStop isn't executed so the container
isn't removed. After a reboot of a failed node, the container doesn't
start because the old container is still present in created state.
We should always try to remove the container in ExecStartPre for this
situation.
A normal reboot doesn't trigger this issue and this also doesn't affect
nodes running containers via docker.
This behaviour was introduced by d43769d.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1858865

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-07-23 17:00:38 +02:00
Dimitri Savineau 18e3c7a0a2 ceph-handler: add missing condition on ceph-crash
The ceph-crash tasks present in the ceph-handler role don't need to be
executed on all nodes.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-07-21 23:26:11 +02:00
Guillaume Abrioux 39bb279a53 crash: rm container in ExecPreStart even with docker
We should ensure the container is removed in `ExecPreStart` even when
`{{ container_binary }}` is docker.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-07-21 23:23:18 +02:00
Guillaume Abrioux 9d2f2108e1 ceph-crash: introduce new role ceph-crash
This commit introduces a new role `ceph-crash` in order to deploy
everything needed for the ceph-crash daemon.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-07-21 20:22:12 +02:00
Guillaume Abrioux d490968fc8 defaults: remove legacy
These variables aren't consummed anywhere else than in ceph-nfs role so
there is no need to have them in `ceph-defaults`'s defaults

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-07-21 09:39:15 +02:00
Guillaume Abrioux f8a951f50c facts: fix broken facts when using --limit
This commit fixes these tasks when --limit is used.

It makes sure the fact is set on right nodes even when the playbook is
run with `--limit`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-07-20 10:56:10 -04:00
Dimitri Savineau 2b8ebf1457 ceph-dashboard: copy TLS cert/key on monitor
The ceph-dashboard role is executed on the mgr nodes so the TLS cert/key
files are copied to those nodes.
But we are running importing the cert/key files into the ceph
configuration on the monitor.

Closes: #5557

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-07-20 16:16:35 +02:00
Guillaume Abrioux 86edae724f rgw: set container memory limit to 4g
This commit changes the container memory limit for rgw daemons.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1707488

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-07-09 15:31:10 +02:00
Guillaume Abrioux bcc673f66c facts: refact `ceph_uid` fact
There's no need to set this fact with a `set_fact`
We can achieve this in `ceph-defaults`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-07-09 13:37:29 +02:00
Dimitri Savineau 1438ca0120 ceph-nfs: change ganesha devel source
The download.nfs-ganesha.org source for nfs-ganesha on CentOS isn't
available anymore.
Let's switch back to shaman since we have builds available now.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-07-06 16:59:25 +02:00
Dimitri Savineau 93754bd70c ceph-defaults: update nfs-ganesha to 3.3
nfs-ganesha 3.3 is the latest 3.x release available for octopus so we
should update to this version.

https://download.ceph.com/nfs-ganesha/rpm-V3.3-stable/octopus

This will also match the version used in RHCS 5.

Ceph container already uses that version too.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-07-03 06:36:49 +02:00
Dimitri Savineau 1361e84a4e radosgw: remove INST_PORT environment variable
This variable isn't consumed by the container so we can remove it.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-07-02 16:52:29 +02:00
Guillaume Abrioux 7dd68b9ac1 rgw: fix multi instances scaleout
When rgw and osd are collocated, the current workflow prevents from
scaling out the radosgw_num_instances parameter when rerunning the
playbook.

The environment file used in the rgw systemd template is rendered when
executing the `ceph-rgw` role but during a new run of the playbook (in
order to scale out rgw instances), handlers are triggered from `ceph-osd`
role which is run before `ceph-rgw`, therefore it tries to start the new
rgw daemon whereas its corresponding environment file hasn't been
rendered yet and fails like following:

```
ceph-radosgw@rgw.ceph4osd3.rgw1.service failed to run 'start-pre' task: No such file or directory
```

This commit moves the tasks generating this file in `ceph-config` role
so it is generated early.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1851906

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-07-02 10:39:50 -04:00
Dimitri Savineau 3592ba1d61 ceph-common: remove copr and sepia repositories
All EL8 dependencies are now present on EPEL 8 so we don't need the
additional repositories that were only a temporary solution.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-06-30 08:35:19 +02:00
George Shuklin 3e87f53875 Add container settings for Ubuntu 20 (the same as Ubuntu 18)
Signed-off-by: George Shuklin <george.shuklin@gmail.com>
2020-06-29 12:18:58 -04:00
Dimitri Savineau 03cd75845f dashboard: configure mgr backend before restart
We need to set the mgr dashboard server ip address before restarting the
dashboard module otherwise we can try to bind the dashboard module on an
already used address.
We already do this configuration for the dashboard port value and ssl
setup so we should do the same for server address too.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1851455

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-06-29 14:59:01 +02:00
Jonathan Rosser 42884e8175 Ansible tests are not filters
The use of "| success" and "| changed" are not valid syntax for modern
ansible releases.

Signed-off-by: Jonathan Rosser <jonathan.rosser@rd.bbc.co.uk>
2020-06-26 12:26:25 -04:00
Jonathan Rosser 92288c11c5 Install python routes package as a dependancy rather than directly
This is now a dependancy of ceph-mgr so will be installed automatically
and does not need a specific task.

This change means that ceph-mgr installs correctly on Ubuntu Focal where
the python3-routes package is necessary.

Signed-off-by: Jonathan Rosser <jonathan.rosser@rd.bbc.co.uk>
2020-06-26 12:26:25 -04:00
Guillaume Abrioux b7539eb275 dashboard: copy self-signed generated crt to mons
This commit makes the playbook copying self-signed generated certificate
to monitors.
When mons and mgrs are deployed on dedicated nodes the playbook will
fail when trying to import certificate and key files since they are
generated on mgrs whereas we try to import them from a monitor.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1846995

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-06-23 09:37:21 -04:00
Dimitri Savineau d43769dc2a podman: Add Type and PIDFile value to unit files
This changes the way we are running the podman containers via systemd.
They are now in dettached mode and Type/PIDFile set.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1834974

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-06-23 09:37:50 +02:00
Dimitri Savineau bd22f1d1ec docker: Add Requires on docker service
When using docker container engine then the systemd unit scripts only
use a dependency on the docker daemon via the After parameter.
But if docker is restarted on a live system then the ceph systemd units
should wait for the docker daemon to be fully restarted.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1846830

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-06-22 23:08:50 +02:00
Dimitri Savineau 829990e60d ceph-osd: remove ceph-osd-run.sh script
Since we only have one scenario since nautilus then we can just move
the container start command from ceph-osd-run.sh to the systemd unit
service.
As a result, the ceph-osd-run.sh.j2 template and the
ceph_osd_docker_run_script_path variable are removed.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-06-18 17:51:13 +02:00
Dimitri Savineau 0f8a61a3ae debian/uca: remove the handler notification
The "update apt cache" in the ceph-handler role was never called and the
handler trigger after adding the uca repository doesn't exist at all.
Instead of using a handler for that we can just set the update_cache
parameter to true like the other apt_repository tasks.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-06-17 10:14:03 +02:00
Guillaume Abrioux b91d60d384 switch_to_containers: don't set noup flag
We shouldn't set this flag when running switch_to_containers playbook.
Otherwise the playbook fails waiting for pgs to be clean.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1843569

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-06-17 01:32:18 +02:00
Dimitri Savineau cdb30bd125 container: inspect Id field instead of RepoDigests
When a container image managed by podman isn't tag anymore then the
RepoDigests field when inspecting the image doesn't return any value.
This is different from docker workflow and it breaks the ceph-ansible
container upgrade when collocated multiple services and using a non
fix container tag (like latest or 4).

$ podman images
REPOSITORY              TAG      IMAGE ID       CREATED        SIZE
docker.io/ceph/daemon   latest   680c9c0d38c3   8 days ago     957 MB
<none>                  <none>   011ee108bfc9   2 months ago   1.01 GB

$ podman inspect 680c9c0d38c3 | jq .[0].RepoDigests[0]
"docker.io/ceph/daemon@sha256:20cf789235e23ddaf38e109b391d1496bb88011239d16862c4c106d0e05fea9e"
$ podman inspect 011ee108bfc9 | jq .[0].RepoDigests[0]
null

Because this field returns "null" then the ansible task trying to
determine this value is failing

-----------------------------
fatal: [foo]: FAILED! =>
  msg: |-
    The task includes an option with an undefined variable. The error
    was: None has no element 0

    The error appears to be in
    'roles/ceph-container-common/tasks/fetch_image.yml': line 137,
    column 3, but may be elsewhere in the file depending on the exact
    syntax problem.

    The offending line appears to be:

    - name: set_fact ceph_osd_image_repodigest_before_pulling
      ^ here
-----------------------------

We don't have this behaviour with docker.

$ docker images
REPOSITORY              TAG      IMAGE ID       CREATED        SIZE
docker.io/ceph/daemon   latest   680c9c0d38c3   8 days ago     928 MB
docker.io/ceph/daemon   <none>   011ee108bfc9   2 months ago   986 MB

$ docker inspect 680c9c0d38c3 | jq .[0].RepoDigests[0]
"docker.io/ceph/daemon@sha256:45e6f28bb67c81b826acb64fad5c0da1cac3dffb41a88992fe4ca2be79575fa6"
$ docker inspect 011ee108bfc9 | jq .[0].RepoDigests[0]
"docker.io/ceph/daemon@sha256:b393a73309d72e43ca7d65cd3519036007947671e373eb59aa75a46185c52231"

Instead we should just get the Id field.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1844496

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-06-16 17:06:25 +02:00
Ali Maredia 0175c205fa rgw multisite: add master zone endpoints to zonegroup
We were only adding the endpoints to the master zone but not to the
zonegroup.
This patch fixes the issue.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1839228

Signed-off-by: Ali Maredia <amaredia@redhat.com>
2020-06-09 09:50:18 -04:00
Ansible Deployment User 3f906e0c26 rgwloadbalancer undefined index variable
The vrrp_instances variable is using a loop with index but the index_var
wasn't defined.
As a result, the fact task was failing on this undefined index variable.

The task includes an option with an undefined variable. The error was:
'index' is undefined

Closes: #5395

Signed-off-by: Florian Faltermeier <florian.faltermeier@uibk.ac.at>
2020-05-26 10:03:25 -04:00
Dimitri Savineau 44e1ebaaff ceph-nfs: add stable noarch repository
When using the stable nfs ganesha repository, we need have both arch
and noarch repositories enabled.
Currently the noarch repository is missing which cause the non
containerized deployment to fail.

Closes: #5375

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-05-16 07:34:08 +02:00
Guillaume Abrioux af9f6684f2 common: introduce ceph_pool module calls
This commits calls the `ceph_pool` module for creating ceph pools
everywhere it's needed in the playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-05-16 07:31:57 +02:00
Guillaume Abrioux 8c7a48832c common: fix target_size_ratio task enablement
The condition on this task is wrong, we have to check whether
`target_size_ratio` is set in the pool definition instead.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-05-15 20:57:32 +02:00
Guillaume Abrioux e5e81843e9 facts: always set ceph_run_cmd and ceph_admin_command
always set these facts on monitor nodes whatever we run with `--limit`.
Otherwise, playbook will fail when using `--limit` on nodes where these
facts are used on a delegated task to monitor.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-05-15 10:53:15 +02:00
Dimitri Savineau 252e78b4e4 docker2podman: manage dashboard nodes
The dashboard nodes (alertmanager, grafana, node-exporter, and prometheus)
were not manage during the docker to podman migration.

This adds the systemd container template of those services to a dedicated
file (systemd.yml) in order to include it in the docker2podman playbook.

This also adds the dashboard container images pull from docker to podman.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1829389

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-05-13 12:02:00 +02:00
Dimitri Savineau b20519efd0 dashboard: allow disabling grafana api ssl verify
When using an untrusted TLS certificate (like self-signed) on grafana
then the grafana dashboards update subcommand will fail.
One solution could be to trust the TLS certificate.
The other one is to disable the TLS verification on the grafana API.

Closes: #5324

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-05-13 11:56:57 +02:00
Dimitri Savineau 222fe4abd8 ceph-nfs: bind mount ganesha log directory
The current ganesha log directory is only present in the container
and not bind mount on the host.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-05-13 11:55:38 +02:00
Benoît Knecht 444b46ea24 ceph-validate: Expand templates in rgw_create_pools
Same fix as `ceph-rgw` for `rgw_create_pools` pool names that contain Jinja
templates.

See #5348 for details.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
2020-05-11 11:51:27 -04:00
Benoît Knecht d2b7670c7d ceph-rgw: Make sure pool name templates are expanded
It is common to set templated pool names in `rgw_create_pools`, e.g.

```yaml
rgw_create_pools:
  "{{ rgw_zone }}.rgw.buckets.index":
    pg_num: 16
    size: 3
    type: replicated
```

This worked fine with Ansible 2.8, but broke in Ansible 2.9 due to a change in
the way `with_dict` works [1].

This commit replaces the use of `with_dict` with

```yaml
loop: "{{ rgw_create_pools | dict2items }}"
```

which works as intended and expands the template in the pool name.

[1]: https://docs.ansible.com/ansible/latest/porting_guides/porting_guide_2.9.html#loops

Closes #5348

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
2020-05-11 11:51:27 -04:00