Commit Graph

311 Commits (9125bba48dd39f10d566ac91be668ed1e5aa6678)

Author SHA1 Message Date
Guillaume Abrioux 09ef465f62 containers: introduce target systemd unit
This adds ceph-*.target systemd unit files support for containerized
deployments.
This also fixes a regression introduced by PR #6719 (rgw and nfs systemd
units not getting purged)

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1962748

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-08-18 11:08:50 -04:00
Guillaume Abrioux 1db8fa8989 roles: remove leftover from pr #4319
pr #4319 introduced some uesless `become: true` on systemd tasks.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-08-18 09:10:15 +02:00
Guillaume Abrioux 7511195738 common: do not log keyring secret
let's not display any keyring secret by default in ansible log.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1980744

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-08-11 17:33:34 +02:00
Teoman ONAY 9b5d97adb9 podman pids.max default value is 2048, docker's one is 4096 which are
sufficient for the default value (512) of rgw thread pool size.
But if its value is increased near to the pids-limit value,
it does not leave place for the other processes to spawn and run within
the container and the container crashes.

pids-limit set to unlimited regardless of the container engine.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1987041

Signed-off-by: Teoman ONAY <tonay@redhat.com>
2021-08-04 10:20:25 +02:00
Dimitri Savineau ad05a08160 multisite: use node fqdn for endpoints when https
When the rgw_multisite_proto variable is set to https then we shoudn't use
the IP address in the zone endpoints list but the node FQDN to match the
TLS certificate CN.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1965504

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2021-07-22 21:22:12 +02:00
Dimitri Savineau 9758e3c513 container: set tcmalloc value by default
All ceph daemons need to have the TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES
environment variable set to 128MB by default in container setup.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1970913

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2021-06-30 20:30:55 +02:00
Neelaksh Singh d18a9860cd Sensitive key data now hidden in output log
Fixes: #6529

Signed-off-by: Neelaksh Singh <neelaksh48@gmail.com>
2021-06-08 20:46:37 +02:00
Dimitri Savineau a670982a38 ceph-rgw: fix pg_autoscale_mode for pool
The pg_autoscale_mode for rgw pools introduced in 9f03a52 was wrong
and was missing a `value` keyword because `rgw_create_pools` is a
dict.

Fixes: #6516

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2021-05-06 10:15:13 +02:00
Guillaume Abrioux bab403b603 container/systemd: ensure /var/log/ceph exists
This adds a `ExecStartPre=-/usr/bin/mkdir -p /var/log/ceph` in all
systemd service templates for all ceph daemon.
This is specific to RHCS after a Leapp upgrade is done. Indeed, the
`/var/log/ceph` seems to be removed after the upgrade.
In order to work around this issue let's ensure the directory is present
before trying to start the containers with podman.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1949489

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-04-14 16:37:33 +02:00
Guillaume Abrioux 9f03a527ba rgw: supports pg_autoscale_mode option for pool creation
Support enabling/disabling the pg autoscaler for rgw pools.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-03-31 13:10:28 +02:00
Dimitri Savineau 5b86ac8801 library: add realm pull to radosgw_realm module
This adds the realm pull operation to the current radosgw_realm module.
The pull operation requires the url, access/secret key variables.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2021-03-12 18:21:37 +01:00
Alex Schultz a7f2fa73e6 Use ansible_facts
It has come to our attention that using ansible_* vars that are
populated with INJECT_FACTS_AS_VARS=True is not very performant.  In
order to be able to support setting that to off, we need to update the
references to use ansible_facts[<thing>] instead of ansible_<thing>.

Related: ansible#73654
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1935406
Signed-off-by: Alex Schultz <aschultz@redhat.com>
2021-03-08 20:54:02 +01:00
Guillaume Abrioux 931b87e830 rgw: fix a typo in multisite
if `rgw_zonegroupmaster` is not defined at the rgw instance level in
`rgw_instances` it will fallback to a wrong variable (`rgw_zonemaster`).

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1925247

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-02-10 07:01:21 +01:00
Guillaume Abrioux 8617081664 rgw: avoid useless call to ceph-rgw
since `ceph-rgw` may be called from `ceph-handler` in some contexts we
should avoid rerunning it unnecessarily.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-01-28 14:37:14 -05:00
Guillaume Abrioux 71a5e666e3 rgw: multisite refact
Add the possibility to deploy rgw multisite configuration with a mix of
secondary and primary zones on a same rgw node.
Before that, on a same node, all instances were either primary
zones *OR* secondary.

Now you can define a rgw instance like following:

```
rgw_instances:
  - instance_name: 'rgw0'
    rgw_zonemaster: false
    rgw_zonesecondary: true
    rgw_zonegroupmaster: false
    rgw_realm: 'france'
    rgw_zonegroup: 'zonegroup-france'
    rgw_zone: paris-00
    radosgw_address: "{{ _radosgw_address }}"
    radosgw_frontend_port: 8080
    rgw_zone_user: jacques.chirac
    rgw_zone_user_display_name: "Jacques Chirac"
    system_access_key: P9Eb6S8XNyo4dtZZUUMy
    system_secret_key: qqHCUtfdNnpHq3PZRHW5un9l0bEBM812Uhow0XfB
    endpoint: http://192.168.101.12:8080
```

Basically it's now possible to define `rgw_zonemaster`,
`rgw_zonesecondary` and `rgw_zonegroupmaster` at the intsance
level instead of the whole node level.

Also, this commit adds an option `deploy_secondary_zones` (default True)
which can be set to `False` in order to explicitly ask the playbook to
not deploy secondary zones in case where the corresponding endpoint are
not deployed yet.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1915478

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-01-27 15:46:43 +01:00
Guillaume Abrioux 513c8cfe55 rgw: support switching from single-site to multisite
When collocating rgw with either a mon, mgr or osd, switching from
single site to a multisite rgw setup failed because of the handlers
triggered between the ansible play of the collocated daemon and the play
of the rgw. Since the multisite changes are not yet applied the handlers
fail.
The idea here is to ensure we run the multisite configuration from the
ceph-handler role before the restart happens, this way it won't complain
because of non existing multisite configuration.

(Note: this is also valid when simply changing a multisite configuration)

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1888630

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-01-06 09:58:45 -05:00
Fabien Brachere 4026ba9da1 library: add missing `target_size_ratio` parameter support in ceph_pool module
When creating a new pool, target_size_ratio was ignored by ansible module ceph_pool.py.
target_size_ratio is now used when pg_autoscale_mode is on.
Tests added to library tests.
This adds too the use in the role ceph-rgw.

Signed-off-by: Fabien Brachere <fabien.brachere@celeste.fr>
2020-12-16 15:10:27 +01:00
Dimitri Savineau d82249a8c0 ceph-rgw: add cluster parameter on ceph_ec_profile
81233dd introduced a regression with the ceph_ec_profile module call in
the ceph-rgw role due the missing cluster module parameter.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-12-12 06:54:46 +01:00
Dimitri Savineau 2e417ab901 library: add ceph_crush_rule module
This adds ceph_crush_rule ansible module for replacing the command
module usage with the ceph osd crush rule commands.
This module can manage both erasure and replicated crush rules.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-12-01 17:52:41 +01:00
Guillaume Abrioux c68b124ba8 container: remove `--ignore` from `podman rm` command
As of podman 2.0.5, `--ignore` param conflicts with `--storage`.
```
Nov 30 13:53:10 magna089 podman[164443]: Error: --storage conflicts with --volumes, --all, --latest, --ignore and --cidfile
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-11-30 12:24:11 -05:00
Guillaume Abrioux 81233dd963 rgw: call `ceph_ec_profile` when needed
Let's replace `command` tasks with `ceph_ec_profile` calls

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-11-24 10:38:28 +01:00
Guillaume Abrioux 5ba7824c55 container: force rm --storage on ExecStartPre
This is a workaround to avoid error like following:
```
Error: error creating container storage: the container name "ceph-mgr-magna022" is already in use by "4a5f674e113f837a0cc561dea5d2cd55d16ca159a647b7794ab06c4c276ef701"
```

that doesn't seem to be 100% reproducible but it shows up after a
reboot. The only workaround we came up with at the moment is to run
`podman rm --storage <container>` before starting it.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1887716

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-11-16 10:38:40 -05:00
Dimitri Savineau 59ecddcdd0 keyring: use ceph_key module for auth get command
Instead of using ceph auth get command via the ansible command module
then we can use the ceph_key module and the info state.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-11-02 17:17:29 +01:00
Dimitri Savineau 16cd183b9c podman: force log driver to journald
Since we've changed to podman configuration using the detach mode and
systemd type to forking then the container logs aren't present in the
journald anymore.
The default conmon log driver is using k8s-file.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1890439

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-11-02 15:49:27 +01:00
Benoît Knecht 54ba38e35e Fix Ansible check mode for site.yml.sample playbook
Make sure the `site.yml.sample` playbook can be run in check mode by skipping
tasks that try to read the output of commands that have been skipped.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
2020-10-07 00:29:44 +02:00
Dimitri Savineau 1281e8bcc8 library: add radosgw_zone module
This adds radosgw_zone ansible module for replacing the command module
usage with the radosgw-admin zone command.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-10-06 10:07:58 +02:00
Dimitri Savineau 65dbe0782e library: add radosgw_zonegroup module
This adds radosgw_zonegroup ansible module for replacing the command
module usage with the radosgw-admin zonegroup command.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-10-06 10:07:58 +02:00
Dimitri Savineau d171f4068d library: add radosgw_realm module
This adds radosgw_realm ansible module for replacing the command module
usage with the radosgw-admin realm command.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-10-06 10:07:58 +02:00
Dimitri Savineau 235c7e27cc library: add radosgw_user module
This adds radosgw_user ansible module for replacing the command module
usage with the radosgw-admin user command.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-10-06 10:07:58 +02:00
Guillaume Abrioux a802fa2810 rgw: fix multi instances scaleout in baremetal
When rgw and osd are collocated, the current workflow prevents from
scaling out the radosgw_num_instances parameter when rerunning the
playbook in baremetal deployments.

When ceph-osd notifies handlers, it means rgw handlers are triggered
too. The issue with this is that they are triggered before the role
ceph-rgw is run.
In the case a scaleout operation is expected on `radosgw_num_instances`
it causes an issue because keyrings haven't been created yet so the new
instances won't start.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1881313

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-10-06 07:38:44 +02:00
Guillaume Abrioux 29fc115f4a ceph_pool: refact module
remove complexity about current defaults in running cluster

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-10-02 07:42:40 +02:00
Ali Maredia 902575369c rgw multisite: check connection for realm endpoint
This commit adds connection checks before realm pulls
Curls are performed on the endpoint being pulled from
the mons and the rgws

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1731158

Signed-off-by: Ali Maredia <amaredia@redhat.com>
2020-09-29 07:37:21 +02:00
Dimitri Savineau 50104650e7 add missing boolean filter
Otherwise this will generate an ansible warning about the missing
filter.

[DEPRECATION WARNING]: evaluating xxx as a bare variable, this behaviour
will go away and you might need to add |bool to the expression in the
future.
Also see CONDITIONAL_BARE_VARS configuration toggle.. This feature will
be removed in version 2.12.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-09-28 20:45:01 +02:00
Guillaume Abrioux bf7b044c9a Revert "ceph-rgw: remove ceph_pool state and default value"
This reverts commit ba3512a8fc.
2020-09-28 16:56:33 +02:00
Dimitri Savineau ba3512a8fc ceph-rgw: remove ceph_pool state and default value
Since the state is now optional and default values are handled in the
ceph_pool module itself.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-09-25 19:18:07 +02:00
Dimitri Savineau 8dacbce68f ceph-rgw: use ceph_pool module
Since [1] we can use the ceph_pool module instead of using the command
module combined with ceph osd pool commands.

[1] bddcb439ce

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-09-10 15:16:58 +02:00
Dimitri Savineau cb8f0237e1 ceph-rgw: allow specifying crush rule on pool
We already support specifiying a custom crush rule during pool creation
in ceph-osd role but not in ceph-rgw role.
This patch adds the missing code to implement this feature.
Note this is only available for replicated pool not erasure. The rule
must also exist prior the pool creation.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1855439

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-08-17 22:59:06 +02:00
Dimitri Savineau 47b7c00287 podman: always remove container on start
In case of failure, the systemd ExecStop isn't executed so the container
isn't removed. After a reboot of a failed node, the container doesn't
start because the old container is still present in created state.
We should always try to remove the container in ExecStartPre for this
situation.
A normal reboot doesn't trigger this issue and this also doesn't affect
nodes running containers via docker.
This behaviour was introduced by d43769d.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1858865

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-07-23 17:00:38 +02:00
Guillaume Abrioux 86edae724f rgw: set container memory limit to 4g
This commit changes the container memory limit for rgw daemons.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1707488

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-07-09 15:31:10 +02:00
Dimitri Savineau 1361e84a4e radosgw: remove INST_PORT environment variable
This variable isn't consumed by the container so we can remove it.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-07-02 16:52:29 +02:00
Guillaume Abrioux 7dd68b9ac1 rgw: fix multi instances scaleout
When rgw and osd are collocated, the current workflow prevents from
scaling out the radosgw_num_instances parameter when rerunning the
playbook.

The environment file used in the rgw systemd template is rendered when
executing the `ceph-rgw` role but during a new run of the playbook (in
order to scale out rgw instances), handlers are triggered from `ceph-osd`
role which is run before `ceph-rgw`, therefore it tries to start the new
rgw daemon whereas its corresponding environment file hasn't been
rendered yet and fails like following:

```
ceph-radosgw@rgw.ceph4osd3.rgw1.service failed to run 'start-pre' task: No such file or directory
```

This commit moves the tasks generating this file in `ceph-config` role
so it is generated early.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1851906

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-07-02 10:39:50 -04:00
Dimitri Savineau d43769dc2a podman: Add Type and PIDFile value to unit files
This changes the way we are running the podman containers via systemd.
They are now in dettached mode and Type/PIDFile set.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1834974

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-06-23 09:37:50 +02:00
Dimitri Savineau bd22f1d1ec docker: Add Requires on docker service
When using docker container engine then the systemd unit scripts only
use a dependency on the docker daemon via the After parameter.
But if docker is restarted on a live system then the ceph systemd units
should wait for the docker daemon to be fully restarted.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1846830

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-06-22 23:08:50 +02:00
Ali Maredia 0175c205fa rgw multisite: add master zone endpoints to zonegroup
We were only adding the endpoints to the master zone but not to the
zonegroup.
This patch fixes the issue.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1839228

Signed-off-by: Ali Maredia <amaredia@redhat.com>
2020-06-09 09:50:18 -04:00
Benoît Knecht d2b7670c7d ceph-rgw: Make sure pool name templates are expanded
It is common to set templated pool names in `rgw_create_pools`, e.g.

```yaml
rgw_create_pools:
  "{{ rgw_zone }}.rgw.buckets.index":
    pg_num: 16
    size: 3
    type: replicated
```

This worked fine with Ansible 2.8, but broke in Ansible 2.9 due to a change in
the way `with_dict` works [1].

This commit replaces the use of `with_dict` with

```yaml
loop: "{{ rgw_create_pools | dict2items }}"
```

which works as intended and expands the template in the pool name.

[1]: https://docs.ansible.com/ansible/latest/porting_guides/porting_guide_2.9.html#loops

Closes #5348

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
2020-05-11 11:51:27 -04:00
Dimitri Savineau 34e6e8e06c ceph-rgw: use match instead of equalto from jinja2
The '==' jinja2 operator (or 'equalto') has been introduced in jinja2
2.8.
On EL7, jinja2 version is 2.7 so the operator isn't present creating
templating error like:

The error was: TemplateRuntimeError: no test named '=='

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1747206

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-05-06 14:23:10 -04:00
Guillaume Abrioux 60a2e28189 rgw: add multi-instances support when deploying multisite
This commit adds the multi-instances when deploying rgw multisite

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-12 16:44:48 -04:00
Dimitri Savineau e62532de46 update osd pool set size command
Since [1] we can't use osd pool without replicas (size: 1) by default.
We now need to set the mon_allow_pool_size_one flag to true in the ceph
configuration and add the --yes-i-really-mean-it flag to the osd pool
set size cli.

[1] https://github.com/ceph/ceph/commit/21508bd

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-11 11:25:42 +01:00
Guillaume Abrioux b3bbd6bb77 rgw: fix a typo in create_realm_zonegroup_zone_lists
This commit fixes a typo.

`s/realms/secondary_realms`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-10 14:13:30 +01:00
Guillaume Abrioux 7a8a719e75 rgw: add retry/until on pools tasks
Sometimes, these task can timeout for some reason.
Adding these retries can help to avoid unexcepted failures.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-06 08:55:13 -05:00