With podman version podman-3:4.2.0-4.module+el8.7.0+17064+3b31f55c and
later, when mgr fails to start if mon is already running.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2169767
Signed-off-by: Teoman ONAY <tonay@ibm.com>
Initially MONs and RGW binded /etc/pki/ca-trust/extracted using the :z flag
(introduced to solve an OSP TripleO issue on RHEL - #3638) but using
this flag prevents local services (like sssd) running on the host from accessing
the certificates/files in that folder.
Signed-off-by: Teoman ONAY <tonay@redhat.com>
In order to reduce need of module
internal maintenance and to join forces on plugin development,
it's proposed to switch to using upstream version of
config_template module.
As it's shipped as collection, it's installation for end-users
is trivial and aligns with general approach of shipping extra modules.
Signed-off-by: Dmitriy Rabotyagov <noonedeadpunk@ya.ru>
Update `After=` and `Wants=` parameters in container systemd units
and make them be aligned with the systemd units that come
from the packaging.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2027440
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The pools created by `ceph-rgw` (listed in `rgw_create_pools`) now support a
`ec_crush_device_class` option to specify which device class the EC pool should
use.
It default to being omitted, which means it will use OSDs from any device class
by default.
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
This adds ceph-*.target systemd unit files support for containerized
deployments.
This also fixes a regression introduced by PR #6719 (rgw and nfs systemd
units not getting purged)
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1962748
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
sufficient for the default value (512) of rgw thread pool size.
But if its value is increased near to the pids-limit value,
it does not leave place for the other processes to spawn and run within
the container and the container crashes.
pids-limit set to unlimited regardless of the container engine.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1987041
Signed-off-by: Teoman ONAY <tonay@redhat.com>
When the rgw_multisite_proto variable is set to https then we shoudn't use
the IP address in the zone endpoints list but the node FQDN to match the
TLS certificate CN.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1965504
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
All ceph daemons need to have the TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES
environment variable set to 128MB by default in container setup.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1970913
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
The pg_autoscale_mode for rgw pools introduced in 9f03a52 was wrong
and was missing a `value` keyword because `rgw_create_pools` is a
dict.
Fixes: #6516
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
This adds a `ExecStartPre=-/usr/bin/mkdir -p /var/log/ceph` in all
systemd service templates for all ceph daemon.
This is specific to RHCS after a Leapp upgrade is done. Indeed, the
`/var/log/ceph` seems to be removed after the upgrade.
In order to work around this issue let's ensure the directory is present
before trying to start the containers with podman.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1949489
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This adds the realm pull operation to the current radosgw_realm module.
The pull operation requires the url, access/secret key variables.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
It has come to our attention that using ansible_* vars that are
populated with INJECT_FACTS_AS_VARS=True is not very performant. In
order to be able to support setting that to off, we need to update the
references to use ansible_facts[<thing>] instead of ansible_<thing>.
Related: ansible#73654
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1935406
Signed-off-by: Alex Schultz <aschultz@redhat.com>
if `rgw_zonegroupmaster` is not defined at the rgw instance level in
`rgw_instances` it will fallback to a wrong variable (`rgw_zonemaster`).
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1925247
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
since `ceph-rgw` may be called from `ceph-handler` in some contexts we
should avoid rerunning it unnecessarily.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Add the possibility to deploy rgw multisite configuration with a mix of
secondary and primary zones on a same rgw node.
Before that, on a same node, all instances were either primary
zones *OR* secondary.
Now you can define a rgw instance like following:
```
rgw_instances:
- instance_name: 'rgw0'
rgw_zonemaster: false
rgw_zonesecondary: true
rgw_zonegroupmaster: false
rgw_realm: 'france'
rgw_zonegroup: 'zonegroup-france'
rgw_zone: paris-00
radosgw_address: "{{ _radosgw_address }}"
radosgw_frontend_port: 8080
rgw_zone_user: jacques.chirac
rgw_zone_user_display_name: "Jacques Chirac"
system_access_key: P9Eb6S8XNyo4dtZZUUMy
system_secret_key: qqHCUtfdNnpHq3PZRHW5un9l0bEBM812Uhow0XfB
endpoint: http://192.168.101.12:8080
```
Basically it's now possible to define `rgw_zonemaster`,
`rgw_zonesecondary` and `rgw_zonegroupmaster` at the intsance
level instead of the whole node level.
Also, this commit adds an option `deploy_secondary_zones` (default True)
which can be set to `False` in order to explicitly ask the playbook to
not deploy secondary zones in case where the corresponding endpoint are
not deployed yet.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1915478
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
When collocating rgw with either a mon, mgr or osd, switching from
single site to a multisite rgw setup failed because of the handlers
triggered between the ansible play of the collocated daemon and the play
of the rgw. Since the multisite changes are not yet applied the handlers
fail.
The idea here is to ensure we run the multisite configuration from the
ceph-handler role before the restart happens, this way it won't complain
because of non existing multisite configuration.
(Note: this is also valid when simply changing a multisite configuration)
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1888630
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
When creating a new pool, target_size_ratio was ignored by ansible module ceph_pool.py.
target_size_ratio is now used when pg_autoscale_mode is on.
Tests added to library tests.
This adds too the use in the role ceph-rgw.
Signed-off-by: Fabien Brachere <fabien.brachere@celeste.fr>
81233dd introduced a regression with the ceph_ec_profile module call in
the ceph-rgw role due the missing cluster module parameter.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
This adds ceph_crush_rule ansible module for replacing the command
module usage with the ceph osd crush rule commands.
This module can manage both erasure and replicated crush rules.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
As of podman 2.0.5, `--ignore` param conflicts with `--storage`.
```
Nov 30 13:53:10 magna089 podman[164443]: Error: --storage conflicts with --volumes, --all, --latest, --ignore and --cidfile
```
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This is a workaround to avoid error like following:
```
Error: error creating container storage: the container name "ceph-mgr-magna022" is already in use by "4a5f674e113f837a0cc561dea5d2cd55d16ca159a647b7794ab06c4c276ef701"
```
that doesn't seem to be 100% reproducible but it shows up after a
reboot. The only workaround we came up with at the moment is to run
`podman rm --storage <container>` before starting it.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1887716
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Instead of using ceph auth get command via the ansible command module
then we can use the ceph_key module and the info state.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Since we've changed to podman configuration using the detach mode and
systemd type to forking then the container logs aren't present in the
journald anymore.
The default conmon log driver is using k8s-file.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1890439
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Make sure the `site.yml.sample` playbook can be run in check mode by skipping
tasks that try to read the output of commands that have been skipped.
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
This adds radosgw_zone ansible module for replacing the command module
usage with the radosgw-admin zone command.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
This adds radosgw_zonegroup ansible module for replacing the command
module usage with the radosgw-admin zonegroup command.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
This adds radosgw_realm ansible module for replacing the command module
usage with the radosgw-admin realm command.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
This adds radosgw_user ansible module for replacing the command module
usage with the radosgw-admin user command.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
When rgw and osd are collocated, the current workflow prevents from
scaling out the radosgw_num_instances parameter when rerunning the
playbook in baremetal deployments.
When ceph-osd notifies handlers, it means rgw handlers are triggered
too. The issue with this is that they are triggered before the role
ceph-rgw is run.
In the case a scaleout operation is expected on `radosgw_num_instances`
it causes an issue because keyrings haven't been created yet so the new
instances won't start.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1881313
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This commit adds connection checks before realm pulls
Curls are performed on the endpoint being pulled from
the mons and the rgws
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1731158
Signed-off-by: Ali Maredia <amaredia@redhat.com>
Otherwise this will generate an ansible warning about the missing
filter.
[DEPRECATION WARNING]: evaluating xxx as a bare variable, this behaviour
will go away and you might need to add |bool to the expression in the
future.
Also see CONDITIONAL_BARE_VARS configuration toggle.. This feature will
be removed in version 2.12.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Since [1] we can use the ceph_pool module instead of using the command
module combined with ceph osd pool commands.
[1] bddcb439ce
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
We already support specifiying a custom crush rule during pool creation
in ceph-osd role but not in ceph-rgw role.
This patch adds the missing code to implement this feature.
Note this is only available for replicated pool not erasure. The rule
must also exist prior the pool creation.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1855439
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
In case of failure, the systemd ExecStop isn't executed so the container
isn't removed. After a reboot of a failed node, the container doesn't
start because the old container is still present in created state.
We should always try to remove the container in ExecStartPre for this
situation.
A normal reboot doesn't trigger this issue and this also doesn't affect
nodes running containers via docker.
This behaviour was introduced by d43769d.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1858865
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
When rgw and osd are collocated, the current workflow prevents from
scaling out the radosgw_num_instances parameter when rerunning the
playbook.
The environment file used in the rgw systemd template is rendered when
executing the `ceph-rgw` role but during a new run of the playbook (in
order to scale out rgw instances), handlers are triggered from `ceph-osd`
role which is run before `ceph-rgw`, therefore it tries to start the new
rgw daemon whereas its corresponding environment file hasn't been
rendered yet and fails like following:
```
ceph-radosgw@rgw.ceph4osd3.rgw1.service failed to run 'start-pre' task: No such file or directory
```
This commit moves the tasks generating this file in `ceph-config` role
so it is generated early.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1851906
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This changes the way we are running the podman containers via systemd.
They are now in dettached mode and Type/PIDFile set.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1834974
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
When using docker container engine then the systemd unit scripts only
use a dependency on the docker daemon via the After parameter.
But if docker is restarted on a live system then the ceph systemd units
should wait for the docker daemon to be fully restarted.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1846830
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
We were only adding the endpoints to the master zone but not to the
zonegroup.
This patch fixes the issue.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1839228
Signed-off-by: Ali Maredia <amaredia@redhat.com>
It is common to set templated pool names in `rgw_create_pools`, e.g.
```yaml
rgw_create_pools:
"{{ rgw_zone }}.rgw.buckets.index":
pg_num: 16
size: 3
type: replicated
```
This worked fine with Ansible 2.8, but broke in Ansible 2.9 due to a change in
the way `with_dict` works [1].
This commit replaces the use of `with_dict` with
```yaml
loop: "{{ rgw_create_pools | dict2items }}"
```
which works as intended and expands the template in the pool name.
[1]: https://docs.ansible.com/ansible/latest/porting_guides/porting_guide_2.9.html#loopsCloses#5348
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>