2888c08 introduced a regression as the check_devices tasks file was
only included based on the devices variable.
But that file also validate some devices from the lvm_volumes variable.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1906022
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
All ceph daemons need to have the TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES
environment variable set to 128MB by default in container setup.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1970913
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Starting RHCS 5, there's no ISO available anymore.
This removes all ISO variables and the ceph_repository_type variable.
Closes: #6626
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
It was requested for us to update our alerting definitions to include a
slow OSD Ops health check.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1951664
Signed-off-by: Boris Ranto <branto@redhat.com>
When running the switch-to-containers playbook with multisite enabled,
the fact "rgw_instances" is only set for the node being processed
(serial: 1), the consequence of that is that the set_fact of
'rgw_instances_all' can't iterate over all rgw node in order to look up
each 'rgw_instances_host'.
Adding a condition checking whether hostvars[item]["rgw_instances_host"]
is defined fixes this issue.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967926
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Enabling lvmetad in containerized deployments on el7 based OS might
cause issues.
This commit make it possible to disable this service if needed.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1955040
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
When calling the `ceph_key` module with `state: info`, if the ceph
command called fails, the actual error is hidden by the module which
makes it pretty difficult to troubleshoot.
The current code always states that if rc is not equal to 0 the keyring
doesn't exist.
`state: info` should always return the actual rc, stdout and stderr.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1964889
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
When monitors and rgw are collocated with multisite enabled, the
rolling_update playbook fails because during the workflow, we run some
radosgw-admin commands very early on the first mon even though this is
the monitor being upgraded, it means the container doesn't exist since
it was stopped.
This block is relevant only for scaling out rgw daemons or initial
deployment. In rolling_update workflow, it is not needed so let's skip
it.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1970232
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
When osd nodes are collocated in the clients group (HCI context for
instance), the current logic will exclude osd nodes since they are
present in the client group.
The best fix would be to exclude clients node only when they are not
member of another group but for now, as a workaround, we can enforce
the addition of osd nodes to fix this specific case.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1947695
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Since we need to revert 33bfb10, this is an alternative to initial approach.
We can avoid maintaining this file since it is present in container
image. The idea is to simply get it from the image container and write
it to the host.
Fixes: #6501
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The pg_autoscale_mode for rgw pools introduced in 9f03a52 was wrong
and was missing a `value` keyword because `rgw_create_pools` is a
dict.
Fixes: #6516
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
When dashboard_frontend_vip is provided, all the services should be
configured using the related VIP. A new VIP variable is added for
both prometheus and alertmanager: we're already able to properly
config the grafana vip using dashboard_frontend_vip variable.
This change adds the same variable for both prometheus and
alertmanager.
Signed-off-by: Francesco Pantano <fpantano@redhat.com>
We should consider bumping ansible version for future releases, so let's
start testing against ansible 2.10
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The `set_fact rgw_ports` task was failing due to a templating error, because
`hostvars[item].rgw_instances` is a list, but it was treated as if it was a
dictionary.
Another issue was the fact that the `unique` filter only applied to the list
being appended to `rgw_ports` instead of the entire list, which means it was
possible to have duplicate items.
Lastly, `rgw_ports` would have been a list of integers, but the `seport` module
expects a list of strings.
This commit fixes all of the issues above, allowing the `ceph-rgw-loadbalancer`
role to work on systems with SELinux enabled.
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
This adds a `ExecStartPre=-/usr/bin/mkdir -p /var/log/ceph` in all
systemd service templates for all ceph daemon.
This is specific to RHCS after a Leapp upgrade is done. Indeed, the
`/var/log/ceph` seems to be removed after the upgrade.
In order to work around this issue let's ensure the directory is present
before trying to start the containers with podman.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1949489
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
`configure_mirroring.yml` is called right after the daemon is started.
Sometimes, it can happen the first task in `configure_mirroring.yml` is
run while the daemon isn't yet ready, adding a retries/until on that
task should help to avoid causing the playbook to fail.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1944996
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
when running docker-to-podman playbook, there's no need to call
`ceph-config` and `ceph-rgw` from the role `ceph-handler`.
It can even have side effects when coming from a baremetal cluster that
was previously migrated using the switch-to-containers playbook. Indeed
it might complain about missing .target systemd unit since they are
removed during that migration.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1944999
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This moves some task from the `ceph-nfs` role in `ceph-common` since
some of them are needed in `ceph-rgwloadbalancer` role.
This avoids duplicated tasks.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This adds all rgw ports to the http_port_t selinux type so it
allows haproxy to connect to those ports in order to avoid AVC.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1923890
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
haproxy gets an AVC when configured to connect to port 8081
This commit adds a snippet regarding haproxy in a selinux environment
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1923890
Signed-off-by: Kaleb S KEITHLEY <kkeithle@redhat.com>
Pass the password variable via stdin for the registry login
authentication.
This allows to remove the no_log statement and see the task output
without displaying the password value.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Currently NFS Ganesha (ceph-nfs) consumes /etc/idmapd.conf, which
controls mapping of user/owner identities under NFSv4+. With
containerized service deployment, this file is an immutable part of the
container image and cannot be modified.
Here we provide group variables, and a taskk and templates for the
ceph-nfs role, to set the path of the idmap configuration file and
to make the most common adjustment to the contents of that file --
namely to set the 'Domain'. We default the path to /etc/ganesha/idmap.conf
so that we will not conflict with /etc/idmapd.conf on the controller nodes
where ganesha runs. NFSv4 clients, as used for example by the Cinder NFS
driver, consume /etc/idmapd.conf and may require different settings than
what is wanted for NFS Ganesha. Additionally, because we already bind
/etc/ganesha from the host into the ceph-nfs container, the file NFS
Ganesha consumes will no longer be an immutable part of the container.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1925646
Signed-off-by: Tom Barron tpb@dyncloud.net
Co-Authored-by: Guillaume Abrioux <gabrioux@redhat.com>
This commit adds the parameter `--storage.tsdb.retention.time` to the
prometheus systemd unit template.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1928000
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
In commits 39649f0 and bf8cdad we switch from using the shaman /repos endpoint
to the /search endpoint for using the architecture filter.
In fact that filter is also available with the /repos endpoint, which requires
less ansible tasks.
This also adds back a condition remove in 5801171 on the ceph-iscsi
repository and that repository doesn't need to filter on the architecture
because the ceph-iscsi project is noarch.
Both ceph-iscsi and tcmu-runner shaman URLs were using the ceph_dev_branch
and ceph_dev_sha1 variables which doesn't make sense. Those variables are
only useful for the ceph core repository.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
This adds the possibility to deploy the dashboard with igw nodes using
a dedicated subnet.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1926170
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
rbd-mirroring is not configured as adding peer is getting skipped.
Peer addition should not get skipped if its not added already
Closes - https://bugzilla.redhat.com/show_bug.cgi?id=1942444
Signed-off-by: VasishtaShastry <vipin.indiasmg@gmail.com>
As a continuation of a7f2fa73e6, this
change switches fact injection to off by default in the provided
ansible.cfg.
Signed-off-by: Alex Schultz <aschultz@redhat.com>
due to recent changes in shaman, we must fetch the right repo by
filtering on the desired architecture.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
We can end up with an arm only repo unless we are specific about the
architecture we require. Brings the deb code in line with the rpm
equivalent.
Signed-off-by: Brad Hubbard <bhubbard@redhat.com>
when the group `_filtered_clients` is built, the order can change from
the original `clients` group which can cause issues since we run
`ceph-container-engine` on the first client only. It means later in the
playbook we can make call to the container CLI on a node where the
container engine wasn't installed.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
this updates the `ceph_repository_community` check in `ceph-validate`
with the right ceph release expected.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This commit updates the default version of nfs-ganesha to V3.5 which is the
latest version available upstream.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
When collocating OSDs with other daemon, `num_osds` is incorrectly calculated
because `ceph-config` is called multiple times.
Indeed, the following code:
```
num_osds: "{{ lvm_list.stdout | default('{}') | from_json | length | int + num_osds | default(0) | int }}"
```
makes `num_osds` be incremented each time `ceph-config` is called.
We have to reset it in order to get the correct number of expected OSDs.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This adds the realm pull operation to the current radosgw_realm module.
The pull operation requires the url, access/secret key variables.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Skip the `get initial keyring when it already exists` task when both commands
whose `stdout` output it requires have been skipped (e.g. when running in check
mode).
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
Sometimes it's useful to be able to skip the OSD creation step when
running ceph-ansible (cf #1777). The lvm scenario has a prepare_osd
tag on the relevant play. This commit adds the same tag to the
lvm-batch scenario.
Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
It has come to our attention that using ansible_* vars that are
populated with INJECT_FACTS_AS_VARS=True is not very performant. In
order to be able to support setting that to off, we need to update the
references to use ansible_facts[<thing>] instead of ansible_<thing>.
Related: ansible#73654
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1935406
Signed-off-by: Alex Schultz <aschultz@redhat.com>