radosgw_civetweb_xxx variables are legacy variables and users should
have switched to radosgw_frontend_xxx variables instead.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
The ceph osd pool ls detail command is a subset of the ceph osd dump
command.
$ ceph osd dump --format json|wc -c
10117
$ ceph osd pool ls detail --format json|wc -c
4740
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Run the Ceph commands that only gather information (without making any changes
to the cluster) when running Ansible in check mode.
This allows the tasks that depend on the variables set by those tasks to
succeed in check mode.
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
The radosgw-sync-overview and rbd-details grafana dashboars were missing
from the list.
Closes: #6758
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
When using self-signed/untrusted CA certificates, alertmanager displays
an error in logs. With this commit this should make those messages
disappear.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1936299
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
When the rgw_multisite_proto variable is set to https then we shoudn't use
the IP address in the zone endpoints list but the node FQDN to match the
TLS certificate CN.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1965504
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
When using python 2 and the task with a loop is skipped then it generates
an error.
Unexpected templating type error occurred on
({{ (pool_list.stdout | from_json)['pools'] }}): expected string or buffer
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
The PG autoscaler can disrupt the PG checks so the idea here is to
disable it and re-enable it back after the restart is done.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Populating the ceph_mgr_modules list in the mgr_modules doesn't make sense
since that file is only executed if the list isn't empty or we're using the
dashboard.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
We already have config override variables for existing block (like
ganesha_ceph_export_overrides, ganesha_log_overrides, etc...) or a
global one (ganesha_conf_overrides) but redefining the NFS_CORE_PARAM
block in that variable will erase all previous values (currently only
Bind_Addr).
ganesha_core_param_overrides: |
Enable_UDP = false;
NFS_Port = 2050;
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1941775
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
When deploying dashboard with ssl certificates generated by
ceph-ansible, we enforce the CN to 'ceph-dashboard' which can makes
application such alertmanager complain like following:
`err="Post https://mgr0:8443/api/prometheus_receiver: x509: certificate is valid for ceph-dashboard, not mgr0" context_err="context deadline exceeded"`
The idea here is to add alternative names matching all mgr/mon instances
in the certificate so this error won't appear in logs.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1978869
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This introduces a new variable `dashboard_network` in order to support
deploying the dashboard on a different subnet.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1927574
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Instead of reusing the condition 'inventory_hostname in groups[osds]'
on each device facts tasks then we can move all the tasks into a
dedicated file and set the condition on the import_tasks statement.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
We currently don't check if the logical volume used in lvm_volumes list
for either bluestore data/db/wal or filestore data/journal exist.
We're only doing this on raw devices for batch scenario.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
When using dedicated devices for db/journal/wal objecstore with
ceph-volume lvm batch then we should also validate that those devices
exist and don't use a gpt partition table in addition of the devices
and lvm_volume.data variables.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Instead of using findmnt command to find the device associated to the
root mount point then we can use the ansible_mounts fact.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Instead of doing two parted calls we can check first if the device exist
and then test the partition table.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2888c08 introduced a regression as the check_devices tasks file was
only included based on the devices variable.
But that file also validate some devices from the lvm_volumes variable.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1906022
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
All ceph daemons need to have the TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES
environment variable set to 128MB by default in container setup.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1970913
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Starting RHCS 5, there's no ISO available anymore.
This removes all ISO variables and the ceph_repository_type variable.
Closes: #6626
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
It was requested for us to update our alerting definitions to include a
slow OSD Ops health check.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1951664
Signed-off-by: Boris Ranto <branto@redhat.com>
When running the switch-to-containers playbook with multisite enabled,
the fact "rgw_instances" is only set for the node being processed
(serial: 1), the consequence of that is that the set_fact of
'rgw_instances_all' can't iterate over all rgw node in order to look up
each 'rgw_instances_host'.
Adding a condition checking whether hostvars[item]["rgw_instances_host"]
is defined fixes this issue.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967926
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Enabling lvmetad in containerized deployments on el7 based OS might
cause issues.
This commit make it possible to disable this service if needed.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1955040
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
When calling the `ceph_key` module with `state: info`, if the ceph
command called fails, the actual error is hidden by the module which
makes it pretty difficult to troubleshoot.
The current code always states that if rc is not equal to 0 the keyring
doesn't exist.
`state: info` should always return the actual rc, stdout and stderr.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1964889
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
When monitors and rgw are collocated with multisite enabled, the
rolling_update playbook fails because during the workflow, we run some
radosgw-admin commands very early on the first mon even though this is
the monitor being upgraded, it means the container doesn't exist since
it was stopped.
This block is relevant only for scaling out rgw daemons or initial
deployment. In rolling_update workflow, it is not needed so let's skip
it.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1970232
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
When osd nodes are collocated in the clients group (HCI context for
instance), the current logic will exclude osd nodes since they are
present in the client group.
The best fix would be to exclude clients node only when they are not
member of another group but for now, as a workaround, we can enforce
the addition of osd nodes to fix this specific case.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1947695
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Since we need to revert 33bfb10, this is an alternative to initial approach.
We can avoid maintaining this file since it is present in container
image. The idea is to simply get it from the image container and write
it to the host.
Fixes: #6501
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The pg_autoscale_mode for rgw pools introduced in 9f03a52 was wrong
and was missing a `value` keyword because `rgw_create_pools` is a
dict.
Fixes: #6516
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
When dashboard_frontend_vip is provided, all the services should be
configured using the related VIP. A new VIP variable is added for
both prometheus and alertmanager: we're already able to properly
config the grafana vip using dashboard_frontend_vip variable.
This change adds the same variable for both prometheus and
alertmanager.
Signed-off-by: Francesco Pantano <fpantano@redhat.com>
We should consider bumping ansible version for future releases, so let's
start testing against ansible 2.10
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The `set_fact rgw_ports` task was failing due to a templating error, because
`hostvars[item].rgw_instances` is a list, but it was treated as if it was a
dictionary.
Another issue was the fact that the `unique` filter only applied to the list
being appended to `rgw_ports` instead of the entire list, which means it was
possible to have duplicate items.
Lastly, `rgw_ports` would have been a list of integers, but the `seport` module
expects a list of strings.
This commit fixes all of the issues above, allowing the `ceph-rgw-loadbalancer`
role to work on systems with SELinux enabled.
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
This adds a `ExecStartPre=-/usr/bin/mkdir -p /var/log/ceph` in all
systemd service templates for all ceph daemon.
This is specific to RHCS after a Leapp upgrade is done. Indeed, the
`/var/log/ceph` seems to be removed after the upgrade.
In order to work around this issue let's ensure the directory is present
before trying to start the containers with podman.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1949489
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
`configure_mirroring.yml` is called right after the daemon is started.
Sometimes, it can happen the first task in `configure_mirroring.yml` is
run while the daemon isn't yet ready, adding a retries/until on that
task should help to avoid causing the playbook to fail.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1944996
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
when running docker-to-podman playbook, there's no need to call
`ceph-config` and `ceph-rgw` from the role `ceph-handler`.
It can even have side effects when coming from a baremetal cluster that
was previously migrated using the switch-to-containers playbook. Indeed
it might complain about missing .target systemd unit since they are
removed during that migration.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1944999
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This moves some task from the `ceph-nfs` role in `ceph-common` since
some of them are needed in `ceph-rgwloadbalancer` role.
This avoids duplicated tasks.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This adds all rgw ports to the http_port_t selinux type so it
allows haproxy to connect to those ports in order to avoid AVC.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1923890
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
haproxy gets an AVC when configured to connect to port 8081
This commit adds a snippet regarding haproxy in a selinux environment
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1923890
Signed-off-by: Kaleb S KEITHLEY <kkeithle@redhat.com>
Pass the password variable via stdin for the registry login
authentication.
This allows to remove the no_log statement and see the task output
without displaying the password value.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Currently NFS Ganesha (ceph-nfs) consumes /etc/idmapd.conf, which
controls mapping of user/owner identities under NFSv4+. With
containerized service deployment, this file is an immutable part of the
container image and cannot be modified.
Here we provide group variables, and a taskk and templates for the
ceph-nfs role, to set the path of the idmap configuration file and
to make the most common adjustment to the contents of that file --
namely to set the 'Domain'. We default the path to /etc/ganesha/idmap.conf
so that we will not conflict with /etc/idmapd.conf on the controller nodes
where ganesha runs. NFSv4 clients, as used for example by the Cinder NFS
driver, consume /etc/idmapd.conf and may require different settings than
what is wanted for NFS Ganesha. Additionally, because we already bind
/etc/ganesha from the host into the ceph-nfs container, the file NFS
Ganesha consumes will no longer be an immutable part of the container.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1925646
Signed-off-by: Tom Barron tpb@dyncloud.net
Co-Authored-by: Guillaume Abrioux <gabrioux@redhat.com>
This commit adds the parameter `--storage.tsdb.retention.time` to the
prometheus systemd unit template.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1928000
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>