When running in check mode with one or more Ceph daemons that need to be
restarted, the `tmpdirpath.path` variable that several handlers rely on is
undefined, leading to fatal errors.
This commit ensures the tasks that require `tmpdirpath.path` are skipped when
it's undefined.
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
Update `After=` and `Wants=` parameters in container systemd units
and make them be aligned with the systemd units that come
from the packaging.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2027440
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The pools created by `ceph-rgw` (listed in `rgw_create_pools`) now support a
`ec_crush_device_class` option to specify which device class the EC pool should
use.
It default to being omitted, which means it will use OSDs from any device class
by default.
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
since a variable encrypted with vault is no longer a string but a
encrypted object we can't use the filter | length, we have to convert it
to a string before.
Fixes: #6991
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Change needed in order to support --limit on mon nodes.
Otherwise, a call to `hostvars[groups[mon_group_name][0]]['_current_monitor_address']`
throws an error:
```
"The error was: 'ansible.vars.hostvars.HostVarsVars object' has no attribute '_current_monitor_address'"
```
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2014304#c28
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
ceph_stable_release is a legacy from the time where a single branch of ceph-ansible supported more than one release of ceph
Signed-off-by: Seena Fallah <seenafallah@gmail.com>
Unconfigured dashboard features can lead to empty tabs in the dashboard
containing no meaningful content. Allow users to disable dashboard features
they know will not be used.
A list of features to be disabled allows the user to define a streamlined
dashboard as standard across deployments. Defaults to disabling no features,
ensuring that users are sure they do not need the dashboard feature before
disabling it.
Signed-off-by: Alex Lambert <lamberta@microsoft.com>
After pacific release, ceph-volume has its own package.
ceph-ansible has to explicitly install it on osd nodes.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Because the ceph container images are now only pushed to the quay.io
registry then this updates the default registry value.
The docker.io registry can still be used but doesn't receive updated
container images.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
We don't pull the monitoring container images (alertmanager, prometheus,
node-exporter and grafana) in a dedicated task like we're doing for the
ceph container image.
This means that the container image pull is done during the start of the
systemd service.
By doing this, pulling the image behind a proxy isn't working with podman.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1995574
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
It restricts access to the iSCSI API.
It can be left empty if the API isn't going to be access from outside the
gateway node
Even though this seems to be a limited use case, it's better to leave it
empty by default than having a meaningless default value.
We could make this variable mandatory but that would be a breaking
change. Let's just add a logic in the template in order to set this
variable in the configuration file only if it was specified by users.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1994930
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Dimitri Savineau <dsavinea@redhat.com>
This adds ceph-*.target systemd unit files support for containerized
deployments.
This also fixes a regression introduced by PR #6719 (rgw and nfs systemd
units not getting purged)
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1962748
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Since [1] multiple ceph dashboard commands have been removed and this is
breaking the current ceph-ansible dashboard with RGW automation.
This removes the following dashboard rgw commands:
- ceph dashboard set-rgw-api-access-key
- ceph dashboard set-rgw-api-secret-key
- ceph dashboard set-rgw-api-host
- ceph dashboard set-rgw-api-port
- ceph dashboard set-rgw-api-scheme
Which are replaced by `ceph dashboard set-rgw-credentials`
The RGW user creation task is also removed.
Finally moving the delegate_to statement from the rgw tasks at the block
level.
[1] https://github.com/ceph/ceph/pull/42252
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
With OpenSSL version prior 1.1.1 (like CentOS 7 with 1.0.2k), the -addext
doesn't exist.
As a solution, this uses the default openssl.cnf configuration file as a
template and add the subjectAltName in the v3_ca section. This temp openssl
configuration file is removed after the TLS certificate creation.
This patch also move the run_once statement at the block level.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1978869
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
the current way the variable is built results in:
```
2021-08-03 04:18:23,020 - ceph.ceph - INFO - ok: [ceph-sangadi-4x-indpt6-node1-installer] => changed=false
ansible_facts:
subj_alt_names: |-
subjectAltName=ceph-sangadi-4x-indpt6-node1-installer/subjectAltName=10.0.210.223/subjectAltName=ceph-sangadi-4x-indpt6-node1-installersubjectAltName=ceph-sangadi-4x-indpt6-node2/subjectAltName=10.0.210.252/subjectAltName=ceph-sangadi-4x-indpt6-node2/
```
which is incorrect.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1978869
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
sufficient for the default value (512) of rgw thread pool size.
But if its value is increased near to the pids-limit value,
it does not leave place for the other processes to spawn and run within
the container and the container crashes.
pids-limit set to unlimited regardless of the container engine.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1987041
Signed-off-by: Teoman ONAY <tonay@redhat.com>
radosgw_civetweb_xxx variables are legacy variables and users should
have switched to radosgw_frontend_xxx variables instead.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
The ceph osd pool ls detail command is a subset of the ceph osd dump
command.
$ ceph osd dump --format json|wc -c
10117
$ ceph osd pool ls detail --format json|wc -c
4740
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Run the Ceph commands that only gather information (without making any changes
to the cluster) when running Ansible in check mode.
This allows the tasks that depend on the variables set by those tasks to
succeed in check mode.
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
The radosgw-sync-overview and rbd-details grafana dashboars were missing
from the list.
Closes: #6758
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
When using self-signed/untrusted CA certificates, alertmanager displays
an error in logs. With this commit this should make those messages
disappear.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1936299
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
When the rgw_multisite_proto variable is set to https then we shoudn't use
the IP address in the zone endpoints list but the node FQDN to match the
TLS certificate CN.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1965504
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
When using python 2 and the task with a loop is skipped then it generates
an error.
Unexpected templating type error occurred on
({{ (pool_list.stdout | from_json)['pools'] }}): expected string or buffer
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
The PG autoscaler can disrupt the PG checks so the idea here is to
disable it and re-enable it back after the restart is done.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Populating the ceph_mgr_modules list in the mgr_modules doesn't make sense
since that file is only executed if the list isn't empty or we're using the
dashboard.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
We already have config override variables for existing block (like
ganesha_ceph_export_overrides, ganesha_log_overrides, etc...) or a
global one (ganesha_conf_overrides) but redefining the NFS_CORE_PARAM
block in that variable will erase all previous values (currently only
Bind_Addr).
ganesha_core_param_overrides: |
Enable_UDP = false;
NFS_Port = 2050;
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1941775
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
When deploying dashboard with ssl certificates generated by
ceph-ansible, we enforce the CN to 'ceph-dashboard' which can makes
application such alertmanager complain like following:
`err="Post https://mgr0:8443/api/prometheus_receiver: x509: certificate is valid for ceph-dashboard, not mgr0" context_err="context deadline exceeded"`
The idea here is to add alternative names matching all mgr/mon instances
in the certificate so this error won't appear in logs.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1978869
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This introduces a new variable `dashboard_network` in order to support
deploying the dashboard on a different subnet.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1927574
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Instead of reusing the condition 'inventory_hostname in groups[osds]'
on each device facts tasks then we can move all the tasks into a
dedicated file and set the condition on the import_tasks statement.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
We currently don't check if the logical volume used in lvm_volumes list
for either bluestore data/db/wal or filestore data/journal exist.
We're only doing this on raw devices for batch scenario.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
When using dedicated devices for db/journal/wal objecstore with
ceph-volume lvm batch then we should also validate that those devices
exist and don't use a gpt partition table in addition of the devices
and lvm_volume.data variables.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Instead of using findmnt command to find the device associated to the
root mount point then we can use the ansible_mounts fact.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Instead of doing two parted calls we can check first if the device exist
and then test the partition table.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2888c08 introduced a regression as the check_devices tasks file was
only included based on the devices variable.
But that file also validate some devices from the lvm_volumes variable.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1906022
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
All ceph daemons need to have the TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES
environment variable set to 128MB by default in container setup.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1970913
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Starting RHCS 5, there's no ISO available anymore.
This removes all ISO variables and the ceph_repository_type variable.
Closes: #6626
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
It was requested for us to update our alerting definitions to include a
slow OSD Ops health check.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1951664
Signed-off-by: Boris Ranto <branto@redhat.com>
When running the switch-to-containers playbook with multisite enabled,
the fact "rgw_instances" is only set for the node being processed
(serial: 1), the consequence of that is that the set_fact of
'rgw_instances_all' can't iterate over all rgw node in order to look up
each 'rgw_instances_host'.
Adding a condition checking whether hostvars[item]["rgw_instances_host"]
is defined fixes this issue.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967926
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Enabling lvmetad in containerized deployments on el7 based OS might
cause issues.
This commit make it possible to disable this service if needed.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1955040
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
When calling the `ceph_key` module with `state: info`, if the ceph
command called fails, the actual error is hidden by the module which
makes it pretty difficult to troubleshoot.
The current code always states that if rc is not equal to 0 the keyring
doesn't exist.
`state: info` should always return the actual rc, stdout and stderr.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1964889
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
When monitors and rgw are collocated with multisite enabled, the
rolling_update playbook fails because during the workflow, we run some
radosgw-admin commands very early on the first mon even though this is
the monitor being upgraded, it means the container doesn't exist since
it was stopped.
This block is relevant only for scaling out rgw daemons or initial
deployment. In rolling_update workflow, it is not needed so let's skip
it.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1970232
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
When osd nodes are collocated in the clients group (HCI context for
instance), the current logic will exclude osd nodes since they are
present in the client group.
The best fix would be to exclude clients node only when they are not
member of another group but for now, as a workaround, we can enforce
the addition of osd nodes to fix this specific case.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1947695
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Since we need to revert 33bfb10, this is an alternative to initial approach.
We can avoid maintaining this file since it is present in container
image. The idea is to simply get it from the image container and write
it to the host.
Fixes: #6501
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The pg_autoscale_mode for rgw pools introduced in 9f03a52 was wrong
and was missing a `value` keyword because `rgw_create_pools` is a
dict.
Fixes: #6516
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
When dashboard_frontend_vip is provided, all the services should be
configured using the related VIP. A new VIP variable is added for
both prometheus and alertmanager: we're already able to properly
config the grafana vip using dashboard_frontend_vip variable.
This change adds the same variable for both prometheus and
alertmanager.
Signed-off-by: Francesco Pantano <fpantano@redhat.com>
We should consider bumping ansible version for future releases, so let's
start testing against ansible 2.10
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The `set_fact rgw_ports` task was failing due to a templating error, because
`hostvars[item].rgw_instances` is a list, but it was treated as if it was a
dictionary.
Another issue was the fact that the `unique` filter only applied to the list
being appended to `rgw_ports` instead of the entire list, which means it was
possible to have duplicate items.
Lastly, `rgw_ports` would have been a list of integers, but the `seport` module
expects a list of strings.
This commit fixes all of the issues above, allowing the `ceph-rgw-loadbalancer`
role to work on systems with SELinux enabled.
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
This adds a `ExecStartPre=-/usr/bin/mkdir -p /var/log/ceph` in all
systemd service templates for all ceph daemon.
This is specific to RHCS after a Leapp upgrade is done. Indeed, the
`/var/log/ceph` seems to be removed after the upgrade.
In order to work around this issue let's ensure the directory is present
before trying to start the containers with podman.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1949489
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
`configure_mirroring.yml` is called right after the daemon is started.
Sometimes, it can happen the first task in `configure_mirroring.yml` is
run while the daemon isn't yet ready, adding a retries/until on that
task should help to avoid causing the playbook to fail.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1944996
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
when running docker-to-podman playbook, there's no need to call
`ceph-config` and `ceph-rgw` from the role `ceph-handler`.
It can even have side effects when coming from a baremetal cluster that
was previously migrated using the switch-to-containers playbook. Indeed
it might complain about missing .target systemd unit since they are
removed during that migration.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1944999
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This moves some task from the `ceph-nfs` role in `ceph-common` since
some of them are needed in `ceph-rgwloadbalancer` role.
This avoids duplicated tasks.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This adds all rgw ports to the http_port_t selinux type so it
allows haproxy to connect to those ports in order to avoid AVC.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1923890
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
haproxy gets an AVC when configured to connect to port 8081
This commit adds a snippet regarding haproxy in a selinux environment
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1923890
Signed-off-by: Kaleb S KEITHLEY <kkeithle@redhat.com>
Pass the password variable via stdin for the registry login
authentication.
This allows to remove the no_log statement and see the task output
without displaying the password value.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Currently NFS Ganesha (ceph-nfs) consumes /etc/idmapd.conf, which
controls mapping of user/owner identities under NFSv4+. With
containerized service deployment, this file is an immutable part of the
container image and cannot be modified.
Here we provide group variables, and a taskk and templates for the
ceph-nfs role, to set the path of the idmap configuration file and
to make the most common adjustment to the contents of that file --
namely to set the 'Domain'. We default the path to /etc/ganesha/idmap.conf
so that we will not conflict with /etc/idmapd.conf on the controller nodes
where ganesha runs. NFSv4 clients, as used for example by the Cinder NFS
driver, consume /etc/idmapd.conf and may require different settings than
what is wanted for NFS Ganesha. Additionally, because we already bind
/etc/ganesha from the host into the ceph-nfs container, the file NFS
Ganesha consumes will no longer be an immutable part of the container.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1925646
Signed-off-by: Tom Barron tpb@dyncloud.net
Co-Authored-by: Guillaume Abrioux <gabrioux@redhat.com>
This commit adds the parameter `--storage.tsdb.retention.time` to the
prometheus systemd unit template.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1928000
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
In commits 39649f0 and bf8cdad we switch from using the shaman /repos endpoint
to the /search endpoint for using the architecture filter.
In fact that filter is also available with the /repos endpoint, which requires
less ansible tasks.
This also adds back a condition remove in 5801171 on the ceph-iscsi
repository and that repository doesn't need to filter on the architecture
because the ceph-iscsi project is noarch.
Both ceph-iscsi and tcmu-runner shaman URLs were using the ceph_dev_branch
and ceph_dev_sha1 variables which doesn't make sense. Those variables are
only useful for the ceph core repository.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
This adds the possibility to deploy the dashboard with igw nodes using
a dedicated subnet.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1926170
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
rbd-mirroring is not configured as adding peer is getting skipped.
Peer addition should not get skipped if its not added already
Closes - https://bugzilla.redhat.com/show_bug.cgi?id=1942444
Signed-off-by: VasishtaShastry <vipin.indiasmg@gmail.com>
As a continuation of a7f2fa73e6, this
change switches fact injection to off by default in the provided
ansible.cfg.
Signed-off-by: Alex Schultz <aschultz@redhat.com>
due to recent changes in shaman, we must fetch the right repo by
filtering on the desired architecture.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
We can end up with an arm only repo unless we are specific about the
architecture we require. Brings the deb code in line with the rpm
equivalent.
Signed-off-by: Brad Hubbard <bhubbard@redhat.com>
when the group `_filtered_clients` is built, the order can change from
the original `clients` group which can cause issues since we run
`ceph-container-engine` on the first client only. It means later in the
playbook we can make call to the container CLI on a node where the
container engine wasn't installed.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
this updates the `ceph_repository_community` check in `ceph-validate`
with the right ceph release expected.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This commit updates the default version of nfs-ganesha to V3.5 which is the
latest version available upstream.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
When collocating OSDs with other daemon, `num_osds` is incorrectly calculated
because `ceph-config` is called multiple times.
Indeed, the following code:
```
num_osds: "{{ lvm_list.stdout | default('{}') | from_json | length | int + num_osds | default(0) | int }}"
```
makes `num_osds` be incremented each time `ceph-config` is called.
We have to reset it in order to get the correct number of expected OSDs.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This adds the realm pull operation to the current radosgw_realm module.
The pull operation requires the url, access/secret key variables.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Skip the `get initial keyring when it already exists` task when both commands
whose `stdout` output it requires have been skipped (e.g. when running in check
mode).
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
Sometimes it's useful to be able to skip the OSD creation step when
running ceph-ansible (cf #1777). The lvm scenario has a prepare_osd
tag on the relevant play. This commit adds the same tag to the
lvm-batch scenario.
Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
It has come to our attention that using ansible_* vars that are
populated with INJECT_FACTS_AS_VARS=True is not very performant. In
order to be able to support setting that to off, we need to update the
references to use ansible_facts[<thing>] instead of ansible_<thing>.
Related: ansible#73654
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1935406
Signed-off-by: Alex Schultz <aschultz@redhat.com>
http://docs.ceph.com/docs/nautilus/radosgw/frontends/ 404s so replace
it with a working "latest" docs link, and correct the spelling of
"additional" while I'm at it.
Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
the `ceph_cmd` fact is missing the `--net=host` parameter.
Some tasks consuming this fact can fail like following:
```
Error: error configuring network namespace for container b8ec913db1fb694ae683faf202680de7a59c714a004e533aba87e8503d29261f: Missing CNI default network
```
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1931365
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The monitoring node running grafana needs the rhcs tools repostory
enabled in non containerized deployment to be able to install the
ceph-grafana-dashboards rpm package.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1918650
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
if `rgw_zonegroupmaster` is not defined at the rgw instance level in
`rgw_instances` it will fallback to a wrong variable (`rgw_zonemaster`).
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1925247
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>