When osd_auto_discovery is true the `devices` var will be empty (as the disks have holders).
Also in general there is no need to check for devices to list the devices with ceph-volume as we have `default({})` on the stdout in `num_osds` set fact in the next task
Signed-off-by: Seena Fallah <seenafallah@gmail.com>
this should be set when rolling_update is true as well, otherwise, it will reset to default on the upgrade
Signed-off-by: Seena Fallah <seenafallah@gmail.com>
in order to avoid the following error:
```
multiple RX peers are not currently supported
```
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2037646
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This `run_once: true` breaks multiple rbd-mirror daemons support
as it would make all rbd-mirror daemons use the same keyring.
Each rbd-mirror daemon needs its own keyring in order to start.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2037646
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
add cluster name to CEPH_ARGS env var to support custom cluster name
this can change as an arg to ceph-crash when https://github.com/ceph/ceph/pull/47836 released
Signed-off-by: Seena Fallah <seenafallah@gmail.com>
If `osd_memory_target` is set in group_vars, the default value (4Gb)
should be overridden.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2118544
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
When 'osd_memory_target' is overridden in ceph_conf_overrides.
The task that sets the fact `osd_memory_target` in the ceph-config role
should be skipped.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2056675#c11
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
"set_fact container_run_cmd" is not set when using --limit on MDS as facts
were not run on first MON.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2111017
Signed-off-by: Teoman ONAY <tonay@redhat.com>
Add missing `--cluster {{ cluster }}` on task
`set osd_memory_target` in the main.yml file of the
ceph-config role.
Also it moves the task after ceph configuration file is actually written.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Due to some changes [1] in nfs-ganesha-4, we now have to use `/var/run/ganesha/ganesha.pid`
[1] 52e15c30d0
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
use `include_tasks` instead of `import_tasks`.
Given that with `import_tasks` statements are preprocessed
and the tasks that defines it hasn't been run yet, it will fail
and complain like following:
```
The error was: 'ansible.vars.hostvars.HostVarsVars object' has no attribute '_interface'
```
Using `include_tasks` instead fixes this.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This commit makes podman bindmount `/:/rootfs:ro` so the container can
collect data from the host.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2028775
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This commit fixes templating error that occurs when using auto osd discovery. Getting the len before converting the result to a list causes "object of type generator has no len()" error.
Signed-off-by: pinotelio <ahmadreza.mollapour@gmail.com>
Since the ISO install method removal, ceph-ansible isn't able
to detect wheter the user is deploying in a 'disconnected environment'.
By the way, given that ceph-ansible is available only for upgrading to RHCS 5,
this check doesn't make sense anymore, let's drop it.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2062147
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
When deploying with --skip-tags=package-install (when there is no access to a repository), the playbook is still trying to update the package cache, which makes the playbook fail.
This change prevents the playbook to try to update the cache when the package-install tag is skipped.
Signed-off-by: Florent CARLI <florent.carli@rte-france.com>
When running the playbook with `--limit`, if the play targeted doesn't match
hosts present in the mgr group the playbook can fail.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2063029
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Initially MONs and RGW binded /etc/pki/ca-trust/extracted using the :z flag
(introduced to solve an OSP TripleO issue on RHEL - #3638) but using
this flag prevents local services (like sssd) running on the host from accessing
the certificates/files in that folder.
Signed-off-by: Teoman ONAY <tonay@redhat.com>
When using group of group, the playbook will apply undesired
labels on nodes.
This commit fixes it by applying only the expected labels.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2057528
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
By default cephadm uses root account to connect remotely
to other nodes in the cluster. This change allows to choose
another account.
This commit also allows to use a dedicated subnet for cephadm mgmt.
Signed-off-by: Teoman ONAY <tonay@redhat.com>
This construct doesn't work as intended since ansible/ansible#74212:
```
item.stdout | default('{}') | from_json
```
That PR made the `command` module return `stdout` even in check mode (setting
it to the empty string), so `default()` has no effect in that case and
`from_json()` fails to parse an empty string.
Instead, `default()` needs to be invoked with its second argument set to
`True`, so that it replaces any `False` value (such as an empty string) with
its first argument:
```
item.stdout | default('{}', True) | from_json
```
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
Set a default value for `item.stdout` before passing it to `from_json()`. The
`when` condition doesn't prevent this template from being evaluated in check
mode, so it fails if `item.stdout` doesn't contain a valid JSON string.
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
This construct doesn't work as intended since ansible/ansible#74212:
```
ceph_osd_ids.stdout | default('{}') | from_json
```
That PR made the `command` module return `stdout` even in check mode (setting
it to the empty string), so `default()` has no effect in that case and
`from_json()` fails to parse an empty string.
Instead, `default()` needs to be invoked with its second argument set to
`True`, so that it replaces any `False` value (such as an empty string) with
its first argument:
```
ceph_osd_ids.stdout | default('{}', True) | from_json
```
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
When installing grafana plugins, the container will make http requests.
This requires http proxy otherwise installation cannot be performed. Passed
the proxy vars from all.yml as env args.
Fixes: ceph#6484, ceph#6481
Signed-off-by: John Karasev <john.karasev@intel.com>
In order to reduce need of module
internal maintenance and to join forces on plugin development,
it's proposed to switch to using upstream version of
config_template module.
As it's shipped as collection, it's installation for end-users
is trivial and aligns with general approach of shipping extra modules.
Signed-off-by: Dmitriy Rabotyagov <noonedeadpunk@ya.ru>
When running in check mode with one or more Ceph daemons that need to be
restarted, the `tmpdirpath.path` variable that several handlers rely on is
undefined, leading to fatal errors.
This commit ensures the tasks that require `tmpdirpath.path` are skipped when
it's undefined.
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
Update `After=` and `Wants=` parameters in container systemd units
and make them be aligned with the systemd units that come
from the packaging.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2027440
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The pools created by `ceph-rgw` (listed in `rgw_create_pools`) now support a
`ec_crush_device_class` option to specify which device class the EC pool should
use.
It default to being omitted, which means it will use OSDs from any device class
by default.
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
since a variable encrypted with vault is no longer a string but a
encrypted object we can't use the filter | length, we have to convert it
to a string before.
Fixes: #6991
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Change needed in order to support --limit on mon nodes.
Otherwise, a call to `hostvars[groups[mon_group_name][0]]['_current_monitor_address']`
throws an error:
```
"The error was: 'ansible.vars.hostvars.HostVarsVars object' has no attribute '_current_monitor_address'"
```
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2014304#c28
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
ceph_stable_release is a legacy from the time where a single branch of ceph-ansible supported more than one release of ceph
Signed-off-by: Seena Fallah <seenafallah@gmail.com>
Unconfigured dashboard features can lead to empty tabs in the dashboard
containing no meaningful content. Allow users to disable dashboard features
they know will not be used.
A list of features to be disabled allows the user to define a streamlined
dashboard as standard across deployments. Defaults to disabling no features,
ensuring that users are sure they do not need the dashboard feature before
disabling it.
Signed-off-by: Alex Lambert <lamberta@microsoft.com>
After pacific release, ceph-volume has its own package.
ceph-ansible has to explicitly install it on osd nodes.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Because the ceph container images are now only pushed to the quay.io
registry then this updates the default registry value.
The docker.io registry can still be used but doesn't receive updated
container images.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
We don't pull the monitoring container images (alertmanager, prometheus,
node-exporter and grafana) in a dedicated task like we're doing for the
ceph container image.
This means that the container image pull is done during the start of the
systemd service.
By doing this, pulling the image behind a proxy isn't working with podman.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1995574
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
It restricts access to the iSCSI API.
It can be left empty if the API isn't going to be access from outside the
gateway node
Even though this seems to be a limited use case, it's better to leave it
empty by default than having a meaningless default value.
We could make this variable mandatory but that would be a breaking
change. Let's just add a logic in the template in order to set this
variable in the configuration file only if it was specified by users.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1994930
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Dimitri Savineau <dsavinea@redhat.com>
This adds ceph-*.target systemd unit files support for containerized
deployments.
This also fixes a regression introduced by PR #6719 (rgw and nfs systemd
units not getting purged)
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1962748
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Since [1] multiple ceph dashboard commands have been removed and this is
breaking the current ceph-ansible dashboard with RGW automation.
This removes the following dashboard rgw commands:
- ceph dashboard set-rgw-api-access-key
- ceph dashboard set-rgw-api-secret-key
- ceph dashboard set-rgw-api-host
- ceph dashboard set-rgw-api-port
- ceph dashboard set-rgw-api-scheme
Which are replaced by `ceph dashboard set-rgw-credentials`
The RGW user creation task is also removed.
Finally moving the delegate_to statement from the rgw tasks at the block
level.
[1] https://github.com/ceph/ceph/pull/42252
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
With OpenSSL version prior 1.1.1 (like CentOS 7 with 1.0.2k), the -addext
doesn't exist.
As a solution, this uses the default openssl.cnf configuration file as a
template and add the subjectAltName in the v3_ca section. This temp openssl
configuration file is removed after the TLS certificate creation.
This patch also move the run_once statement at the block level.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1978869
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
the current way the variable is built results in:
```
2021-08-03 04:18:23,020 - ceph.ceph - INFO - ok: [ceph-sangadi-4x-indpt6-node1-installer] => changed=false
ansible_facts:
subj_alt_names: |-
subjectAltName=ceph-sangadi-4x-indpt6-node1-installer/subjectAltName=10.0.210.223/subjectAltName=ceph-sangadi-4x-indpt6-node1-installersubjectAltName=ceph-sangadi-4x-indpt6-node2/subjectAltName=10.0.210.252/subjectAltName=ceph-sangadi-4x-indpt6-node2/
```
which is incorrect.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1978869
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
sufficient for the default value (512) of rgw thread pool size.
But if its value is increased near to the pids-limit value,
it does not leave place for the other processes to spawn and run within
the container and the container crashes.
pids-limit set to unlimited regardless of the container engine.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1987041
Signed-off-by: Teoman ONAY <tonay@redhat.com>
radosgw_civetweb_xxx variables are legacy variables and users should
have switched to radosgw_frontend_xxx variables instead.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
The ceph osd pool ls detail command is a subset of the ceph osd dump
command.
$ ceph osd dump --format json|wc -c
10117
$ ceph osd pool ls detail --format json|wc -c
4740
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Run the Ceph commands that only gather information (without making any changes
to the cluster) when running Ansible in check mode.
This allows the tasks that depend on the variables set by those tasks to
succeed in check mode.
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
The radosgw-sync-overview and rbd-details grafana dashboars were missing
from the list.
Closes: #6758
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
When using self-signed/untrusted CA certificates, alertmanager displays
an error in logs. With this commit this should make those messages
disappear.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1936299
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
When the rgw_multisite_proto variable is set to https then we shoudn't use
the IP address in the zone endpoints list but the node FQDN to match the
TLS certificate CN.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1965504
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
When using python 2 and the task with a loop is skipped then it generates
an error.
Unexpected templating type error occurred on
({{ (pool_list.stdout | from_json)['pools'] }}): expected string or buffer
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
The PG autoscaler can disrupt the PG checks so the idea here is to
disable it and re-enable it back after the restart is done.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Populating the ceph_mgr_modules list in the mgr_modules doesn't make sense
since that file is only executed if the list isn't empty or we're using the
dashboard.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
We already have config override variables for existing block (like
ganesha_ceph_export_overrides, ganesha_log_overrides, etc...) or a
global one (ganesha_conf_overrides) but redefining the NFS_CORE_PARAM
block in that variable will erase all previous values (currently only
Bind_Addr).
ganesha_core_param_overrides: |
Enable_UDP = false;
NFS_Port = 2050;
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1941775
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
When deploying dashboard with ssl certificates generated by
ceph-ansible, we enforce the CN to 'ceph-dashboard' which can makes
application such alertmanager complain like following:
`err="Post https://mgr0:8443/api/prometheus_receiver: x509: certificate is valid for ceph-dashboard, not mgr0" context_err="context deadline exceeded"`
The idea here is to add alternative names matching all mgr/mon instances
in the certificate so this error won't appear in logs.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1978869
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This introduces a new variable `dashboard_network` in order to support
deploying the dashboard on a different subnet.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1927574
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Instead of reusing the condition 'inventory_hostname in groups[osds]'
on each device facts tasks then we can move all the tasks into a
dedicated file and set the condition on the import_tasks statement.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
We currently don't check if the logical volume used in lvm_volumes list
for either bluestore data/db/wal or filestore data/journal exist.
We're only doing this on raw devices for batch scenario.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
When using dedicated devices for db/journal/wal objecstore with
ceph-volume lvm batch then we should also validate that those devices
exist and don't use a gpt partition table in addition of the devices
and lvm_volume.data variables.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Instead of using findmnt command to find the device associated to the
root mount point then we can use the ansible_mounts fact.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Instead of doing two parted calls we can check first if the device exist
and then test the partition table.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2888c08 introduced a regression as the check_devices tasks file was
only included based on the devices variable.
But that file also validate some devices from the lvm_volumes variable.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1906022
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
All ceph daemons need to have the TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES
environment variable set to 128MB by default in container setup.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1970913
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Starting RHCS 5, there's no ISO available anymore.
This removes all ISO variables and the ceph_repository_type variable.
Closes: #6626
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
It was requested for us to update our alerting definitions to include a
slow OSD Ops health check.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1951664
Signed-off-by: Boris Ranto <branto@redhat.com>
When running the switch-to-containers playbook with multisite enabled,
the fact "rgw_instances" is only set for the node being processed
(serial: 1), the consequence of that is that the set_fact of
'rgw_instances_all' can't iterate over all rgw node in order to look up
each 'rgw_instances_host'.
Adding a condition checking whether hostvars[item]["rgw_instances_host"]
is defined fixes this issue.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967926
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Enabling lvmetad in containerized deployments on el7 based OS might
cause issues.
This commit make it possible to disable this service if needed.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1955040
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
When calling the `ceph_key` module with `state: info`, if the ceph
command called fails, the actual error is hidden by the module which
makes it pretty difficult to troubleshoot.
The current code always states that if rc is not equal to 0 the keyring
doesn't exist.
`state: info` should always return the actual rc, stdout and stderr.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1964889
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
When monitors and rgw are collocated with multisite enabled, the
rolling_update playbook fails because during the workflow, we run some
radosgw-admin commands very early on the first mon even though this is
the monitor being upgraded, it means the container doesn't exist since
it was stopped.
This block is relevant only for scaling out rgw daemons or initial
deployment. In rolling_update workflow, it is not needed so let's skip
it.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1970232
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>