Right now, under certain OS and Ansible versions, ie Rocky Linux and
ansible-core 2.17, `devices_check` variable is getting defined even if
task was skipped.
That results in set_fact to fail, as resulting variable has no `results`
key in it.
Structure of such variable looks like that:
```
"devices_check": {
"changed": false,
"false_condition": "osd_auto_discovery | default(False) | bool",
"skip_reason": "Conditional result was False",
"skipped": true
}
```
Checking for task not being skipped solves such issues.
Signed-off-by: Dmitriy Rabotyagov <noonedeadpunk@gmail.com>
In some scenarios with NVMe, a device might be identified by
Ansible but could actually be a multipath device rather than an
actual device. We need to exclude these as Ceph cannot create
OSDs on them.
Signed-off-by: Seena Fallah <seenafallah@gmail.com>
this drops the following parameters:
- monitor_address_block
- monitor_interface
- monitor_address
The monitor address will be automatically set from `public_network` parameter.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
* Make common params of container args in a var to avoid duplication
* The /var/lib/ceph/crash mount was missing after 637ca81c9c
* Add CEPH_USE_RANDOM_NONCE as it's needed when running inside container (can be removed for squid later)
* Add NODE_NAME as some part of ceph code relies on this var
* add default logging opts for
Signed-off-by: Seena Fallah <seenafallah@gmail.com>
The current approach is extremely complex and introduced a lot
of spaghetti code. This doesn't offer a good user experience at all.
It's time to think to another approach (dedicated playbook) and drop
the current implementation in order to clean up the code.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
This refactor makes the 'name' argument not mandatory because when
'state' is 'info' we shouldn't need to pass it.
The second change is just a duplicate code removal.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
This adds the resquired changes in order to support
CentOS stream 9.
Also, this bumps the Ansible version support to 2.15
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
With ansible-core 2.15 it is not possible to pass argument of unexpected
type, as otherwise module will fail with:
`'None' is not a string and conversion is not allowed`
With that we want to only get all existing crush rules, so we can simply
supply an empty string as a name argument, which would satisfy
requirements and have same behaviour for previous ansible versions.
Alternative approach would be to stop making `name` as a required
argument to the module and use empty string as default value
when info state is used.
Signed-off-by: Dmitriy Rabotyagov <noonedeadpunk@gmail.com>
There was multiple rgw frontends entries while there was just one
rgw instance on each host. The other entries were the details from
the other rgw hosts in the cluster
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2232282
Signed-off-by: Teoman ONAY <tonay@ibm.com>
* Exclude device from lvm_volumes while osd_auto_discovery is true
* Sum num_osds on both lvm_volumes and devices list
Signed-off-by: Seena Fallah <seenafallah@gmail.com>
If a disk has a symlink it will be re-added to the devices lists one with resolved path and the other with a defined path.
We can rebuild the list from the readlink output cause readlink always return the correct path for all disks.
Signed-off-by: Seena Fallah <seenafallah@gmail.com>
Exclude disks were defined in dedicated_devices and bluestore_wal_devices on osd_auto_discovery enabled.
Signed-off-by: Seena Fallah <seenafallah@gmail.com>
When the following conditions are met:
- rgw is deployed,
- dashboard is deployed,
- playbook is called with --limit,
- a node being processed is collocated on either a mon or mgr.
The playbook fails because `rgw_instances` is undefined.
The idea here is to make sure this variable is always defined.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2063029
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
when these variables are defined in the inventory host file,
all tasks are skipped then because the node being played isn't
aware about the values from the rgw nodes.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2063029
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
use `include_tasks` instead of `import_tasks`.
Given that with `import_tasks` statements are preprocessed
and the tasks that defines it hasn't been run yet, it will fail
and complain like following:
```
The error was: 'ansible.vars.hostvars.HostVarsVars object' has no attribute '_interface'
```
Using `include_tasks` instead fixes this.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This commit fixes templating error that occurs when using auto osd discovery. Getting the len before converting the result to a list causes "object of type generator has no len()" error.
Signed-off-by: pinotelio <ahmadreza.mollapour@gmail.com>
This construct doesn't work as intended since ansible/ansible#74212:
```
item.stdout | default('{}') | from_json
```
That PR made the `command` module return `stdout` even in check mode (setting
it to the empty string), so `default()` has no effect in that case and
`from_json()` fails to parse an empty string.
Instead, `default()` needs to be invoked with its second argument set to
`True`, so that it replaces any `False` value (such as an empty string) with
its first argument:
```
item.stdout | default('{}', True) | from_json
```
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
Change needed in order to support --limit on mon nodes.
Otherwise, a call to `hostvars[groups[mon_group_name][0]]['_current_monitor_address']`
throws an error:
```
"The error was: 'ansible.vars.hostvars.HostVarsVars object' has no attribute '_current_monitor_address'"
```
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2014304#c28
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The PG autoscaler can disrupt the PG checks so the idea here is to
disable it and re-enable it back after the restart is done.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Instead of reusing the condition 'inventory_hostname in groups[osds]'
on each device facts tasks then we can move all the tasks into a
dedicated file and set the condition on the import_tasks statement.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
When running the switch-to-containers playbook with multisite enabled,
the fact "rgw_instances" is only set for the node being processed
(serial: 1), the consequence of that is that the set_fact of
'rgw_instances_all' can't iterate over all rgw node in order to look up
each 'rgw_instances_host'.
Adding a condition checking whether hostvars[item]["rgw_instances_host"]
is defined fixes this issue.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967926
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
when running docker-to-podman playbook, there's no need to call
`ceph-config` and `ceph-rgw` from the role `ceph-handler`.
It can even have side effects when coming from a baremetal cluster that
was previously migrated using the switch-to-containers playbook. Indeed
it might complain about missing .target systemd unit since they are
removed during that migration.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1944999
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
As a continuation of a7f2fa73e6, this
change switches fact injection to off by default in the provided
ansible.cfg.
Signed-off-by: Alex Schultz <aschultz@redhat.com>
It has come to our attention that using ansible_* vars that are
populated with INJECT_FACTS_AS_VARS=True is not very performant. In
order to be able to support setting that to off, we need to update the
references to use ansible_facts[<thing>] instead of ansible_<thing>.
Related: ansible#73654
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1935406
Signed-off-by: Alex Schultz <aschultz@redhat.com>
There's no need to set the rgw_instances_all fact for each node. We can
rely on run_once for that one.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
The conversion fact task was only executed when the grafana_server_group_name
variable was explicitly set in the user configuration. If an user was using
the default value then the conversion wasn't executed.
This also adds back the default grafana_server_group_name value in case user
was using the default value and to avoid undefined variable error.
Instead of hardcoding the "monitoring" group name then we can reuse the
monitoring_group_name variable.
There's no need to override the monitoring_group_name variable, it's either
using the default value or the one defined by the user.
Finally removing the delegate_to statement on the add_host task since it's
always executed on the ansible controller.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1903732
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
This adds ceph_crush_rule ansible module for replacing the command
module usage with the ceph osd crush rule commands.
This module can manage both erasure and replicated crush rules.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
We don't need to use run_once on that task when having running monitors
otherwise the read task could be skip and the set task will fail.
The conditional check 'crush_rule_variable.rc == 0' failed. The error
was: error while evaluating conditional (crush_rule_variable.rc == 0):
'dict object' has no attribute 'rc'
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1898856
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
The `osd_pool_default_crush_rule` is set based on `crush_rule_variable`, which
is the output of a `grep` command.
However, two consecutive tasks can set that variable, and if the second task is
skipped, it still overwrites the `crush_rule_variable`, leading the
`osd_pool_default_crush_rule` to be set to `ceph_osd_pool_default_crush_rule`
instead of the output of the first task.
This commit ensures that the fact is set right after the `crush_rule_variable`
is assigned, before it can be overwritten.
Closes#5912
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
The ceph status command returns a lot of information stored in variables
and/or facts which could consume resources for nothing.
When checking the rgw/rbdmirror services status, we're only using the
servicmap structure in the ceph status output.
To optimize this, we could use the ceph service dump command which contains
the same needed information.
This command returns less information and is slightly faster than the ceph
status command.
$ ceph status -f json | wc -c
2001
$ ceph service dump -f json | wc -c
1105
$ time ceph status -f json > /dev/null
real 0m0.557s
user 0m0.516s
sys 0m0.040s
$ time ceph service dump -f json > /dev/null
real 0m0.454s
user 0m0.434s
sys 0m0.020s
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
this commit changes defaults value in default pool definitions.
there's no need to define `pg_num`, `pgp_num`, `size` and `min_size`,
`ceph_pool` module will use the current default if needed.
This also drops the 3 following `set_fact` in `ceph-facts`:
- osd_pool_default_pg_num,
- osd_pool_default_pgp_num,
- osd_pool_default_size_num
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
In case of deploying new monitor node to an existing cluster,
osd_pool_default_crush_rule should be taken from running monitor because
ceph-osd role won't be run and the new monitor will have different
osd_pool_default_crush_role from other monitors.
Signed-off-by: Seena Fallah <seenafallah@gmail.com>
This change default value of grafana-server group name.
Adding some tasks in ceph-defaults in order to keep backward
compatibility.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Otherwise this will generate an ansible warning about the missing
filter.
[DEPRECATION WARNING]: evaluating xxx as a bare variable, this behaviour
will go away and you might need to add |bool to the expression in the
future.
Also see CONDITIONAL_BARE_VARS configuration toggle.. This feature will
be removed in version 2.12.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>