Commit Graph

133 Commits (main)

Author SHA1 Message Date
Dmitriy Rabotyagov 9405558d03 Don't try to set devices fact when osd_auto_discovery was skipped
Right now, under certain OS and Ansible versions, ie Rocky Linux and
ansible-core 2.17, `devices_check` variable is getting defined even if
task was skipped.

That results in set_fact to fail, as resulting variable has no `results`
key in it.

Structure of such variable looks like that:
```
"devices_check": {
    "changed": false,
    "false_condition": "osd_auto_discovery | default(False) | bool",
    "skip_reason": "Conditional result was False",
    "skipped": true
}
```

Checking for task not being skipped solves such issues.

Signed-off-by: Dmitriy Rabotyagov <noonedeadpunk@gmail.com>
2024-10-28 23:21:45 +01:00
Seena Fallah be9b458524 devices: test devices before collecting on auto discovery
In some scenarios with NVMe, a device might be identified by
Ansible but could actually be a multipath device rather than an
actual device. We need to exclude these as Ceph cannot create
OSDs on them.

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
2024-08-13 11:17:05 +02:00
Guillaume Abrioux d9e2de1d1d remove legacy task
this was for backward compatibility concerns, it's time to drop this
task.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
2024-03-20 20:22:34 +01:00
Guillaume Abrioux 8871de6d99 simplify monitor address setting
this drops the following parameters:

- monitor_address_block
- monitor_interface
- monitor_address

The monitor address will be automatically set from `public_network` parameter.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
2024-03-08 13:02:44 +01:00
Seena Fallah b1848ac957 ceph-facts: make set_radosgw_address optional
This can help to define custom rgw_instances with custom names and ports and addresses.

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
2024-02-28 09:25:00 +01:00
Seena Fallah 84e10bfd03 container: cleanup container systemd units
* Make common params of container args in a var to avoid duplication
* The /var/lib/ceph/crash mount was missing after 637ca81c9c
* Add CEPH_USE_RANDOM_NONCE as it's needed when running inside container (can be removed for squid later)
* Add NODE_NAME as some part of ceph code relies on this var
* add default logging opts for

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
2024-02-19 23:14:26 +01:00
Guillaume Abrioux 03f1e3f48e drop iscsigw support
This service is no longer maintained.
Let's drop its support within ceph-ansible.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
2024-02-16 08:59:05 +01:00
Guillaume Abrioux 18da10bb7a address Ansible linter errors
This addresses all errors reported by the Ansible linter.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
2024-02-16 00:38:19 +01:00
Guillaume Abrioux 7d25a5d565 drop rgw multisite deployment support
The current approach is extremely complex and introduced a lot
of spaghetti code. This doesn't offer a good user experience at all.

It's time to think to another approach (dedicated playbook) and drop
the current implementation in order to clean up the code.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
2024-02-16 00:38:19 +01:00
Guillaume Abrioux b2273ef4b8 facts: remove legacy tasks
these tasks were there only for backward compatibility concerns.
It's time to drop them.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
2024-02-14 09:54:13 +01:00
Guillaume Abrioux e433d2b955 library/ceph_crush_rule: module refactor
This refactor makes the 'name' argument not mandatory because when
'state' is 'info' we shouldn't need to pass it.

The second change is just a duplicate code removal.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
2024-02-14 09:54:13 +01:00
Guillaume Abrioux 7909778d0e add CentOS stream 9 support
This adds the resquired changes in order to support
CentOS stream 9.

Also, this bumps the Ansible version support to 2.15

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
2024-02-14 09:54:13 +01:00
Dmitriy Rabotyagov b610297554 Do not pass NoneType as argument to ceph_crush_rule
With ansible-core 2.15 it is not possible to pass argument of unexpected
type, as otherwise module will fail with:
`'None' is not a string and conversion is not allowed`

With that we want to only get all existing crush rules, so we can simply
supply an empty string as a name argument, which would satisfy
requirements and have same behaviour for previous ansible versions.

Alternative approach would be to stop making `name` as a required
argument to the module and use empty string as default value
when info state is used.

Signed-off-by: Dmitriy Rabotyagov <noonedeadpunk@gmail.com>
2023-12-06 19:13:39 +01:00
Teoman ONAY 490ca79ccc dashboad: rgw frontends entries in ceph.conf are incorrect
There was multiple rgw frontends entries while there was just one
rgw instance on each host. The other entries were the details from
the other rgw hosts in the cluster

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2232282

Signed-off-by: Teoman ONAY <tonay@ibm.com>
2023-09-26 10:37:14 +02:00
Teoman ONAY 72d4d207a0 Speed up the some facts settings by running them once
Signed-off-by: Teoman ONAY <tonay@ibm.com>
2023-09-26 10:37:14 +02:00
Teoman ONAY 8f3bdd8559 Add ipv6 libvirt support scenario in vagrant
Addition of ipv6 support in vagrant/libvirt and an all_daemons_ipv6 scenario.
Some typo fixes

Signed-off-by: Teoman ONAY <tonay@ibm.com>
2023-06-28 20:51:01 +02:00
Seena Fallah 80b1ed9d4a devices: allow using lvm_volumes with devices
* Exclude device from lvm_volumes while osd_auto_discovery is true
* Sum num_osds on both lvm_volumes and devices list

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
2023-03-17 16:05:34 +01:00
Seena Fallah 1f7b3ac5a3 devices: remove duplicated disks after the readlink resolve
If a disk has a symlink it will be re-added to the devices lists one with resolved path and the other with a defined path.
We can rebuild the list from the readlink output cause readlink always return the correct path for all disks.

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
2023-02-28 14:50:49 +01:00
Seena Fallah 32b5678511 devices: exclude db disks on osd_auto_discovery enabled
Exclude disks were defined in dedicated_devices and bluestore_wal_devices on osd_auto_discovery enabled.

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
2023-02-22 15:06:37 +01:00
Guillaume Abrioux 45c2f0a90a dashboard: support --limit execution with rgw
When the following conditions are met:

- rgw is deployed,
- dashboard is deployed,
- playbook is called with --limit,
- a node being processed is collocated on either a mon or mgr.

The playbook fails because `rgw_instances` is undefined.
The idea here is to make sure this variable is always defined.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2063029

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2022-10-20 10:40:49 +02:00
Guillaume Abrioux 93df3e53ab facts: follow up on aa0cc93
when these variables are defined in the inventory host file,
all tasks are skipped then because the node being played isn't
aware about the values from the rgw nodes.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2063029

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2022-10-20 10:40:49 +02:00
Guillaume Abrioux a99812aa92 facts: follow up on f6b49f78
f6b49f78a9 changed a call back to `ipwrap`
This fixes this.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2022-07-06 03:06:01 +02:00
Guillaume Abrioux 434793e2fe facts: fix set_radosgw_address.yml
use `include_tasks` instead of `import_tasks`.
Given that with `import_tasks` statements are preprocessed
and the tasks that defines it hasn't been run yet, it will fail
and complain like following:

```
The error was: 'ansible.vars.hostvars.HostVarsVars object' has no attribute '_interface'
```

Using `include_tasks` instead fixes this.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2022-07-05 21:10:50 +02:00
Guillaume Abrioux f6b49f78a9 facts: fix deployments with different net interface names
Deployments when radosgws don't have the same names for
network interface.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2095605

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2022-07-05 10:01:40 +02:00
Guillaume Abrioux c1649862a9 common: move to `ansible.utils.ipwrap`
ipwrap has moved to ansible.utils

see
db4920ebf6

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2022-05-12 22:51:31 +02:00
pinotelio f288364c5c ceph-facts: fix ansible templating error for auto osd discovery
This commit fixes templating error that occurs when using auto osd discovery. Getting the len before converting the result to a list causes "object of type generator has no len()" error.

Signed-off-by: pinotelio <ahmadreza.mollapour@gmail.com>
2022-04-13 14:26:35 +02:00
Seena Fallah 9d87fd87cb ceph-facts: ignore mounted disks on osd auto discovery
Ignore disks with active mountpoint when osd_auto_discovery is true

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
2022-02-21 17:15:30 +01:00
Benoît Knecht 7684d892c0 ceph-facts: Fix get_def_crush_rule_name.yml in check mode
This construct doesn't work as intended since ansible/ansible#74212:

```
item.stdout | default('{}') | from_json
```

That PR made the `command` module return `stdout` even in check mode (setting
it to the empty string), so `default()` has no effect in that case and
`from_json()` fails to parse an empty string.

Instead, `default()` needs to be invoked with its second argument set to
`True`, so that it replaces any `False` value (such as an empty string) with
its first argument:

```
item.stdout | default('{}', True) | from_json
```

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
2022-02-07 14:13:19 +01:00
Danny Webb 189ff93372 make grafana network a configurable option
Signed-off-by: Danny Webb <danny.webb@thehutgroup.com>
2021-12-02 08:53:58 +01:00
Guillaume Abrioux 82eee4303b update: support --limit on monitor nodes
Change needed in order to support --limit on mon nodes.
Otherwise, a call to `hostvars[groups[mon_group_name][0]]['_current_monitor_address']`
throws an error:

```
"The error was: 'ansible.vars.hostvars.HostVarsVars object' has no attribute '_current_monitor_address'"
```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2014304#c28

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-10-28 21:47:01 +02:00
Guillaume Abrioux 13036115e2 common: disable/enable pg_autoscaler
The PG autoscaler can disrupt the PG checks so the idea here is to
disable it and re-enable it back after the restart is done.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-07-20 07:37:07 +02:00
Dimitri Savineau d704b05e52 ceph-facts: move device facts to its own file
Instead of reusing the condition 'inventory_hostname in groups[osds]'
on each device facts tasks then we can move all the tasks into a
dedicated file and set the condition on the import_tasks statement.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2021-07-02 14:02:30 +02:00
Guillaume Abrioux 8279d14d32 multisite: fix bug during switch2containers
When running the switch-to-containers playbook with multisite enabled,
the fact "rgw_instances" is only set for the node being processed
(serial: 1), the consequence of that is that the set_fact of
'rgw_instances_all' can't iterate over all rgw node in order to look up
each 'rgw_instances_host'.

Adding a condition checking whether hostvars[item]["rgw_instances_host"]
is defined fixes this issue.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967926

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-06-17 01:49:29 +02:00
Guillaume Abrioux 70f19be367 docker2podman: skip some role imports from handler
when running docker-to-podman playbook, there's no need to call
`ceph-config` and `ceph-rgw` from the role `ceph-handler`.
It can even have side effects when coming from a baremetal cluster that
was previously migrated using the switch-to-containers playbook. Indeed
it might complain about missing .target systemd unit since they are
removed during that migration.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1944999

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-04-09 15:28:50 +02:00
Guillaume Abrioux 0163ecc924 convert some missed `ansible_*`` calls to `ansible_facts['*']`
This converts some missed calls to `ansible_*` that were missed in
initial PR #6312

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-03-25 15:19:13 +01:00
Alex Schultz db031a4993 Disable facts by default in ansible.cfg
As a continuation of a7f2fa73e6, this
change switches fact injection to off by default in the provided
ansible.cfg.

Signed-off-by: Alex Schultz <aschultz@redhat.com>
2021-03-24 13:44:33 +01:00
Guillaume Abrioux ccd1cbb732 facts: fix nfs/external cluster scenario
These tasks shouldn't be run when at least 1 monitor isn't present in
the inventory.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1937997

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-03-17 16:05:48 +01:00
Alex Schultz a7f2fa73e6 Use ansible_facts
It has come to our attention that using ansible_* vars that are
populated with INJECT_FACTS_AS_VARS=True is not very performant.  In
order to be able to support setting that to off, we need to update the
references to use ansible_facts[<thing>] instead of ansible_<thing>.

Related: ansible#73654
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1935406
Signed-off-by: Alex Schultz <aschultz@redhat.com>
2021-03-08 20:54:02 +01:00
Dimitri Savineau 7208a39e57 ceph-facts: set rgw_instances_all fact once
There's no need to set the rgw_instances_all fact for each node. We can
rely on run_once for that one.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2021-02-01 13:49:12 +01:00
Dimitri Savineau 2aeab882f3 ceph-facts: fix grafana group conversion
The conversion fact task was only executed when the grafana_server_group_name
variable was explicitly set in the user configuration. If an user was using
the default value then the conversion wasn't executed.

This also adds back the default grafana_server_group_name value in case user
was using the default value and to avoid undefined variable error.

Instead of hardcoding the "monitoring" group name then we can reuse the
monitoring_group_name variable.

There's no need to override the monitoring_group_name variable, it's either
using the default value or the one defined by the user.

Finally removing the delegate_to statement on the add_host task since it's
always executed on the ansible controller.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1903732

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-12-10 16:51:16 +01:00
Dimitri Savineau 2e417ab901 library: add ceph_crush_rule module
This adds ceph_crush_rule ansible module for replacing the command
module usage with the ceph osd crush rule commands.
This module can manage both erasure and replicated crush rules.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-12-01 17:52:41 +01:00
Dimitri Savineau e150df789e ceph-facts: fix read osd pool default crush fact
We don't need to use run_once on that task when having running monitors
otherwise the read task could be skip and the set task will fail.

The conditional check 'crush_rule_variable.rc == 0' failed. The error
was: error while evaluating conditional (crush_rule_variable.rc == 0):
'dict object' has no attribute 'rc'

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1898856

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-11-18 12:55:43 -05:00
Benoît Knecht c5f7343a2f ceph-facts: Fix osd_pool_default_crush_rule fact
The `osd_pool_default_crush_rule` is set based on `crush_rule_variable`, which
is the output of a `grep` command.

However, two consecutive tasks can set that variable, and if the second task is
skipped, it still overwrites the `crush_rule_variable`, leading the
`osd_pool_default_crush_rule` to be set to `ceph_osd_pool_default_crush_rule`
instead of the output of the first task.

This commit ensures that the fact is set right after the `crush_rule_variable`
is assigned, before it can be overwritten.

Closes #5912

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
2020-11-13 09:36:49 +01:00
Dimitri Savineau 3f9081931f rgw/rbdmirror: use service dump instead of ceph -s
The ceph status command returns a lot of information stored in variables
and/or facts which could consume resources for nothing.
When checking the rgw/rbdmirror services status, we're only using the
servicmap structure in the ceph status output.
To optimize this, we could use the ceph service dump command which contains
the same needed information.
This command returns less information and is slightly faster than the ceph
status command.

$ ceph status -f json | wc -c
2001
$ ceph service dump -f json | wc -c
1105
$ time ceph status -f json > /dev/null

real	0m0.557s
user	0m0.516s
sys	0m0.040s
$ time ceph service dump -f json > /dev/null

real	0m0.454s
user	0m0.434s
sys	0m0.020s

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-11-03 09:05:33 +01:00
Guillaume Abrioux 1cc9666c09 common: drop `fetch_directory` feature
This commit drops the `fetch_directory` feature.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-10-21 13:22:16 +02:00
Guillaume Abrioux c101cb3931 defaults: change defaults value
this commit changes defaults value in default pool definitions.

there's no need to define `pg_num`, `pgp_num`, `size` and `min_size`,
`ceph_pool` module will use the current default if needed.

This also drops the 3 following `set_fact` in `ceph-facts`:

- osd_pool_default_pg_num,
- osd_pool_default_pgp_num,
- osd_pool_default_size_num

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-10-02 07:42:40 +02:00
Seena Fallah ff9f4d138f ceph-facts: add get default crush rule from running monitor
In case of deploying new monitor node to an existing cluster,
osd_pool_default_crush_rule should be taken from running monitor because
ceph-osd role won't be run and the new monitor will have different
osd_pool_default_crush_role from other monitors.

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
2020-09-29 09:27:58 -04:00
Guillaume Abrioux eefe11d90c defaults: change default grafana-server name
This change default value of grafana-server group name.
Adding some tasks in ceph-defaults in order to keep backward
compatibility.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-09-29 07:42:26 +02:00
Seena Fallah 69f7e35382 ceph-facts: check for mon socket in its own host
delegate to its own host after checking mon socket to findout if mon socket is in-use or not.

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
2020-09-29 00:21:12 +02:00
Dimitri Savineau 50104650e7 add missing boolean filter
Otherwise this will generate an ansible warning about the missing
filter.

[DEPRECATION WARNING]: evaluating xxx as a bare variable, this behaviour
will go away and you might need to add |bool to the expression in the
future.
Also see CONDITIONAL_BARE_VARS configuration toggle.. This feature will
be removed in version 2.12.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-09-28 20:45:01 +02:00