The ceph status command returns a lot of information stored in variables
and/or facts which could consume resources for nothing.
When checking the quorum status, we're only using the quorum_names
structure in the ceph status output.
To optimize this, we could use the ceph quorum_status command which contains
the same needed information.
This command returns less information.
$ ceph status -f json | wc -c
2001
$ ceph quorum_status -f json | wc -c
957
$ time ceph status -f json > /dev/null
real 0m0.577s
user 0m0.538s
sys 0m0.029s
$ time ceph quorum_status -f json > /dev/null
real 0m0.544s
user 0m0.527s
sys 0m0.016s
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 88f91d8c12)
The ceph status command returns a lot of information stored in variables
and/or facts which could consume resources for nothing.
When checking the pgs state, we're using the pgmap structure in the ceph
status output.
To optimize this, we could use the ceph pg stat command which contains
the same needed information.
This command returns less information (only about pgs) and is slightly
faster than the ceph status command.
$ ceph status -f json | wc -c
2000
$ ceph pg stat -f json | wc -c
240
$ time ceph status -f json > /dev/null
real 0m0.529s
user 0m0.503s
sys 0m0.024s
$ time ceph pg stat -f json > /dev/null
real 0m0.426s
user 0m0.409s
sys 0m0.016s
The data returned by the ceph status is even bigger when using the
nautilus release.
$ ceph status -f json | wc -c
35005
$ ceph pg stat -f json | wc -c
240
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ee50588590)
Improve the checked way of the OSD created checking process.
This replaces the ceph status command by the ceph osd stat command.
The osdmap structure isn't needed anymore.
$ ceph status -f json | wc -c
2001
$ ceph osd stat -f json | wc -c
132
$ time ceph status -f json > /dev/null
real 0m0.563s
user 0m0.526s
sys 0m0.036s
$ time ceph osd stat -f json > /dev/null
real 0m0.457s
user 0m0.411s
sys 0m0.045s
Signed-off-by: wangxiaotong <wangxiaotong@fiberhome.com>
(cherry picked from commit b9cb0f12e9)
Otherwise this task fails if no permission is set on the item.
Previously the code omited the mode parameter if it was not set, but
this was lost with commit ab370b6ad8.
Signed-off-by: Gaudenz Steinlin <gaudenz.steinlin@cloudscale.ch>
(cherry picked from commit 79ff79c422)
Since we've changed to podman configuration using the detach mode and
systemd type to forking then the container logs aren't present in the
journald anymore.
The default conmon log driver is using k8s-file.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1890439
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 16cd183b9c)
After rolling updates performed with
`infrastructure-playbooks/rolling_updates.yml`, files located in
`/var/lib/ceph/mon/{{ cluster }}-{{ monitor_name }}` had mode 0755 (including
the keyring), making them world-readable.
This commit separates the task that configured permissions recursively on
`/var/lib/ceph/mon/{{ cluster }}-{{ monitor_name }}` into two separate tasks:
1. Set the ownership and mode of the directory itself;
2. Recursively set ownership in the directory, but don't modify the mode.
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit 0d76826bbb)
When using the curl command with ipv6 address and brackets then we need
to use the -g option otherwise the command fails.
$ curl http://[fdc2:328:750b:6983::6]:8080
curl: (3) [globbing] error: bad range specification after pos 9
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit cdb7b09cd7)
This commit cleans up the `main.yml` task file of `ceph-config`.
It drops the local ceph.conf generation.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 900c0f4492)
This file is currently deployed with '0644' ownership making this file
readable by any user on the system.
Since it contains sensitive information it should be readable by the
owner only.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1890119
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a822f77300)
When running rhel8 containers on a rhel7 host, after zapping an OSD
there's a discrepancy with the lvmetad cache that needs to be refreshed.
Otherwise, the host still sees the lv and can makes the user confused.
If user tries to redeploy an OSD, it will fail because the LV isn't
present and need to be recreated.
ie:
```
stderr: lsblk: ceph-block-8/block-8: not a block device
stderr: blkid: error: ceph-block-8/block-8: No such file or directory
stderr: Unknown device, --name=, --path=, or absolute path in /dev/ or /sys expected.
usage: ceph-volume lvm prepare [-h] --data DATA [--data-size DATA_SIZE]
[--data-slots DATA_SLOTS] [--filestore]
[--journal JOURNAL]
[--journal-size JOURNAL_SIZE] [--bluestore]
[--block.db BLOCK_DB]
[--block.db-size BLOCK_DB_SIZE]
[--block.db-slots BLOCK_DB_SLOTS]
[--block.wal BLOCK_WAL]
[--block.wal-size BLOCK_WAL_SIZE]
[--block.wal-slots BLOCK_WAL_SLOTS]
[--osd-id OSD_ID] [--osd-fsid OSD_FSID]
[--cluster-fsid CLUSTER_FSID]
[--crush-device-class CRUSH_DEVICE_CLASS]
[--dmcrypt] [--no-systemd]
ceph-volume lvm prepare: error: Unable to proceed with non-existing device: ceph-block-8/block-8
```
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1886534
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0bb106045e)
Correctly set `osd_ids_non_container.stdout_lines` to an empty list if it's
undefined (i.e. in check mode).
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit 8b0023cb77)
Skip the `get initial keyring when it already exists` task when both commands
whose `stdout` output it requires have been skipped (e.g. when running in check
mode).
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit 8f436ab5d8)
The current task installs the ceph-crash key to "most" hosts via
"delegate_to". This key is only used by the ceph-crash daemon and should
just be installed on all hosts targeted by this role. There is no need
for using a delegated task.
Signed-off-by: Gaudenz Steinlin <gaudenz.steinlin@cloudscale.ch>
(cherry picked from commit 68cc93fb18)
We don't need to run flake8 on ansible modules and their tests if we
don't have any modifitions.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 00b7ee27df)
The service should be started after the ceph-osd systemd overrides has
been added, otherwise, the latter isn't considered.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1860739
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 59d0f01992)
Using the + operation on two lists doesn't filter out the duplicate
keys.
Currently each OSDs is started (via systemd) twice.
Instead we could use the union filter.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 4eaa65c362)
the `stat --printf=%n` returns something like following:
```
ok: [osd0] => changed=false
cmd: |-
stat --printf=%n /var/run/ceph/ceph-osd*.asok
delta: '0:00:00.009388'
end: '2020-10-06 06:18:28.109500'
failed_when_result: false
rc: 0
start: '2020-10-06 06:18:28.100112'
stderr: ''
stderr_lines: <omitted>
stdout: /var/run/ceph/ceph-osd.2.asok/var/run/ceph/ceph-osd.5.asok
stdout_lines: <omitted>
```
it makes the next task "check if the ceph osd socket is in-use" grep
like this:
```
ok: [osd0] => changed=false
cmd:
- grep
- -q
- /var/run/ceph/ceph-osd.2.asok/var/run/ceph/ceph-osd.5.asok
- /proc/net/unix
```
which will obviously fail because this path never exists. It makes the
OSD handler broken.
Let's use `find` module instead.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 46d4d97da9)
Make sure the `site.yml.sample` playbook can be run in check mode by skipping
tasks that try to read the output of commands that have been skipped.
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit 54ba38e35e)
`all_daemons` scenario can't handle pools with `size: 3` because we have
1 osd node in root=HDD and two nodes in root=default.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e5713ea5d5)
This adds radosgw_zone ansible module for replacing the command module
usage with the radosgw-admin zone command.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 1281e8bcc8)
This adds radosgw_zonegroup ansible module for replacing the command
module usage with the radosgw-admin zonegroup command.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 65dbe0782e)
This adds radosgw_realm ansible module for replacing the command module
usage with the radosgw-admin realm command.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d171f4068d)
This adds radosgw_user ansible module for replacing the command module
usage with the radosgw-admin user command.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 235c7e27cc)
This adds the ceph_fs ansible module for replacing the command module
usage with the ceph fs command.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit bd611a785b)
Currently the `ceph_key` module doesn't support using a different
keyring than `client.admin`.
This commit adds the possibility to use a different keyring.
Usage:
```
ceph_key:
name: "client.rgw.myrgw-node.rgw123"
cluster: "ceph"
user: "client.bootstrap-rgw"
user_key: /var/lib/ceph/bootstrap-rgw/ceph.keyring
dest: "/var/lib/ceph/radosgw/ceph-rgw.myrgw-node.rgw123/keyring"
caps:
osd: 'allow rwx'
mon: 'allow rw'
import_key: False
owner: "ceph"
group: "ceph"
mode: "0400"
```
Where:
`user` corresponds to `-n (--name)`
`user_key` corresponds to `-k (--keyring)`
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 12e6260266)
When rgw and osd are collocated, the current workflow prevents from
scaling out the radosgw_num_instances parameter when rerunning the
playbook in baremetal deployments.
When ceph-osd notifies handlers, it means rgw handlers are triggered
too. The issue with this is that they are triggered before the role
ceph-rgw is run.
In the case a scaleout operation is expected on `radosgw_num_instances`
it causes an issue because keyrings haven't been created yet so the new
instances won't start.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1881313
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a802fa2810)
It's time to remove this backward compatibility. Users had enough time
to convert their openstack_keys and key values.
We now fail in ceph-validate if the caps key isn't set.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit c960362639)
This playbook isn't needed anymore, we can achieve this operation by
running main playbook with `--limit` option.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 20718582da)
This commit drops nested jinja construction in this set_fact task.
It also rename it to `container_exec_start_osd`
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ff95fa9c32)
tests/conftest.py and tests present in tests/functional/tests/ has been
missed from previous commit
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8596f1d52c)
This commit modifies all *.py files in ./tests/library/ so flake8
passes.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e49a5241f0)
(cherry picked from commit fb98f436848189e26480697b23f45b28f51a6ccd)
drop ricardochaves/python-lint action and use `run` steps instead.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f2d3432cad)
(cherry picked from commit 7378909c7b8d6a6285f14ea6c7c8987fae73939d)
this commit changes defaults value in default pool definitions.
there's no need to define `pg_num`, `pgp_num`, `size` and `min_size`,
`ceph_pool` module will use the current default if needed.
This also drops the 3 following `set_fact` in `ceph-facts`:
- osd_pool_default_pg_num,
- osd_pool_default_pgp_num,
- osd_pool_default_size_num
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c101cb3931)
This commit modifies how the `pg_autoscaler` feature is handled by the
ceph_pool module.
1/ If a pool has the pg_autoscaler feature enabled, we shouldn't try to
update pg/pgp.
2/ Make it more readable
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 740df379b7)
remove complexity about current defaults in running cluster
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 29fc115f4a)
This file is a leftover and should have been removed when we dropped the
validate module.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8603cba9ab)
This commit ensure all ceph-ansible modules pass flake8 properly.
Signed-off-by: Wong Hoi Sing Edison <hswong3i@gmail.com>
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 268a39ca0e)
In case of deploying new monitor node to an existing cluster,
osd_pool_default_crush_rule should be taken from running monitor because
ceph-osd role won't be run and the new monitor will have different
osd_pool_default_crush_role from other monitors.
Signed-off-by: Seena Fallah <seenafallah@gmail.com>
(cherry picked from commit ff9f4d138f)
This commit adds the `osd_auto_discovery` scenario support in the
filestore-to-bluestore playbook.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1881523
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 8b1eeef18a)
This commit adds connection checks before realm pulls
Curls are performed on the endpoint being pulled from
the mons and the rgws
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1731158
Signed-off-by: Ali Maredia <amaredia@redhat.com>
(cherry picked from commit 902575369c)
Just likve `devices`, this commit adds the support for linux device aliases for
`dedicated_devices` and `bluestore_wal_devices`.
Signed-off-by: Tyler Bishop <tbishop@liquidweb.com>
(cherry picked from commit ee4b8804ae)
In non containerized deployment we check if the service is running
via the socket file presence.
This is done via the xxx_socket_stat variable that check the file
socket in the /var/run/ceph/ directory.
In some scenarios, we could have the socket file still present in
that directory but not used by any process.
That's why we have the xxx_stat variable which clean those leftovers.
The problem here is that we're set the variable for the handlers status
(like handler_mon_status) based on xxx_socket_stat instead of xxx_stat.
That means we will trigger the handlers if there's an old socket file
present on the system without any process associated.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1866834
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 733596582d)
af9f6684 introduced a regression on the ceph iscsi pool creation
because it was delegated to the first monitor node before that change.
This patch restores the initial worflow.
When the iscsi node doesn't have the admin keyring then the pool
creation fails.
This commit also ensures that the pool creation is only executed once
when having multiple iscsi nodes.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 501b8e0fd3)