Commit Graph

473 Commits (6485e1a69ed90b446c2cf2d9fdbf703cd8105d6d)

Author SHA1 Message Date
Guillaume Abrioux 26b396cb6c common: support setting pg autoscaler to off
The current implementation doesn't allow to disable the pg autoscaler
on created pools. This allows only 'on' or 'warn'.

With this commit, this is now possible to disable it.

Valid values would be ['on', 'yes', 'true', 'off', 'no', 'false']

```
openstack_glance_pool:
  name: "images"
  pg_num: 128
  pgp_num: 128
  rule_name: "replicated_rule"
  type: 1
  application: "rbd"
  size: 3
  pg_autoscale_mode: off
```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2062621

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9d1ff8f236)
2022-05-09 13:44:31 +02:00
Guillaume Abrioux 20583e83dd containers: introduce target systemd unit
This adds ceph-*.target systemd unit files support for containerized
deployments.
This also fixes a regression introduced by PR #6719 (rgw and nfs systemd
units not getting purged)

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1962748

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 09ef465f62)
2021-08-18 13:43:01 -04:00
Guillaume Abrioux c55c87d3c5 common: do not log keyring secret
let's not display any keyring secret by default in ansible log.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1980744

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7511195738)
2021-08-11 17:01:22 -04:00
Neelaksh Singh 5213612eaf Sensitive key data now hidden in output log
Fixes: #6529

Signed-off-by: Neelaksh Singh <neelaksh48@gmail.com>
(cherry picked from commit d18a9860cd)
2021-07-12 09:43:12 +02:00
Guillaume Abrioux 871ad6d71a osd: always allow setting target_size_ratio
We shouldn't prevent from setting target_size_ratio when the autoscaler
is set to 'warn'.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1906305

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-04-15 14:47:38 +02:00
Alex Schultz 7ddbe74712 Use ansible_facts
It has come to our attention that using ansible_* vars that are
populated with INJECT_FACTS_AS_VARS=True is not very performant.  In
order to be able to support setting that to off, we need to update the
references to use ansible_facts[<thing>] instead of ansible_<thing>.

Related: ansible#73654
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1935406
Signed-off-by: Alex Schultz <aschultz@redhat.com>
(cherry picked from commit a7f2fa73e6)
2021-03-26 00:16:58 +01:00
Guillaume Abrioux 607ef5a7d2 common: do not use pipefail when not needed
Let's discard the ansible lint error 306 and add a "# noqa 306" on tasks
where we don't need `set -o pipefail`

Fixes: #6090

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 86a8889ee3)
2020-12-16 14:05:45 +01:00
Guillaume Abrioux 6855feb604 ceph-osd: refact `docker_exec_start_osd`
This commit drops nested jinja construction in this set_fact task.
It also rename it to `container_exec_start_osd`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ff95fa9c32)
2020-12-16 14:05:45 +01:00
Dimitri Savineau 3f16132e44 library: add ceph_osd_flag module
This adds ceph_osd_flag ansible module for replacing the command module
usage with the ceph osd set/unset commands.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 5da593604a)
2020-12-15 17:36:28 +01:00
Guillaume Abrioux 69b5b96f2d osd: add tag on 'wait for all osd to be up' task
This allows skipping this task if really desired.
Use it carefully. Use it at your own risk.

Fixes: #6073

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5c4ae5356d)
2020-12-15 17:32:09 +01:00
Seena Fallah 2485b35825 ceph-osd: use global crush_device_class in lvm_volumes
Use global crush_device_class variable if it's not set per OSD

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
(cherry picked from commit 5e9444fa5c)
2020-12-15 17:30:55 +01:00
Dimitri Savineau f917bb015c ceph_key: set state as optional
Most ansible module using a state parameter default to the present
value (when available) instead of using it as a mandatory option.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit abb4023d76)
2020-12-01 09:53:26 -05:00
Guillaume Abrioux fe699897ed common: add a default value for ceph_directories_mode
Since this variable makes it possible to customize the mode for ceph
directories, let's make it a bit more explicit by adding a default value
in ceph-defaults.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 483adb5d79)
2020-11-19 21:14:02 -05:00
Guillaume Abrioux 0efc347a67 osd: ensure /var/lib/ceph/osd/{cluster}-{id} is present
This commit ensures that the `/var/lib/ceph/osd/{{ cluster }}-{{ osd_id }}` is
present before starting OSDs.

This is needed specificly when redeploying an OSD in case of OS upgrade
failure.
Since ceph data are still present on its devices then the node can be
redeployed, however those directories aren't present since they are
initially created by ceph-volume. We could recreate them manually but
for better user experience we can ask ceph-ansible to recreate them.

NOTE:
this only works for OSDs that were deployed with ceph-volume.
ceph-disk deployed OSDs would have to get those directories recreated
manually.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1898486

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 873fc8ec0f)
2020-11-19 21:14:02 -05:00
wangxiaotong 8bc0806f10 osds: use ceph osd stat instead of ceph status
Improve the checked way of the OSD created checking process.
This replaces the ceph status command by the ceph osd stat command.
The osdmap structure isn't needed anymore.

$ ceph status -f json | wc -c
2001
$ ceph osd stat -f json | wc -c
132
$ time ceph status -f json > /dev/null

real    0m0.563s
user    0m0.526s
sys     0m0.036s
$ time ceph osd stat -f json > /dev/null

real	0m0.457s
user	0m0.411s
sys	0m0.045s

Signed-off-by: wangxiaotong <wangxiaotong@fiberhome.com>
(cherry picked from commit b9cb0f12e9)
2020-11-03 14:38:49 -05:00
Guillaume Abrioux 61001660bb common: follow up on #5948
In addition to f7e2b2c608

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 371d854a5c)
2020-11-03 09:52:42 +01:00
Gaudenz Steinlin a1ff05b26e openstack: use ceph_keyring_permissions by default
Otherwise this task fails if no permission is set on the item.
Previously the code omited the mode parameter if it was not set, but
this was lost with commit ab370b6ad8.

Signed-off-by: Gaudenz Steinlin <gaudenz.steinlin@cloudscale.ch>
(cherry picked from commit 79ff79c422)
2020-11-02 18:42:18 -05:00
Benoît Knecht d165dc676d ceph-osd: Fix check mode for start osds tasks
Correctly set `osd_ids_non_container.stdout_lines` to an empty list if it's
undefined (i.e. in check mode).

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit 8b0023cb77)
2020-10-21 13:19:49 +02:00
Guillaume Abrioux 6a521155d5 ceph-osd: start osd after systemd overrides
The service should be started after the ceph-osd systemd overrides has
been added, otherwise, the latter isn't considered.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1860739

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 59d0f01992)
2020-10-15 09:34:58 -04:00
Dimitri Savineau 7c337d96d2 ceph-osd: don't start the OSD services twice
Using the + operation on two lists doesn't filter out the duplicate
keys.
Currently each OSDs is started (via systemd) twice.
Instead we could use the union filter.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 4eaa65c362)
2020-10-14 10:00:21 -04:00
Benoît Knecht 69a6053114 Fix Ansible check mode for site.yml.sample playbook
Make sure the `site.yml.sample` playbook can be run in check mode by skipping
tasks that try to read the output of commands that have been skipped.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit 54ba38e35e)
2020-10-07 07:06:54 +02:00
Dimitri Savineau 69b09f9336 Allow updating crush rule on existing pool
The crush rule value was only set once during the pool creation. It was
not possible to update the crush rule value by updating the value in the
configuration.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1847166

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-09-10 20:35:44 -04:00
Guillaume Abrioux 88c9f6d969 common: don't enable debug log on ceph-volume calls by default
ceph-volume can generate large logs at some point.

debug logs by definition should be enabled only when debugging.

Let's make it customizable with a variable which is set to `False` by
default.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 448cc280b7)
2020-08-13 14:21:44 +02:00
Dimitri Savineau a99c94ea11 ceph-osd: remove ceph-osd-run.sh script
Since we only have one scenario since nautilus then we can just move
the container start command from ceph-osd-run.sh to the systemd unit
service.
As a result, the ceph-osd-run.sh.j2 template and the
ceph_osd_docker_run_script_path variable are removed.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 829990e60d)
2020-06-23 17:35:01 +02:00
Guillaume Abrioux c847c2f117 switch_to_containers: don't set noup flag
We shouldn't set this flag when running switch_to_containers playbook.
Otherwise the playbook fails waiting for pgs to be clean.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1843569

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b91d60d384)
2020-06-17 09:24:19 -04:00
Guillaume Abrioux d790375905 common: fix target_size_ratio task enablement
The condition on this task is wrong, we have to check whether
`target_size_ratio` is set in the pool definition instead.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8c7a48832c)
2020-06-03 13:22:57 -04:00
Guillaume Abrioux 86d5979269 osd: add a default value for 'default' in crush_rules
Let's default to `False` for the `default` attribute in `crush_rules`
variable.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1797774

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1b0b7af119)
2020-06-03 13:20:40 -04:00
Guillaume Abrioux 7acd9686ab osd: use default crush rule name when needed
When `rule_name` isn't set in `crush_rules` the osd pool creation will
fail.
This commit adds a new fact `ceph_osd_pool_default_crush_rule_name` with
the default crush rule name.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1817586

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1bb9860dfd)
2020-03-31 19:42:40 -04:00
Guillaume Abrioux 87e1f0cc6c osd: support changing default rule even when osd_crush_location isn't defined
Creating crush rules even with no crush hierarchy configuration is a
valid scenario so we shouldn't be bound to the first task result (which
configure crush hierarchy) to be able to add new crush rules.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1816989

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5b0476385c)
2020-03-31 23:14:55 +02:00
Guillaume Abrioux 1a978ae545 osd: do not change pool size on erasure pool
This commit adds condition in order to not try to customize pools size
when its type is erasure.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e17c79b871)
2020-03-06 16:10:03 +01:00
Guillaume Abrioux 98783a17b3 osd: add pg autoscaler support
This commit adds the pg autoscaler support.

The structure for pool definition has now two additional attributes
`pg_autoscale_mode` and `target_size_ratio`, eg:

```
test:
  name: "test"
  pg_num: "{{ osd_pool_default_pg_num }}"
  pgp_num: "{{ osd_pool_default_pg_num }}"
  rule_name: "replicated_rule"
  application: "rbd"
  type: 1
  erasure_profile: ""
  expected_num_objects: ""
  size: "{{ osd_pool_default_size }}"
  min_size: "{{ osd_pool_default_min_size }}"
  pg_autoscale_mode: False
  target_size_ratio": 0.1
```

when `pg_autoscale_mode` is `True` user has to set a decent value in
`target_size_ratio`.

Given that it's a new feature, it's still disabled by default.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1782253

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 47adc2bb08)
2020-03-06 16:10:03 +01:00
Guillaume Abrioux ae06d684b8 osd: refact osd pool creation
Currently, the command executed is wrong, eg:

```
  cmd:
  - podman
  - exec
  - ceph-mon-controller-0
  - ceph
  - --cluster
  - ceph
  - osd
  - pool
  - create
  - volumes
  - '32'
  - '32'
  - replicated_rule
  - '1'
  delta: '0:00:01.625525'
  end: '2020-02-27 16:41:05.232705'
  item:
```

From documentation, the osd pool creation command is :

```
ceph osd pool create {pool-name} {pg-num} [{pgp-num}] [replicated] \
     [crush-rule-name] [expected-num-objects]
ceph osd pool create {pool-name} {pg-num}  {pgp-num}   erasure \
     [erasure-code-profile] [crush-rule-name] [expected_num_objects]
```

it means we pass '1' (from item.type) as value for
`expected_num_objects` by default which is very likely not what we want.

Also, this commit modifies the default value when no `rule_name` is set
to use the existing variable `osd_pool_default_crush_rule`

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1808495

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit bf1f125d71)
2020-03-06 16:10:03 +01:00
Dimitri Savineau 5f7778d59a ceph-{mon,osd}: move default crush variables
Since ed36a11 we move the crush rules creation code from the ceph-mon to
the ceph-osd role.
To keep the backward compatibility we kept the possibility to set the
crush variables on the mons side but we didn't move the default values.
As a result, when using crush_rule_config set to true and wanted to use
the default values for crush_rules then the crush rule ansible task
creation will fail.

"msg": "'ansible.vars.hostvars.HostVarsVars object' has no attribute
'crush_rules'"

This patch move the default crush variables from ceph-mon to ceph-osd
role but also use those default values when nothing is defined on the
mons side.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1798864

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 1fc6b33714)
2020-02-17 10:18:56 -05:00
Guillaume Abrioux 0d2af6ebf3 fix calls to `container_exec_cmd` in ceph-osd role
We must call `container_exec_cmd` from the right monitor node otherwise
the value of the fact might mistmatch between the delegated node and the
node being played.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1794900

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2f919f8971)
2020-01-27 17:54:39 -05:00
Dimitri Savineau 6a51330892 ceph-osd: set container objectstore env variables
Because we need to manage legacy ceph-disk based OSD with ceph-volume
then we need a way to know the osd_objectstore in the container.
This was done like this previously with ceph-disk so we should also
do it with ceph-volume.
Note that this won't have any impact for ceph-volume lvm based OSD.

Rename docker_env_args fact to container_env_args and move the container
condition on the include_tasks call.
Remove OSD_DMCRYPT env variable from the ceph-osd template because it's
now included in the container_env_args variable.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792122

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit c9e1fe3d92)
2020-01-20 15:36:11 -05:00
Guillaume Abrioux 1462423059 handler: fix call to container_exec_cmd in handler_osds
When unsetting the noup flag, we must call container_exec_cmd from the
delegated node (first mon member)
Also, adding a `run_once: true` because this task needs to be run only 1
time.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792320

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 22865cde9c)
2020-01-20 12:45:51 -05:00
Dimitri Savineau ff3a3ee5e9 container: move lvm2 package installation
Before this patch, the lvm2 package installation was done during the
ceph-osd role.
However we were running ceph-volume command in the ceph-config role
before ceph-osd. If lvm2 wasn't installed then the ceph-volume command
fails:

error checking path "/run/lock/lvm": stat /run/lock/lvm: no such file or
directory

This wasn't visible before because lvm2 was automatically installed as
docker dependency but it's not the same for podman on CentOS 8.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit de8f2a9f83)
2020-01-14 12:47:55 -05:00
Guillaume Abrioux a81830ddc0 osd: use _devices fact in lvm batch scenario
since fd1718f379, we must use `_devices`
when deploying with lvm batch scenario.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5558664f37)
2020-01-14 15:33:15 +01:00
Guillaume Abrioux ffdfa634ac osd: do not run openstack_config during upgrade
There is no need to run this part of the playbook when upgrading the
cluter.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit af6875706a)
2020-01-14 09:12:34 -05:00
Guillaume Abrioux 2d85fab02d osd: support scaling up using --limit
This commit lets add-osd.yml in place but mark the deprecation of the
playbook.
Scaling up OSDs is now possible using --limit

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3496a0efa2)
2020-01-14 09:12:34 -05:00
Guillaume Abrioux 9ed540da7e osd: ensure osd ids collected are well restarted
This commit refact the condition in the loop of that task so all
potential osd ids found are well started.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1790212

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 58e6bfed2d)
2020-01-13 14:31:03 -05:00
Dimitri Savineau f2e1941ef1 ceph-osd: wait for all osds once
cf8c6a3 moves the 'wait for all osds' task from openstack_config to the
main tasks list.
But the openstack_config code was executed only on the last OSD node.
We don't need to do this check on all OSD node so we need to add set
run_once to true on that task.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 5bd1cf40eb)
2020-01-10 11:07:25 -05:00
Dimitri Savineau 9fa5b296ca ceph-osd: wait for all osd before crush rules
When creating crush rules with device class parameter we need to be sure
that all OSDs are up and running because the device class list is
is populated with this information.
This is now enable for all scenario not openstack_config only.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit cf8c6a3849)
2020-01-10 11:07:25 -05:00
Dimitri Savineau bd016960cf ceph-osd: add device class to crush rules
This adds device class support to crush rules when using the class key
in the rule dict via the create-replicated sub command.
If the class key isn't specified then we use the create-simple sub
command for backward compatibility.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1636508

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ef2cb99f73)
2020-01-10 11:07:25 -05:00
Dimitri Savineau 661b2c013a move crush rule creation from mon to osd role
If we want to create crush rules with the create-replicated sub command
and device class then we need to have the OSD created before the crush
rules otherwise the device classes won't exist.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ed36a11eab)
2020-01-10 11:07:25 -05:00
Dimitri Savineau 3ebd71c8c2 ceph-osd: fix fs.aio-max-nr sysctl condition
[1] introduced a regression on the fs.aio-max-nr sysctl value condition.
The enable key isn't a boolean but a string because the expression isn't
evaluated.
This string output "(osd_objectstore == 'bluestore')" is always true
because item.enable condition only matches non empty string. So the
sysctl value was applyied for both filestore and bluestore backend.

[2] added the bool filter to the condition but the filter always returns
false on string and the sysctl wasn't applyed at all.

This commit fixes the enable key value by evaluating the value instead
of using the string.

[1] https://github.com/ceph/ceph-ansible/commit/08a2b58
[2] https://github.com/ceph/ceph-ansible/commit/ab54fe2

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ece46d33be)
2019-11-07 20:31:02 +01:00
Dimitri Savineau 6d5125f2a4 lint: fix error [303,602,701,702]
[303] mktemp used in place of tempfile module
 [602] Don't compare to empty string
 [701] No 'galaxy_info' found
 [702] Use 'galaxy_tags' rather than 'categories'

This patch also changes the ansible log_path value via the
ANSIBLE_LOG_PATH environment variable in the travis configuration to
avoid warnings.

[WARNING]: log file at /home/travis/ansible/ansible.log is not writeable
and we cannot create it, aborting

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f7fd0b6d4f)
2019-10-21 15:55:54 -04:00
Guillaume Abrioux 13ca0531d8 common: improve keyrings generation
There is no need to get n * number of nodes the different keyrings.
Adding a `run_once: true` here avoid running a ceph command too many
times which could be impacting large cluster deployment.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9bad239d77)
2019-10-02 14:34:27 +02:00
Guillaume Abrioux df5337535d container: isolate systemd tasks
This commit isolates the systemd unit files generation for containers into
separate yml files in order to be able importing each corresponding roles
without playing all tasks.
This is needed so we can run ceph-ansible to render systemd unit files
so they call podman instead of docker.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit bd64167469)
2019-10-01 18:50:51 +02:00
Guillaume Abrioux e1d06f498c global: remove fetch_directory dependency
This commit drops the fetch_directory dependency.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1622688

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ab370b6ad8)
2019-09-26 16:21:54 +02:00