[1] introduced a regression on the fs.aio-max-nr sysctl value condition.
The enable key isn't a boolean but a string because the expression isn't
evaluated.
This string output "(osd_objectstore == 'bluestore')" is always true
because item.enable condition only matches non empty string. So the
sysctl value was applyied for both filestore and bluestore backend.
[2] added the bool filter to the condition but the filter always returns
false on string and the sysctl wasn't applyed at all.
This commit fixes the enable key value by evaluating the value instead
of using the string.
[1] https://github.com/ceph/ceph-ansible/commit/08a2b58
[2] https://github.com/ceph/ceph-ansible/commit/ab54fe2
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ece46d33be)
Even if this improves ceph-disk/ceph-volume performances then it also
impact the ceph-osd process.
The ceph-osd process shouldn't use 1024:4096 value for the max open
files.
Removing the ulimit option from the container engine and doing this kind
of change on the container side [1].
[1] https://github.com/ceph/ceph-container/pull/1497
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1702285
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 9a996aef7f)
let's use `until` instead of doing test in bash using python oneliner
also, use `command` instead of `shell`.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c76cd5ad84)
Since we change the way to run the OSD containers with the ID instead
of the device name, we lost the ability to use loop devices.
Loop devices are like nvme or cciss devices because the partitions are
referenced with an extra 'p' before the partition number.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1749097
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
On containerized deployment, the OSD entrypoint runs some ceph-volume
commands (lvm/simple scan and/or activate) which perform badly without
the ulimit option.
This option was added for all previous ceph-volume commands but not on
the ceph-osd container startup.
Also updating hard limit value to 4096 to reflect default baremetal
value.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1744390
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 9a4ac46d19)
Otherwise it will fail when running rolling_update.yml playbook because
of `serial: 1` usage.
The task which copies the script is run against the current node being
played only whereas the task which runs the script is run against all
nodes in a loop, it ends up with the typical error:
```
2019-08-08 17:47:05,115 p=14905 u=ubuntu | failed: [magna023 -> magna030] (item=magna030) => {
"changed": true,
"cmd": [
"/usr/bin/env",
"bash",
"/tmp/systemd-device-to-id.sh"
],
"delta": "0:00:00.004339",
"end": "2019-08-08 17:46:59.059670",
"invocation": {
"module_args": {
"_raw_params": "/usr/bin/env bash /tmp/systemd-device-to-id.sh",
"_uses_shell": false,
"argv": null,
"chdir": null,
"creates": null,
"executable": null,
"removes": null,
"stdin": null,
"warn": true
}
},
"item": "magna030",
"msg": "non-zero return code",
"rc": 127,
"start": "2019-08-08 17:46:59.055331",
"stderr": "bash: /tmp/systemd-device-to-id.sh: No such file or directory",
"stderr_lines": [
"bash: /tmp/systemd-device-to-id.sh: No such file or directory"
],
"stdout": "",
"stdout_lines": []
}
```
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1739209
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
When creating OpenStack pools, we only check if the return code from
the pool list command isn't 0 (ie: if it doesn't exist). In that case,
the return code will be 2. That's why the next condition is rc != 0 for
the pool creation.
But in containerized deployment, the return code could be different if
there's a failure on the container engine command (like container not
running). In that case, the return code could but either 1 (docker) or
125 (podman) so we should fail at this point and not in the next tasks.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1732157
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d549fffdd2)
When using containerized deployment we have to create the systemd
service unit based on a template.
The current implementation with ceph-disk is using the device name
as paramater to the systemd service and for the container name too.
$ systemctl start ceph-osd@sdb
$ docker ps --filter 'name=ceph-osd-*'
CONTAINER ID IMAGE NAMES
065530d0a27f ceph/daemon:latest-luminous ceph-osd-strg0-sdb
This is the only scenario (compared to non containerized or
ceph-volume based deployment) that isn't using the OSD id.
$ systemctl start ceph-osd@0
$ docker ps --filter 'name=ceph-osd-*'
CONTAINER ID IMAGE NAMES
d34552ec157e ceph/daemon:latest-luminous ceph-osd-0
Also if the device mapping doesn't persist to system reboot (ie sdb
might be remapped to sde) then the OSD service won't come back after
the reboot.
This patch allows to use the OSD id with the ceph-osd systemd service
but requires to activate the OSD manually with ceph-disk first in
order to affect the ID to that OSD.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1670734
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Same behaviour than ceph-volume (b987534). The ceph-disk command runs
faster when using ulimit nofile with container cli.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
The ceph-volume lvm list command takes ages to complete when having
a lot of LV devices on containerized deployment.
For instance, with 25 OSDs on a node it takes 3 mins 44s to list the
OSD.
Adding the max open files limit to the container engine cli when
executing the ceph-volume command seems to improve a lot thee
execution time ~30s.
This was impacting the OSDs creation with ceph-volume (both filestore
and bluestore) when using multiple LV devices.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1702285
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit b987534881)
Otherwise content in /run/udev is mislabeled and prevent some services
like NetworkManager from starting.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 80875adba7)
The RHCS documentation mentionned in the default values and
group_vars directory are referring to RHCS 2.x while it should be
3.x.
Revolves: https://bugzilla.redhat.com/show_bug.cgi?id=1702732
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
We only need to set the wal dedicated device when there's three tiers
of storage used.
Currently the block.wal partition will also be created on the same
device than block.db.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1685253
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
The PR #3916 was merged automatically by mergify even if there was a
confict in the ceph-osd-run.sh.j2 template.
This commit resolves the conflict.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
In containerized deployment the default osd cpu quota is too low
for production environment using NVMe devices.
This is causing performance degradation compared to bare-metal.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1695880
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit c17106874c)
# Conflicts:
# roles/ceph-osd/templates/ceph-osd-run.sh.j2
When using osd_scenario lvm, we never check if the lvm2 package is
present on the host.
When using containerized deployment and docker on CentOS/RedHat this
package will be automatically installed as a dependency but not for
Ubuntu distribution.
OSD deployed via ceph-volume require the lvmetad.socket to be active
and running.
Resolves: #3728
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 179fdfbc19)
Since all files in container image have moved to `/opt/ceph-container`
this check must look for new AND the old path so it's backward
compatible. Otherwise it could end up by templating an inconsistent
`ceph-osd-run.sh`.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 987bdac963)
With 3e32dce we can run OSD containers with numactl support.
When using numactl command in a containerized deployment we need to
be sure that the corresponding package is installed on the host.
The package installation is only executed when the
ceph_osd_numactl_opts variable isn't empty.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit b7f4e3e7c7)
introduce two new variables to make the check that 'wait for all osd to
be up' configurable.
It's possible that for some deployments, OSDs can take longer to be seen
as UP and IN.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1676763
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 21e5db8982)
The existing task checks that the number of OSDs is equal to the number of up OSDs before continuing.
The problem is that if none of the OSDs have been discovered yet, the task will exit immediately and subsequent pool creation will fail (num_osds = 0, num_up_osds = 0).
This is related to Bugzilla 1578086.
In this change, we also check that at least one OSD is present. In our testing, this results in the task correctly waiting for all OSDs to come up before continuing.
Signed-off-by: David Waiting <david_waiting@comcast.com>
(cherry picked from commit 3930791cb7)
In order to be able to retrieve udev information, we must expose its
socket. As per, https://github.com/ceph/ceph/pull/25201 ceph-volume will
start consuming udev output.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 997667a873)
without this, the command `ceph-volume lvm list --format json` hangs and
takes a very long time to complete.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7ade032807)
Applying and passing the OSD_BLUESTORE/FILESTORE on the fly is wrong for
existing clusters as their config will be changed.
Typically, if an OSD was prepared with ceph-disk on filestore and we
change the default objectstore to bluestore, the activation will fail.
The flag osd_objectstore should only be used for the preparation, not
activation. The activate in this case detects the osd objecstore which
prevents failures like the one described above.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 4c51130198)
If an existing cluster runs this config, and has ceph-disk OSD, the
`expose_partitions` won't be expected by jinja since it's inside the
'old' if. We need it as part of the osd_scenario != 'lvm' condition.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1640273
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit bef522627e)
The code is now able (again) to start osds that where configured with
ceph-disk on a non-container scenario.
Closes: https://github.com/ceph/ceph-ansible/issues/3388
Signed-off-by: Sébastien Han <seb@redhat.com>
Add real default value for osd pool size customization.
Ceph itself has an `osd_pool_default_size` default value to `3`.
If users don't specify a pool size in various pools definition within
ceph-ansible, we should default to `3`.
By the way, this kind of condition isn't really clear:
```
when:
- rbd_pool_size | default ("")
```
we should try to get the customized value then default to what is in
`osd_pool_default_size` (which has its default value pointing to
`ceph_osd_pool_default_size` (`3`) as well) and compare it to
`ceph_osd_pool_default_size`.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7774069d45)
`osd_pool_default_pg_num` parameter is set in `ceph-mon`.
When using ceph-ansible with `--limit` on a specifc group of nodes, it
will fail when trying to access this variables since it wouldn't be
defined.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1518696
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d4c0960f04)
since `ceph-volume` introduction, there is no need to split those tasks.
Let's refact this part of the code so it's clearer.
By the way, this was breaking rolling_update.yml when `openstack_config:
true` playbook because nothing ensured OSDs were started in ceph-osd role (In
`openstack_config.yml` there is a check ensuring all OSD are UP which was
obviously failing) and resulted with OSDs on the last OSD node not started
anyway.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f7fcc012e9)
Since we do not have enough data to put valid upper bounds for the memory
usage of these daemons, do not put artificial limits by default. This will
help us avoid failures like OOM kills due to low default values.
Whenever required, these limits can be manually enforced by the user.
More details in
https://bugzilla.redhat.com/show_bug.cgi?id=1638148
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1638148
Signed-off-by: Neha Ojha <nojha@redhat.com>
The playbook has various improvements:
* run ceph-validate role before doing anything
* run ceph-fetch-keys only on the first monitor of the inventory list
* set noup flag so PGs get distributed once all the new OSDs have been
added to the cluster and unset it when they are up and running
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1624962
Signed-off-by: Sébastien Han <seb@redhat.com>
As of now, we should no longer support Jewel in ceph-ansible.
The latest ceph-ansible release supporting Jewel is `stable-3.1`.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This commit does a couple of things:
* Avoid code duplication
* Clarify the code
* add more unit tests
* add myself to the author of the module
Signed-off-by: Sébastien Han <seb@redhat.com>
This task was created for ceph-disk based deployments so it's not needed
when osd are prepared with ceph-volume.
Signed-off-by: Sébastien Han <seb@redhat.com>
We don't need to pass the device and discover the OSD ID. We have a
task that gathers all the OSD ID present on that machine, so we simply
re-use them and activate them. This also handles the situation when you
have multiple OSDs running on the same device.
Signed-off-by: Sébastien Han <seb@redhat.com>
We don't need to pass the hostname on the container name but we can keep
it simple and just call it ceph-osd-$id.
Signed-off-by: Sébastien Han <seb@redhat.com>
expose_partitions is only needed on ceph-disk OSDs so we don't need to
activate this code when running lvm prepared OSDs.
Signed-off-by: Sébastien Han <seb@redhat.com>
The batch option got recently added, while rebasing this patch it was
necessary to implement it. So now, the batch option can work on
containerized environments.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1630977
Signed-off-by: Sébastien Han <seb@redhat.com>