Commit Graph

546 Commits (8a907cb1caff23ee3698afc72dd6b8e12280b755)

Author SHA1 Message Date
Guillaume Abrioux 0d2af6ebf3 fix calls to `container_exec_cmd` in ceph-osd role
We must call `container_exec_cmd` from the right monitor node otherwise
the value of the fact might mistmatch between the delegated node and the
node being played.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1794900

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2f919f8971)
2020-01-27 17:54:39 -05:00
Dimitri Savineau 6a51330892 ceph-osd: set container objectstore env variables
Because we need to manage legacy ceph-disk based OSD with ceph-volume
then we need a way to know the osd_objectstore in the container.
This was done like this previously with ceph-disk so we should also
do it with ceph-volume.
Note that this won't have any impact for ceph-volume lvm based OSD.

Rename docker_env_args fact to container_env_args and move the container
condition on the include_tasks call.
Remove OSD_DMCRYPT env variable from the ceph-osd template because it's
now included in the container_env_args variable.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792122

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit c9e1fe3d92)
2020-01-20 15:36:11 -05:00
Guillaume Abrioux 1462423059 handler: fix call to container_exec_cmd in handler_osds
When unsetting the noup flag, we must call container_exec_cmd from the
delegated node (first mon member)
Also, adding a `run_once: true` because this task needs to be run only 1
time.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792320

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 22865cde9c)
2020-01-20 12:45:51 -05:00
Dimitri Savineau ff3a3ee5e9 container: move lvm2 package installation
Before this patch, the lvm2 package installation was done during the
ceph-osd role.
However we were running ceph-volume command in the ceph-config role
before ceph-osd. If lvm2 wasn't installed then the ceph-volume command
fails:

error checking path "/run/lock/lvm": stat /run/lock/lvm: no such file or
directory

This wasn't visible before because lvm2 was automatically installed as
docker dependency but it's not the same for podman on CentOS 8.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit de8f2a9f83)
2020-01-14 12:47:55 -05:00
Guillaume Abrioux a81830ddc0 osd: use _devices fact in lvm batch scenario
since fd1718f379, we must use `_devices`
when deploying with lvm batch scenario.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5558664f37)
2020-01-14 15:33:15 +01:00
Guillaume Abrioux ffdfa634ac osd: do not run openstack_config during upgrade
There is no need to run this part of the playbook when upgrading the
cluter.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit af6875706a)
2020-01-14 09:12:34 -05:00
Guillaume Abrioux 2d85fab02d osd: support scaling up using --limit
This commit lets add-osd.yml in place but mark the deprecation of the
playbook.
Scaling up OSDs is now possible using --limit

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3496a0efa2)
2020-01-14 09:12:34 -05:00
Guillaume Abrioux 9ed540da7e osd: ensure osd ids collected are well restarted
This commit refact the condition in the loop of that task so all
potential osd ids found are well started.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1790212

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 58e6bfed2d)
2020-01-13 14:31:03 -05:00
Dimitri Savineau f2e1941ef1 ceph-osd: wait for all osds once
cf8c6a3 moves the 'wait for all osds' task from openstack_config to the
main tasks list.
But the openstack_config code was executed only on the last OSD node.
We don't need to do this check on all OSD node so we need to add set
run_once to true on that task.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 5bd1cf40eb)
2020-01-10 11:07:25 -05:00
Dimitri Savineau 9fa5b296ca ceph-osd: wait for all osd before crush rules
When creating crush rules with device class parameter we need to be sure
that all OSDs are up and running because the device class list is
is populated with this information.
This is now enable for all scenario not openstack_config only.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit cf8c6a3849)
2020-01-10 11:07:25 -05:00
Dimitri Savineau bd016960cf ceph-osd: add device class to crush rules
This adds device class support to crush rules when using the class key
in the rule dict via the create-replicated sub command.
If the class key isn't specified then we use the create-simple sub
command for backward compatibility.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1636508

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ef2cb99f73)
2020-01-10 11:07:25 -05:00
Dimitri Savineau 661b2c013a move crush rule creation from mon to osd role
If we want to create crush rules with the create-replicated sub command
and device class then we need to have the OSD created before the crush
rules otherwise the device classes won't exist.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ed36a11eab)
2020-01-10 11:07:25 -05:00
Dimitri Savineau 3ebd71c8c2 ceph-osd: fix fs.aio-max-nr sysctl condition
[1] introduced a regression on the fs.aio-max-nr sysctl value condition.
The enable key isn't a boolean but a string because the expression isn't
evaluated.
This string output "(osd_objectstore == 'bluestore')" is always true
because item.enable condition only matches non empty string. So the
sysctl value was applyied for both filestore and bluestore backend.

[2] added the bool filter to the condition but the filter always returns
false on string and the sysctl wasn't applyed at all.

This commit fixes the enable key value by evaluating the value instead
of using the string.

[1] https://github.com/ceph/ceph-ansible/commit/08a2b58
[2] https://github.com/ceph/ceph-ansible/commit/ab54fe2

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ece46d33be)
2019-11-07 20:31:02 +01:00
Dimitri Savineau 27eb40714c ceph-osd: Remove ulimit nofile on container start
Even if this improves ceph-disk/ceph-volume performances then it also
impact the ceph-osd process.
The ceph-osd process shouldn't use 1024:4096 value for the max open
files.
Removing the ulimit option from the container engine and doing this kind
of change on the container side [1].

[1] https://github.com/ceph/ceph-container/pull/1497

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1702285

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 9a996aef7f)
2019-10-31 14:42:30 -04:00
Dimitri Savineau 6d5125f2a4 lint: fix error [303,602,701,702]
[303] mktemp used in place of tempfile module
 [602] Don't compare to empty string
 [701] No 'galaxy_info' found
 [702] Use 'galaxy_tags' rather than 'categories'

This patch also changes the ansible log_path value via the
ANSIBLE_LOG_PATH environment variable in the travis configuration to
avoid warnings.

[WARNING]: log file at /home/travis/ansible/ansible.log is not writeable
and we cannot create it, aborting

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f7fd0b6d4f)
2019-10-21 15:55:54 -04:00
Guillaume Abrioux 13ca0531d8 common: improve keyrings generation
There is no need to get n * number of nodes the different keyrings.
Adding a `run_once: true` here avoid running a ceph command too many
times which could be impacting large cluster deployment.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9bad239d77)
2019-10-02 14:34:27 +02:00
Guillaume Abrioux df5337535d container: isolate systemd tasks
This commit isolates the systemd unit files generation for containers into
separate yml files in order to be able importing each corresponding roles
without playing all tasks.
This is needed so we can run ceph-ansible to render systemd unit files
so they call podman instead of docker.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit bd64167469)
2019-10-01 18:50:51 +02:00
Guillaume Abrioux e1d06f498c global: remove fetch_directory dependency
This commit drops the fetch_directory dependency.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1622688

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ab370b6ad8)
2019-09-26 16:21:54 +02:00
Guillaume Abrioux 69ec26e045 osd: add wal_devices option support to ceph_volume module
This commit adds the `wal_devices` option support to the
ceph_volume module.
passing a devices list in `bluestore_wal_devices` will make ceph-volume
creating 1 vg using these devices to create block.wal partitions.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 09e04a9197)
2019-09-26 16:21:54 +02:00
Guillaume Abrioux a33791be25 osd: update doc text in defaults/main.yml
This commit removes ceph-disk references.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 70f1b37097)
2019-09-26 16:21:54 +02:00
Guillaume Abrioux d666e03b0c osd: add block_db_devices option support to ceph_volume module
This commit adds the `block_db_devices` option support to the
ceph_volume module.
passing a devices list in `dedicated_devices` will make ceph-volume
creating 1 vg using these devices to create block.db partitions for data
devices.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7b836eaa47)
2019-09-26 16:21:54 +02:00
Guillaume Abrioux 8f781198d6 lint: fix error [306], add pipefail on shell command using pipe
This commit fixes the error [306]:

`[306] Shells that use pipes should set the pipefail option`

using `/bin/bash` as executable because Debian/Ubuntu systems use `dash`
by default which doesn't have the `-o pipefail`. (See:
https://github.com/ansible/ansible-lint/issues/497#issue-424623501)

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 102edaeb61)
2019-08-28 11:22:47 -04:00
Artur Fijalkowski 27014df45e global: make directories mode parameterizable
This commit makes it possible to parametrize the ceph directories modes.
So it changes hardocded mode for ceph related directories from 0755 to
customizable with `ceph_directories_mode` variable.

Closes: #2920

Signed-off-by: Artur Fijalkowski <artur.fijalkowski@ing.com>
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 011270ca69)
2019-08-23 11:39:23 +00:00
Dimitri Savineau 500c59c648 ceph-osd: Add ulimit nofile on container start
On containerized deployment, the OSD entrypoint runs some ceph-volume
commands (lvm/simple scan and/or activate) which perform badly without
the ulimit option.
This option was added for all previous ceph-volume commands but not on
the ceph-osd container startup.
Also updating hard limit value to 4096 to reflect default baremetal
value.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 9a4ac46d19)
2019-08-22 22:50:17 +00:00
Guillaume Abrioux 19c7b650db osd: remove useless condition
just like `ceph_osd_pool_default_size`, a pool size might change after an
initial deployment. Having this condition prevents from customizing the
pool in that case.
This is not needed so let's remove it.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 70cf2a5846)
2019-08-20 09:13:15 +02:00
Guillaume Abrioux f08408bf5c osd: refact 'wait for all osd to be up' task
let's use `until` instead of doing test in bash using python oneliner
also, use `command` instead of `shell`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 687087fd43)
2019-08-19 18:47:14 +00:00
Guillaume Abrioux 2f77704591 common: use discovered_interpreter_python fact
in order to use the right binary name when using python cli in command
or shell module.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 13815ad3ca)
2019-08-19 18:47:14 +00:00
Dimitri Savineau 36e18e20d1 ceph-osd: check container engine rc for pools
When creating OpenStack pools, we only check if the return code from
the pool list command isn't 0 (ie: if it doesn't exist). In that case,
the return code will be 2. That's why the next condition is rc != 0 for
the pool creation.
But in containerized deployment, the return code could be different if
there's a failure on the container engine command (like container not
running). In that case, the return code could but either 1 (docker) or
125 (podman) so we should fail at this point and not in the next tasks.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1732157

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d549fffdd2)
2019-07-31 14:07:41 -04:00
Guillaume Abrioux 2295a4cf0a containers: improve logging
bindmount /var/log/ceph on all containers so it's possible to retrieve
logs from the host.

related ceph-container PR: ceph/ceph-container#1408

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1710548

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 33eed78d17)
2019-07-02 11:27:34 -04:00
Dimitri Savineau 109883e7a5 ceph-osd: Add CONTAINER_IMAGE env variable
This environment variable was added in cb381b4 but was removed in
4d35e9e.
This commit reintroduces the change.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 02fbe76e62)
2019-06-27 17:34:24 -04:00
Dimitri Savineau 62d98971f2 ceph-volume: Set max open files limit on container
The ceph-volume lvm list command takes ages to complete when having
a lot of LV devices on containerized deployment.
For instance, with 25 OSDs on a node it takes 3 mins 44s to list the
OSD.
Adding the max open files limit to the container engine cli when
executing the ceph-volume command seems to improve a lot thee
execution time ~30s.

This was impacting the OSDs creation with ceph-volume (both filestore
and bluestore) when using multiple LV devices.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1702285

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit b987534881)
2019-06-20 20:00:53 -04:00
Dimitri Savineau 590f6026bb roles: Remove useless become (true) flag
We already set the become flag to true at a play level in the site*
playbooks so we don't need to set it at a task level.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 7c3640177b)
2019-06-20 22:00:27 +00:00
Guillaume Abrioux c245c4e8eb osd: remove legacy task
`parted_results` isn't used anymore in the playbook.

By the way, `parted` seems to cause issue because it changes the
ownership on devices:

```
root@osd0 ~]# ls -l /dev/sdc*
brw-rw----. 1 root disk 8, 32 Jun 11 08:53 /dev/sdc
brw-rw----. 1 ceph ceph 8, 33 Jun 11 08:53 /dev/sdc1
brw-rw----. 1 ceph ceph 8, 34 Jun 11 08:53 /dev/sdc2

[root@osd0 ~]# parted -s /dev/sdc print
Model: ATA QEMU HARDDISK (scsi)
Disk /dev/sdc: 53.7GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number  Start   End     Size    File system  Name           Flags
 1      1049kB  1075MB  1074MB               ceph block.db
 2      1075MB  2149MB  1074MB               ceph block.db

[root@osd0 ~]# #We can see ownerships have changed from ceph:ceph to root:disk:
[root@osd0 ~]# ls -l /dev/sdc*
brw-rw----. 1 root disk 8, 32 Jun 11 08:57 /dev/sdc
brw-rw----. 1 root disk 8, 33 Jun 11 08:57 /dev/sdc1
brw-rw----. 1 root disk 8, 34 Jun 11 08:57 /dev/sdc2
[root@osd0 ~]#
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit eece362b38)
2019-06-19 08:41:25 +00:00
Dimitri Savineau e9edb5a92a podman: Add systemd dependency on network.target
When using podman, the systemd unit scripts don't have a dependency
on the network. So we're not sure that the network is up and running
when the containers are starting.
With docker this behaviour is already handled because the systemd
unit scripts depend on docker service which is started after the
network.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f49090df7e)
2019-06-07 16:06:26 +02:00
L3D 1daca1ba83 ansible: use 'bool' filter on boolean conditionals
By running ceph-ansible there are a lot ``[DEPRECATION WARNING]`` like these:
```
[DEPRECATION WARNING]: evaluating containerized_deployment as a bare variable,
this behaviour will go away and you might need to add |bool to the expression
in the future. Also see CONDITIONAL_BARE_VARS configuration toggle.. This
feature will be removed in version 2.12. Deprecation warnings can be disabled
by setting deprecation_warnings=False in ansible.cfg.
```

Now appended ``| bool`` on a lot of the affected variables.

Sometimes the coding style from ``variable|bool`` changed to ``variable | bool`` *(with spaces at the pipe)*.

Closes: #4022

Signed-off-by: L3D <l3d@c3woc.de>
(cherry picked from commit ab54fe20ec)
2019-06-07 16:05:51 +02:00
Guillaume Abrioux 61a52a97e3 ceph-osd: do not relabel /run/udev in containerized context
Otherwise content in /run/udev is mislabeled and prevent some services
like NetworkManager from starting.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 80875adba7)
2019-06-04 22:09:27 +00:00
Guillaume Abrioux e29fd842a6 rename docker_exec_cmd variable
This commit renames the `docker_exec_cmd` variable to
`container_exec_cmd` so it's more generic.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e74d80e72f)
2019-05-17 16:05:58 +02:00
Rishabh Dave 9e6b2e3bc5 don't access other node's docker_exec_cmd variable
Except for some corner case, it's not correct to access some other
node's copy of variable docker_exec_cmd. Therefore replace
"hostvars[groups[mon_group_name][0]]['docker_exec_cmd']" by
"docker_exec_cmd".

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 89748d579a)
2019-05-07 17:56:30 +02:00
Rishabh Dave 06b3ab2a6b improve coding style
Keywords requiring only one item shouldn't express it by creating a
list with single item.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 739a662c80)

Conflicts:
	roles/ceph-mon/tasks/ceph_keys.yml
	roles/ceph-validate/tasks/check_devices.yml
2019-05-06 15:09:06 +00:00
Dimitri Savineau 4752327340 ansible: remove private and static attribute
This will be removed in ansible 2.8 and breaks the playbook execution
with this release.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ae266c6f2b)
2019-05-02 20:21:26 -04:00
Dimitri Savineau d8688e0eb9 ceph-osd: Increase cpu limit to 4
In containerized deployment the default osd cpu quota is too low
for production environment using NVMe devices.
This is causing performance degradation compared to bare-metal.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1695880

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit c17106874c)
2019-04-30 12:11:42 -04:00
Rishabh Dave cad35d5c52 "when" keyword should precede "block" keyword
Otherwise the reader is forced to search for "when" when blocks are too
long.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit e0beaf123a)

Conflicts:
	roles/ceph-config/tasks/main.yml
	roles/ceph-container-common/tasks/pre_requisites/prerequisites.yml
	roles/ceph-validate/tasks/check_devices.yml
2019-04-24 16:25:43 +02:00
Andrew Schoen 1e0e50fc90 ceph-osd: do not run lvm batch tasks during update
When performing a rolling update do not try to create
any new osds with `ceph-volume lvm batch`. This is troublesome
because when upgrading to nautilus the devices list might contain
devices that are currently being used by ceph-disk and have GPT
headers on them, which will cause ceph-volume to fail when
trying to use such a device. Any devices originally created
by ceph-disk will need to be removed from the devices list
before any new osds can be created.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 5e3dfe5021)
2019-04-18 19:12:13 +02:00
Guillaume Abrioux 22d39591a4 osd: remove legacy file
this file is not used anymore, let's remove it.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f899da3172)
2019-04-12 00:45:21 +00:00
Guillaume Abrioux 692b1a8b9f osd: remove ceph-disk scenarios files
these files aren't needed anymore since we only use lvm scenario.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4f68462009)
2019-04-12 00:45:21 +00:00
Guillaume Abrioux 41e55a840f osd: remove dedicated_devices variable
This variable was related to ceph-disk scenarios.
Since we are entirely dropping ceph-disk support as of stable-4.0, let's
remove this variable.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f0416c8892)
2019-04-12 00:45:21 +00:00
Guillaume Abrioux 4a663e1fc0 osd: remove variable osd_scenario
As of stable-4.0, the only valid scenario is `lvm`.
Thus, this makes this variable useless.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4d35e9eeed)
2019-04-12 00:45:21 +00:00
Guillaume Abrioux 948a5e802e osd: remove legacy file
ceph_disk_cli_options_facts.yml is not used anymore, let's remove it.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4d5637fd8a)
2019-04-12 00:45:21 +00:00
Sébastien Han 343a99c8b7 osd: default osd_scenario to lvm
osd_scenario has become obsolete and defaults to lvm. With lvm there is
no such things has collocated and non-collocated.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 52df15895b)
2019-04-12 00:45:21 +00:00
Sébastien Han 11c6655f57 osd: remove ceph-disk support
We don't support the preparation of OSD with ceph-disk. ceph-volume is
only supported. However, the start operation of OSD is still supported.
So let's say you change a config option, the handlers will be able to
restart all the OSDs via their respective systemd unit files.

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e2a5aa062e)
2019-04-12 00:45:21 +00:00