ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Dimitri Savineau	de8f2a9f83	container: move lvm2 package installation Before this patch, the lvm2 package installation was done during the ceph-osd role. However we were running ceph-volume command in the ceph-config role before ceph-osd. If lvm2 wasn't installed then the ceph-volume command fails: error checking path "/run/lock/lvm": stat /run/lock/lvm: no such file or directory This wasn't visible before because lvm2 was automatically installed as docker dependency but it's not the same for podman on CentOS 8. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-08 11:13:46 +01:00
Dimitri Savineau	5bd1cf40eb	ceph-osd: wait for all osds once `cf8c6a3` moves the 'wait for all osds' task from openstack_config to the main tasks list. But the openstack_config code was executed only on the last OSD node. We don't need to do this check on all OSD node so we need to add set run_once to true on that task. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-11-27 13:05:42 -05:00
Dimitri Savineau	cf8c6a3849	ceph-osd: wait for all osd before crush rules When creating crush rules with device class parameter we need to be sure that all OSDs are up and running because the device class list is is populated with this information. This is now enable for all scenario not openstack_config only. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-11-27 07:43:07 +01:00
Dimitri Savineau	ef2cb99f73	ceph-osd: add device class to crush rules This adds device class support to crush rules when using the class key in the rule dict via the create-replicated sub command. If the class key isn't specified then we use the create-simple sub command for backward compatibility. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1636508 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-11-14 16:25:46 +01:00
Dimitri Savineau	ed36a11eab	move crush rule creation from mon to osd role If we want to create crush rules with the create-replicated sub command and device class then we need to have the OSD created before the crush rules otherwise the device classes won't exist. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-11-14 16:25:46 +01:00
Dimitri Savineau	ece46d33be	ceph-osd: fix fs.aio-max-nr sysctl condition [1] introduced a regression on the fs.aio-max-nr sysctl value condition. The enable key isn't a boolean but a string because the expression isn't evaluated. This string output "(osd_objectstore == 'bluestore')" is always true because item.enable condition only matches non empty string. So the sysctl value was applyied for both filestore and bluestore backend. [2] added the bool filter to the condition but the filter always returns false on string and the sysctl wasn't applyed at all. This commit fixes the enable key value by evaluating the value instead of using the string. [1] https://github.com/ceph/ceph-ansible/commit/08a2b58 [2] https://github.com/ceph/ceph-ansible/commit/ab54fe2 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-11-07 13:51:48 +01:00
Dimitri Savineau	9a996aef7f	ceph-osd: Remove ulimit nofile on container start Even if this improves ceph-disk/ceph-volume performances then it also impact the ceph-osd process. The ceph-osd process shouldn't use 1024:4096 value for the max open files. Removing the ulimit option from the container engine and doing this kind of change on the container side [1]. [1] https://github.com/ceph/ceph-container/pull/1497 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1702285 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-10-31 10:42:09 -04:00
Dimitri Savineau	f7fd0b6d4f	lint: fix error [303,602,701,702] [303] mktemp used in place of tempfile module [602] Don't compare to empty string [701] No 'galaxy_info' found [702] Use 'galaxy_tags' rather than 'categories' This patch also changes the ansible log_path value via the ANSIBLE_LOG_PATH environment variable in the travis configuration to avoid warnings. [WARNING]: log file at /home/travis/ansible/ansible.log is not writeable and we cannot create it, aborting Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-10-15 10:07:52 +02:00
Guillaume Abrioux	9bad239d77	common: improve keyrings generation There is no need to get n * number of nodes the different keyrings. Adding a `run_once: true` here avoid running a ceph command too many times which could be impacting large cluster deployment. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-10-02 13:09:50 +02:00
Guillaume Abrioux	bd64167469	container: isolate systemd tasks This commit isolates the systemd unit files generation for containers into separate yml files in order to be able importing each corresponding roles without playing all tasks. This is needed so we can run ceph-ansible to render systemd unit files so they call podman instead of docker. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-10-01 10:27:51 -04:00
Guillaume Abrioux	ab370b6ad8	global: remove fetch_directory dependency This commit drops the fetch_directory dependency. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1622688 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-09-26 11:35:24 +02:00
Guillaume Abrioux	09e04a9197	osd: add wal_devices option support to ceph_volume module This commit adds the `wal_devices` option support to the ceph_volume module. passing a devices list in `bluestore_wal_devices` will make ceph-volume creating 1 vg using these devices to create block.wal partitions. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-09-26 11:35:24 +02:00
Guillaume Abrioux	70f1b37097	osd: update doc text in defaults/main.yml This commit removes ceph-disk references. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-09-26 11:35:24 +02:00
Guillaume Abrioux	7b836eaa47	osd: add block_db_devices option support to ceph_volume module This commit adds the `block_db_devices` option support to the ceph_volume module. passing a devices list in `dedicated_devices` will make ceph-volume creating 1 vg using these devices to create block.db partitions for data devices. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-09-26 11:35:24 +02:00
Guillaume Abrioux	5986b26a01	global: add newline at end of file This commit re-add a newline at end of files when it's missing. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-08-23 15:56:47 +02:00
Artur Fijalkowski	011270ca69	global: make directories mode parameterizable This commit makes it possible to parametrize the ceph directories modes. So it changes hardocded mode for ceph related directories from 0755 to customizable with `ceph_directories_mode` variable. Closes: #2920 Signed-off-by: Artur Fijalkowski <artur.fijalkowski@ing.com> Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-08-23 09:38:17 +02:00
Guillaume Abrioux	102edaeb61	lint: fix error [306], add pipefail on shell command using pipe This commit fixes the error [306]: `[306] Shells that use pipes should set the pipefail option` using `/bin/bash` as executable because Debian/Ubuntu systems use `dash` by default which doesn't have the `-o pipefail`. (See: https://github.com/ansible/ansible-lint/issues/497#issue-424623501) Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-08-23 00:23:47 +02:00
Dimitri Savineau	9a4ac46d19	ceph-osd: Add ulimit nofile on container start On containerized deployment, the OSD entrypoint runs some ceph-volume commands (lvm/simple scan and/or activate) which perform badly without the ulimit option. This option was added for all previous ceph-volume commands but not on the ceph-osd container startup. Also updating hard limit value to 4096 to reflect default baremetal value. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-08-22 16:59:08 +02:00
Guillaume Abrioux	70cf2a5846	osd: remove useless condition just like `ceph_osd_pool_default_size`, a pool size might change after an initial deployment. Having this condition prevents from customizing the pool in that case. This is not needed so let's remove it. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-08-19 16:17:22 +02:00
Guillaume Abrioux	687087fd43	osd: refact 'wait for all osd to be up' task let's use `until` instead of doing test in bash using python oneliner also, use `command` instead of `shell`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-08-14 16:42:02 +02:00
Guillaume Abrioux	13815ad3ca	common: use discovered_interpreter_python fact in order to use the right binary name when using python cli in command or shell module. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-08-14 16:42:02 +02:00
Guillaume Abrioux	a5e359ee80	osd: update the check for 'all osd to be up' the data structure has changed in octopus. eg: the path to `num_osds` is now `["osdmap"]["num_osds"]`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-08-14 16:42:02 +02:00
Dimitri Savineau	d549fffdd2	ceph-osd: check container engine rc for pools When creating OpenStack pools, we only check if the return code from the pool list command isn't 0 (ie: if it doesn't exist). In that case, the return code will be 2. That's why the next condition is rc != 0 for the pool creation. But in containerized deployment, the return code could be different if there's a failure on the container engine command (like container not running). In that case, the return code could but either 1 (docker) or 125 (podman) so we should fail at this point and not in the next tasks. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1732157 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-07-29 15:55:04 +02:00
Guillaume Abrioux	33eed78d17	containers: improve logging bindmount /var/log/ceph on all containers so it's possible to retrieve logs from the host. related ceph-container PR: ceph/ceph-container#1408 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1710548 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-28 13:30:36 -04:00
Dimitri Savineau	02fbe76e62	ceph-osd: Add CONTAINER_IMAGE env variable This environment variable was added in `cb381b4` but was removed in `4d35e9e`. This commit reintroduces the change. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-06-27 16:38:02 +02:00
Dimitri Savineau	b987534881	ceph-volume: Set max open files limit on container The ceph-volume lvm list command takes ages to complete when having a lot of LV devices on containerized deployment. For instance, with 25 OSDs on a node it takes 3 mins 44s to list the OSD. Adding the max open files limit to the container engine cli when executing the ceph-volume command seems to improve a lot thee execution time ~30s. This was impacting the OSDs creation with ceph-volume (both filestore and bluestore) when using multiple LV devices. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1702285 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-06-20 22:37:40 +02:00
Dimitri Savineau	7c3640177b	roles: Remove useless become (true) flag We already set the become flag to true at a play level in the site* playbooks so we don't need to set it at a task level. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-06-19 10:31:32 +02:00
Guillaume Abrioux	eece362b38	osd: remove legacy task `parted_results` isn't used anymore in the playbook. By the way, `parted` seems to cause issue because it changes the ownership on devices: ``` root@osd0 ~]# ls -l /dev/sdc* brw-rw----. 1 root disk 8, 32 Jun 11 08:53 /dev/sdc brw-rw----. 1 ceph ceph 8, 33 Jun 11 08:53 /dev/sdc1 brw-rw----. 1 ceph ceph 8, 34 Jun 11 08:53 /dev/sdc2 [root@osd0 ~]# parted -s /dev/sdc print Model: ATA QEMU HARDDISK (scsi) Disk /dev/sdc: 53.7GB Sector size (logical/physical): 512B/512B Partition Table: gpt Disk Flags: Number Start End Size File system Name Flags 1 1049kB 1075MB 1074MB ceph block.db 2 1075MB 2149MB 1074MB ceph block.db [root@osd0 ~]# #We can see ownerships have changed from ceph:ceph to root:disk: [root@osd0 ~]# ls -l /dev/sdc* brw-rw----. 1 root disk 8, 32 Jun 11 08:57 /dev/sdc brw-rw----. 1 root disk 8, 33 Jun 11 08:57 /dev/sdc1 brw-rw----. 1 root disk 8, 34 Jun 11 08:57 /dev/sdc2 [root@osd0 ~]# ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-18 12:45:01 -04:00
Dimitri Savineau	f49090df7e	podman: Add systemd dependency on network.target When using podman, the systemd unit scripts don't have a dependency on the network. So we're not sure that the network is up and running when the containers are starting. With docker this behaviour is already handled because the systemd unit scripts depend on docker service which is started after the network. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-06-07 09:28:58 +02:00
L3D	ab54fe20ec	ansible: use 'bool' filter on boolean conditionals By running ceph-ansible there are a lot ``[DEPRECATION WARNING]`` like these: ``` [DEPRECATION WARNING]: evaluating containerized_deployment as a bare variable, this behaviour will go away and you might need to add \|bool to the expression in the future. Also see CONDITIONAL_BARE_VARS configuration toggle.. This feature will be removed in version 2.12. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg. ``` Now appended ``\| bool`` on a lot of the affected variables. Sometimes the coding style from ``variable\|bool`` changed to ``variable \| bool`` (with spaces at the pipe). Closes: #4022 Signed-off-by: L3D <l3d@c3woc.de>	2019-06-06 10:21:17 +02:00
Guillaume Abrioux	80875adba7	ceph-osd: do not relabel /run/udev in containerized context Otherwise content in /run/udev is mislabeled and prevent some services like NetworkManager from starting. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-04 11:32:41 -04:00
Guillaume Abrioux	e74d80e72f	rename docker_exec_cmd variable This commit renames the `docker_exec_cmd` variable to `container_exec_cmd` so it's more generic. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-05-16 16:39:13 +02:00
Rishabh Dave	89748d579a	don't access other node's docker_exec_cmd variable Except for some corner case, it's not correct to access some other node's copy of variable docker_exec_cmd. Therefore replace "hostvars[groups[mon_group_name][0]]['docker_exec_cmd']" by "docker_exec_cmd". Signed-off-by: Rishabh Dave <ridave@redhat.com>	2019-05-07 12:37:48 +02:00
Dimitri Savineau	ae266c6f2b	ansible: remove private and static attribute This will be removed in ansible 2.8 and breaks the playbook execution with this release. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-05-02 14:25:17 -04:00
Dimitri Savineau	c17106874c	ceph-osd: Increase cpu limit to 4 In containerized deployment the default osd cpu quota is too low for production environment using NVMe devices. This is causing performance degradation compared to bare-metal. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1695880 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-04-24 17:59:42 +02:00
Rishabh Dave	739a662c80	improve coding style Keywords requiring only one item shouldn't express it by creating a list with single item. Signed-off-by: Rishabh Dave <ridave@redhat.com>	2019-04-23 15:37:07 +02:00
Andrew Schoen	5e3dfe5021	ceph-osd: do not run lvm batch tasks during update When performing a rolling update do not try to create any new osds with `ceph-volume lvm batch`. This is troublesome because when upgrading to nautilus the devices list might contain devices that are currently being used by ceph-disk and have GPT headers on them, which will cause ceph-volume to fail when trying to use such a device. Any devices originally created by ceph-disk will need to be removed from the devices list before any new osds can be created. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2019-04-18 10:55:11 +02:00
Guillaume Abrioux	f899da3172	osd: remove legacy file this file is not used anymore, let's remove it. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-11 11:57:02 -04:00
Guillaume Abrioux	4f68462009	osd: remove ceph-disk scenarios files these files aren't needed anymore since we only use lvm scenario. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-11 11:57:02 -04:00
Guillaume Abrioux	f0416c8892	osd: remove dedicated_devices variable This variable was related to ceph-disk scenarios. Since we are entirely dropping ceph-disk support as of stable-4.0, let's remove this variable. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-11 11:57:02 -04:00
Guillaume Abrioux	4d35e9eeed	osd: remove variable osd_scenario As of stable-4.0, the only valid scenario is `lvm`. Thus, this makes this variable useless. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-11 11:57:02 -04:00
Guillaume Abrioux	4d5637fd8a	osd: remove legacy file ceph_disk_cli_options_facts.yml is not used anymore, let's remove it. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-11 11:57:02 -04:00
Sébastien Han	52df15895b	osd: default osd_scenario to lvm osd_scenario has become obsolete and defaults to lvm. With lvm there is no such things has collocated and non-collocated. Signed-off-by: Sébastien Han <seb@redhat.com>	2019-04-11 11:57:02 -04:00
Sébastien Han	e2a5aa062e	osd: remove ceph-disk support We don't support the preparation of OSD with ceph-disk. ceph-volume is only supported. However, the start operation of OSD is still supported. So let's say you change a config option, the handlers will be able to restart all the OSDs via their respective systemd unit files. Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-11 11:57:02 -04:00
Dimitri Savineau	7e5e4229b7	ceph-volume: Add PYTHONIOENCODING env variable Since https://github.com/ceph/ceph/commit/77912c0 ceph-volume uses stdout encoding based on LC_CTYPE and PYTHONIOENCODING environment variables. Thoses variables aren't set when using ansible. Currently this commit breaks non containerized deployment on Ubuntu. TASK [use ceph-volume to create bluestore osds] ******************** cmd: - ceph-volume - --cluster - ceph - lvm - create - --bluestore - --data - /dev/sdb rc: 1 stderr: \|- Traceback (most recent call last): (...) UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 132: ordinal not in range(128) Note that the task is failing on ansible side due to the stdout decoding but the osd creation is successful. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-04-02 12:41:55 +02:00
Rishabh Dave	e0beaf123a	"when" keyword should precede "block" keyword Otherwise the reader is forced to search for "when" when blocks are too long. Signed-off-by: Rishabh Dave <ridave@redhat.com>	2019-03-29 16:16:04 +00:00
Guillaume Abrioux	82764afe8d	update: mask systemd service units during upgrade This prevents the packaging from restarting services before we do need to restart them in the rolling update sequence. We want to handle services restart at rolling_update playbook. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-25 16:02:56 -04:00
Dimitri Savineau	179fdfbc19	ceph-osd: Ensure lvm2 is installed When using osd_scenario lvm, we never check if the lvm2 package is present on the host. When using containerized deployment and docker on CentOS/RedHat this package will be automatically installed as a dependency but not for Ubuntu distribution. OSD deployed via ceph-volume require the lvmetad.socket to be active and running. Resolves: #3728 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-03-20 22:26:45 +00:00
Guillaume Abrioux	987bdac963	osd: backward compatibility with old disk_list.sh location Since all files in container image have moved to `/opt/ceph-container` this check must look for new AND the old path so it's backward compatible. Otherwise it could end up by templating an inconsistent `ceph-osd-run.sh`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-18 17:25:51 +00:00
Dimitri Savineau	b7f4e3e7c7	ceph-osd: Install numactl package when needed With `3e32dce` we can run OSD containers with numactl support. When using numactl command in a containerized deployment we need to be sure that the corresponding package is installed on the host. The package installation is only executed when the ceph_osd_numactl_opts variable isn't empty. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-03-12 07:43:06 +00:00

1 2 3 4 5 ...

541 Commits (a09d1c38bf80e412265f58d732c554262ef23cc7)