ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Dimitri Savineau	50104650e7	add missing boolean filter Otherwise this will generate an ansible warning about the missing filter. [DEPRECATION WARNING]: evaluating xxx as a bare variable, this behaviour will go away and you might need to add \|bool to the expression in the future. Also see CONDITIONAL_BARE_VARS configuration toggle.. This feature will be removed in version 2.12. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-09-28 20:45:01 +02:00
Dimitri Savineau	abb4023d76	ceph_key: set state as optional Most ansible module using a state parameter default to the present value (when available) instead of using it as a mandatory option. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-09-14 14:12:21 -04:00
Guillaume Abrioux	f0fc59258a	Revert "ceph_pool: use default size/min_size and rule_name" This reverts commit `142934057f`. This is already handled in the ceph_pool module itself Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-09-14 14:12:21 -04:00
Dimitri Savineau	3a05aeb6cb	ceph_pool: set state as optional Most ansible module using a state parameter default to the present value (when available) instead of using it as a mandatory option. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-09-11 10:26:15 +02:00
Dimitri Savineau	142934057f	ceph_pool: use default size/min_size and rule_name Before [1] we were using default value for - size - min_size - rule_name when the key wasn't present in the pool dict. The commit [1] changed this by defaulting to omit. This patch restores the original workflow by using facts: - osd_pool_default_size - osd_pool_default_min_size - ceph_osd_pool_default_crush_rule_name [1] `af9f6684f2` Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-09-11 10:15:28 +02:00
Guillaume Abrioux	448cc280b7	common: don't enable debug log on ceph-volume calls by default ceph-volume can generate large logs at some point. debug logs by definition should be enabled only when debugging. Let's make it customizable with a variable which is set to `False` by default. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-08-11 15:03:20 +02:00
Dimitri Savineau	47b7c00287	podman: always remove container on start In case of failure, the systemd ExecStop isn't executed so the container isn't removed. After a reboot of a failed node, the container doesn't start because the old container is still present in created state. We should always try to remove the container in ExecStartPre for this situation. A normal reboot doesn't trigger this issue and this also doesn't affect nodes running containers via docker. This behaviour was introduced by `d43769d`. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1858865 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-23 17:00:38 +02:00
Dimitri Savineau	d43769dc2a	podman: Add Type and PIDFile value to unit files This changes the way we are running the podman containers via systemd. They are now in dettached mode and Type/PIDFile set. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1834974 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-06-23 09:37:50 +02:00
Dimitri Savineau	bd22f1d1ec	docker: Add Requires on docker service When using docker container engine then the systemd unit scripts only use a dependency on the docker daemon via the After parameter. But if docker is restarted on a live system then the ceph systemd units should wait for the docker daemon to be fully restarted. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1846830 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-06-22 23:08:50 +02:00
Dimitri Savineau	829990e60d	ceph-osd: remove ceph-osd-run.sh script Since we only have one scenario since nautilus then we can just move the container start command from ceph-osd-run.sh to the systemd unit service. As a result, the ceph-osd-run.sh.j2 template and the ceph_osd_docker_run_script_path variable are removed. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-06-18 17:51:13 +02:00
Guillaume Abrioux	b91d60d384	switch_to_containers: don't set noup flag We shouldn't set this flag when running switch_to_containers playbook. Otherwise the playbook fails waiting for pgs to be clean. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1843569 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-06-17 01:32:18 +02:00
Guillaume Abrioux	af9f6684f2	common: introduce ceph_pool module calls This commits calls the `ceph_pool` module for creating ceph pools everywhere it's needed in the playbook. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-05-16 07:31:57 +02:00
Guillaume Abrioux	8c7a48832c	common: fix target_size_ratio task enablement The condition on this task is wrong, we have to check whether `target_size_ratio` is set in the pool definition instead. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-05-15 20:57:32 +02:00
Guillaume Abrioux	1bb9860dfd	osd: use default crush rule name when needed When `rule_name` isn't set in `crush_rules` the osd pool creation will fail. This commit adds a new fact `ceph_osd_pool_default_crush_rule_name` with the default crush rule name. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1817586 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-03-31 14:49:38 -04:00
Guillaume Abrioux	5b0476385c	osd: support changing default rule even when osd_crush_location isn't defined Creating crush rules even with no crush hierarchy configuration is a valid scenario so we shouldn't be bound to the first task result (which configure crush hierarchy) to be able to add new crush rules. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1816989 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-03-31 14:26:48 -04:00
Dimitri Savineau	64701437de	container: remove ulimit nofile parameter Since Ceph Octopus is python3 only we don't need to specify the max open files anymore with the container engine. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-03-30 09:54:23 +02:00
Guillaume Abrioux	1b0b7af119	osd: add a default value for 'default' in crush_rules Let's default to `False` for the `default` attribute in `crush_rules` variable. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1797774 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-03-24 08:41:43 -04:00
Dimitri Savineau	e62532de46	update osd pool set size command Since [1] we can't use osd pool without replicas (size: 1) by default. We now need to set the mon_allow_pool_size_one flag to true in the ceph configuration and add the --yes-i-really-mean-it flag to the osd pool set size cli. [1] https://github.com/ceph/ceph/commit/21508bd Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-03-11 11:25:42 +01:00
Guillaume Abrioux	e17c79b871	osd: do not change pool size on erasure pool This commit adds condition in order to not try to customize pools size when its type is erasure. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-03-04 09:29:01 -05:00
Guillaume Abrioux	47adc2bb08	osd: add pg autoscaler support This commit adds the pg autoscaler support. The structure for pool definition has now two additional attributes `pg_autoscale_mode` and `target_size_ratio`, eg: ``` test: name: "test" pg_num: "{{ osd_pool_default_pg_num }}" pgp_num: "{{ osd_pool_default_pg_num }}" rule_name: "replicated_rule" application: "rbd" type: 1 erasure_profile: "" expected_num_objects: "" size: "{{ osd_pool_default_size }}" min_size: "{{ osd_pool_default_min_size }}" pg_autoscale_mode: False target_size_ratio": 0.1 ``` when `pg_autoscale_mode` is `True` user has to set a decent value in `target_size_ratio`. Given that it's a new feature, it's still disabled by default. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1782253 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-03-04 09:29:01 -05:00
Guillaume Abrioux	bf1f125d71	osd: refact osd pool creation Currently, the command executed is wrong, eg: ``` cmd: - podman - exec - ceph-mon-controller-0 - ceph - --cluster - ceph - osd - pool - create - volumes - '32' - '32' - replicated_rule - '1' delta: '0:00:01.625525' end: '2020-02-27 16:41:05.232705' item: ``` From documentation, the osd pool creation command is : ``` ceph osd pool create {pool-name} {pg-num} [{pgp-num}] [replicated] \ [crush-rule-name] [expected-num-objects] ceph osd pool create {pool-name} {pg-num} {pgp-num} erasure \ [erasure-code-profile] [crush-rule-name] [expected_num_objects] ``` it means we pass '1' (from item.type) as value for `expected_num_objects` by default which is very likely not what we want. Also, this commit modifies the default value when no `rule_name` is set to use the existing variable `osd_pool_default_crush_rule` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1808495 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-03-04 09:29:01 -05:00
Guillaume Abrioux	0326d992c2	osd: add journal option in ceph_volume call (batch) This commit adds the journal option to the ceph_volume call when scenario is lvm batch Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-02-28 17:29:59 -05:00
Dimitri Savineau	1fc6b33714	ceph-{mon,osd}: move default crush variables Since `ed36a11` we move the crush rules creation code from the ceph-mon to the ceph-osd role. To keep the backward compatibility we kept the possibility to set the crush variables on the mons side but we didn't move the default values. As a result, when using crush_rule_config set to true and wanted to use the default values for crush_rules then the crush rule ansible task creation will fail. "msg": "'ansible.vars.hostvars.HostVarsVars object' has no attribute 'crush_rules'" This patch move the default crush variables from ceph-mon to ceph-osd role but also use those default values when nothing is defined on the mons side. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1798864 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-17 10:50:53 +01:00
Dimitri Savineau	5a03e0ee1c	containers: add KillMode=none to systemd templates Because we are relying on docker\|podman for managing containers then we don't need systemd to manage the process (like kill). Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-13 16:11:33 +01:00
Guillaume Abrioux	2f919f8971	fix calls to `container_exec_cmd` in ceph-osd role We must call `container_exec_cmd` from the right monitor node otherwise the value of the fact might mistmatch between the delegated node and the node being played. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1794900 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-27 15:30:45 -05:00
Guillaume Abrioux	483adb5d79	common: add a default value for ceph_directories_mode Since this variable makes it possible to customize the mode for ceph directories, let's make it a bit more explicit by adding a default value in ceph-defaults. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-22 09:35:35 +01:00
Dimitri Savineau	c9e1fe3d92	ceph-osd: set container objectstore env variables Because we need to manage legacy ceph-disk based OSD with ceph-volume then we need a way to know the osd_objectstore in the container. This was done like this previously with ceph-disk so we should also do it with ceph-volume. Note that this won't have any impact for ceph-volume lvm based OSD. Rename docker_env_args fact to container_env_args and move the container condition on the include_tasks call. Remove OSD_DMCRYPT env variable from the ceph-osd template because it's now included in the container_env_args variable. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792122 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-20 13:59:44 -05:00
Guillaume Abrioux	22865cde9c	handler: fix call to container_exec_cmd in handler_osds When unsetting the noup flag, we must call container_exec_cmd from the delegated node (first mon member) Also, adding a `run_once: true` because this task needs to be run only 1 time. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792320 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-20 09:25:56 -05:00
Guillaume Abrioux	3e262e072b	containers: use --cpus instead --cpu-quota When using docker 1.13.1, the current condition: ``` {% if (container_binary == 'docker' and ceph_docker_version.split('.')[0] is version_compare('13', '>=')) or container_binary == 'podman' -%} ``` is wrong because it compares the first digit (1) whereas it should compare the second one. It means we always use `--cpu-quota` although documentation recommend using `--cpus` when docker version is 1.13.1 or higher. From the doc: > --cpu-quota=<value> Impose a CPU CFS quota on the container. The number of > microseconds per --cpu-period that the container is limited to before > throttled. As such acting as the effective ceiling. > If you use Docker 1.13 or higher, use --cpus instead. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-16 13:51:43 -05:00
Guillaume Abrioux	5558664f37	osd: use _devices fact in lvm batch scenario since `fd1718f379`, we must use `_devices` when deploying with lvm batch scenario. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-14 09:12:03 -05:00
Guillaume Abrioux	58e6bfed2d	osd: ensure osd ids collected are well restarted This commit refact the condition in the loop of that task so all potential osd ids found are well started. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1790212 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-13 12:05:48 -05:00
Guillaume Abrioux	af6875706a	osd: do not run openstack_config during upgrade There is no need to run this part of the playbook when upgrading the cluter. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-13 09:59:08 -05:00
Guillaume Abrioux	3496a0efa2	osd: support scaling up using --limit This commit lets add-osd.yml in place but mark the deprecation of the playbook. Scaling up OSDs is now possible using --limit Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-13 09:59:08 -05:00
Dimitri Savineau	de8f2a9f83	container: move lvm2 package installation Before this patch, the lvm2 package installation was done during the ceph-osd role. However we were running ceph-volume command in the ceph-config role before ceph-osd. If lvm2 wasn't installed then the ceph-volume command fails: error checking path "/run/lock/lvm": stat /run/lock/lvm: no such file or directory This wasn't visible before because lvm2 was automatically installed as docker dependency but it's not the same for podman on CentOS 8. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-08 11:13:46 +01:00
Dimitri Savineau	5bd1cf40eb	ceph-osd: wait for all osds once `cf8c6a3` moves the 'wait for all osds' task from openstack_config to the main tasks list. But the openstack_config code was executed only on the last OSD node. We don't need to do this check on all OSD node so we need to add set run_once to true on that task. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-11-27 13:05:42 -05:00
Dimitri Savineau	cf8c6a3849	ceph-osd: wait for all osd before crush rules When creating crush rules with device class parameter we need to be sure that all OSDs are up and running because the device class list is is populated with this information. This is now enable for all scenario not openstack_config only. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-11-27 07:43:07 +01:00
Dimitri Savineau	ef2cb99f73	ceph-osd: add device class to crush rules This adds device class support to crush rules when using the class key in the rule dict via the create-replicated sub command. If the class key isn't specified then we use the create-simple sub command for backward compatibility. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1636508 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-11-14 16:25:46 +01:00
Dimitri Savineau	ed36a11eab	move crush rule creation from mon to osd role If we want to create crush rules with the create-replicated sub command and device class then we need to have the OSD created before the crush rules otherwise the device classes won't exist. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-11-14 16:25:46 +01:00
Dimitri Savineau	ece46d33be	ceph-osd: fix fs.aio-max-nr sysctl condition [1] introduced a regression on the fs.aio-max-nr sysctl value condition. The enable key isn't a boolean but a string because the expression isn't evaluated. This string output "(osd_objectstore == 'bluestore')" is always true because item.enable condition only matches non empty string. So the sysctl value was applyied for both filestore and bluestore backend. [2] added the bool filter to the condition but the filter always returns false on string and the sysctl wasn't applyed at all. This commit fixes the enable key value by evaluating the value instead of using the string. [1] https://github.com/ceph/ceph-ansible/commit/08a2b58 [2] https://github.com/ceph/ceph-ansible/commit/ab54fe2 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-11-07 13:51:48 +01:00
Dimitri Savineau	9a996aef7f	ceph-osd: Remove ulimit nofile on container start Even if this improves ceph-disk/ceph-volume performances then it also impact the ceph-osd process. The ceph-osd process shouldn't use 1024:4096 value for the max open files. Removing the ulimit option from the container engine and doing this kind of change on the container side [1]. [1] https://github.com/ceph/ceph-container/pull/1497 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1702285 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-10-31 10:42:09 -04:00
Dimitri Savineau	f7fd0b6d4f	lint: fix error [303,602,701,702] [303] mktemp used in place of tempfile module [602] Don't compare to empty string [701] No 'galaxy_info' found [702] Use 'galaxy_tags' rather than 'categories' This patch also changes the ansible log_path value via the ANSIBLE_LOG_PATH environment variable in the travis configuration to avoid warnings. [WARNING]: log file at /home/travis/ansible/ansible.log is not writeable and we cannot create it, aborting Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-10-15 10:07:52 +02:00
Guillaume Abrioux	9bad239d77	common: improve keyrings generation There is no need to get n * number of nodes the different keyrings. Adding a `run_once: true` here avoid running a ceph command too many times which could be impacting large cluster deployment. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-10-02 13:09:50 +02:00
Guillaume Abrioux	bd64167469	container: isolate systemd tasks This commit isolates the systemd unit files generation for containers into separate yml files in order to be able importing each corresponding roles without playing all tasks. This is needed so we can run ceph-ansible to render systemd unit files so they call podman instead of docker. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-10-01 10:27:51 -04:00
Guillaume Abrioux	ab370b6ad8	global: remove fetch_directory dependency This commit drops the fetch_directory dependency. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1622688 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-09-26 11:35:24 +02:00
Guillaume Abrioux	09e04a9197	osd: add wal_devices option support to ceph_volume module This commit adds the `wal_devices` option support to the ceph_volume module. passing a devices list in `bluestore_wal_devices` will make ceph-volume creating 1 vg using these devices to create block.wal partitions. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-09-26 11:35:24 +02:00
Guillaume Abrioux	70f1b37097	osd: update doc text in defaults/main.yml This commit removes ceph-disk references. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-09-26 11:35:24 +02:00
Guillaume Abrioux	7b836eaa47	osd: add block_db_devices option support to ceph_volume module This commit adds the `block_db_devices` option support to the ceph_volume module. passing a devices list in `dedicated_devices` will make ceph-volume creating 1 vg using these devices to create block.db partitions for data devices. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-09-26 11:35:24 +02:00
Guillaume Abrioux	5986b26a01	global: add newline at end of file This commit re-add a newline at end of files when it's missing. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-08-23 15:56:47 +02:00
Artur Fijalkowski	011270ca69	global: make directories mode parameterizable This commit makes it possible to parametrize the ceph directories modes. So it changes hardocded mode for ceph related directories from 0755 to customizable with `ceph_directories_mode` variable. Closes: #2920 Signed-off-by: Artur Fijalkowski <artur.fijalkowski@ing.com> Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-08-23 09:38:17 +02:00
Guillaume Abrioux	102edaeb61	lint: fix error [306], add pipefail on shell command using pipe This commit fixes the error [306]: `[306] Shells that use pipes should set the pipefail option` using `/bin/bash` as executable because Debian/Ubuntu systems use `dash` by default which doesn't have the `-o pipefail`. (See: https://github.com/ansible/ansible-lint/issues/497#issue-424623501) Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-08-23 00:23:47 +02:00
Dimitri Savineau	9a4ac46d19	ceph-osd: Add ulimit nofile on container start On containerized deployment, the OSD entrypoint runs some ceph-volume commands (lvm/simple scan and/or activate) which perform badly without the ulimit option. This option was added for all previous ceph-volume commands but not on the ceph-osd container startup. Also updating hard limit value to 4096 to reflect default baremetal value. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-08-22 16:59:08 +02:00
Guillaume Abrioux	70cf2a5846	osd: remove useless condition just like `ceph_osd_pool_default_size`, a pool size might change after an initial deployment. Having this condition prevents from customizing the pool in that case. This is not needed so let's remove it. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-08-19 16:17:22 +02:00
Guillaume Abrioux	687087fd43	osd: refact 'wait for all osd to be up' task let's use `until` instead of doing test in bash using python oneliner also, use `command` instead of `shell`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-08-14 16:42:02 +02:00
Guillaume Abrioux	13815ad3ca	common: use discovered_interpreter_python fact in order to use the right binary name when using python cli in command or shell module. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-08-14 16:42:02 +02:00
Guillaume Abrioux	a5e359ee80	osd: update the check for 'all osd to be up' the data structure has changed in octopus. eg: the path to `num_osds` is now `["osdmap"]["num_osds"]`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-08-14 16:42:02 +02:00
Dimitri Savineau	d549fffdd2	ceph-osd: check container engine rc for pools When creating OpenStack pools, we only check if the return code from the pool list command isn't 0 (ie: if it doesn't exist). In that case, the return code will be 2. That's why the next condition is rc != 0 for the pool creation. But in containerized deployment, the return code could be different if there's a failure on the container engine command (like container not running). In that case, the return code could but either 1 (docker) or 125 (podman) so we should fail at this point and not in the next tasks. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1732157 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-07-29 15:55:04 +02:00
Guillaume Abrioux	33eed78d17	containers: improve logging bindmount /var/log/ceph on all containers so it's possible to retrieve logs from the host. related ceph-container PR: ceph/ceph-container#1408 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1710548 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-28 13:30:36 -04:00
Dimitri Savineau	02fbe76e62	ceph-osd: Add CONTAINER_IMAGE env variable This environment variable was added in `cb381b4` but was removed in `4d35e9e`. This commit reintroduces the change. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-06-27 16:38:02 +02:00
Dimitri Savineau	b987534881	ceph-volume: Set max open files limit on container The ceph-volume lvm list command takes ages to complete when having a lot of LV devices on containerized deployment. For instance, with 25 OSDs on a node it takes 3 mins 44s to list the OSD. Adding the max open files limit to the container engine cli when executing the ceph-volume command seems to improve a lot thee execution time ~30s. This was impacting the OSDs creation with ceph-volume (both filestore and bluestore) when using multiple LV devices. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1702285 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-06-20 22:37:40 +02:00
Dimitri Savineau	7c3640177b	roles: Remove useless become (true) flag We already set the become flag to true at a play level in the site* playbooks so we don't need to set it at a task level. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-06-19 10:31:32 +02:00
Guillaume Abrioux	eece362b38	osd: remove legacy task `parted_results` isn't used anymore in the playbook. By the way, `parted` seems to cause issue because it changes the ownership on devices: ``` root@osd0 ~]# ls -l /dev/sdc* brw-rw----. 1 root disk 8, 32 Jun 11 08:53 /dev/sdc brw-rw----. 1 ceph ceph 8, 33 Jun 11 08:53 /dev/sdc1 brw-rw----. 1 ceph ceph 8, 34 Jun 11 08:53 /dev/sdc2 [root@osd0 ~]# parted -s /dev/sdc print Model: ATA QEMU HARDDISK (scsi) Disk /dev/sdc: 53.7GB Sector size (logical/physical): 512B/512B Partition Table: gpt Disk Flags: Number Start End Size File system Name Flags 1 1049kB 1075MB 1074MB ceph block.db 2 1075MB 2149MB 1074MB ceph block.db [root@osd0 ~]# #We can see ownerships have changed from ceph:ceph to root:disk: [root@osd0 ~]# ls -l /dev/sdc* brw-rw----. 1 root disk 8, 32 Jun 11 08:57 /dev/sdc brw-rw----. 1 root disk 8, 33 Jun 11 08:57 /dev/sdc1 brw-rw----. 1 root disk 8, 34 Jun 11 08:57 /dev/sdc2 [root@osd0 ~]# ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-18 12:45:01 -04:00
Dimitri Savineau	f49090df7e	podman: Add systemd dependency on network.target When using podman, the systemd unit scripts don't have a dependency on the network. So we're not sure that the network is up and running when the containers are starting. With docker this behaviour is already handled because the systemd unit scripts depend on docker service which is started after the network. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-06-07 09:28:58 +02:00
L3D	ab54fe20ec	ansible: use 'bool' filter on boolean conditionals By running ceph-ansible there are a lot ``[DEPRECATION WARNING]`` like these: ``` [DEPRECATION WARNING]: evaluating containerized_deployment as a bare variable, this behaviour will go away and you might need to add \|bool to the expression in the future. Also see CONDITIONAL_BARE_VARS configuration toggle.. This feature will be removed in version 2.12. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg. ``` Now appended ``\| bool`` on a lot of the affected variables. Sometimes the coding style from ``variable\|bool`` changed to ``variable \| bool`` (with spaces at the pipe). Closes: #4022 Signed-off-by: L3D <l3d@c3woc.de>	2019-06-06 10:21:17 +02:00
Guillaume Abrioux	80875adba7	ceph-osd: do not relabel /run/udev in containerized context Otherwise content in /run/udev is mislabeled and prevent some services like NetworkManager from starting. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-04 11:32:41 -04:00
Guillaume Abrioux	e74d80e72f	rename docker_exec_cmd variable This commit renames the `docker_exec_cmd` variable to `container_exec_cmd` so it's more generic. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-05-16 16:39:13 +02:00
Rishabh Dave	89748d579a	don't access other node's docker_exec_cmd variable Except for some corner case, it's not correct to access some other node's copy of variable docker_exec_cmd. Therefore replace "hostvars[groups[mon_group_name][0]]['docker_exec_cmd']" by "docker_exec_cmd". Signed-off-by: Rishabh Dave <ridave@redhat.com>	2019-05-07 12:37:48 +02:00
Dimitri Savineau	ae266c6f2b	ansible: remove private and static attribute This will be removed in ansible 2.8 and breaks the playbook execution with this release. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-05-02 14:25:17 -04:00
Dimitri Savineau	c17106874c	ceph-osd: Increase cpu limit to 4 In containerized deployment the default osd cpu quota is too low for production environment using NVMe devices. This is causing performance degradation compared to bare-metal. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1695880 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-04-24 17:59:42 +02:00
Rishabh Dave	739a662c80	improve coding style Keywords requiring only one item shouldn't express it by creating a list with single item. Signed-off-by: Rishabh Dave <ridave@redhat.com>	2019-04-23 15:37:07 +02:00
Andrew Schoen	5e3dfe5021	ceph-osd: do not run lvm batch tasks during update When performing a rolling update do not try to create any new osds with `ceph-volume lvm batch`. This is troublesome because when upgrading to nautilus the devices list might contain devices that are currently being used by ceph-disk and have GPT headers on them, which will cause ceph-volume to fail when trying to use such a device. Any devices originally created by ceph-disk will need to be removed from the devices list before any new osds can be created. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2019-04-18 10:55:11 +02:00
Guillaume Abrioux	f899da3172	osd: remove legacy file this file is not used anymore, let's remove it. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-11 11:57:02 -04:00
Guillaume Abrioux	4f68462009	osd: remove ceph-disk scenarios files these files aren't needed anymore since we only use lvm scenario. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-11 11:57:02 -04:00
Guillaume Abrioux	f0416c8892	osd: remove dedicated_devices variable This variable was related to ceph-disk scenarios. Since we are entirely dropping ceph-disk support as of stable-4.0, let's remove this variable. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-11 11:57:02 -04:00
Guillaume Abrioux	4d35e9eeed	osd: remove variable osd_scenario As of stable-4.0, the only valid scenario is `lvm`. Thus, this makes this variable useless. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-11 11:57:02 -04:00
Guillaume Abrioux	4d5637fd8a	osd: remove legacy file ceph_disk_cli_options_facts.yml is not used anymore, let's remove it. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-11 11:57:02 -04:00
Sébastien Han	52df15895b	osd: default osd_scenario to lvm osd_scenario has become obsolete and defaults to lvm. With lvm there is no such things has collocated and non-collocated. Signed-off-by: Sébastien Han <seb@redhat.com>	2019-04-11 11:57:02 -04:00
Sébastien Han	e2a5aa062e	osd: remove ceph-disk support We don't support the preparation of OSD with ceph-disk. ceph-volume is only supported. However, the start operation of OSD is still supported. So let's say you change a config option, the handlers will be able to restart all the OSDs via their respective systemd unit files. Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-11 11:57:02 -04:00
Dimitri Savineau	7e5e4229b7	ceph-volume: Add PYTHONIOENCODING env variable Since https://github.com/ceph/ceph/commit/77912c0 ceph-volume uses stdout encoding based on LC_CTYPE and PYTHONIOENCODING environment variables. Thoses variables aren't set when using ansible. Currently this commit breaks non containerized deployment on Ubuntu. TASK [use ceph-volume to create bluestore osds] ******************** cmd: - ceph-volume - --cluster - ceph - lvm - create - --bluestore - --data - /dev/sdb rc: 1 stderr: \|- Traceback (most recent call last): (...) UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 132: ordinal not in range(128) Note that the task is failing on ansible side due to the stdout decoding but the osd creation is successful. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-04-02 12:41:55 +02:00
Rishabh Dave	e0beaf123a	"when" keyword should precede "block" keyword Otherwise the reader is forced to search for "when" when blocks are too long. Signed-off-by: Rishabh Dave <ridave@redhat.com>	2019-03-29 16:16:04 +00:00
Guillaume Abrioux	82764afe8d	update: mask systemd service units during upgrade This prevents the packaging from restarting services before we do need to restart them in the rolling update sequence. We want to handle services restart at rolling_update playbook. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-25 16:02:56 -04:00
Dimitri Savineau	179fdfbc19	ceph-osd: Ensure lvm2 is installed When using osd_scenario lvm, we never check if the lvm2 package is present on the host. When using containerized deployment and docker on CentOS/RedHat this package will be automatically installed as a dependency but not for Ubuntu distribution. OSD deployed via ceph-volume require the lvmetad.socket to be active and running. Resolves: #3728 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-03-20 22:26:45 +00:00
Guillaume Abrioux	987bdac963	osd: backward compatibility with old disk_list.sh location Since all files in container image have moved to `/opt/ceph-container` this check must look for new AND the old path so it's backward compatible. Otherwise it could end up by templating an inconsistent `ceph-osd-run.sh`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-18 17:25:51 +00:00
Dimitri Savineau	b7f4e3e7c7	ceph-osd: Install numactl package when needed With `3e32dce` we can run OSD containers with numactl support. When using numactl command in a containerized deployment we need to be sure that the corresponding package is installed on the host. The package installation is only executed when the ceph_osd_numactl_opts variable isn't empty. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-03-12 07:43:06 +00:00
Guillaume Abrioux	b3eb9206fa	osd: support numactl options on OSD activate This commit adds OSD containers activate with numactl support. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1684146 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-11 10:14:50 +01:00
Dimitri Savineau	a089e1ec23	systemd/service: Set docker.service conditionally We don't need to set After=docker.service when the container_binary variable isn't set to docker. It doesn't break anything currently but it could be confusing when using podman. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-03-07 20:56:11 +00:00
Dimitri Savineau	4d32ecc980	Force osd pool min_size value to integer After `b8d580b` and `e9e5d5a` we could have either item.min_size or osd_pool_default_min_size using string instead of int causing the condition to be true when it's false. As a result, the task could try to set the pool min_size value to 0 which leads to: Error EINVAL: pool min_size must be between 1 and 1 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-03-05 19:48:09 +00:00
Dimitri Savineau	cb381b41fe	Add CONTAINER_IMAGE env var to ceph daemons Ceph daemons will set the CONTAINER_IMAGE environment variable value in the daemon metadata. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-03-05 15:07:05 +00:00
Guillaume Abrioux	e9e5d5a39a	fix pool min_size customization `b8d580b3f4` introduced a bug when `min_size` isn't set (default to 0). Typical error: ``` Error EINVAL: pool min_size must be between 1 and 1 ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-05 13:29:34 +00:00
Radu Toader	b8d580b3f4	Customize pools min_size Signed-off-by: Radu Toader <radu.m.toader@gmail.com>	2019-03-05 10:57:15 +00:00
Kevin Coakley	b11dc13476	Updated 7 ansible-lint issues in the ceph-mon, ceph-osd, and ceph-rgw roles The following lint issues have been resolved: [301] Commands should not change things if nothing needs doing /home/travis/build/ceph/ceph-ansible/roles/ceph-mon/tasks/ceph_keys.yml:2 [305] Use shell only when shell functionality is required /home/travis/build/ceph/ceph-ansible/roles/ceph-osd/tasks/start_osds.yml:47 [301] Commands should not change things if nothing needs doing /home/travis/build/ceph/ceph-ansible/roles/ceph-rgw/tasks/multisite/destroy.yml:2 [301] Commands should not change things if nothing needs doing /home/travis/build/ceph/ceph-ansible/roles/ceph-rgw/tasks/multisite/destroy.yml:7 [301] Commands should not change things if nothing needs doing /home/travis/build/ceph/ceph-ansible/roles/ceph-rgw/tasks/multisite/destroy.yml:14 [301] Commands should not change things if nothing needs doing /home/travis/build/ceph/ceph-ansible/roles/ceph-rgw/tasks/multisite/destroy.yml:19 [301] Commands should not change things if nothing needs doing /home/travis/build/ceph/ceph-ansible/roles/ceph-rgw/tasks/multisite/destroy.yml:24 Signed-off-by: Kevin Coakley <kcoakley@sdsc.edu>	2019-03-04 22:25:35 +00:00
Dimitri Savineau	45a7082712	lint: Fix spaces before and after variables ansible-lint reports: [206] Variables should have spaces after {{ and before }} Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-03-01 17:22:24 +00:00
Kevin Coakley	038401fef2	Add changed_when: false to the "get osd ids" statement The "get osd ids" statement only registers the osd_ids_non_container variable. Running "ls /var/lib/ceph/osd/ \| sed 's/.*-//'" should never produce a change on the system. Adding changed_when: false prevents irrelevant change messages from Ansible. Signed-off-by: Kevin Coakley <kcoakley@sdsc.edu>	2019-02-28 22:46:19 +00:00
Guillaume Abrioux	d5be83e504	osd: add ipc=host in systemd template for containers in addition to `15812970f0` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-02-28 13:14:09 +00:00
Dimitri Savineau	dc1c0dcee2	ceph-osd: Drop memory flag with bluestore Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-02-26 07:27:06 +00:00
Guillaume Abrioux	21e5db8982	osd: make the 'wait for all osd to be up' task configurable introduce two new variables to make the check that 'wait for all osd to be up' configurable. It's possible that for some deployments, OSDs can take longer to be seen as UP and IN. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1676763 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-02-20 16:06:04 +00:00
David Waiting	3930791cb7	ensure at least one osd is up The existing task checks that the number of OSDs is equal to the number of up OSDs before continuing. The problem is that if none of the OSDs have been discovered yet, the task will exit immediately and subsequent pool creation will fail (num_osds = 0, num_up_osds = 0). This is related to Bugzilla 1578086. In this change, we also check that at least one OSD is present. In our testing, this results in the task correctly waiting for all OSDs to come up before continuing. Signed-off-by: David Waiting <david_waiting@comcast.com>	2019-02-19 18:31:05 +00:00
Guillaume Abrioux	d4e31b90a6	Revert "osd: container remove --pid=host" This reverts commit `bb2bbeb941`. Looks like when not passing `--pid=host` we are facing some issues when deploying more than 2 OSDs in containerized environment. At the moment, we are still troubleshooting this issue but we prefer to revert this commit so it doesn't block any PR in the CI. As soon as we have a fix; we will push a new PR to remove `--pid=host` (a revert of revert...) Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-02-14 10:34:37 +00:00
Guillaume Abrioux	8c8ec63633	container: use tmpfiles.d to creates /run/ceph instead of using `RuntimeDirectory` parameter in systemd unit files, let's use a systemd `tmpfiles.d` to ensure `/run/ceph`. Explanation: `podman` doesn't create the `/var/run/ceph` if it doesn't exist the time where the container is run while `docker` used to create it. In case of `switch_to_containers` scenario, `/run/ceph` gets created by a tmpfiles.d systemd file; when switching to containers, the systemd unit file complains because `/run/ceph` already exists The better fix would be to ensure `/usr/lib/tmpfiles.d/ceph-common.conf` is removed and only rely on `RuntimeDirectory` from systemd unit file parameter but we come from a non-containerized environment which is already running, it means `/run/ceph` is already created and when starting the unit to start the container, systemd will still complain and we can't simply remove the directory if daemons are collocated. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-02-13 09:42:27 +01:00
Sébastien Han	bb2bbeb941	osd: container remove --pid=host Let's try again with the Nautilus release. Closes: https://github.com/ceph/ceph-ansible/issues/1297 Signed-off-by: Sébastien Han <seb@redhat.com>	2019-02-07 12:13:51 +00:00
John Fulton	cc0bf197e1	Fix CNI error when net=host is not used on OSD calls Follow up fix that `410abd7` missed. Related: ceph#3561 Signed-off-by: John Fulton <fulton@redhat.com>	2019-02-05 22:49:01 +00:00

1 2 3 4 5 ...

624 Commits (main)