ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Benoît Knecht	69a6053114	Fix Ansible check mode for site.yml.sample playbook Make sure the `site.yml.sample` playbook can be run in check mode by skipping tasks that try to read the output of commands that have been skipped. Signed-off-by: Benoît Knecht <bknecht@protonmail.ch> (cherry picked from commit `54ba38e35e`)	2020-10-07 07:06:54 +02:00
Dimitri Savineau	69b09f9336	Allow updating crush rule on existing pool The crush rule value was only set once during the pool creation. It was not possible to update the crush rule value by updating the value in the configuration. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1847166 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-09-10 20:35:44 -04:00
Guillaume Abrioux	88c9f6d969	common: don't enable debug log on ceph-volume calls by default ceph-volume can generate large logs at some point. debug logs by definition should be enabled only when debugging. Let's make it customizable with a variable which is set to `False` by default. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `448cc280b7`)	2020-08-13 14:21:44 +02:00
Dimitri Savineau	d408c75d76	podman: always remove container on start In case of failure, the systemd ExecStop isn't executed so the container isn't removed. After a reboot of a failed node, the container doesn't start because the old container is still present in created state. We should always try to remove the container in ExecStartPre for this situation. A normal reboot doesn't trigger this issue and this also doesn't affect nodes running containers via docker. This behaviour was introduced by `d43769d`. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1858865 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `47b7c00287`)	2020-07-24 12:47:21 -04:00
Dimitri Savineau	eb3f065d03	podman: Add Type and PIDFile value to unit files This changes the way we are running the podman containers via systemd. They are now in dettached mode and Type/PIDFile set. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1834974 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `d43769dc2a`)	2020-06-23 17:35:01 +02:00
Dimitri Savineau	a99c94ea11	ceph-osd: remove ceph-osd-run.sh script Since we only have one scenario since nautilus then we can just move the container start command from ceph-osd-run.sh to the systemd unit service. As a result, the ceph-osd-run.sh.j2 template and the ceph_osd_docker_run_script_path variable are removed. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `829990e60d`)	2020-06-23 17:35:01 +02:00
Dimitri Savineau	09453e22f4	docker: Add Requires on docker service When using docker container engine then the systemd unit scripts only use a dependency on the docker daemon via the After parameter. But if docker is restarted on a live system then the ceph systemd units should wait for the docker daemon to be fully restarted. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1846830 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `bd22f1d1ec`)	2020-06-22 19:11:20 -04:00
Guillaume Abrioux	c847c2f117	switch_to_containers: don't set noup flag We shouldn't set this flag when running switch_to_containers playbook. Otherwise the playbook fails waiting for pgs to be clean. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1843569 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `b91d60d384`)	2020-06-17 09:24:19 -04:00
Guillaume Abrioux	d790375905	common: fix target_size_ratio task enablement The condition on this task is wrong, we have to check whether `target_size_ratio` is set in the pool definition instead. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `8c7a48832c`)	2020-06-03 13:22:57 -04:00
Guillaume Abrioux	86d5979269	osd: add a default value for 'default' in crush_rules Let's default to `False` for the `default` attribute in `crush_rules` variable. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1797774 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `1b0b7af119`)	2020-06-03 13:20:40 -04:00
abaird-rh	6878aab0f9	Updated use of deprecated filter This was removed in Ansible 2.9. [DEPRECATION WARNING]: Using tests as filters is deprecated. Instead of using `result\|version_compare` use `result is version_compare`. This feature will be removed in version 2.9. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg. Rename 'version_compare' to the function 'version'. version_compose was renamed to version since ansible 2.5 Signed-off-by: abaird-rh <abaird@redhat.com> (cherry picked from commit `eb71244bfd`)	2020-04-20 13:37:42 -04:00
Guillaume Abrioux	7acd9686ab	osd: use default crush rule name when needed When `rule_name` isn't set in `crush_rules` the osd pool creation will fail. This commit adds a new fact `ceph_osd_pool_default_crush_rule_name` with the default crush rule name. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1817586 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `1bb9860dfd`)	2020-03-31 19:42:40 -04:00
Guillaume Abrioux	87e1f0cc6c	osd: support changing default rule even when osd_crush_location isn't defined Creating crush rules even with no crush hierarchy configuration is a valid scenario so we shouldn't be bound to the first task result (which configure crush hierarchy) to be able to add new crush rules. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1816989 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `5b0476385c`)	2020-03-31 23:14:55 +02:00
Guillaume Abrioux	1a978ae545	osd: do not change pool size on erasure pool This commit adds condition in order to not try to customize pools size when its type is erasure. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `e17c79b871`)	2020-03-06 16:10:03 +01:00
Guillaume Abrioux	98783a17b3	osd: add pg autoscaler support This commit adds the pg autoscaler support. The structure for pool definition has now two additional attributes `pg_autoscale_mode` and `target_size_ratio`, eg: ``` test: name: "test" pg_num: "{{ osd_pool_default_pg_num }}" pgp_num: "{{ osd_pool_default_pg_num }}" rule_name: "replicated_rule" application: "rbd" type: 1 erasure_profile: "" expected_num_objects: "" size: "{{ osd_pool_default_size }}" min_size: "{{ osd_pool_default_min_size }}" pg_autoscale_mode: False target_size_ratio": 0.1 ``` when `pg_autoscale_mode` is `True` user has to set a decent value in `target_size_ratio`. Given that it's a new feature, it's still disabled by default. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1782253 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `47adc2bb08`)	2020-03-06 16:10:03 +01:00
Guillaume Abrioux	ae06d684b8	osd: refact osd pool creation Currently, the command executed is wrong, eg: ``` cmd: - podman - exec - ceph-mon-controller-0 - ceph - --cluster - ceph - osd - pool - create - volumes - '32' - '32' - replicated_rule - '1' delta: '0:00:01.625525' end: '2020-02-27 16:41:05.232705' item: ``` From documentation, the osd pool creation command is : ``` ceph osd pool create {pool-name} {pg-num} [{pgp-num}] [replicated] \ [crush-rule-name] [expected-num-objects] ceph osd pool create {pool-name} {pg-num} {pgp-num} erasure \ [erasure-code-profile] [crush-rule-name] [expected_num_objects] ``` it means we pass '1' (from item.type) as value for `expected_num_objects` by default which is very likely not what we want. Also, this commit modifies the default value when no `rule_name` is set to use the existing variable `osd_pool_default_crush_rule` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1808495 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `bf1f125d71`)	2020-03-06 16:10:03 +01:00
Dimitri Savineau	3617543517	containers: add KillMode=none to systemd templates Because we are relying on docker\|podman for managing containers then we don't need systemd to manage the process (like kill). Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `5a03e0ee1c`)	2020-02-18 12:10:35 -05:00
Dimitri Savineau	5f7778d59a	ceph-{mon,osd}: move default crush variables Since `ed36a11` we move the crush rules creation code from the ceph-mon to the ceph-osd role. To keep the backward compatibility we kept the possibility to set the crush variables on the mons side but we didn't move the default values. As a result, when using crush_rule_config set to true and wanted to use the default values for crush_rules then the crush rule ansible task creation will fail. "msg": "'ansible.vars.hostvars.HostVarsVars object' has no attribute 'crush_rules'" This patch move the default crush variables from ceph-mon to ceph-osd role but also use those default values when nothing is defined on the mons side. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1798864 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `1fc6b33714`)	2020-02-17 10:18:56 -05:00
Guillaume Abrioux	0d2af6ebf3	fix calls to `container_exec_cmd` in ceph-osd role We must call `container_exec_cmd` from the right monitor node otherwise the value of the fact might mistmatch between the delegated node and the node being played. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1794900 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `2f919f8971`)	2020-01-27 17:54:39 -05:00
Dimitri Savineau	6a51330892	ceph-osd: set container objectstore env variables Because we need to manage legacy ceph-disk based OSD with ceph-volume then we need a way to know the osd_objectstore in the container. This was done like this previously with ceph-disk so we should also do it with ceph-volume. Note that this won't have any impact for ceph-volume lvm based OSD. Rename docker_env_args fact to container_env_args and move the container condition on the include_tasks call. Remove OSD_DMCRYPT env variable from the ceph-osd template because it's now included in the container_env_args variable. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792122 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `c9e1fe3d92`)	2020-01-20 15:36:11 -05:00
Guillaume Abrioux	1462423059	handler: fix call to container_exec_cmd in handler_osds When unsetting the noup flag, we must call container_exec_cmd from the delegated node (first mon member) Also, adding a `run_once: true` because this task needs to be run only 1 time. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792320 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `22865cde9c`)	2020-01-20 12:45:51 -05:00
Dimitri Savineau	ff3a3ee5e9	container: move lvm2 package installation Before this patch, the lvm2 package installation was done during the ceph-osd role. However we were running ceph-volume command in the ceph-config role before ceph-osd. If lvm2 wasn't installed then the ceph-volume command fails: error checking path "/run/lock/lvm": stat /run/lock/lvm: no such file or directory This wasn't visible before because lvm2 was automatically installed as docker dependency but it's not the same for podman on CentOS 8. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `de8f2a9f83`)	2020-01-14 12:47:55 -05:00
Guillaume Abrioux	a81830ddc0	osd: use _devices fact in lvm batch scenario since `fd1718f379`, we must use `_devices` when deploying with lvm batch scenario. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `5558664f37`)	2020-01-14 15:33:15 +01:00
Guillaume Abrioux	ffdfa634ac	osd: do not run openstack_config during upgrade There is no need to run this part of the playbook when upgrading the cluter. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `af6875706a`)	2020-01-14 09:12:34 -05:00
Guillaume Abrioux	2d85fab02d	osd: support scaling up using --limit This commit lets add-osd.yml in place but mark the deprecation of the playbook. Scaling up OSDs is now possible using --limit Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `3496a0efa2`)	2020-01-14 09:12:34 -05:00
Guillaume Abrioux	9ed540da7e	osd: ensure osd ids collected are well restarted This commit refact the condition in the loop of that task so all potential osd ids found are well started. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1790212 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `58e6bfed2d`)	2020-01-13 14:31:03 -05:00
Dimitri Savineau	f2e1941ef1	ceph-osd: wait for all osds once `cf8c6a3` moves the 'wait for all osds' task from openstack_config to the main tasks list. But the openstack_config code was executed only on the last OSD node. We don't need to do this check on all OSD node so we need to add set run_once to true on that task. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `5bd1cf40eb`)	2020-01-10 11:07:25 -05:00
Dimitri Savineau	9fa5b296ca	ceph-osd: wait for all osd before crush rules When creating crush rules with device class parameter we need to be sure that all OSDs are up and running because the device class list is is populated with this information. This is now enable for all scenario not openstack_config only. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `cf8c6a3849`)	2020-01-10 11:07:25 -05:00
Dimitri Savineau	bd016960cf	ceph-osd: add device class to crush rules This adds device class support to crush rules when using the class key in the rule dict via the create-replicated sub command. If the class key isn't specified then we use the create-simple sub command for backward compatibility. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1636508 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `ef2cb99f73`)	2020-01-10 11:07:25 -05:00
Dimitri Savineau	661b2c013a	move crush rule creation from mon to osd role If we want to create crush rules with the create-replicated sub command and device class then we need to have the OSD created before the crush rules otherwise the device classes won't exist. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `ed36a11eab`)	2020-01-10 11:07:25 -05:00
Dimitri Savineau	3ebd71c8c2	ceph-osd: fix fs.aio-max-nr sysctl condition [1] introduced a regression on the fs.aio-max-nr sysctl value condition. The enable key isn't a boolean but a string because the expression isn't evaluated. This string output "(osd_objectstore == 'bluestore')" is always true because item.enable condition only matches non empty string. So the sysctl value was applyied for both filestore and bluestore backend. [2] added the bool filter to the condition but the filter always returns false on string and the sysctl wasn't applyed at all. This commit fixes the enable key value by evaluating the value instead of using the string. [1] https://github.com/ceph/ceph-ansible/commit/08a2b58 [2] https://github.com/ceph/ceph-ansible/commit/ab54fe2 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `ece46d33be`)	2019-11-07 20:31:02 +01:00
Dimitri Savineau	27eb40714c	ceph-osd: Remove ulimit nofile on container start Even if this improves ceph-disk/ceph-volume performances then it also impact the ceph-osd process. The ceph-osd process shouldn't use 1024:4096 value for the max open files. Removing the ulimit option from the container engine and doing this kind of change on the container side [1]. [1] https://github.com/ceph/ceph-container/pull/1497 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1702285 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `9a996aef7f`)	2019-10-31 14:42:30 -04:00
Dimitri Savineau	6d5125f2a4	lint: fix error [303,602,701,702] [303] mktemp used in place of tempfile module [602] Don't compare to empty string [701] No 'galaxy_info' found [702] Use 'galaxy_tags' rather than 'categories' This patch also changes the ansible log_path value via the ANSIBLE_LOG_PATH environment variable in the travis configuration to avoid warnings. [WARNING]: log file at /home/travis/ansible/ansible.log is not writeable and we cannot create it, aborting Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `f7fd0b6d4f`)	2019-10-21 15:55:54 -04:00
Guillaume Abrioux	13ca0531d8	common: improve keyrings generation There is no need to get n * number of nodes the different keyrings. Adding a `run_once: true` here avoid running a ceph command too many times which could be impacting large cluster deployment. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `9bad239d77`)	2019-10-02 14:34:27 +02:00
Guillaume Abrioux	df5337535d	container: isolate systemd tasks This commit isolates the systemd unit files generation for containers into separate yml files in order to be able importing each corresponding roles without playing all tasks. This is needed so we can run ceph-ansible to render systemd unit files so they call podman instead of docker. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `bd64167469`)	2019-10-01 18:50:51 +02:00
Guillaume Abrioux	e1d06f498c	global: remove fetch_directory dependency This commit drops the fetch_directory dependency. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1622688 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `ab370b6ad8`)	2019-09-26 16:21:54 +02:00
Guillaume Abrioux	69ec26e045	osd: add wal_devices option support to ceph_volume module This commit adds the `wal_devices` option support to the ceph_volume module. passing a devices list in `bluestore_wal_devices` will make ceph-volume creating 1 vg using these devices to create block.wal partitions. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `09e04a9197`)	2019-09-26 16:21:54 +02:00
Guillaume Abrioux	a33791be25	osd: update doc text in defaults/main.yml This commit removes ceph-disk references. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `70f1b37097`)	2019-09-26 16:21:54 +02:00
Guillaume Abrioux	d666e03b0c	osd: add block_db_devices option support to ceph_volume module This commit adds the `block_db_devices` option support to the ceph_volume module. passing a devices list in `dedicated_devices` will make ceph-volume creating 1 vg using these devices to create block.db partitions for data devices. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `7b836eaa47`)	2019-09-26 16:21:54 +02:00
Guillaume Abrioux	8f781198d6	lint: fix error [306], add pipefail on shell command using pipe This commit fixes the error [306]: `[306] Shells that use pipes should set the pipefail option` using `/bin/bash` as executable because Debian/Ubuntu systems use `dash` by default which doesn't have the `-o pipefail`. (See: https://github.com/ansible/ansible-lint/issues/497#issue-424623501) Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `102edaeb61`)	2019-08-28 11:22:47 -04:00
Artur Fijalkowski	27014df45e	global: make directories mode parameterizable This commit makes it possible to parametrize the ceph directories modes. So it changes hardocded mode for ceph related directories from 0755 to customizable with `ceph_directories_mode` variable. Closes: #2920 Signed-off-by: Artur Fijalkowski <artur.fijalkowski@ing.com> Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `011270ca69`)	2019-08-23 11:39:23 +00:00
Dimitri Savineau	500c59c648	ceph-osd: Add ulimit nofile on container start On containerized deployment, the OSD entrypoint runs some ceph-volume commands (lvm/simple scan and/or activate) which perform badly without the ulimit option. This option was added for all previous ceph-volume commands but not on the ceph-osd container startup. Also updating hard limit value to 4096 to reflect default baremetal value. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `9a4ac46d19`)	2019-08-22 22:50:17 +00:00
Guillaume Abrioux	19c7b650db	osd: remove useless condition just like `ceph_osd_pool_default_size`, a pool size might change after an initial deployment. Having this condition prevents from customizing the pool in that case. This is not needed so let's remove it. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `70cf2a5846`)	2019-08-20 09:13:15 +02:00
Guillaume Abrioux	f08408bf5c	osd: refact 'wait for all osd to be up' task let's use `until` instead of doing test in bash using python oneliner also, use `command` instead of `shell`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `687087fd43`)	2019-08-19 18:47:14 +00:00
Guillaume Abrioux	2f77704591	common: use discovered_interpreter_python fact in order to use the right binary name when using python cli in command or shell module. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `13815ad3ca`)	2019-08-19 18:47:14 +00:00
Dimitri Savineau	36e18e20d1	ceph-osd: check container engine rc for pools When creating OpenStack pools, we only check if the return code from the pool list command isn't 0 (ie: if it doesn't exist). In that case, the return code will be 2. That's why the next condition is rc != 0 for the pool creation. But in containerized deployment, the return code could be different if there's a failure on the container engine command (like container not running). In that case, the return code could but either 1 (docker) or 125 (podman) so we should fail at this point and not in the next tasks. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1732157 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `d549fffdd2`)	2019-07-31 14:07:41 -04:00
Guillaume Abrioux	2295a4cf0a	containers: improve logging bindmount /var/log/ceph on all containers so it's possible to retrieve logs from the host. related ceph-container PR: ceph/ceph-container#1408 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1710548 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `33eed78d17`)	2019-07-02 11:27:34 -04:00
Dimitri Savineau	109883e7a5	ceph-osd: Add CONTAINER_IMAGE env variable This environment variable was added in `cb381b4` but was removed in `4d35e9e`. This commit reintroduces the change. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `02fbe76e62`)	2019-06-27 17:34:24 -04:00
Dimitri Savineau	62d98971f2	ceph-volume: Set max open files limit on container The ceph-volume lvm list command takes ages to complete when having a lot of LV devices on containerized deployment. For instance, with 25 OSDs on a node it takes 3 mins 44s to list the OSD. Adding the max open files limit to the container engine cli when executing the ceph-volume command seems to improve a lot thee execution time ~30s. This was impacting the OSDs creation with ceph-volume (both filestore and bluestore) when using multiple LV devices. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1702285 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `b987534881`)	2019-06-20 20:00:53 -04:00
Dimitri Savineau	590f6026bb	roles: Remove useless become (true) flag We already set the become flag to true at a play level in the site* playbooks so we don't need to set it at a task level. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `7c3640177b`)	2019-06-20 22:00:27 +00:00

1 2 3 4 5 ...

564 Commits (709deb90cc9a76cc51d784d7e4fd22dda243246f)