ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Karl-Heinz Preuß	da7b708636	fix broken ceph-fetch-keys role set fetch_directory variable in default/main.yml instead of using the defaults jinja filter in tasks/main.yml. Fixes: #6072 Signed-off-by: Karl-Heinz Preuß <karl-heinz.preuss@cms.hu-berlin.de> (cherry picked from commit `6ce34ef59f`)	2020-12-14 11:42:50 -05:00
Dimitri Savineau	41f7f9d020	Revert "config: Always use osd_memory_target if set" This reverts commit `4d1fdd2b05`. This breaks the backward compatibility with previous osd_memory_target calculation and we could have a value lower than the minimum value allowed (896M) which causes some ceph commands to fail (like ceph assimilate-conf). Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `aa6e1f20ea`)	2020-12-14 02:41:45 +01:00
Dimitri Savineau	e96293024b	purge-container-cluster: always prune force Since podman 2.x, there's now a confirmation when running podman container prune command. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `0108c9f941`)	2020-12-09 16:45:30 -05:00
Dimitri Savineau	228407308c	tests/vagrant: update box version to CentOS 8.3 This updates the CentOS libvirt box version to 8.3 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `801e7a29cf`)	2020-12-09 16:45:30 -05:00
Jukka Nousiainen	302fa3b2f8	ceph-mon: No become during gen mon initial keyring Since the backing generate_secret() just hands out urandom output, running as privileged doesn't seem to be required. It's not desireable to provide sudo in some Ansible runner environments. Signed-off-by: Jukka Nousiainen <jukka.nousiainen@csc.fi> (cherry picked from commit `eb7473491b`)	2020-12-07 09:24:37 -05:00
Dimitri Savineau	17c4744579	rhcs: drop fetch_directory override Since the fetch_directory variable has been dropped then we don't need the override in rhcs file. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `a2cbab16a4`)	2020-12-03 12:10:07 -05:00
Guillaume Abrioux	6b04f1154f	common: do not use pipefail when not needed Let's discard the ansible lint error 306 and add a "# noqa 306" on tasks where we don't need `set -o pipefail` Fixes: #6090 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `86a8889ee3`)	2020-12-01 20:18:35 -05:00
Guillaume Abrioux	679d3e2d10	osd: add tag on 'wait for all osd to be up' task This allows skipping this task if really desired. Use it carefully. Use it at your own risk. Fixes: #6073 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `5c4ae5356d`)	2020-12-01 11:04:37 +01:00
Guillaume Abrioux	7ab606bac5	iscsigw: remove `--cap-add=all` from `podman run` cmd As of podman `2.0.5`, `--cap-add` and `--privileged` are exclusive options. ``` Nov 30 13:56:30 magna089 podman[171677]: Error: invalid config provided: CapAdd and privileged are mutually exclusive options ``` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1902149 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `d40dd764e0`)	2020-11-30 16:42:59 -05:00
Guillaume Abrioux	f1ae3dec72	container: remove `--ignore` from `podman rm` command As of podman 2.0.5, `--ignore` param conflicts with `--storage`. ``` Nov 30 13:53:10 magna089 podman[164443]: Error: --storage conflicts with --volumes, --all, --latest, --ignore and --cidfile ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `c68b124ba8`)	2020-11-30 16:42:59 -05:00
Guillaume Abrioux	dc5aea52cf	switch2containers: do not stop ceph.target in osd play `ceph.target` should be disabled only. Otherwise, in collocation scenario you stop other collocated services in the OSD play which isn't what we want to do. Each daemon has its corresponding play for managing the transition to container. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1901865 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `0b05620597`)	2020-11-30 10:11:57 +01:00
Dimitri Savineau	51bea82677	alertmanager/prometheus: fix owner/group Set the owner/group on alertmanager and prometheus directories and files to nobody and nogroup (uid and gid 65534) to avoid permission issues. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1901543 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `eb452d35bc`)	2020-11-27 14:55:39 -05:00
Guillaume Abrioux	0edbabbf4d	mon: refact initial keyring generation adding monitor is no longer possible because we generate a new mon keyring each time the playbook is run. Fixes: #5864 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `970c6a4ee6`)	2020-11-26 09:12:22 +01:00
Guillaume Abrioux	10551da173	mon: replace `command` task by `copy` We can achieve this task using `copy` module. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `5ff2ca270f`)	2020-11-26 09:12:22 +01:00
Dimitri Savineau	1cf76da74a	ceph-iscsi: set the pool name in the config file When using a custom pool for iSCSI gateway then we need to set the pool name in the configuration otherwise the default rbd pool name will be used. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `40a87c4b92`)	2020-11-25 09:19:03 -05:00
Guillaume Abrioux	2a96eb81b7	tests: use github workflow for nbsp char check Let's use a github workflow instead of travis for this. With this commit we can get rid of Travis. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `94c37b9de8`)	2020-11-24 10:39:03 +01:00
Guillaume Abrioux	ed9a470113	lint: ignore 302,303,505 errors ignore 302,303 and 505 errors [302] Using command rather than an argument to e.g. file [303] Using command rather than module [505] referenced files must exist they aren't relevant on these tasks. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `195d88fcda`)	2020-11-24 10:39:03 +01:00
Guillaume Abrioux	805183bde3	lint: do not use 'local_action' Fix ansible-lint 504 error: [504] Do not use 'local_action', use 'delegate_to: localhost' Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `c948b668eb`)	2020-11-24 10:39:03 +01:00
Guillaume Abrioux	6ef95e9cde	lint: trailing whitespace Fix ansible-lint 201 error: [201] Trailing whitespace Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `dfc7e6e4bd`)	2020-11-24 10:39:03 +01:00
Guillaume Abrioux	0c3adbc710	lint: all tasks should be named Fix ansible-lint 502 error: [502] All tasks should be named Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `97dd9218dd`)	2020-11-24 10:39:03 +01:00
Guillaume Abrioux	5375713d3e	lint: use shell only when shell functionality is required Fix ansible-lint 305 error: [305] Use shell only when shell functionality is required Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `11b4bf5083`)	2020-11-24 10:39:03 +01:00
Guillaume Abrioux	e83bcd9459	lint: don't compare to literal true/false Fix ansible lint 601 error: [601] Don't compare to literal True/False Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `2011e4dbc8`)	2020-11-24 10:39:03 +01:00
Guillaume Abrioux	35a44a4f5a	lint: variables should have spaces before and after Fix ansible lint 206 error: [206] Variables should have spaces before and after: {{ var_name }} Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `9fba6eecfa`)	2020-11-24 10:39:03 +01:00
Guillaume Abrioux	630e6be904	lint: commands should not change things Fix ansible lint 301 error: [301] Commands should not change things if nothing needs doing Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `5450de58b3`)	2020-11-24 10:39:03 +01:00
Guillaume Abrioux	1d4cd3328a	lint: set pipefail on shell tasks Fix ansible lint 306 error: [306] Shells that use pipes should set the pipefail option Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `1879c26eb9`)	2020-11-24 10:39:03 +01:00
Guillaume Abrioux	ffc63ad5f5	tests: use github workflow for ansible-lint let's use github workflow instead of travis. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `d4400f911a`)	2020-11-24 10:39:03 +01:00
Guillaume Abrioux	d86a159a79	osd: ensure /var/lib/ceph/osd/{cluster}-{id} is present This commit ensures that the `/var/lib/ceph/osd/{{ cluster }}-{{ osd_id }}` is present before starting OSDs. This is needed specificly when redeploying an OSD in case of OS upgrade failure. Since ceph data are still present on its devices then the node can be redeployed, however those directories aren't present since they are initially created by ceph-volume. We could recreate them manually but for better user experience we can ask ceph-ansible to recreate them. NOTE: this only works for OSDs that were deployed with ceph-volume. ceph-disk deployed OSDs would have to get those directories recreated manually. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1898486 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `873fc8ec0f`)	2020-11-19 11:52:20 -05:00
Dimitri Savineau	aa302f48de	ceph-facts: fix read osd pool default crush fact We don't need to use run_once on that task when having running monitors otherwise the read task could be skip and the set task will fail. The conditional check 'crush_rule_variable.rc == 0' failed. The error was: error while evaluating conditional (crush_rule_variable.rc == 0): 'dict object' has no attribute 'rc' Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1898856 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `e150df789e`)	2020-11-18 17:01:05 -05:00
Dimitri Savineau	126230bbbd	tests: use github workflow for pytest Move the pytest testing from TravisCI to Github workflow. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `3e79f0322a`)	2020-11-18 10:49:22 -05:00
Guillaume Abrioux	703abb2572	tests: enforce pytest-rerunfailures version This commit enforces the pytest-rerunfailures installed so it's <9.0 This is to avoid the following error: ``` ERROR: pytest-rerunfailures 9.0 has requirement pytest>=5.0, but you'll have pytest 4.6.11 which is incompatible. ``` latest version of pytest-rerunfailures isn't compatible with the version of pytest we are using. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `19097026fb`)	2020-11-18 10:49:22 -05:00
Guillaume Abrioux	acdd43c0e2	containers: modify bindmount option This commit changes the bind mount option for the mount point `/var/lib/ceph` in the systemd template for mon and mgr containers. This is needed in case of collocating mon/mgr with osds using dmcrypt scenario. Once mon/mgr got converted to containers, the dmcrypt layer sub mount is still seen in `/var/lib/ceph`. For some reason it makes the corresponding devices busy so any other container can't open/close it. As a result, it prevents osds from starting properly. Since it only happens on the nodes converted before the OSD play, the idea is to bind mount `/var/lib/ceph` on mon and mgr with the `rshared` option so once the sub mount is unmounted, it is propagated inside the container so it doesn't see that mount point. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1896392 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `f5ba6d9b01`)	2020-11-17 12:27:07 -05:00
Guillaume Abrioux	10dff6888c	container: force rm --storage on ExecStartPre This is a workaround to avoid error like following: ``` Error: error creating container storage: the container name "ceph-mgr-magna022" is already in use by "4a5f674e113f837a0cc561dea5d2cd55d16ca159a647b7794ab06c4c276ef701" ``` that doesn't seem to be 100% reproducible but it shows up after a reboot. The only workaround we came up with at the moment is to run `podman rm --storage <container>` before starting it. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1887716 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `5ba7824c55`)	2020-11-16 16:37:37 -05:00
Dimitri Savineau	553381c326	switch2container: chown symlink in mon/mgr plays `fa2bb3a` only fix the symlink owner/group issue in the OSD play. If the OSDs are collocated with other services like MONs and MGRs then the chown command will fail. $ find /var/lib/ceph/osd/ceph-0 -not -user 167 -execdir chown 167:167 {} + chown: cannot dereference './block': Permission denied Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1896448 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `35ed9977aa`)	2020-11-16 16:36:56 -05:00
Benoît Knecht	deaf60316a	ceph-facts: Fix osd_pool_default_crush_rule fact The `osd_pool_default_crush_rule` is set based on `crush_rule_variable`, which is the output of a `grep` command. However, two consecutive tasks can set that variable, and if the second task is skipped, it still overwrites the `crush_rule_variable`, leading the `osd_pool_default_crush_rule` to be set to `ceph_osd_pool_default_crush_rule` instead of the output of the first task. This commit ensures that the fact is set right after the `crush_rule_variable` is assigned, before it can be overwritten. Closes #5912 Signed-off-by: Benoît Knecht <bknecht@protonmail.ch> (cherry picked from commit `c5f7343a2f`)	2020-11-13 10:42:03 -05:00
Gaudenz Steinlin	bb3cfd0481	config: Always use osd_memory_target if set The osd_memory_target variable was only used if it was higher than the calculated value based on the number of OSDs. This is changed to always use the value if it is set in the configuration. This allows this value to be intentionally set lower so that it does not have to be changed when more OSDs are added later. Signed-off-by: Gaudenz Steinlin <gaudenz.steinlin@cloudscale.ch> (cherry picked from commit `4d1fdd2b05`)	2020-11-13 09:27:11 -05:00
Dimitri Savineau	ee43555148	switch2container: disable ceph-osd enabled-runtime When deploying the ceph OSD via the packages then the ceph-osd@.service unit is configured as enabled-runtime. This means that each ceph-osd service will inherit from that state. The enabled-runtime systemd state doesn't survive after a reboot. For non containerized deployment the OSD are still starting after a reboot because there's the ceph-volume@.service and/or ceph-osd.target units that are doing the job. $ systemctl list-unit-files\|egrep '^ceph-(volume\|osd)'\|column -t ceph-osd@.service enabled-runtime ceph-volume@.service enabled ceph-osd.target enabled When switching to containerized deployment we are stopping/disabling ceph-osd@XX.servive, ceph-volume and ceph.target and then removing the systemd unit files. But the new systemd units for containerized ceph-osd service will still inherit from ceph-osd@.service unit file. As a consequence, if an OSD host is rebooting after the playbook execution then the ceph-osd service won't come back because they aren't enabled at boot. This patch also adds a reboot and testinfra run after running the switch to container playbook. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1881288 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `fa2bb3af86`)	2020-11-12 17:04:30 -05:00
Guillaume Abrioux	46e2695cf1	main: followup on pr 6012 This tag can be set at the play level. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `2fa17520c4`)	2020-11-12 15:32:49 -05:00
Francesco Pantano	65a503bedc	Add ceph_client tag to execute or skip the playbook There are some use cases where there's a need to skip the execution of the ceph-ansible client role even though the client section of the inventory isn't empty. This can happen in contexts where the services are colocated or when a all-in-one deployment is performed. The purpose of this change is adding a 'ceph_client' tag to avoid altering the ceph-ansible execution flow but at the same time be able to include or exclude a set of tasks using this tag. Signed-off-by: Francesco Pantano <fpantano@redhat.com> (cherry picked from commit `fafd5f871a`)	2020-11-12 14:32:10 -05:00
Guillaume Abrioux	7b7f20c636	dashboard: change dashboard_grafana_api_no_ssl_verify default value This sets the `dashboard_grafana_api_no_ssl_verify` default value according to the length of `dashboard_crt` and `dashboard_key`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `5cadfea42e`)	2020-11-04 11:02:05 -05:00
Guillaume Abrioux	36f550b3b4	dashboard: enable https by default see linked bz for details Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1889426 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `767d3c898e`)	2020-11-04 11:02:05 -05:00
Gaudenz Steinlin	44d43a8e4d	osd: Fix number of OSD calculation If some OSDs are to be created and others already exist the calculation only counted the to be created OSDs. This changes the calculation to take all OSDs into account. Signed-off-by: Gaudenz Steinlin <gaudenz.steinlin@cloudscale.ch> (cherry picked from commit `15044da030`)	2020-11-03 11:36:04 -05:00
Dimitri Savineau	0cb9e179f5	rolling_update: fix mgr start with mon collocation `cec994b` introduced a regression when a mgr is collocated with a mon. During the mon upgrade, the mgr service is masked to avoid to be restarted on packages update. Then the start mgr task is failing because the service is still masked. Instead we should unmask it. Fixes: #5983 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `3d3ce26327`)	2020-11-03 14:32:42 +01:00
Dimitri Savineau	d2114efa4d	infrastructure: consume ceph_fs module `bd611a7` introduced the new ceph_fs module but missed some tasks in rolling_update and shrink-mds playbooks. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `16afe90806`)	2020-11-03 14:32:25 +01:00
Dimitri Savineau	1c6bd9a383	rolling_update: use ceph health instead of ceph -s The ceph status command returns a lot of information stored in variables and/or facts which could consume resources for nothing. When checking the cluster health, we're using the health structure in the ceph status output. To optimize this, we could use the ceph health command which contains the same needed information. $ ceph status -f json \| wc -c 2001 $ ceph health -f json \| wc -c 46 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `acddf4fb67`)	2020-11-03 14:32:09 +01:00
Dimitri Savineau	9c70add661	rgw/rbdmirror: use service dump instead of ceph -s The ceph status command returns a lot of information stored in variables and/or facts which could consume resources for nothing. When checking the rgw/rbdmirror services status, we're only using the servicmap structure in the ceph status output. To optimize this, we could use the ceph service dump command which contains the same needed information. This command returns less information and is slightly faster than the ceph status command. $ ceph status -f json \| wc -c 2001 $ ceph service dump -f json \| wc -c 1105 $ time ceph status -f json > /dev/null real 0m0.557s user 0m0.516s sys 0m0.040s $ time ceph service dump -f json > /dev/null real 0m0.454s user 0m0.434s sys 0m0.020s Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `3f9081931f`)	2020-11-03 14:32:09 +01:00
Dimitri Savineau	3bba1fd203	monitor: use quorum_status instead of ceph status The ceph status command returns a lot of information stored in variables and/or facts which could consume resources for nothing. When checking the quorum status, we're only using the quorum_names structure in the ceph status output. To optimize this, we could use the ceph quorum_status command which contains the same needed information. This command returns less information. $ ceph status -f json \| wc -c 2001 $ ceph quorum_status -f json \| wc -c 957 $ time ceph status -f json > /dev/null real 0m0.577s user 0m0.538s sys 0m0.029s $ time ceph quorum_status -f json > /dev/null real 0m0.544s user 0m0.527s sys 0m0.016s Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `88f91d8c12`)	2020-11-03 14:32:09 +01:00
Dimitri Savineau	a8e2bc087f	osds: use pg stat command instead of ceph status The ceph status command returns a lot of information stored in variables and/or facts which could consume resources for nothing. When checking the pgs state, we're using the pgmap structure in the ceph status output. To optimize this, we could use the ceph pg stat command which contains the same needed information. This command returns less information (only about pgs) and is slightly faster than the ceph status command. $ ceph status -f json \| wc -c 2000 $ ceph pg stat -f json \| wc -c 240 $ time ceph status -f json > /dev/null real 0m0.529s user 0m0.503s sys 0m0.024s $ time ceph pg stat -f json > /dev/null real 0m0.426s user 0m0.409s sys 0m0.016s The data returned by the ceph status is even bigger when using the nautilus release. $ ceph status -f json \| wc -c 35005 $ ceph pg stat -f json \| wc -c 240 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `ee50588590`)	2020-11-03 14:32:09 +01:00
wangxiaotong	b4c1f325a8	osds: use ceph osd stat instead of ceph status Improve the checked way of the OSD created checking process. This replaces the ceph status command by the ceph osd stat command. The osdmap structure isn't needed anymore. $ ceph status -f json \| wc -c 2001 $ ceph osd stat -f json \| wc -c 132 $ time ceph status -f json > /dev/null real 0m0.563s user 0m0.526s sys 0m0.036s $ time ceph osd stat -f json > /dev/null real 0m0.457s user 0m0.411s sys 0m0.045s Signed-off-by: wangxiaotong <wangxiaotong@fiberhome.com> (cherry picked from commit `b9cb0f12e9`)	2020-11-03 14:32:09 +01:00
Guillaume Abrioux	04d47d68fd	common: follow up on #5948 In addition to `f7e2b2c608` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `371d854a5c`)	2020-11-03 09:43:51 +01:00
Gaudenz Steinlin	2550e44e2f	openstack: use ceph_keyring_permissions by default Otherwise this task fails if no permission is set on the item. Previously the code omited the mode parameter if it was not set, but this was lost with commit `ab370b6ad8`. Signed-off-by: Gaudenz Steinlin <gaudenz.steinlin@cloudscale.ch> (cherry picked from commit `79ff79c422`)	2020-11-02 18:41:53 -05:00

... 4 5 6 7 8 ...

5724 Commits (3dd918db23853b959a35253e1ba7e25b3befd2df) All Branches Search

5724 Commits (3dd918db23853b959a35253e1ba7e25b3befd2df)

All Branches