ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Guillaume Abrioux	94c37b9de8	tests: use github workflow for nbsp char check Let's use a github workflow instead of travis for this. With this commit we can get rid of Travis. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-11-23 08:33:47 +01:00
Guillaume Abrioux	195d88fcda	lint: ignore 302,303,505 errors ignore 302,303 and 505 errors [302] Using command rather than an argument to e.g. file [303] Using command rather than module [505] referenced files must exist they aren't relevant on these tasks. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-11-23 08:33:47 +01:00
Guillaume Abrioux	c948b668eb	lint: do not use 'local_action' Fix ansible-lint 504 error: [504] Do not use 'local_action', use 'delegate_to: localhost' Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-11-23 08:33:47 +01:00
Guillaume Abrioux	dfc7e6e4bd	lint: trailing whitespace Fix ansible-lint 201 error: [201] Trailing whitespace Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-11-23 08:33:47 +01:00
Guillaume Abrioux	97dd9218dd	lint: all tasks should be named Fix ansible-lint 502 error: [502] All tasks should be named Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-11-23 08:33:47 +01:00
Guillaume Abrioux	11b4bf5083	lint: use shell only when shell functionality is required Fix ansible-lint 305 error: [305] Use shell only when shell functionality is required Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-11-23 08:33:47 +01:00
Guillaume Abrioux	2011e4dbc8	lint: don't compare to literal true/false Fix ansible lint 601 error: [601] Don't compare to literal True/False Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-11-23 08:33:47 +01:00
Guillaume Abrioux	9fba6eecfa	lint: variables should have spaces before and after Fix ansible lint 206 error: [206] Variables should have spaces before and after: {{ var_name }} Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-11-23 08:33:47 +01:00
Guillaume Abrioux	5450de58b3	lint: commands should not change things Fix ansible lint 301 error: [301] Commands should not change things if nothing needs doing Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-11-23 08:33:47 +01:00
Guillaume Abrioux	1879c26eb9	lint: set pipefail on shell tasks Fix ansible lint 306 error: [306] Shells that use pipes should set the pipefail option Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-11-23 08:33:47 +01:00
Guillaume Abrioux	d4400f911a	tests: use github workflow for ansible-lint let's use github workflow instead of travis. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-11-23 08:33:47 +01:00
Guillaume Abrioux	873fc8ec0f	osd: ensure /var/lib/ceph/osd/{cluster}-{id} is present This commit ensures that the `/var/lib/ceph/osd/{{ cluster }}-{{ osd_id }}` is present before starting OSDs. This is needed specificly when redeploying an OSD in case of OS upgrade failure. Since ceph data are still present on its devices then the node can be redeployed, however those directories aren't present since they are initially created by ceph-volume. We could recreate them manually but for better user experience we can ask ceph-ansible to recreate them. NOTE: this only works for OSDs that were deployed with ceph-volume. ceph-disk deployed OSDs would have to get those directories recreated manually. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1898486 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-11-19 09:20:28 +01:00
Dimitri Savineau	e150df789e	ceph-facts: fix read osd pool default crush fact We don't need to use run_once on that task when having running monitors otherwise the read task could be skip and the set task will fail. The conditional check 'crush_rule_variable.rc == 0' failed. The error was: error while evaluating conditional (crush_rule_variable.rc == 0): 'dict object' has no attribute 'rc' Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1898856 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-11-18 12:55:43 -05:00
Dimitri Savineau	3e79f0322a	tests: use github workflow for pytest Move the pytest testing from TravisCI to Github workflow. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-11-17 17:52:03 +01:00
Guillaume Abrioux	f5ba6d9b01	containers: modify bindmount option This commit changes the bind mount option for the mount point `/var/lib/ceph` in the systemd template for mon and mgr containers. This is needed in case of collocating mon/mgr with osds using dmcrypt scenario. Once mon/mgr got converted to containers, the dmcrypt layer sub mount is still seen in `/var/lib/ceph`. For some reason it makes the corresponding devices busy so any other container can't open/close it. As a result, it prevents osds from starting properly. Since it only happens on the nodes converted before the OSD play, the idea is to bind mount `/var/lib/ceph` on mon and mgr with the `rshared` option so once the sub mount is unmounted, it is propagated inside the container so it doesn't see that mount point. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1896392 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-11-17 09:19:23 -05:00
Dimitri Savineau	35ed9977aa	switch2container: chown symlink in mon/mgr plays `fa2bb3a` only fix the symlink owner/group issue in the OSD play. If the OSDs are collocated with other services like MONs and MGRs then the chown command will fail. $ find /var/lib/ceph/osd/ceph-0 -not -user 167 -execdir chown 167:167 {} + chown: cannot dereference './block': Permission denied Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1896448 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-11-16 13:40:57 -05:00
Guillaume Abrioux	5ba7824c55	container: force rm --storage on ExecStartPre This is a workaround to avoid error like following: ``` Error: error creating container storage: the container name "ceph-mgr-magna022" is already in use by "4a5f674e113f837a0cc561dea5d2cd55d16ca159a647b7794ab06c4c276ef701" ``` that doesn't seem to be 100% reproducible but it shows up after a reboot. The only workaround we came up with at the moment is to run `podman rm --storage <container>` before starting it. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1887716 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-11-16 10:38:40 -05:00
Benoît Knecht	c5f7343a2f	ceph-facts: Fix osd_pool_default_crush_rule fact The `osd_pool_default_crush_rule` is set based on `crush_rule_variable`, which is the output of a `grep` command. However, two consecutive tasks can set that variable, and if the second task is skipped, it still overwrites the `crush_rule_variable`, leading the `osd_pool_default_crush_rule` to be set to `ceph_osd_pool_default_crush_rule` instead of the output of the first task. This commit ensures that the fact is set right after the `crush_rule_variable` is assigned, before it can be overwritten. Closes #5912 Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>	2020-11-13 09:36:49 +01:00
Gaudenz Steinlin	4d1fdd2b05	config: Always use osd_memory_target if set The osd_memory_target variable was only used if it was higher than the calculated value based on the number of OSDs. This is changed to always use the value if it is set in the configuration. This allows this value to be intentionally set lower so that it does not have to be changed when more OSDs are added later. Signed-off-by: Gaudenz Steinlin <gaudenz.steinlin@cloudscale.ch>	2020-11-13 09:13:58 +01:00
Guillaume Abrioux	2fa17520c4	main: followup on pr 6012 This tag can be set at the play level. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-11-12 15:31:31 -05:00
Dimitri Savineau	fa2bb3af86	switch2container: disable ceph-osd enabled-runtime When deploying the ceph OSD via the packages then the ceph-osd@.service unit is configured as enabled-runtime. This means that each ceph-osd service will inherit from that state. The enabled-runtime systemd state doesn't survive after a reboot. For non containerized deployment the OSD are still starting after a reboot because there's the ceph-volume@.service and/or ceph-osd.target units that are doing the job. $ systemctl list-unit-files\|egrep '^ceph-(volume\|osd)'\|column -t ceph-osd@.service enabled-runtime ceph-volume@.service enabled ceph-osd.target enabled When switching to containerized deployment we are stopping/disabling ceph-osd@XX.servive, ceph-volume and ceph.target and then removing the systemd unit files. But the new systemd units for containerized ceph-osd service will still inherit from ceph-osd@.service unit file. As a consequence, if an OSD host is rebooting after the playbook execution then the ceph-osd service won't come back because they aren't enabled at boot. This patch also adds a reboot and testinfra run after running the switch to container playbook. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1881288 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-11-12 20:05:39 +01:00
Francesco Pantano	fafd5f871a	Add ceph_client tag to execute or skip the playbook There are some use cases where there's a need to skip the execution of the ceph-ansible client role even though the client section of the inventory isn't empty. This can happen in contexts where the services are colocated or when a all-in-one deployment is performed. The purpose of this change is adding a 'ceph_client' tag to avoid altering the ceph-ansible execution flow but at the same time be able to include or exclude a set of tasks using this tag. Signed-off-by: Francesco Pantano <fpantano@redhat.com>	2020-11-12 13:44:49 +01:00
Dimitri Savineau	3e49258377	rolling_update: always run cv simple scan/activate There's no need to use a condition on the ceph release for the ceph-volume simple commands. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-11-10 14:01:10 +01:00
Guillaume Abrioux	5cadfea42e	dashboard: change dashboard_grafana_api_no_ssl_verify default value This sets the `dashboard_grafana_api_no_ssl_verify` default value according to the length of `dashboard_crt` and `dashboard_key`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-11-04 10:00:48 +01:00
Guillaume Abrioux	767d3c898e	dashboard: enable https by default see linked bz for details Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1889426 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-11-04 10:00:48 +01:00
Gaudenz Steinlin	15044da030	osd: Fix number of OSD calculation If some OSDs are to be created and others already exist the calculation only counted the to be created OSDs. This changes the calculation to take all OSDs into account. Signed-off-by: Gaudenz Steinlin <gaudenz.steinlin@cloudscale.ch>	2020-11-03 14:33:35 +01:00
Dimitri Savineau	3d3ce26327	rolling_update: fix mgr start with mon collocation `cec994b` introduced a regression when a mgr is collocated with a mon. During the mon upgrade, the mgr service is masked to avoid to be restarted on packages update. Then the start mgr task is failing because the service is still masked. Instead we should unmask it. Fixes: #5983 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-11-03 09:10:17 +01:00
Dimitri Savineau	16afe90806	infrastructure: consume ceph_fs module `bd611a7` introduced the new ceph_fs module but missed some tasks in rolling_update and shrink-mds playbooks. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-11-03 09:06:17 +01:00
Dimitri Savineau	acddf4fb67	rolling_update: use ceph health instead of ceph -s The ceph status command returns a lot of information stored in variables and/or facts which could consume resources for nothing. When checking the cluster health, we're using the health structure in the ceph status output. To optimize this, we could use the ceph health command which contains the same needed information. $ ceph status -f json \| wc -c 2001 $ ceph health -f json \| wc -c 46 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-11-03 09:05:33 +01:00
Dimitri Savineau	3f9081931f	rgw/rbdmirror: use service dump instead of ceph -s The ceph status command returns a lot of information stored in variables and/or facts which could consume resources for nothing. When checking the rgw/rbdmirror services status, we're only using the servicmap structure in the ceph status output. To optimize this, we could use the ceph service dump command which contains the same needed information. This command returns less information and is slightly faster than the ceph status command. $ ceph status -f json \| wc -c 2001 $ ceph service dump -f json \| wc -c 1105 $ time ceph status -f json > /dev/null real 0m0.557s user 0m0.516s sys 0m0.040s $ time ceph service dump -f json > /dev/null real 0m0.454s user 0m0.434s sys 0m0.020s Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-11-03 09:05:33 +01:00
Dimitri Savineau	88f91d8c12	monitor: use quorum_status instead of ceph status The ceph status command returns a lot of information stored in variables and/or facts which could consume resources for nothing. When checking the quorum status, we're only using the quorum_names structure in the ceph status output. To optimize this, we could use the ceph quorum_status command which contains the same needed information. This command returns less information. $ ceph status -f json \| wc -c 2001 $ ceph quorum_status -f json \| wc -c 957 $ time ceph status -f json > /dev/null real 0m0.577s user 0m0.538s sys 0m0.029s $ time ceph quorum_status -f json > /dev/null real 0m0.544s user 0m0.527s sys 0m0.016s Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-11-03 09:05:33 +01:00
Dimitri Savineau	ee50588590	osds: use pg stat command instead of ceph status The ceph status command returns a lot of information stored in variables and/or facts which could consume resources for nothing. When checking the pgs state, we're using the pgmap structure in the ceph status output. To optimize this, we could use the ceph pg stat command which contains the same needed information. This command returns less information (only about pgs) and is slightly faster than the ceph status command. $ ceph status -f json \| wc -c 2000 $ ceph pg stat -f json \| wc -c 240 $ time ceph status -f json > /dev/null real 0m0.529s user 0m0.503s sys 0m0.024s $ time ceph pg stat -f json > /dev/null real 0m0.426s user 0m0.409s sys 0m0.016s The data returned by the ceph status is even bigger when using the nautilus release. $ ceph status -f json \| wc -c 35005 $ ceph pg stat -f json \| wc -c 240 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-11-03 09:05:33 +01:00
wangxiaotong	b9cb0f12e9	osds: use ceph osd stat instead of ceph status Improve the checked way of the OSD created checking process. This replaces the ceph status command by the ceph osd stat command. The osdmap structure isn't needed anymore. $ ceph status -f json \| wc -c 2001 $ ceph osd stat -f json \| wc -c 132 $ time ceph status -f json > /dev/null real 0m0.563s user 0m0.526s sys 0m0.036s $ time ceph osd stat -f json > /dev/null real 0m0.457s user 0m0.411s sys 0m0.045s Signed-off-by: wangxiaotong <wangxiaotong@fiberhome.com>	2020-11-03 09:05:33 +01:00
Guillaume Abrioux	371d854a5c	common: follow up on #5948 In addition to `f7e2b2c608` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-11-02 20:16:36 -05:00
Benoît Knecht	0d76826bbb	ceph-mon: Don't set monitor directory mode recursively After rolling updates performed with `infrastructure-playbooks/rolling_updates.yml`, files located in `/var/lib/ceph/mon/{{ cluster }}-{{ monitor_name }}` had mode 0755 (including the keyring), making them world-readable. This commit separates the task that configured permissions recursively on `/var/lib/ceph/mon/{{ cluster }}-{{ monitor_name }}` into two separate tasks: 1. Set the ownership and mode of the directory itself; 2. Recursively set ownership in the directory, but don't modify the mode. Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>	2020-11-02 17:36:37 +01:00
Dimitri Savineau	2138a00a32	library: remove unused module import Move the import at the top of the file and remove unused module import. - E402 module level import not at top of file - F401 'xxxx' imported but unused This also removes the '# noqa E402' statement from the code. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-11-02 17:20:06 +01:00
Dimitri Savineau	b02589ad50	keyring: use ceph_key module for get-or-create cmd Instead of using ceph auth get-or-create command via the ansible command module then we can use the ceph_key module. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-11-02 17:17:29 +01:00
Dimitri Savineau	59ecddcdd0	keyring: use ceph_key module for auth get command Instead of using ceph auth get command via the ansible command module then we can use the ceph_key module and the info state. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-11-02 17:17:29 +01:00
Dimitri Savineau	7d3d51d6da	library/ceph_key: add output format parameter The ceph_key module currently only supports the json output for the info state. When using this state on an entity then we something want the output as: - plain for copying it to another node. - json in order to get only a subset information of the entity (like the key or caps). This patch adds the output_format parameter which uses json as a default value for backward compatibility. It removes the internal and hardcoded variable also called output_format. In addition of json and plain outputs, there's also xml and yaml values available. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-11-02 17:17:29 +01:00
Gaudenz Steinlin	79ff79c422	openstack: use ceph_keyring_permissions by default Otherwise this task fails if no permission is set on the item. Previously the code omited the mode parameter if it was not set, but this was lost with commit `ab370b6ad8`. Signed-off-by: Gaudenz Steinlin <gaudenz.steinlin@cloudscale.ch>	2020-11-02 15:53:58 +01:00
Dimitri Savineau	16cd183b9c	podman: force log driver to journald Since we've changed to podman configuration using the detach mode and systemd type to forking then the container logs aren't present in the journald anymore. The default conmon log driver is using k8s-file. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1890439 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-11-02 15:49:27 +01:00
Dimitri Savineau	cdb7b09cd7	ceph-handler: fix curl ipv6 command with rgw When using the curl command with ipv6 address and brackets then we need to use the -g option otherwise the command fails. $ curl http://[fdc2:328:750b:6983::6]:8080 curl: (3) [globbing] error: bad range specification after pos 9 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-11-02 15:45:51 +01:00
Guillaume Abrioux	a822f77300	iscsi: fix ownership on iscsi-gateway.cfg This file is currently deployed with '0644' ownership making this file readable by any user on the system. Since it contains sensitive information it should be readable by the owner only. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1890119 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-10-21 16:10:48 +02:00
Guillaume Abrioux	1cc9666c09	common: drop `fetch_directory` feature This commit drops the `fetch_directory` feature. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-10-21 13:22:16 +02:00
Guillaume Abrioux	900c0f4492	ceph-config: ceph.conf rendering refactor This commit cleans up the `main.yml` task file of `ceph-config`. It drops the local ceph.conf generation. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-10-21 13:22:16 +02:00
Guillaume Abrioux	a8bd947c7d	crash: refact caps definition there is no need to use `{{ }}` syntax here. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-10-19 18:53:54 -04:00
Guillaume Abrioux	0bb106045e	ceph-volume: refresh lvm metadata cache When running rhel8 containers on a rhel7 host, after zapping an OSD there's a discrepancy with the lvmetad cache that needs to be refreshed. Otherwise, the host still sees the lv and can makes the user confused. If user tries to redeploy an OSD, it will fail because the LV isn't present and need to be recreated. ie: ``` stderr: lsblk: ceph-block-8/block-8: not a block device stderr: blkid: error: ceph-block-8/block-8: No such file or directory stderr: Unknown device, --name=, --path=, or absolute path in /dev/ or /sys expected. usage: ceph-volume lvm prepare [-h] --data DATA [--data-size DATA_SIZE] [--data-slots DATA_SLOTS] [--filestore] [--journal JOURNAL] [--journal-size JOURNAL_SIZE] [--bluestore] [--block.db BLOCK_DB] [--block.db-size BLOCK_DB_SIZE] [--block.db-slots BLOCK_DB_SLOTS] [--block.wal BLOCK_WAL] [--block.wal-size BLOCK_WAL_SIZE] [--block.wal-slots BLOCK_WAL_SLOTS] [--osd-id OSD_ID] [--osd-fsid OSD_FSID] [--cluster-fsid CLUSTER_FSID] [--crush-device-class CRUSH_DEVICE_CLASS] [--dmcrypt] [--no-systemd] ceph-volume lvm prepare: error: Unable to proceed with non-existing device: ceph-block-8/block-8 ``` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1886534 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-10-19 15:07:32 -04:00
Benoît Knecht	8b0023cb77	ceph-osd: Fix check mode for start osds tasks Correctly set `osd_ids_non_container.stdout_lines` to an empty list if it's undefined (i.e. in check mode). Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>	2020-10-19 20:22:08 +02:00
Benoît Knecht	8f436ab5d8	ceph-mon: Fix check mode for deploy monitor tasks Skip the `get initial keyring when it already exists` task when both commands whose `stdout` output it requires have been skipped (e.g. when running in check mode). Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>	2020-10-19 20:22:08 +02:00
Gaudenz Steinlin	68cc93fb18	ceph-crash: Only deploy key to targeted hosts The current task installs the ceph-crash key to "most" hosts via "delegate_to". This key is only used by the ceph-crash daemon and should just be installed on all hosts targeted by this role. There is no need for using a delegated task. Signed-off-by: Gaudenz Steinlin <gaudenz.steinlin@cloudscale.ch>	2020-10-19 16:54:06 +02:00

1 2 3 4 5 ...

5498 Commits (94c37b9de89ffd93449e77f7a90ad50b700fd0db) All Branches Search

5498 Commits (94c37b9de89ffd93449e77f7a90ad50b700fd0db)

All Branches