ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Guillaume Abrioux	b79c2070f4	nfs: set idmap config for Ceph-NFS Currently NFS Ganesha (ceph-nfs) consumes /etc/idmapd.conf, which controls mapping of user/owner identities under NFSv4+. With containerized service deployment, this file is an immutable part of the container image and cannot be modified. Here we provide group variables, and a taskk and templates for the ceph-nfs role, to set the path of the idmap configuration file and to make the most common adjustment to the contents of that file -- namely to set the 'Domain'. We default the path to /etc/ganesha/idmap.conf so that we will not conflict with /etc/idmapd.conf on the controller nodes where ganesha runs. NFSv4 clients, as used for example by the Cinder NFS driver, consume /etc/idmapd.conf and may require different settings than what is wanted for NFS Ganesha. Additionally, because we already bind /etc/ganesha from the host into the ceph-nfs container, the file NFS Ganesha consumes will no longer be an immutable part of the container. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1925646 Signed-off-by: Tom Barron tpb@dyncloud.net Co-Authored-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `2db2208e40`)	2021-04-02 13:18:52 +02:00
Guillaume Abrioux	b2cf677b71	dashboard: support prometheus storage.tsdb.retention.time parameter This commit adds the parameter `--storage.tsdb.retention.time` to the prometheus systemd unit template. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1928000 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `b60c61ce45`)	2021-04-02 13:17:59 +02:00
Guillaume Abrioux	653d180ec0	defaults: add a comment about `igw_network` This add a quick documentation in ceph-defaults about `igw_network` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `c5728bdc63`)	2021-03-29 11:24:28 +02:00
Guillaume Abrioux	fe47a02134	dashboard: support igw nodes with dedicated subnet This adds the possibility to deploy the dashboard with igw nodes using a dedicated subnet. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1926170 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `c33de174f1`)	2021-03-26 21:26:14 +01:00
VasishtaShastry	58a28656ff	Peer addition won't be skipped if remote is not in peer rbd-mirroring is not configured as adding peer is getting skipped. Peer addition should not get skipped if its not added already Closes - https://bugzilla.redhat.com/show_bug.cgi?id=1942444 Signed-off-by: VasishtaShastry <vipin.indiasmg@gmail.com> (cherry picked from commit `006998e804`)	2021-03-26 19:14:35 +01:00
Guillaume Abrioux	9780490b2f	convert some missed `ansible_`` calls to `ansible_facts['']` This converts some missed calls to `ansible_*` that were missed in initial PR #6312 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `0163ecc924`)	2021-03-26 00:16:58 +01:00
Alex Schultz	6229b3bdba	Disable facts by default in ansible.cfg As a continuation of `a7f2fa73e6`, this change switches fact injection to off by default in the provided ansible.cfg. Signed-off-by: Alex Schultz <aschultz@redhat.com> (cherry picked from commit `db031a4993`)	2021-03-26 00:16:58 +01:00
Alex Schultz	7ddbe74712	Use ansible_facts It has come to our attention that using ansible_* vars that are populated with INJECT_FACTS_AS_VARS=True is not very performant. In order to be able to support setting that to off, we need to update the references to use ansible_facts[<thing>] instead of ansible_<thing>. Related: ansible#73654 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1935406 Signed-off-by: Alex Schultz <aschultz@redhat.com> (cherry picked from commit `a7f2fa73e6`)	2021-03-26 00:16:58 +01:00
Guillaume Abrioux	7fd332e7fe	iscsi: fetch right repo from shaman due to recent changes in shaman, we must fetch the right repo by filtering on the desired architecture. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `5801171b37`)	2021-03-25 14:11:11 +01:00
Guillaume Abrioux	bbf8b2fdf6	facts: fix nfs/external cluster scenario These tasks shouldn't be run when at least 1 monitor isn't present in the inventory. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1937997 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `ccd1cbb732`)	2021-03-18 06:41:00 +01:00
Guillaume Abrioux	dc2a11ce3f	config: reset num_osds When collocating OSDs with other daemon, `num_osds` is incorrectly calculated because `ceph-config` is called multiple times. Indeed, the following code: ``` num_osds: "{{ lvm_list.stdout \| default('{}') \| from_json \| length \| int + num_osds \| default(0) \| int }}" ``` makes `num_osds` be incremented each time `ceph-config` is called. We have to reset it in order to get the correct number of expected OSDs. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `31a0f2653d`)	2021-03-17 17:35:52 +01:00
Dimitri Savineau	6921aafb2b	debian/uca: remove the handler notification The "update apt cache" in the ceph-handler role was never called and the handler trigger after adding the uca repository doesn't exist at all. Instead of using a handler for that we can just set the update_cache parameter to true like the other apt_repository tasks. Resolve merge conflict from cherry-picking this commit. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `09d6706697`)	2021-03-11 22:06:11 +01:00
Dimitri Savineau	735965ef9c	ceph-common: enable rhcs tools repo for monitoring The monitoring node running grafana needs the rhcs tools repostory enabled in non containerized deployment to be able to install the ceph-grafana-dashboards rpm package. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1918650 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `e4dd0067c6`)	2021-03-11 13:52:21 +01:00
Tyler Bishop	ba76102952	facts: support device aliases for (dedicated\|bluestore_wal)_devices Just likve `devices`, this commit adds the support for linux device aliases for `dedicated_devices` and `bluestore_wal_devices`. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1919084 Signed-off-by: Tyler Bishop <tbishop@liquidweb.com> (cherry picked from commit `ee4b8804ae`)	2021-03-11 13:51:19 +01:00
Guillaume Abrioux	e3165f9a07	mon: fix cephx disabled deployment Due to missing condition on `cephx` variable, cephx disabled deployments are broken. This commit fixes this. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1910151 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `4af0845702`)	2021-03-11 13:51:04 +01:00
Guillaume Abrioux	241418409d	common: ensure shaman returns right repo Due to recent changes in shaman, there's a chance it returns the wrong repository from architecture point of view. We can query shaman and ask for the correct architecture to get around this. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `39649f0ce8`)	2021-03-10 16:43:04 +01:00
Matthew Vernon	fdf437743c	Fix typo and broken link for documenting RGW frontends http://docs.ceph.com/docs/nautilus/radosgw/frontends/ 404s so replace it with a working "latest" docs link, and correct the spelling of "additional" while I'm at it. Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk> (cherry picked from commit `847611048e`)	2021-03-03 14:20:26 +01:00
Guillaume Abrioux	c3304c213b	dashboard: add missing parameter in `ceph_cmd` the `ceph_cmd` fact is missing the `--net=host` parameter. Some tasks consuming this fact can fail like following: ``` Error: error configuring network namespace for container b8ec913db1fb694ae683faf202680de7a59c714a004e533aba87e8503d29261f: Missing CNI default network ``` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1931365 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `f143b1a647`)	2021-03-03 12:57:08 +01:00
Guillaume Abrioux	b5d082c4bc	rgw: fix a typo in multisite if `rgw_zonegroupmaster` is not defined at the rgw instance level in `rgw_instances` it will fallback to a wrong variable (`rgw_zonemaster`). Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1925247 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `931b87e830`)	2021-02-10 08:32:24 +01:00
Guillaume Abrioux	920f07514a	rgw: quick fix in create_zone_user.yml typical error: ``` 2021-02-01 03:11:09,809 p=93834 u=cephuser n=ansible \| TASK [ceph-rgw : check if the realm system user already exists] ************************************************************************************************************************************************* 2021-02-01 03:11:09,809 p=93834 u=cephuser n=ansible \| Monday 01 February 2021 03:11:09 -0500 (0:00:00.084) 0:14:38.607 ***** 2021-02-01 03:11:09,836 p=93834 u=cephuser n=ansible \| fatal: [ceph-kvm-ms2-1611241931591-node7-rgw]: FAILED! => msg: \|- The task includes an option with an undefined variable. The error was: 'None' has no attribute 'realm' ``` This task should be skipped when `zone_users` is undefined. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1922998 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-02-01 11:28:57 -05:00
Dimitri Savineau	6278c5a4e3	ceph-mon: add ExecStartPre docker stop to systemd We already do that in the other systemd templates (mgr, mds, etc..) and would present to add workaround in other orchestration tool. This change is for containerized deployment only. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1882724 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `3749d297c7`)	2021-01-29 12:00:14 -05:00
Guillaume Abrioux	aeee3471e3	rgw: avoid useless call to ceph-rgw since `ceph-rgw` may be called from `ceph-handler` in some contexts we should avoid rerunning it unnecessarily. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `8617081664`)	2021-01-28 16:37:50 -05:00
Guillaume Abrioux	b903446fa4	containers: use --cpus instead --cpu-quota When using docker 1.13.1, the current condition: ``` {% if (container_binary == 'docker' and ceph_docker_version.split('.')[0] is version_compare('13', '>=')) or container_binary == 'podman' -%} ``` is wrong because it compares the first digit (1) whereas it should compare the second one. It means we always use `--cpu-quota` although documentation recommend using `--cpus` when docker version is 1.13.1 or higher. From the doc: > --cpu-quota=<value> Impose a CPU CFS quota on the container. The number of > microseconds per --cpu-period that the container is limited to before > throttled. As such acting as the effective ceiling. > If you use Docker 1.13 or higher, use --cpus instead. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `3e262e072b`)	2021-01-28 16:37:50 -05:00
Guillaume Abrioux	14267fe0c4	rgw: multisite refact Add the possibility to deploy rgw multisite configuration with a mix of secondary and primary zones on a same rgw node. Before that, on a same node, all instances were either primary zones OR secondary. Now you can define a rgw instance like following: ``` rgw_instances: - instance_name: 'rgw0' rgw_zonemaster: false rgw_zonesecondary: true rgw_zonegroupmaster: false rgw_realm: 'france' rgw_zonegroup: 'zonegroup-france' rgw_zone: paris-00 radosgw_address: "{{ _radosgw_address }}" radosgw_frontend_port: 8080 rgw_zone_user: jacques.chirac rgw_zone_user_display_name: "Jacques Chirac" system_access_key: P9Eb6S8XNyo4dtZZUUMy system_secret_key: qqHCUtfdNnpHq3PZRHW5un9l0bEBM812Uhow0XfB endpoint: http://192.168.101.12:8080 ``` Basically it's now possible to define `rgw_zonemaster`, `rgw_zonesecondary` and `rgw_zonegroupmaster` at the intsance level instead of the whole node level. Also, this commit adds an option `deploy_secondary_zones` (default True) which can be set to `False` in order to explicitly ask the playbook to not deploy secondary zones in case where the corresponding endpoint are not deployed yet. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1915478 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `71a5e666e3`)	2021-01-28 16:37:50 -05:00
Dimitri Savineau	07d2160421	dashboard: manage password backward compatibility The ceph dashboard changed the way the password are provided via the CLI. This breaks the backward compatibility when using a recent ceph-ansible version with ceph release without that feature. This patch adds tasks for legacy workflow (ceph release without that feature) in ceph-dashboard role. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1915506 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-01-19 18:05:02 +01:00
Guillaume Abrioux	623ca14682	dashboard: configure passwords via stdin Due to recent changes in ceph, the few dashboard passwors must be passed via `-i` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `ef975ef5ea`)	2021-01-19 18:05:02 +01:00
Mike Currin	360a2d2b30	Path for ceph config missing in crash template The path where ceph.conf is located (/etc/ceph) missing in the Docker container bind mounts, this throws errors Signed-off-by: Mike Currin <currin@gmail.com> (cherry picked from commit `4cbc9a48c9`)	2021-01-06 16:55:39 +01:00
Guillaume Abrioux	290d3ef369	rgw: support switching from single-site to multisite When collocating rgw with either a mon, mgr or osd, switching from single site to a multisite rgw setup failed because of the handlers triggered between the ansible play of the collocated daemon and the play of the rgw. Since the multisite changes are not yet applied the handlers fail. The idea here is to ensure we run the multisite configuration from the ceph-handler role before the restart happens, this way it won't complain because of non existing multisite configuration. (Note: this is also valid when simply changing a multisite configuration) Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1888630 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `513c8cfe55`)	2021-01-06 10:38:50 -05:00
Guillaume Abrioux	607ef5a7d2	common: do not use pipefail when not needed Let's discard the ansible lint error 306 and add a "# noqa 306" on tasks where we don't need `set -o pipefail` Fixes: #6090 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `86a8889ee3`)	2020-12-16 14:05:45 +01:00
Guillaume Abrioux	6855feb604	ceph-osd: refact `docker_exec_start_osd` This commit drops nested jinja construction in this set_fact task. It also rename it to `container_exec_start_osd` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `ff95fa9c32`)	2020-12-16 14:05:45 +01:00
Dimitri Savineau	24a5b1bbb5	ceph-config: fix ceph-volume lvm batch report Since the major ceph-volume lvm batch refactoring, the report value is different. Before the refact, the report was a dict with the OSDs list to be created under the "osds" key. After the refact, the report is a list of dict. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `827b23353f`)	2020-12-15 17:26:01 -05:00
Dimitri Savineau	3f16132e44	library: add ceph_osd_flag module This adds ceph_osd_flag ansible module for replacing the command module usage with the ceph osd set/unset commands. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `5da593604a`)	2020-12-15 17:36:28 +01:00
Dimitri Savineau	e51f68fdbb	ceph-iscsi: set the pool name in the config file When using a custom pool for iSCSI gateway then we need to set the pool name in the configuration otherwise the default rbd pool name will be used. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `40a87c4b92`)	2020-12-15 17:33:24 +01:00
Guillaume Abrioux	63fa4c9484	containers: modify bindmount option This commit changes the bind mount option for the mount point `/var/lib/ceph` in the systemd template for mon and mgr containers. This is needed in case of collocating mon/mgr with osds using dmcrypt scenario. Once mon/mgr got converted to containers, the dmcrypt layer sub mount is still seen in `/var/lib/ceph`. For some reason it makes the corresponding devices busy so any other container can't open/close it. As a result, it prevents osds from starting properly. Since it only happens on the nodes converted before the OSD play, the idea is to bind mount `/var/lib/ceph` on mon and mgr with the `rshared` option so once the sub mount is unmounted, it is propagated inside the container so it doesn't see that mount point. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1896392 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `f5ba6d9b01`)	2020-12-15 17:33:11 +01:00
Dimitri Savineau	fa06752e4b	alertmanager/prometheus: fix owner/group Set the owner/group on alertmanager and prometheus directories and files to nobody and nogroup (uid and gid 65534) to avoid permission issues. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1901543 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `eb452d35bc`)	2020-12-15 17:32:50 +01:00
Guillaume Abrioux	69b5b96f2d	osd: add tag on 'wait for all osd to be up' task This allows skipping this task if really desired. Use it carefully. Use it at your own risk. Fixes: #6073 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `5c4ae5356d`)	2020-12-15 17:32:09 +01:00
Jukka Nousiainen	dca1534ee6	ceph-mon: No become during gen mon initial keyring Since the backing generate_secret() just hands out urandom output, running as privileged doesn't seem to be required. It's not desireable to provide sudo in some Ansible runner environments. Signed-off-by: Jukka Nousiainen <jukka.nousiainen@csc.fi> (cherry picked from commit `eb7473491b`)	2020-12-15 17:31:37 +01:00
Dimitri Savineau	9858d61a57	Revert "config: Always use osd_memory_target if set" This reverts commit `4d1fdd2b05`. This breaks the backward compatibility with previous osd_memory_target calculation and we could have a value lower than the minimum value allowed (896M) which causes some ceph commands to fail (like ceph assimilate-conf). Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `aa6e1f20ea`)	2020-12-15 17:31:09 +01:00
Seena Fallah	2485b35825	ceph-osd: use global crush_device_class in lvm_volumes Use global crush_device_class variable if it's not set per OSD Signed-off-by: Seena Fallah <seenafallah@gmail.com> (cherry picked from commit `5e9444fa5c`)	2020-12-15 17:30:55 +01:00
Karl-Heinz Preuß	00793c9221	fix broken ceph-fetch-keys role set fetch_directory variable in default/main.yml instead of using the defaults jinja filter in tasks/main.yml. Fixes: #6072 Signed-off-by: Karl-Heinz Preuß <karl-heinz.preuss@cms.hu-berlin.de> (cherry picked from commit `6ce34ef59f`)	2020-12-15 17:30:42 +01:00
Dimitri Savineau	f18142fc2e	group_vars: remove useless files Delete legacy files that aren't used anymore. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `e790b0851d`)	2020-12-15 17:30:42 +01:00
Guillaume Abrioux	1fcf71dc33	common: drop `fetch_directory` feature This commit drops the `fetch_directory` feature. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `1cc9666c09`)	2020-12-15 17:30:42 +01:00
Guillaume Abrioux	dc7b9519f4	ceph-config: ceph.conf rendering refactor This commit cleans up the `main.yml` task file of `ceph-config`. It drops the local ceph.conf generation. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `900c0f4492`)	2020-12-15 17:30:42 +01:00
Guillaume Abrioux	d14723d5b4	mon: refact initial keyring generation adding monitor is no longer possible because we generate a new mon keyring each time the playbook is run. Fixes: #5864 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1902281 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `970c6a4ee6`)	2020-12-01 09:53:26 -05:00
Dimitri Savineau	f917bb015c	ceph_key: set state as optional Most ansible module using a state parameter default to the present value (when available) instead of using it as a mandatory option. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `abb4023d76`)	2020-12-01 09:53:26 -05:00
Guillaume Abrioux	0d22598806	iscsigw: remove `--cap-add=all` from `podman run` cmd As of podman `2.0.5`, `--cap-add` and `--privileged` are exclusive options. ``` Nov 30 13:56:30 magna089 podman[171677]: Error: invalid config provided: CapAdd and privileged are mutually exclusive options ``` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1902149 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `d40dd764e0`)	2020-12-01 09:53:15 -05:00
Guillaume Abrioux	ef154613c8	container: remove `--ignore` from `podman rm` command As of podman 2.0.5, `--ignore` param conflicts with `--storage`. ``` Nov 30 13:53:10 magna089 podman[164443]: Error: --storage conflicts with --volumes, --all, --latest, --ignore and --cidfile ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `c68b124ba8`)	2020-12-01 09:53:15 -05:00
Guillaume Abrioux	fe699897ed	common: add a default value for ceph_directories_mode Since this variable makes it possible to customize the mode for ceph directories, let's make it a bit more explicit by adding a default value in ceph-defaults. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `483adb5d79`)	2020-11-19 21:14:02 -05:00
Guillaume Abrioux	0efc347a67	osd: ensure /var/lib/ceph/osd/{cluster}-{id} is present This commit ensures that the `/var/lib/ceph/osd/{{ cluster }}-{{ osd_id }}` is present before starting OSDs. This is needed specificly when redeploying an OSD in case of OS upgrade failure. Since ceph data are still present on its devices then the node can be redeployed, however those directories aren't present since they are initially created by ceph-volume. We could recreate them manually but for better user experience we can ask ceph-ansible to recreate them. NOTE: this only works for OSDs that were deployed with ceph-volume. ceph-disk deployed OSDs would have to get those directories recreated manually. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1898486 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `873fc8ec0f`)	2020-11-19 21:14:02 -05:00
Dimitri Savineau	76a77f1c92	ceph-facts: fix read osd pool default crush fact We don't need to use run_once on that task when having running monitors otherwise the read task could be skip and the set task will fail. The conditional check 'crush_rule_variable.rc == 0' failed. The error was: error while evaluating conditional (crush_rule_variable.rc == 0): 'dict object' has no attribute 'rc' Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1898856 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `e150df789e`)	2020-11-18 17:01:14 -05:00
Guillaume Abrioux	ce86d695c2	container: force rm --storage on ExecStartPre This is a workaround to avoid error like following: ``` Error: error creating container storage: the container name "ceph-mgr-magna022" is already in use by "4a5f674e113f837a0cc561dea5d2cd55d16ca159a647b7794ab06c4c276ef701" ``` that doesn't seem to be 100% reproducible but it shows up after a reboot. The only workaround we came up with at the moment is to run `podman rm --storage <container>` before starting it. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1887716 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `5ba7824c55`)	2020-11-16 16:37:46 -05:00
Gaudenz Steinlin	0f679e7b20	config: Always use osd_memory_target if set The osd_memory_target variable was only used if it was higher than the calculated value based on the number of OSDs. This is changed to always use the value if it is set in the configuration. This allows this value to be intentionally set lower so that it does not have to be changed when more OSDs are added later. Signed-off-by: Gaudenz Steinlin <gaudenz.steinlin@cloudscale.ch> (cherry picked from commit `4d1fdd2b05`)	2020-11-16 09:21:01 +01:00
Benoît Knecht	2ea3db269e	ceph-facts: Fix osd_pool_default_crush_rule fact The `osd_pool_default_crush_rule` is set based on `crush_rule_variable`, which is the output of a `grep` command. However, two consecutive tasks can set that variable, and if the second task is skipped, it still overwrites the `crush_rule_variable`, leading the `osd_pool_default_crush_rule` to be set to `ceph_osd_pool_default_crush_rule` instead of the output of the first task. This commit ensures that the fact is set right after the `crush_rule_variable` is assigned, before it can be overwritten. Closes #5912 Signed-off-by: Benoît Knecht <bknecht@protonmail.ch> (cherry picked from commit `c5f7343a2f`)	2020-11-13 10:42:13 -05:00
Guillaume Abrioux	aadef08311	dashboard: change dashboard_grafana_api_no_ssl_verify default value This sets the `dashboard_grafana_api_no_ssl_verify` default value according to the length of `dashboard_crt` and `dashboard_key`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `5cadfea42e`)	2020-11-05 09:02:15 +01:00
Guillaume Abrioux	c9b7e46213	dashboard: enable https by default see linked bz for details Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1889426 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `767d3c898e`)	2020-11-05 09:02:15 +01:00
Gaudenz Steinlin	f344a4810c	osd: Fix number of OSD calculation If some OSDs are to be created and others already exist the calculation only counted the to be created OSDs. This changes the calculation to take all OSDs into account. Signed-off-by: Gaudenz Steinlin <gaudenz.steinlin@cloudscale.ch> (cherry picked from commit `15044da030`)	2020-11-03 21:52:44 -05:00
Dimitri Savineau	bcd2797d11	rgw/rbdmirror: use service dump instead of ceph -s The ceph status command returns a lot of information stored in variables and/or facts which could consume resources for nothing. When checking the rgw/rbdmirror services status, we're only using the servicmap structure in the ceph status output. To optimize this, we could use the ceph service dump command which contains the same needed information. This command returns less information and is slightly faster than the ceph status command. $ ceph status -f json \| wc -c 2001 $ ceph service dump -f json \| wc -c 1105 $ time ceph status -f json > /dev/null real 0m0.557s user 0m0.516s sys 0m0.040s $ time ceph service dump -f json > /dev/null real 0m0.454s user 0m0.434s sys 0m0.020s Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `3f9081931f`)	2020-11-03 14:38:49 -05:00
Dimitri Savineau	69b51b5f19	monitor: use quorum_status instead of ceph status The ceph status command returns a lot of information stored in variables and/or facts which could consume resources for nothing. When checking the quorum status, we're only using the quorum_names structure in the ceph status output. To optimize this, we could use the ceph quorum_status command which contains the same needed information. This command returns less information. $ ceph status -f json \| wc -c 2001 $ ceph quorum_status -f json \| wc -c 957 $ time ceph status -f json > /dev/null real 0m0.577s user 0m0.538s sys 0m0.029s $ time ceph quorum_status -f json > /dev/null real 0m0.544s user 0m0.527s sys 0m0.016s Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `88f91d8c12`)	2020-11-03 14:38:49 -05:00
wangxiaotong	8bc0806f10	osds: use ceph osd stat instead of ceph status Improve the checked way of the OSD created checking process. This replaces the ceph status command by the ceph osd stat command. The osdmap structure isn't needed anymore. $ ceph status -f json \| wc -c 2001 $ ceph osd stat -f json \| wc -c 132 $ time ceph status -f json > /dev/null real 0m0.563s user 0m0.526s sys 0m0.036s $ time ceph osd stat -f json > /dev/null real 0m0.457s user 0m0.411s sys 0m0.045s Signed-off-by: wangxiaotong <wangxiaotong@fiberhome.com> (cherry picked from commit `b9cb0f12e9`)	2020-11-03 14:38:49 -05:00
Guillaume Abrioux	61001660bb	common: follow up on #5948 In addition to `f7e2b2c608` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `371d854a5c`)	2020-11-03 09:52:42 +01:00
Benoît Knecht	4a7186697e	ceph-mon: Don't set monitor directory mode recursively After rolling updates performed with `infrastructure-playbooks/rolling_updates.yml`, files located in `/var/lib/ceph/mon/{{ cluster }}-{{ monitor_name }}` had mode 0755 (including the keyring), making them world-readable. This commit separates the task that configured permissions recursively on `/var/lib/ceph/mon/{{ cluster }}-{{ monitor_name }}` into two separate tasks: 1. Set the ownership and mode of the directory itself; 2. Recursively set ownership in the directory, but don't modify the mode. Signed-off-by: Benoît Knecht <bknecht@protonmail.ch> (cherry picked from commit `0d76826bbb`)	2020-11-02 18:42:31 -05:00
Gaudenz Steinlin	a1ff05b26e	openstack: use ceph_keyring_permissions by default Otherwise this task fails if no permission is set on the item. Previously the code omited the mode parameter if it was not set, but this was lost with commit `ab370b6ad8`. Signed-off-by: Gaudenz Steinlin <gaudenz.steinlin@cloudscale.ch> (cherry picked from commit `79ff79c422`)	2020-11-02 18:42:18 -05:00
Dimitri Savineau	f344fe6f92	podman: force log driver to journald Since we've changed to podman configuration using the detach mode and systemd type to forking then the container logs aren't present in the journald anymore. The default conmon log driver is using k8s-file. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1890439 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `16cd183b9c`)	2020-11-02 17:46:48 -05:00
Dimitri Savineau	34a310e5f4	ceph-handler: fix curl ipv6 command with rgw When using the curl command with ipv6 address and brackets then we need to use the -g option otherwise the command fails. $ curl http://[fdc2:328:750b:6983::6]:8080 curl: (3) [globbing] error: bad range specification after pos 9 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `cdb7b09cd7`)	2020-11-02 17:44:22 -05:00
Guillaume Abrioux	12e06d07c8	iscsi: fix ownership on iscsi-gateway.cfg This file is currently deployed with '0644' ownership making this file readable by any user on the system. Since it contains sensitive information it should be readable by the owner only. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1890119 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `a822f77300`)	2020-10-21 18:27:59 -04:00
Benoît Knecht	d165dc676d	ceph-osd: Fix check mode for start osds tasks Correctly set `osd_ids_non_container.stdout_lines` to an empty list if it's undefined (i.e. in check mode). Signed-off-by: Benoît Knecht <bknecht@protonmail.ch> (cherry picked from commit `8b0023cb77`)	2020-10-21 13:19:49 +02:00
Benoît Knecht	dd51ca530c	ceph-mon: Fix check mode for deploy monitor tasks Skip the `get initial keyring when it already exists` task when both commands whose `stdout` output it requires have been skipped (e.g. when running in check mode). Signed-off-by: Benoît Knecht <bknecht@protonmail.ch> (cherry picked from commit `8f436ab5d8`)	2020-10-21 13:19:49 +02:00
Guillaume Abrioux	0a917b3917	crash: refact caps definition there is no need to use `{{ }}` syntax here. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `a8bd947c7d`)	2020-10-20 09:09:30 +02:00
Gaudenz Steinlin	950ad22f72	ceph-crash: Only deploy key to targeted hosts The current task installs the ceph-crash key to "most" hosts via "delegate_to". This key is only used by the ceph-crash daemon and should just be installed on all hosts targeted by this role. There is no need for using a delegated task. Signed-off-by: Gaudenz Steinlin <gaudenz.steinlin@cloudscale.ch> (cherry picked from commit `68cc93fb18`)	2020-10-19 20:20:46 +02:00
Guillaume Abrioux	6a521155d5	ceph-osd: start osd after systemd overrides The service should be started after the ceph-osd systemd overrides has been added, otherwise, the latter isn't considered. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1860739 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `59d0f01992`)	2020-10-15 09:34:58 -04:00
Dimitri Savineau	7c337d96d2	ceph-osd: don't start the OSD services twice Using the + operation on two lists doesn't filter out the duplicate keys. Currently each OSDs is started (via systemd) twice. Instead we could use the union filter. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `4eaa65c362`)	2020-10-14 10:00:21 -04:00
Guillaume Abrioux	709deb90cc	handler: refact check_socket_non_container the `stat --printf=%n` returns something like following: ``` ok: [osd0] => changed=false cmd: \|- stat --printf=%n /var/run/ceph/ceph-osd*.asok delta: '0:00:00.009388' end: '2020-10-06 06:18:28.109500' failed_when_result: false rc: 0 start: '2020-10-06 06:18:28.100112' stderr: '' stderr_lines: <omitted> stdout: /var/run/ceph/ceph-osd.2.asok/var/run/ceph/ceph-osd.5.asok stdout_lines: <omitted> ``` it makes the next task "check if the ceph osd socket is in-use" grep like this: ``` ok: [osd0] => changed=false cmd: - grep - -q - /var/run/ceph/ceph-osd.2.asok/var/run/ceph/ceph-osd.5.asok - /proc/net/unix ``` which will obviously fail because this path never exists. It makes the OSD handler broken. Let's use `find` module instead. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `46d4d97da9`)	2020-10-14 10:31:05 +02:00
Benoît Knecht	69a6053114	Fix Ansible check mode for site.yml.sample playbook Make sure the `site.yml.sample` playbook can be run in check mode by skipping tasks that try to read the output of commands that have been skipped. Signed-off-by: Benoît Knecht <bknecht@protonmail.ch> (cherry picked from commit `54ba38e35e`)	2020-10-07 07:06:54 +02:00
Guillaume Abrioux	52826caa51	rgw: fix multi instances scaleout in baremetal When rgw and osd are collocated, the current workflow prevents from scaling out the radosgw_num_instances parameter when rerunning the playbook in baremetal deployments. When ceph-osd notifies handlers, it means rgw handlers are triggered too. The issue with this is that they are triggered before the role ceph-rgw is run. In the case a scaleout operation is expected on `radosgw_num_instances` it causes an issue because keyrings haven't been created yet so the new instances won't start. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1881313 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `a802fa2810`)	2020-10-06 09:21:58 -04:00
Seena Fallah	eebed2990d	ceph-facts: add get default crush rule from running monitor In case of deploying new monitor node to an existing cluster, osd_pool_default_crush_rule should be taken from running monitor because ceph-osd role won't be run and the new monitor will have different osd_pool_default_crush_role from other monitors. Signed-off-by: Seena Fallah <seenafallah@gmail.com> (cherry picked from commit `ff9f4d138f`)	2020-09-29 16:38:38 +02:00
Ali Maredia	b753e7db15	rgw multisite: check connection for realm endpoint This commit adds connection checks before realm pulls Curls are performed on the endpoint being pulled from the mons and the rgws Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1731158 Signed-off-by: Ali Maredia <amaredia@redhat.com> (cherry picked from commit `902575369c`)	2020-09-29 16:33:20 +02:00
Dimitri Savineau	fabaec6351	ceph-handler: set handler on xxx_stat result In non containerized deployment we check if the service is running via the socket file presence. This is done via the xxx_socket_stat variable that check the file socket in the /var/run/ceph/ directory. In some scenarios, we could have the socket file still present in that directory but not used by any process. That's why we have the xxx_stat variable which clean those leftovers. The problem here is that we're set the variable for the handlers status (like handler_mon_status) based on xxx_socket_stat instead of xxx_stat. That means we will trigger the handlers if there's an old socket file present on the system without any process associated. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1866834 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `733596582d`)	2020-09-29 16:33:08 +02:00
Seena Fallah	0dd5036f6c	ceph-facts: check for mon socket in its own host delegate to its own host after checking mon socket to findout if mon socket is in-use or not. Signed-off-by: Seena Fallah <seenafallah@gmail.com> (cherry picked from commit `69f7e35382`)	2020-09-29 16:32:54 +02:00
Guillaume Abrioux	f9a6f775e9	mds: support enabling pg autoscaler on rerun This commit add the pg autoscaler enablement support on ceph-ansible rerun. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1836431 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-09-29 16:32:29 +02:00
Dimitri Savineau	7ffd3baa95	ceph-config: remove ceph_release from ceph.conf.j2 We don't use ceph_release variable in the ceph.conf jinja template. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `62bd41f0d4`)	2020-09-29 16:32:17 +02:00
Dmitriy Rabotyagov	6d5c74aa98	Remove libjemalloc1 installation task libjemalloc1 package is not required neither for ganesha dependency nor for the package build process. So this task can be simply dropped. Signed-off-by: Dmitriy Rabotyagov <noonedeadpunk@ya.ru> (cherry picked from commit `297532ca41`)	2020-09-29 16:30:36 +02:00
Guillaume Abrioux	f9d4eb8b41	facts: refact `ceph_uid` fact There's no need to set this fact with a `set_fact` We can achieve this in `ceph-defaults` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1875058 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `bcc673f66c`)	2020-09-21 13:49:03 -04:00
Dimitri Savineau	1385d2fdd0	ceph-facts: move facts to defaults value There's no need to define a variable via a fact if we can do it via a default value. Using a fact could be interesseting to override the default value on some condition. - ceph_uid could be set to 167 by default because it's only different on non containerized deployment on Debian/Ubuntu. - rbd_client_directory_{owner,group,mode} could be set to ceph,ceph,0770 by default install of null as we are doing in the facts. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1875058 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `7f997e623a`)	2020-09-21 13:49:03 -04:00
Dimitri Savineau	9412c44906	container: quote registry password When using a quote in the registry password then we have the following error: The error was: ValueError: No closing quotation To fix this we need to use the quote filter. Close: https://bugzilla.redhat.com/show_bug.cgi?id=1880252 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `6dcfdf17d4`)	2020-09-18 15:21:32 -04:00
Guillaume Abrioux	1527b9b12a	facts: fix 'set_fact rgw_instances with rgw multisite' the current condition doesn't work, as soon as the first iteration is done the condition makes next iterations skip since `rgw_instances` got set with the first iteration. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1859872 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `ff19c1d851`)	2020-09-18 10:35:28 -04:00
Dimitri Savineau	195ce88e26	ceph-infra: include iscsi nodes for logrotate The iscsi nodes aren't included in the logrotate condition. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `85643edfe3`)	2020-09-17 14:49:56 -04:00
Guillaume Abrioux	c60a7ad4f6	infra: support log rotation for tcmu-runner This commit adds the log rotation support for tcmu-runner. ceph-container related PR: ceph/ceph-container#1726 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1873915 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `f576c02ff7`)	2020-09-16 22:37:18 -04:00
Dimitri Savineau	fbc375387a	container: add optional http(s) proxy option When using a http(s) proxy with either docker or podman we can rely on the HTTP_PROXY, HTTPS_PROXY and NO_PROXY environment variables. But with ansible, even if those variables are defined in a source file then they aren't loaded during the container pull/login tasks. This implements the http(s) proxy support with docker/podman. Both implementations are different: 1/ docker doesn't rely en the environment variables with the CLI. Thos are needed by the docker daemon via systemd. 2/ podman uses the environment variables so we need to add them to the login/pull tasks. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1876692 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `bda3581294`)	2020-09-16 11:32:24 -04:00
Dimitri Savineau	13fb83fc93	ceph-prometheus: update pool stat counter Since [1] The bytes_used pool counter in prometheus has been renamed to stored. Closes: #5781 [1] https://github.com/ceph/ceph/commit/71fe9149 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `e54b924eaf`)	2020-09-16 10:08:54 -04:00
Dimitri Savineau	fd0b9491b6	ansible: bump to ansible 2.9 Prior this commit we were supporting both ansible 2.8 and 2.9. Let's drop 2.8 now. Closes: #5459 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1879178 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-09-15 13:13:09 -04:00
Dimitri Savineau	5cbbc904c1	node-exporter: exclude client nodes We don't need to install node-exporter on client node because there's no ceph services running on them. This also makes sure we use the group name variables in the prometheus service template instead of hardcoding the values. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `b105549ed8`)	2020-09-14 16:13:25 -04:00
Guillaume Abrioux	edb7bdd911	Revert "Make 'disable ssl for dashboard task' idempotent." This reverts commit `f607857f2a`. > That commit [1] introduced a regression in the dashboard configuration > because the ceph config get mgr xxxx command doesn't work with > nautilus. > In that release the get operation needs an entity. > [1] `f607857` Signed-off-by: Dimitri Savineau dsavinea@redhat.com	2020-09-11 09:37:23 -04:00
Guillaume Abrioux	44e3195ded	facts: refact and optimize memory consumption there's no need to run this task on all nodes. This uses too much memory for nothing. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1856981 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `f0fe193d8e`)	2020-09-11 09:37:23 -04:00
Guillaume Abrioux	448f36fbbd	config: only add related rgw section there's no need to add each rgw section on all rgw nodes. With this commit, only related rgw section are rendered. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `0a581a6e60`)	2020-09-10 20:55:07 -04:00
Dimitri Savineau	6177a87185	ceph-iscsi: remove python rtslib shaman repository The rtslib python library is now available in the distribution so we shouldn't have to use the shaman repository Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `254ab54f80`)	2020-09-10 20:38:34 -04:00
Dimitri Savineau	47f24ec047	Add CentOS 8 support for rpm deployment We were only supporting CentOS 8 for containerized deployment. Since Nautilus 14.2.10 we now have el8 rpm packages so we should be able to deploy a nautilus ceph cluster with el8. Note that the nfs-ganesha isn't supported because there's no el8 rpm packages for nfs-ganesha V2.8. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-09-10 20:38:34 -04:00
Niko Smeds	67d505af82	Enable HAProxy backend checks for Ceph RGW Add the `check` option to server definitions to enable basic HAProxy health checks for Ceph RADOS gateway backends. Currently traffic will be forwarded to unhealthly `radosgw.service` servers. These changes resolve the issue. Signed-off-by: Niko Smeds nikosmeds@gmail.com (cherry picked from commit `a951c1a3f0`)	2020-09-10 20:38:01 -04:00
Guillaume Abrioux	97a2640714	dashboard: refact admin user creation task this commit splits this task in order to avoid using a `shell` module. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `54d3e9650f`)	2020-09-10 20:37:42 -04:00
George Shuklin	f607857f2a	Make 'disable ssl for dashboard task' idempotent. This should reduce number of 'changed' tasks during convergence test. Signed-off-by: George Shuklin <george.shuklin@gmail.com> (cherry picked from commit `73d4bb6bd6`)	2020-09-10 20:37:26 -04:00
Rafał Wądołowski	db71eabeef	Comment out ceph_custom_key Since there is a check if ceph_custom_key is defined, there is no reason to define it by default. Signed-off-by: Rafał Wądołowski <rwadolowski@cloudferro.com> (cherry picked from commit `55cd6e83e4`)	2020-09-10 20:37:15 -04:00

1 2 3 4 5 ...

2751 Commits (c67bfe84eb94a3432dab76a0ca97a71d2a1b20b9)