ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Guillaume Abrioux	b9dd253a4f	purge: rm service-cid files This commit makes sure purge playbooks remove those file if for any reason they have been left. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1920900 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-02-12 10:01:31 +01:00
Guillaume Abrioux	980a5a7df4	switch2container: do not serialize the ceph-crash migration There's no need to slow down the playbook execution time by migrating all the `ceph-crash` instances in a serial way. Let's remove the `serial: 1` so the migration is achieved in a parallel way. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-02-11 21:36:23 +01:00
Guillaume Abrioux	682116023d	tests: increase `mon_max_pg_per_osd` we aren't deploying enough OSD daemon, so it fails like following: ``` stderr: 'Error ERANGE: pool id 10 pg_num 256 size 2 would mean 1536 total pgs, which exceeds max 1500 (mon_max_pg_per_osd 250 * num_in_osds 6)' ``` Let's increase the value of `mon_max_pg_per_osd` in order to get around this issue in the CI. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-02-11 16:35:55 +01:00
Guillaume Abrioux	4e95180c80	doc: add a note about "latest" tags See the change for details. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-02-11 14:07:33 +01:00
Guillaume Abrioux	26acaf9eeb	mergify: add stable-6.0 backport configuration This adds the stable-6.0 backport configuration in mergify. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-02-10 14:57:01 +01:00
Dimitri Savineau	950a6ae406	cephadm-adopt: remove prometheus workaround This was fixed by [1][2] [1] https://tracker.ceph.com/issues/45120 [2] https://github.com/ceph/ceph/commit/252d4b30 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-02-10 13:51:41 +01:00
Dimitri Savineau	d42d584085	doc: update containerized deployment This adds more documentation to the configuration and usage of containerizerd deployment. Closes: #6198 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-02-10 13:50:53 +01:00
Guillaume Abrioux	7e5071856c	doc: update the documentation - mention `stable-6.0` requirements. - update some patterns. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-02-10 13:50:10 +01:00
Dimitri Savineau	48a456dc8c	rolling_update: enforce ceph-container-engine When running the rolling_update.yml playbook and adding the dashboard component in the same time then the requirement (like container packages) aren't installed. This could lead to a failure in case of using authentication on the container registry because the playbook will try to login on the registry but podman/docker aren't yet installed. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1903504 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1918650 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-02-10 08:17:11 +01:00
Dimitri Savineau	e4dd0067c6	ceph-common: enable rhcs tools repo for monitoring The monitoring node running grafana needs the rhcs tools repostory enabled in non containerized deployment to be able to install the ceph-grafana-dashboards rpm package. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1918650 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-02-10 08:17:11 +01:00
Guillaume Abrioux	2f1d287b1c	tests: pin ansible-lint version This commit pins the ansible-lint version to 4.3.7 as ceph-ansible isn't compatible with recent changes in 5.0.0 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-02-10 07:48:24 +01:00
Guillaume Abrioux	54bae480d2	tests: set `mon_max_pg_per_osd` in rgw_multisite Otherwise, the job fails when it tries to create a bucket with `s3cmd mb` command because we have too many PGs per OSD. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-02-10 07:01:21 +01:00
Guillaume Abrioux	931b87e830	rgw: fix a typo in multisite if `rgw_zonegroupmaster` is not defined at the rgw instance level in `rgw_instances` it will fallback to a wrong variable (`rgw_zonemaster`). Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1925247 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-02-10 07:01:21 +01:00
Dimitri Savineau	94af3c87d1	rolling_update: exclude clients from node-exporter Since `b105549` we don't install node-exporter on client nodes so we should also exclude the client node from the node-exporter upgrade. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-02-09 14:41:13 +01:00
Dimitri Savineau	58b101d9ff	docs: nautilus uses ansible 2.9 This updates the ansible release required to deploy nautilus with the stable-4.0 branch. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-02-09 12:46:38 +01:00
Dimitri Savineau	e7cdcfa342	dashboard: update with the new monitoring group Since `eefe11d` the grafana-server group has been renamed to monitoring but the dashboard playbook wasn't updated. This was still working due to the backward compatibility added in the ceph-facts role. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-02-09 12:45:02 +01:00
Dimitri Savineau	ed094ea07a	vagrant: remove centos/8 workaround The CentOS 8 vagrant box has finally been updated [1] with a recent version (the latest one 2011 which means CentOS 8.3). We don't need to download the vagrant libvirt box with a direct url anymore from the CentOS infrastructure. [1] https://app.vagrantup.com/centos/boxes/8 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-02-09 12:41:59 +01:00
Guillaume Abrioux	b9cdee40a2	update: update ceph release pattern in complete upgrade play since master is now deploying quincy, we must update this. Otherwise, it will fail like following: ``` Error EPERM: require_osd_release cannot be lowered once it has been set ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-02-06 00:34:14 +01:00
Guillaume Abrioux	39649f0ce8	common: ensure shaman returns right repo Due to recent changes in shaman, there's a chance it returns the wrong repository from architecture point of view. We can query shaman and ask for the correct architecture to get around this. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-02-06 00:34:14 +01:00
Guillaume Abrioux	44fbadb50c	rolling_update: pg check refactor There's no need to achieve this in two tasks. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-02-06 00:34:14 +01:00
Guillaume Abrioux	c1f627c465	validate: fix a typo fixes a typo Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-02-06 00:34:14 +01:00
Guillaume Abrioux	8eda590130	tests: remove legacy remove a legacy in tox environment definition Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-02-06 00:34:14 +01:00
Guillaume Abrioux	c3eadbc31a	tests: follow up on `7c9063b` `7c9063b1d2` broke some scenarios. This commit fixes them. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-02-06 00:34:14 +01:00
Dimitri Savineau	8939dddff4	library: fix idempotency in ceph_mgr_module The ceph mgr command output is printed on stderr instead of stdout which prevent to set the changed flag to false if the module is already enabled. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-02-03 08:30:44 +01:00
Dimitri Savineau	76a663245d	cephadm-adopt: use ceph_osd_flag module There's no reason to not use the ceph_osd_flag module to set/unset osd flags. Also if there's no OSD nodes in the inventory then we don't need to execute the set/unset play. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-02-03 08:29:31 +01:00
Dimitri Savineau	36fc04eaab	purge-cluster: use parted ansible module Instead of doing some scripting via the shell module, we can use the parted ansible module to check the boot flag on partitions. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-02-03 08:28:22 +01:00
Dimitri Savineau	bc6948037f	library/cephadm_bootstrap: add registry support This adds the custom registry auth support when using a registry with authentication. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-02-03 08:27:28 +01:00
Dimitri Savineau	b1f37c4b3d	ceph-defaults: use https for download.ceph.com There's no reason to still use http on download.ceph.com instead of https. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-02-03 07:14:37 +01:00
Guillaume Abrioux	7c9063b1d2	tests: use lvm batch on osd2 (all_daemons) in order to test lvm batch in purge scenario. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-02-02 17:24:17 +01:00
Guillaume Abrioux	984191ac7f	purge: zap and destroy db and wal devices for lvm batch Those devices (db/wal) are never zapped in lvm batch deployment. Iterating over `dedicated_devices` and `bluestore_wal_devices` fixes this issue. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1922926 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-02-01 13:01:58 -05:00
Dimitri Savineau	7208a39e57	ceph-facts: set rgw_instances_all fact once There's no need to set the rgw_instances_all fact for each node. We can rely on run_once for that one. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-02-01 13:49:12 +01:00
Dimitri Savineau	195159ecef	library: retrieve realm id for zone/zonegroup When the zonegroup or the zone doesn't have a realm associated then it's not possible to modify that ressource. This patch allows to retrieve the current realm id and compare it to the realm id from the realm in parameter. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-01-29 21:07:39 +01:00
Dimitri Savineau	2734a12d44	cephadm-adopt: use radosgw modules for idempotency When rerunning the cephadm-adopt.yml playbook the radosgw realm, zonegroup and zone tasks will fail because the task isn't idempotent. Using the radosgw ansible modules solves that problem. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-01-29 21:07:39 +01:00
Dimitri Savineau	523966d45f	tox: test cephadm-adopt.yml playbook idempotency Rerun the cephadm-adopt.yml playbook a second time for idempotency purpose. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-01-29 21:07:39 +01:00
Dimitri Savineau	ff9d314305	library: make cephadm_adopt module idempotent Rerunning the cephadm_adopt module on an already adopted daemon will fail because the cephadm adopt command isn't idempotent. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1918424 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-01-29 21:07:39 +01:00
Dimitri Savineau	6886700a00	cephadm-adopt: make the playbook idempotent If the cephadm-adopt.yml fails during the first execution and some daemons have already been adopted by cephadm then we can't rerun the playbook because the old container won't exist anymore. Error: no container with name or ID ceph-mon-xxx found: no such container If the daemons are adopted then the old systemd unit doesn't exist anymore so any call to that unit with systemd will fail. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1918424 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-01-29 21:07:39 +01:00
Dimitri Savineau	3749d297c7	ceph-mon: add ExecStartPre docker stop to systemd We already do that in the other systemd templates (mgr, mds, etc..) and would present to add workaround in other orchestration tool. This change is for containerized deployment only. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1882724 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-01-29 09:03:34 +01:00
Guillaume Abrioux	8617081664	rgw: avoid useless call to ceph-rgw since `ceph-rgw` may be called from `ceph-handler` in some contexts we should avoid rerunning it unnecessarily. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-01-28 14:37:14 -05:00
Guillaume Abrioux	e835b08f8f	fs2bs: remove a legacy fact since `cf7345f143`, we don't need to set this fact anymore. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-01-28 16:26:46 +01:00
Guillaume Abrioux	71a5e666e3	rgw: multisite refact Add the possibility to deploy rgw multisite configuration with a mix of secondary and primary zones on a same rgw node. Before that, on a same node, all instances were either primary zones OR secondary. Now you can define a rgw instance like following: ``` rgw_instances: - instance_name: 'rgw0' rgw_zonemaster: false rgw_zonesecondary: true rgw_zonegroupmaster: false rgw_realm: 'france' rgw_zonegroup: 'zonegroup-france' rgw_zone: paris-00 radosgw_address: "{{ _radosgw_address }}" radosgw_frontend_port: 8080 rgw_zone_user: jacques.chirac rgw_zone_user_display_name: "Jacques Chirac" system_access_key: P9Eb6S8XNyo4dtZZUUMy system_secret_key: qqHCUtfdNnpHq3PZRHW5un9l0bEBM812Uhow0XfB endpoint: http://192.168.101.12:8080 ``` Basically it's now possible to define `rgw_zonemaster`, `rgw_zonesecondary` and `rgw_zonegroupmaster` at the intsance level instead of the whole node level. Also, this commit adds an option `deploy_secondary_zones` (default True) which can be set to `False` in order to explicitly ask the playbook to not deploy secondary zones in case where the corresponding endpoint are not deployed yet. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1915478 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-01-27 15:46:43 +01:00
Guillaume Abrioux	fedb36688d	library: fix bug in radosgw_zone.py If for some reason `get_zonegroup()` returns a failure, we must handle and make the module exit properly instead of failing with the following python trace: ``` Traceback (most recent call last): File "./AnsiballZ_radosgw_zone.py", line 247, in <module> _ansiballz_main() File "./AnsiballZ_radosgw_zone.py", line 234, in _ansiballz_main exitcode = debug(sys.argv[1], zipped_mod, ANSIBALLZ_PARAMS) File "./AnsiballZ_radosgw_zone.py", line 202, in debug runpy.run_module(mod_name='ansible.modules.radosgw_zone', init_globals=None, run_name='__main__', alter_sys=True) File "/usr/lib64/python3.6/runpy.py", line 205, in run_module return _run_module_code(code, init_globals, run_name, mod_spec) File "/usr/lib64/python3.6/runpy.py", line 96, in _run_module_code mod_name, mod_spec, pkg_name, script_name) File "/usr/lib64/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/vagrant/.ansible/tmp/ansible-tmp-1610728441.41-685133-218973990589597/debug_dir/ansible/modules/radosgw_zone.py", line 467, in <module> main() File "/home/vagrant/.ansible/tmp/ansible-tmp-1610728441.41-685133-218973990589597/debug_dir/ansible/modules/radosgw_zone.py", line 463, in main run_module() File "/home/vagrant/.ansible/tmp/ansible-tmp-1610728441.41-685133-218973990589597/debug_dir/ansible/modules/radosgw_zone.py", line 425, in run_module zonegroup = json.loads(_out) File "/usr/lib64/python3.6/json/__init__.py", line 354, in loads return _default_decoder.decode(s) File "/usr/lib64/python3.6/json/decoder.py", line 339, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/lib64/python3.6/json/decoder.py", line 357, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-01-27 15:46:43 +01:00
Guillaume Abrioux	959140e785	library: move `fatal()` into ca_common.py this function is defined in various modules, let's move it to `ca_common.py` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-01-27 15:46:43 +01:00
Dimitri Savineau	bbcad9609c	grafana: update container tag to 6.7.4 This update the grafana container tag to 6.7.4. The RHCS version is now based on the RHCS 5 container image which is also based on 6.7.4. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-01-27 15:08:31 +01:00
Dimitri Savineau	7d56771975	ceph-defaults: change default ceph container tag The "latest" ceph container tag references the latest stable release (octopus at the moment). "latest" is an alias on "latest-octopus". On the devel branch we should use "latest-master" tag instead. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-01-22 21:12:34 +01:00
Dimitri Savineau	13427eddac	cephadm-adopt: add grafana group conversion The grafana group conversion task wasn't present in the cephadm-adopt.yml playbook. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1917530 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-01-18 20:52:58 +01:00
Guillaume Abrioux	4af0845702	mon: fix cephx disabled deployment Due to missing condition on `cephx` variable, cephx disabled deployments are broken. This commit fixes this. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1910151 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-01-18 11:30:02 -05:00
Dimitri Savineau	6616908577	module_utils: don't add newline to the data When executing a command via the run_command method and passing some data with stdin then the default behavior is to add append a newline. This breaks the value of password used by our modules. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-01-18 11:29:30 -05:00
Dimitri Savineau	5a14510354	tests/library: remove duplicate parameter Remove duplicate fake_params parameter as it's already defined later as a dict (instead of an empty list). Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-01-14 10:11:17 +01:00
Guillaume Abrioux	e66f12d138	fs2bs: skip migration when a mix of fs and bs is detected Since the default of `osd_objectstore` has changed as of 3.2, some deployments might have a mix of filestore and bluestore OSDs on a same node. In some specific cases, there's a possibility that a filestore OSD shares a journal/db device with a bluestore OSD. We shouldn't try to redeploy in this context because ceph-volume will complain. (either because in lvm batch you can't pass partition or about gpt header). The safest option is to skip the migration on the node when such a mix is detected or force all osds including those already using bluestore (option `force_filestore_to_bluestore=True` has to be passed as an extra var). If all OSDs are using filestore, then they will be migrated to bluestore. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1875777 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-01-12 14:40:25 -05:00
Guillaume Abrioux	ae196bf946	validate: check virtual_ips variable This commit checks the length of `virtual_ips` doesn't exceed the length of `groups[rgwloadbalancer_group_name]`. It also ensure this variable is defined when `groups[rgwloadbalancer_group_name]` contains at least one node. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-01-12 11:03:12 +01:00

1 2 3 4 5 ...

5708 Commits (3eba2a1584284363cae81118ed6dbe7649b03c19) All Branches Search

5708 Commits (3eba2a1584284363cae81118ed6dbe7649b03c19)

All Branches