ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Guillaume Abrioux	73287f91bc	mgr: fix mgr keyring error on rolling_update when upgrading from RHCS 2.5 to 3.2, it fails because the task `create ceph mgr keyring(s) when mon is containerized` has a when condition `inventory_hostname == groups[mon_group_name]\|last`. First, this is incorrect because `inventory_hostname` is referring to a mgr node, it means this condition would have never been satisfied. Then, this condition + `serial: 1` makes the mgr keyring creating skipped on the first node. Further, the `ceph-mgr` role tries to copy the mgr keyring (it's not aware we are running `serial: 1`) this leads to a failure like the following: ``` TASK [ceph-mgr : copy ceph keyring(s) if needed] ************************************************************************************************************************************************************************************************************************************************************************* task path: /usr/share/ceph-ansible/roles/ceph-mgr/tasks/common.yml:10 Tuesday 27 November 2018 12:03:34 +0000 (0:00:00.296) 0:11:01.290 **** An exception occurred during task execution. To see the full traceback, use -vvv. The error was: AnsibleFileNotFound: Could not find or access '~/ceph-ansible-keys/48d78ac1-e0d6-4e35-ab3e-772aea7828fc//etc/ceph/local.mgr.magna021.keyring' failed: [magna021] (item={u'dest': u'/var/lib/ceph/mgr/local-magna021/keyring', u'name': u'/etc/ceph/local.mgr.magna021.keyring', u'copy_key': True}) => {"changed": false, "item": {"copy_key": true, "dest": "/var/lib/ceph/mgr/local-magna021/keyring", "name": "/etc/ceph/local.mgr.magna021.keyring"}, "msg": "Could not find or access '~/ceph-ansible-keys/48d78ac1-e0d6-4e35-ab3e-772aea7828fc//etc/ceph/local.mgr.magna021.keyring'"} ``` The ceph_key module is idempotent, so there is no need to have such a condition. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1649957 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-11-27 18:19:56 +01:00
Sébastien Han	bc2daaeb71	ceph-osd fix batch with container binary Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-27 16:47:40 +00:00
Sébastien Han	aa086f1a47	ceph_key: fix after rebase Fix the tests Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-27 16:47:40 +00:00
Sébastien Han	80ba45793d	fix template generation Position the right condition on ceph_docker_version, activate it when the container_binary is 'docker'. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-27 16:47:40 +00:00
Sébastien Han	00ebdeff78	container-common: remove leftover ntp is installation is managed by the ceph-infra role. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-27 16:47:40 +00:00
Sébastien Han	e5d5dffeb5	shrink-osd: add missing CEPH_BINARY We need to add the right binary to do the docker exec. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-27 16:47:40 +00:00
Guillaume Abrioux	3684d421e4	defaults: play set_radosgw_address.yml only on rgw nodes This is not needed to play these tasks on nodes that are not in rgw group. Always playing this code makes `shrink_mon.yml` failing. Typical error: ``` TASK [ceph-defaults : set_fact _radosgw_address to radosgw_interface - ipv4] * task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-dev-shrink_mon/roles/ceph-defaults/tasks/set_radosgw_address.yml:21 Thursday 22 November 2018 12:34:51 +0000 (0:00:00.154) 0:00:12.371 *** fatal: [localhost]: FAILED! => {} MSG: The task includes an option with an undefined variable. The error was: 'ansible.vars.hostvars.HostVarsVars object' has no attribute u'ansible_eth1' ``` Indeed, `radosgw_interface` is the network interface on rgw only. It is expected that this same interface doesn't exist on `localhost`, so, when running `shrink_mon.yml`, the role `ceph-defaults` is called in `hosts: localhost` and causes the playbook to fail. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-11-27 16:47:40 +00:00
Sébastien Han	4f57e44f9c	defaults: declare container_binary Always declare container_binary and assign it a correct value. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-27 16:47:40 +00:00
Sébastien Han	ac3e18e4c1	ceph-defaults: use podman on Fedora only It seems Atomic 7.5 has podman already, however this is an old version (0.4). The podman integration is targetting RHEL 8, so Fedora is currently the closest to that. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-27 16:47:40 +00:00
Sébastien Han	69d97f6480	site: symlink site-docker to site-container We deprecated site-docker to site-container so let's have a symlink for backward compatibility. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-27 16:47:40 +00:00
Sébastien Han	49e0e19056	rolling_update: update ceph_key task for container Use the new way to create keys on containerized env as introduced by: 1098b71bda90db3dad19ac179f0ba900ccb0f953 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-27 16:47:40 +00:00
Sébastien Han	2814d36c93	infra playbooks: use the right container binary Use podman or docker wether they are available or not. podman will be prioritized over docker if present. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-27 16:47:40 +00:00
Sébastien Han	a4ad0d7720	site: choose the right container runtime binary We need to verify wether podman exists or not, if yes we use it instead of docker. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-27 16:47:40 +00:00
Sébastien Han	f203031f88	iscsi: expose /dev/log in the container During its initialisation both rbd-target-api and rbd-target-gw try to open /dev/log for their syslog handler. If the device is not present the service fails to start. Thus expose /dev/log from the host in the container solves that problem. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-27 16:47:40 +00:00
Sébastien Han	4e5d862bb7	testinfra: linting Make flake8 happy on the testinfra files. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-27 16:47:40 +00:00
Sébastien Han	dcc765d7c7	testinfra: add support for podman Since we are now testing on docker and podman our functionnal tests must reflect that. So now, if we detect the podman binary we will use it, otherwise we default to docker. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-27 16:47:40 +00:00
Sébastien Han	f5c2ca3710	ceph_key: fix rstrip for python 3 Removing bytes literals since rstrip only supports type String or None. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-27 16:47:40 +00:00
Sébastien Han	7100cc5e77	test_lookup_ceph_initial_entities: fix The previous dict was missing 2 entities: * client.bootstrap-mgr * client.bootstrap-rbd-mirror So the test was failing since it expects 7 entities to match. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-27 16:47:40 +00:00
Sébastien Han	d9ac9d466c	test_build_key_path_bootstrap_osd: fix The entity name is client.bootstrap-osd (as returned by Ceph), and not bootstrap-osd. The build_key_path function split 'client.bootstrap-osd' on the '.' so using bootstrap-osd fails with index out of range. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-27 16:47:40 +00:00
Sébastien Han	1afa4c5c95	ceph_key: remove set-uid support The support of set-uid was remove from Ceph during the Nautilus cycle by the following commit: d6def8ba1126209f8dcb40e296977dc2b09a376e so this will not work anymore when deploying Nautilus clusters and above. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-27 16:47:40 +00:00
Sébastien Han	f192bc92a2	ceph_key: use the right container runtime binary Rework all the ceph_key invocation to use either docker or podman binary. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-27 16:47:40 +00:00
Sébastien Han	6cca37b683	client: do not use a dummy container anymore Since 84fcf4639140c390a7f1fcd790ba190503713f86 we now use the container binary cli to create ceph keys instead of creating a container and 'docker execing' into it. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-27 16:47:40 +00:00
Sébastien Han	a96e910114	Add new container scenario Test with podman instead of docker and also support for python 3 only. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-27 16:47:40 +00:00
Sébastien Han	bc6e652a1c	ceph_key: rework container support Previously, we were doing a 'docker exec' inside a mon container, this worked but this wasn't ideal since it required a mon to be up to generate keys. We must be able to generate a key without a running mon, e.g, when we create the initial key or simply when you want to generate a key from any node that is not a mon. Now, just like the ceph_volume module we use a 'docker run' command with the right binary as an entrypoint to perform the choosen action, this is more elegant and also only requires an env variable to be set in the playbook: CEPH_CONTAINER_IMAGE. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-27 16:47:40 +00:00
Sébastien Han	a9b337ba66	handler: show unit logs on error This will tremendously help debugging daemons that fail on restart by showing the systemd unit logs. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-27 11:00:37 +00:00
Guillaume Abrioux	83a67648d8	validate: add nautilus release validate must accept ceph nautilus release. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-11-27 08:26:43 +00:00
Andrew Schoen	e13f32c1c5	ceph-volume: be idempotent when the batch strategy changes If you deploy with 2 HDDs and 1 SDD then each subsequent deploy both HDD drives will be filtered out, because they're already used by ceph. ceph-volume will report this as a 'strategy change' because the device list went from a mixed type of HDD and SDD to a single type of only SDD. This situation results in a non-zero exit code from ceph-volume. We want to handle this situation gracefully and report that nothing will be changed. A similar json structure to what would have been given by ceph-volume is returned in the 'stdout' key. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1650306 Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-11-26 23:23:50 +00:00
Sébastien Han	997667a873	osd: expose udev into the container In order to be able to retrieve udev information, we must expose its socket. As per, https://github.com/ceph/ceph/pull/25201 ceph-volume will start consuming udev output. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-26 18:57:12 +00:00
Guillaume Abrioux	7c99b6df6d	update: fix a typo `hostvars[groups[mon_host]]['ansible_hostname']` seems to be a typo. That should be `hostvars[mon_host]['ansible_hostname']` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-11-26 18:22:20 +01:00
Guillaume Abrioux	f290e49df8	tests: do not fully override previous ceph_conf_overrides We run an initial deployment with `osd_pool_default_size: 1` in `ceph_conf_overrides`. When re-running the playbook to test idempotency and handlers, we reset `ceph_conf_overrides`, we must append a new value instead of just overwritting it, otherwise, this can lead to error in the CI. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-11-26 18:22:20 +01:00
Guillaume Abrioux	af78173584	rolling_update: refact set_fact `mon_host` each monitor node should select another monitor which isn't itself. Otherwise, one node in the monitor group won't set this fact and causes failure. Typical error: ``` TASK [create potentially missing keys (rbd and rbd-mirror) when mon is containerized] * task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-dev-update_docker_cluster/rolling_update.yml:200 Thursday 22 November 2018 14:02:30 +0000 (0:00:07.493) 0:02:50.005 *** fatal: [mon1]: FAILED! => {} MSG: The task includes an option with an undefined variable. The error was: 'dict object' has no attribute u'mon2' ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-11-26 18:22:20 +01:00
Sébastien Han	4e267bee4f	rolling_update: create rbd and rbd-mirror keyrings During an upgrade ceph won't create keys that were not existing on the previous version. So after the upgrade of let's Jewel to Luminous, once all the monitors have the new version they should get or create the keys. It's ok to have the task fails, especially for the rbd-mirror key, which only appears in Nautilus. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1650572 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-26 18:22:20 +01:00
Sébastien Han	691f373543	ceph_key: add a get_key function When checking if a key exists we also have to ensure that the key exists on the filesystem, the key can change on Ceph but still have an outdated version on the filesystem. This solves this issue. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-26 18:22:20 +01:00
Sébastien Han	c14f9b78ff	switch: do not look for devices anymore It's easier lookup a directoriy instead of the block devices, especially because of ceph-volume and ceph-disk have a different way to handle devices. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-23 07:56:23 +00:00
Sébastien Han	cd56dad9fa	switch: disable all ceph units Prior to this commit we were only disabling ceph-osd units, but forgot the ceph.target which is controlling everything and will restart the ceph-osd units at each reboot. Now that everything gets disabled there won't be any conflicts between the old non-container and the new container units. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-23 07:56:23 +00:00
Sébastien Han	fe1d09925a	switch: do not mask systemd unit If we mask it we won't be able to start the OSD container since now the osd container use the osd ID as a name such as: ceph-osd@0 Fixes the error: Failed to execute operation: Cannot send after transport endpoint shutdown Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-23 07:56:23 +00:00
Guillaume Abrioux	5601af8de2	tests: change default pools size default pool size in our test should be explicitly set to 1 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-11-21 18:23:07 +00:00
Guillaume Abrioux	ed42262b37	client: change default pool size default pool size should match the real default that is defined in ceph itself. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-11-21 18:23:07 +00:00
Guillaume Abrioux	6d1fe32998	defaults: change default size for openstack pools default pool size should match the real default that is defined in ceph itself. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-11-21 18:23:07 +00:00
Guillaume Abrioux	fdc438dd0d	defaults: change for default pool size for cephfs_pools default pool size should match the real default that is defined in ceph itself. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-11-21 18:23:07 +00:00
Guillaume Abrioux	f1735e9bb0	defaults: add ceph related vars file This is to add a granularity level. We can have ceph specific variables that user shouldn't have to change here. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-11-21 15:42:50 +00:00
Guillaume Abrioux	7774069d45	refact osd pool size customization Add real default value for osd pool size customization. Ceph itself has an `osd_pool_default_size` default value to `3`. If users don't specify a pool size in various pools definition within ceph-ansible, we should default to `3`. By the way, this kind of condition isn't really clear: ``` when: - rbd_pool_size \| default ("") ``` we should try to get the customized value then default to what is in `osd_pool_default_size` (which has its default value pointing to `ceph_osd_pool_default_size` (`3`) as well) and compare it to `ceph_osd_pool_default_size`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-11-21 15:42:50 +00:00
Guillaume Abrioux	d4c0960f04	mon: move `osd_pool_default_pg_num` in `ceph-defaults` `osd_pool_default_pg_num` parameter is set in `ceph-mon`. When using ceph-ansible with `--limit` on a specifc group of nodes, it will fail when trying to access this variables since it wouldn't be defined. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1518696 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-11-21 15:42:50 +00:00
Guillaume Abrioux	68dde424f6	config: convert _osd_memory_target to int ceph.conf doesn't accept float value. Typical error seen: ``` $ sudo ceph daemon osd.2 config get osd_memory_target Can't get admin socket path: unable to get conf option admin_socket for osd.2: parse error setting 'osd_memory_target' to '7823740108,8' (strict_si_cast: unit prefix not recognized) ``` This commit ensures the value inserted in ceph.conf will be an integer. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-11-21 14:33:27 +00:00
Sébastien Han	e7b3d3e014	site: resync container playbook This PR https://github.com/ceph/ceph-ansible/pull/3251 forgot to create a symlink from site-docker.yml.sample to site-container.yml.sample. This commit resyncs and put the symlink in place. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-20 18:57:55 +01:00
Boris Ranto	dfab42a21f	defaults/facts: Use list instead of keys It is safer to use the list filter than the keys() method since the keys method does have some interoperability issues between python2 and python3 based ansible/jinja. Signed-off-by: Boris Ranto <branto@redhat.com>	2018-11-20 18:48:22 +01:00
Boris Ranto	c2b0cbd699	start_osds: Use list instead of keys If you use python3 based ansible then keys() returns a dict_keys object, not a list of keys. This breaks the installation on such a system. Using the list filter provides a more robust solution that should work on both python2 and python3 based ansible. You can find some more information about the issue, here: https://github.com/ansible/ansible/issues/19514 Signed-off-by: Boris Ranto <branto@redhat.com>	2018-11-20 18:48:22 +01:00
Valentin Lorentz	30ce7e84f4	Discover rbd facts. Signed-off-by: Valentin Lorentz <progval+git@progval.net>	2018-11-20 15:06:01 +01:00
Dan Mick	a2349f05ac	validate plugin: handle missing exception fields without traceback "missing variable" errors introduced by PR3058 would attempt to be reported, but since the exception contained no "path" definition, would cause a second exception in the Invalid exception handler. Make the exception handler verify that any field it tries to use exists, clean up its message formatting, and reduce the verbose level to see the literal error from notario in case more goes wrong in future. Signed-off-by: Dan Mick <dan.mick@redhat.com>	2018-11-19 22:01:07 +00:00
Neha Ojha	10538e9a23	osd_memory_target: standardize unit and fix calculation * The default value of osd_memory_target used by ceph is 4294967296 bytes, so use the same as ceph-ansible default. * Convert ansible_memtotal_mb to bytes to calculate osd_memory_target Signed-off-by: Neha Ojha <nojha@redhat.com>	2018-11-19 09:54:33 +00:00

... 17 18 19 20 21 ...

5081 Commits (87e1f0cc6c790e14893bfc18d9c37daad7126f38) All Branches Search

5081 Commits (87e1f0cc6c790e14893bfc18d9c37daad7126f38)

All Branches