ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Guillaume Abrioux	1f4cf61058	rolling_update: refact set_fact `mon_host` each monitor node should select another monitor which isn't itself. Otherwise, one node in the monitor group won't set this fact and causes failure. Typical error: ``` TASK [create potentially missing keys (rbd and rbd-mirror) when mon is containerized] * task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-dev-update_docker_cluster/rolling_update.yml:200 Thursday 22 November 2018 14:02:30 +0000 (0:00:07.493) 0:02:50.005 *** fatal: [mon1]: FAILED! => {} MSG: The task includes an option with an undefined variable. The error was: 'dict object' has no attribute u'mon2' ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `af78173584`)	2018-11-29 01:49:05 +00:00
Sébastien Han	d4f1f12bd0	rolling_update: create rbd and rbd-mirror keyrings During an upgrade ceph won't create keys that were not existing on the previous version. So after the upgrade of let's Jewel to Luminous, once all the monitors have the new version they should get or create the keys. It's ok to have the task fails, especially for the rbd-mirror key, which only appears in Nautilus. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1650572 Signed-off-by: Sébastien Han <seb@redhat.com> (cherry picked from commit `4e267bee4f`)	2018-11-29 01:49:05 +00:00
Sébastien Han	ee96454980	ceph_key: add a get_key function When checking if a key exists we also have to ensure that the key exists on the filesystem, the key can change on Ceph but still have an outdated version on the filesystem. This solves this issue. Signed-off-by: Sébastien Han <seb@redhat.com> (cherry picked from commit `691f373543`)	2018-11-29 01:49:05 +00:00
Sébastien Han	26ea96424c	switch: do not look for devices anymore It's easier lookup a directoriy instead of the block devices, especially because of ceph-volume and ceph-disk have a different way to handle devices. Signed-off-by: Sébastien Han <seb@redhat.com> (cherry picked from commit `c14f9b78ff`)	2018-11-29 00:31:47 +01:00
Sébastien Han	57ac7b94c0	switch: disable all ceph units Prior to this commit we were only disabling ceph-osd units, but forgot the ceph.target which is controlling everything and will restart the ceph-osd units at each reboot. Now that everything gets disabled there won't be any conflicts between the old non-container and the new container units. Signed-off-by: Sébastien Han <seb@redhat.com> (cherry picked from commit `cd56dad9fa`)	2018-11-29 00:31:47 +01:00
Sébastien Han	8d0379b4d9	switch: do not mask systemd unit If we mask it we won't be able to start the OSD container since now the osd container use the osd ID as a name such as: ceph-osd@0 Fixes the error: Failed to execute operation: Cannot send after transport endpoint shutdown Signed-off-by: Sébastien Han <seb@redhat.com> (cherry picked from commit `fe1d09925a`)	2018-11-29 00:31:47 +01:00
Sébastien Han	9b5a93e3a5	osd: re-introduce disk_list check This commit `4cc1506303 (diff-51bbe3572e46e3b219ad726da44b64ebL13)` accidentally removed this check. This is a must have for ceph-disk based containerized OSDs. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-29 00:31:13 +01:00
Guillaume Abrioux	659f2c60b5	validate: change default value for `radosgw_address` change default value of `radosgw_address` to keep consistency with `monitor_address`. Moreover, `ceph-validate` checks if the value is '0.0.0.0' to determine if it has to run `check_eth_rgw.yml`. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1600227 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `e4869ac8bd`)	2018-11-28 23:54:06 +01:00
Guillaume Abrioux	968e6f5854	tests: rgw_multisite allow clusters to talk to each other Adding this rule on the hypervisor will allow cluster to talk to each other. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `96ce8761ba`)	2018-11-28 23:53:58 +01:00
Guillaume Abrioux	133615471a	tests: set pool size to 1 in ceph-override.json setting this setting to 1 makes the CI covering the related code in the playbook without breaking the upgrade scenarios. Those scenarios were broken because there is a check `TASK [waiting for clean pgs...]` in rolling_update.yml, since the pool size for `cephfs_metadata` and `cephfs_data` are updated to `2` in `ceph-override.json` and there is not enough osd to honor this size, some PGs are degraded and make the mentioned check failing. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `3ac6619fb9`)	2018-11-28 23:11:46 +01:00
Guillaume Abrioux	4cc1506303	osd: commonize start_osd code since `ceph-volume` introduction, there is no need to split those tasks. Let's refact this part of the code so it's clearer. By the way, this was breaking rolling_update.yml when `openstack_config: true` playbook because nothing ensured OSDs were started in ceph-osd role (In `openstack_config.yml` there is a check ensuring all OSD are UP which was obviously failing) and resulted with OSDs on the last OSD node not started anyway. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `f7fcc012e9`)	2018-11-28 23:11:46 +01:00
Guillaume Abrioux	b72d806f4c	mgr: fix mgr keyring error on rolling_update when upgrading from RHCS 2.5 to 3.2, it fails because the task `create ceph mgr keyring(s) when mon is containerized` has a when condition `inventory_hostname == groups[mon_group_name]\|last`. First, this is incorrect because `inventory_hostname` is referring to a mgr node, it means this condition would have never been satisfied. Then, this condition + `serial: 1` makes the mgr keyring creating skipped on the first node. Further, the `ceph-mgr` role tries to copy the mgr keyring (it's not aware we are running `serial: 1`) this leads to a failure like the following: ``` TASK [ceph-mgr : copy ceph keyring(s) if needed] ************************************************************************************************************************************************************************************************************************************************************************* task path: /usr/share/ceph-ansible/roles/ceph-mgr/tasks/common.yml:10 Tuesday 27 November 2018 12:03:34 +0000 (0:00:00.296) 0:11:01.290 **** An exception occurred during task execution. To see the full traceback, use -vvv. The error was: AnsibleFileNotFound: Could not find or access '~/ceph-ansible-keys/48d78ac1-e0d6-4e35-ab3e-772aea7828fc//etc/ceph/local.mgr.magna021.keyring' failed: [magna021] (item={u'dest': u'/var/lib/ceph/mgr/local-magna021/keyring', u'name': u'/etc/ceph/local.mgr.magna021.keyring', u'copy_key': True}) => {"changed": false, "item": {"copy_key": true, "dest": "/var/lib/ceph/mgr/local-magna021/keyring", "name": "/etc/ceph/local.mgr.magna021.keyring"}, "msg": "Could not find or access '~/ceph-ansible-keys/48d78ac1-e0d6-4e35-ab3e-772aea7828fc//etc/ceph/local.mgr.magna021.keyring'"} ``` The ceph_key module is idempotent, so there is no need to have such a condition. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1649957 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `73287f91bc`)	2018-11-28 23:11:46 +01:00
Guillaume Abrioux	3ead8a2586	tests: apply dev_setup on the secondary cluster for rgw_multisite we must apply this playbook before deploying the secondary cluster. Otherwise, there will be a mismatch between the two deployed cluster. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `3d8f4e6304`)	2018-11-28 12:56:57 +00:00
Sébastien Han	2fca8555cc	handler: show unit logs on error This will tremendously help debugging daemons that fail on restart by showing the systemd unit logs. Signed-off-by: Sébastien Han <seb@redhat.com> (cherry picked from commit `a9b337ba66`)	2018-11-27 12:44:15 +00:00
Andrew Schoen	59524c7246	ceph-volume: be idempotent when the batch strategy changes If you deploy with 2 HDDs and 1 SDD then each subsequent deploy both HDD drives will be filtered out, because they're already used by ceph. ceph-volume will report this as a 'strategy change' because the device list went from a mixed type of HDD and SDD to a single type of only SDD. This situation results in a non-zero exit code from ceph-volume. We want to handle this situation gracefully and report that nothing will be changed. A similar json structure to what would have been given by ceph-volume is returned in the 'stdout' key. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1650306 Signed-off-by: Andrew Schoen <aschoen@redhat.com> (cherry picked from commit `e13f32c1c5`)	2018-11-27 00:23:21 +00:00
Guillaume Abrioux	1a1886a442	config: convert _osd_memory_target to int ceph.conf doesn't accept float value. Typical error seen: ``` $ sudo ceph daemon osd.2 config get osd_memory_target Can't get admin socket path: unable to get conf option admin_socket for osd.2: parse error setting 'osd_memory_target' to '7823740108,8' (strict_si_cast: unit prefix not recognized) ``` This commit ensures the value inserted in ceph.conf will be an integer. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `68dde424f6`)	2018-11-21 15:35:55 +00:00
Guillaume Abrioux	abdc245ceb	infra: don't restart firewalld if unit is masked if firewalld.service systemd unit is masked, the handler will fail when trying to restart it. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1650281 (cherry picked from commit `63b9835cbb`) Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-11-19 17:32:44 +01:00
Neha Ojha	c96af4bac9	osd_memory_target: standardize unit and fix calculation * The default value of osd_memory_target used by ceph is 4294967296 bytes, so use the same as ceph-ansible default. * Convert ansible_memtotal_mb to bytes to calculate osd_memory_target Signed-off-by: Neha Ojha <nojha@redhat.com> (cherry picked from commit `10538e9a23`)	2018-11-19 10:51:05 +00:00
Guillaume Abrioux	f5d8701ed8	client: fix a typo in create_users_keys.yml `cd1e4ee024` introduced a typo. This commit fixes it. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `393ab94728`)	2018-11-17 20:59:11 +00:00
Guillaume Abrioux	62d2ddafd4	validate: allow stable-3.2 to run with ansible 2.4 Although this is not officially supported, this commit allows `stable-3.2` to run against ansible 2.4. This should ease the transition in RHOSP. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-11-16 08:57:00 +00:00
Jason Dillaman	3b40e2bc87	igw: add support for IPv6 Signed-off-by: Jason Dillaman <dillaman@redhat.com> (cherry picked from commit `0aff0e9ede`) Conflicts: library/igw_purge.py: trivial resolution roles/ceph-iscsi-gw/library/igw_purge.py: trivial resolution	2018-11-13 17:35:58 +00:00
Mike Christie	702f2baccc	igw: open iscsi target port Open the port the iscsi target uses for iscsi traffic. Signed-off-by: Mike Christie <mchristi@redhat.com> (cherry picked from commit `5ba7d1671e`)	2018-11-12 10:46:41 +00:00
Mike Christie	44ee5c7495	igw: use api_port variable for firewall port setting Don't hard code api port because it might be overridden by the user. Signed-off-by: Mike Christie <mchristi@redhat.com> (cherry picked from commit `e2f1f81de4`)	2018-11-12 10:46:41 +00:00
Mike Christie	db576f6f0e	igw: fix firewall iscsi_group_name check The firewall setup for igw is not getting setup because iscsi_group_name does not it exist. It should be iscsi_gw_group_name. Signed-off-by: Mike Christie <mchristi@redhat.com> (cherry picked from commit `a4ff52842c`)	2018-11-12 10:46:41 +00:00
Mike Christie	c843ea1d92	igw: Fix default api port The default igw api port is 5000 in the manual setup docs and ceph-iscsi-config package so this syncs up ansible. Signed-off-by: Mike Christie <mchristi@redhat.com> (cherry picked from commit `a10853c5f8`)	2018-11-12 10:46:41 +00:00
VasishtaShastry	f17140c03d	ceph-validate : Added functions to accept true and flase ceph-validate used to throw error for setting flags as 'true' or 'false' for True and False Now user can set the flags 'dmcrypt' and 'osd_auto_discovery' as 'true' or 'false' Will fix - Bug 1638325 Signed-off-by: VasishtaShastry <vipin.indiasmg@gmail.com> (cherry picked from commit `098f42f233`)	2018-11-09 16:47:57 +00:00
Rishabh Dave	a74f4204cd	remove configuration files for ceph packages on ubuntu clusters For apt-get, purge command needs to be used, instead of remove command, to remove related configuration files. Otherwise, packages might be shown as installed while running dpkg command even after removing them. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1640061 Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit `640cad3fd8`)	2018-11-09 16:50:25 +01:00
Mike Christie	77de54025b	igw: stop tcmu-runner on iscsi purge When the iscsi purge playbook is run we stop the gw and api daemons but not tcmu-runner which I forgot on the previous PR. Fixes Red Hat BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1621255 Signed-off-by: Mike Christie <mchristi@redhat.com> (cherry picked from commit `b523a44a1a`)	2018-11-09 16:50:04 +01:00
Guillaume Abrioux	93cdbddd78	tests: test ooo_collocation agasint v3.0.3 ceph-container image Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `811f043947`)	2018-11-09 16:48:35 +01:00
Sébastien Han	12ce311da5	rbd-mirror: enable ceph-rbd-mirror.target Without this the daemon will never start after reboot. Signed-off-by: Sébastien Han <seb@redhat.com> (cherry picked from commit `b7a791e902`)	2018-11-09 16:48:35 +01:00
Andrew Schoen	ee883aa9f2	validate: do not validate ceph_repository if deploying containers Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1630975 Signed-off-by: Andrew Schoen <aschoen@redhat.com> (cherry picked from commit `9cd8ecf0cc`)	2018-11-09 15:14:40 +00:00
Guillaume Abrioux	d5409109fb	rgw: move multisite default variables in ceph-defaults Move all rgw multisite variables in ceph-defaults so ceph-validate can go through them. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-30 17:41:35 +01:00
Guillaume Abrioux	f52344300a	tests: add more memory for rgw_multsite scenarios Adding more memory to VMs for rgw_multisite scenarios could avoid this error I have recently hit in the CI: (It is worth it to set 1024Mb since there is only 2 nodes in those scenarios.) ``` fatal: [osd0]: FAILED! => { "changed": false, "cmd": [ "docker", "run", "--rm", "--entrypoint", "/usr/bin/ceph", "docker.io/ceph/daemon:latest-luminous", "--version" ], "delta": "0:00:04.799084", "end": "2018-10-29 17:10:39.136602", "rc": 1, "start": "2018-10-29 17:10:34.337518" } STDERR: Traceback (most recent call last): File "/usr/bin/ceph", line 125, in <module> import rados ImportError: libceph-common.so.0: cannot map zero-fill pages: Cannot allocate memory ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-30 14:00:28 +01:00
Guillaume Abrioux	547e90f281	rgw: move multisite related tasks after docker/main.yml We must play this task after the container has started otherwise rgw_multisite tasks will fail. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-30 14:00:28 +01:00
Guillaume Abrioux	710e11668d	rgw: add rgw_multisite for containerized deployments run commands on containers when containerized deployments. (At the moment, all commands are run on the host only) Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-30 14:00:28 +01:00
Guillaume Abrioux	37970a5b3c	tests: add rgw_multisite functional test Add a playbook that will upload a file on the master then try to get info from the secondary node, this way we can check if the replication is ok. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-30 14:00:28 +01:00
Guillaume Abrioux	4d464c1003	rgw: add testing scenario for rgw multisite This will setup 2 cluster with rgw multisite enabled. First cluster will act as the 'master', the 2nd will be the secondary one. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-30 14:00:28 +01:00
Guillaume Abrioux	fe88c89c9c	validate: remove check on rgw_multisite_endpoint_addr definition since `rgw_multisite_endpoint_addr` has a default value to `{{ ansible_fqdn }}`, it shouldn't be mandatory to set this variable. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-30 14:00:28 +01:00
Ali Maredia	59e6d04f9b	rgw: add ceph-validate tasks for multisite, other fixes - updated README-MULTISITE - re-added destroy.yml - added tasks in ceph-validate to make sure the rgw multisite vars are set Signed-off-by: Ali Maredia <amaredia@redhat.com>	2018-10-30 14:00:28 +01:00
Guillaume Abrioux	77d5d128c3	rgw: add a dedicated variable for multisite endpoint We should give users the possibility to set the IP they want as multisite endpoint, setting the default value to `{{ ansible_fqdn }}` to not force them to set this variable. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-30 14:00:28 +01:00
Ali Maredia	474f151450	rgw: update rgw multisite tasks - remove destroy tasks - cleanup conditionals and syntax - remove unnecessary realm pulls - enable multisite to be tested in automated testing infra - add multisite related vars to main.yml and group_vars - update README-MULTISITE - ensure all `radosgw-admin` commands are being run on a mon Signed-off-by: Ali Maredia <amaredia@redhat.com>	2018-10-30 14:00:28 +01:00
Sébastien Han	9e87a5ae5e	travis: add ansible-galaxy integration This instructs Travis to notify Galaxy when a build completes. Since 3.0 the ansible-galaxy has the ability to build and push roles from repos with multiple roles. Closes: https://github.com/ceph/ceph-ansible/issues/3165 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-10-30 13:45:30 +01:00
Sébastien Han	49d4b65751	gitignore: add mergify and travis as exceptions Git must notice changes from .travis.yml and .mergify.yml Signed-off-by: Sébastien Han <seb@redhat.com>	2018-10-30 13:45:30 +01:00
Sébastien Han	b8a203bacf	contrib: rm script push-roles-to-ansible-galaxy.sh The script is not used anymore and soon Travis CI will do this job of pushing the role into the galaxy. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-10-30 13:45:30 +01:00
Sébastien Han	0e659caf77	cleanup repos's root Remove old files and move scripts to the contrib directory. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-10-30 10:37:48 +00:00
Maciej Naruszewicz	252d0f9cf2	ceph-volume: fix TypeError exception when setting osds-per-device > 1 osds-per-device needs to be passed to run_command as a string. Otherwise, expandvars method will try to iterate over an integer. Signed-off-by: Maciej Naruszewicz <maciej.naruszewicz@intel.com>	2018-10-29 21:56:37 +01:00
Sébastien Han	22aed97266	testinfra: change test osds for containers We do not use @<device> anymore so we don't need to perform the readlink check anymore. Also we are making an exception for ooo which is still using ceph-disk. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-10-29 18:31:17 +01:00
Sébastien Han	1df0a7acce	ceph_volume: add container support for batch https://tracker.ceph.com/issues/36363 has been resolved and the patch has been backported to luminous and mimic so let's enable the container support. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1541415 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-10-29 18:31:17 +01:00
Sébastien Han	1cdec4069a	test_osd: dynamically get the osd container Do not enforce the container name since this will fail when we have multiple VMs running OSDs. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-10-29 15:33:12 +01:00
Sébastien Han	876f6ced74	test: convert all the tests to use lvm ceph-disk is now deprecated in ceph-ansible so let's convert all the ci tests to use lvm instead of ceph-disk. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-10-29 15:33:12 +01:00

... 2 3 4 5 6 ...

4229 Commits (8a74928a194b8ed6ac5cfff4ab1724f9226fee2c) All Branches Search

4229 Commits (8a74928a194b8ed6ac5cfff4ab1724f9226fee2c)

All Branches