ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Dimitri Savineau	54128db5cd	ceph-osd: Fix merge conflict from mergify The PR #3916 was merged automatically by mergify even if there was a confict in the ceph-osd-run.sh.j2 template. This commit resolves the conflict. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-04-24 12:41:23 -04:00
Dimitri Savineau	3ae2a687ed	ceph-osd: Increase cpu limit to 4 In containerized deployment the default osd cpu quota is too low for production environment using NVMe devices. This is causing performance degradation compared to bare-metal. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1695880 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `c17106874c`) # Conflicts: # roles/ceph-osd/templates/ceph-osd-run.sh.j2	2019-04-24 16:02:28 +00:00
Dimitri Savineau	c056ae7b8c	ansible.cfg: Add library path to configuration Ceph module path needs to be configured if we want to avoid issues like: no action detected in task. This often indicates a misspelled module name, or incorrect module path Currently the ansible-lint command in Travis CI complains about that. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1668478 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `a1a871cade`)	2019-04-24 07:49:48 +00:00
Matthew Vernon	1556d802ff	ceph-mon: increase timeout waiting for admin and bootstrap keys With a large and/or busy cluster, it can take significantly more than 30s for a restarted monitor to get to the point where `ceph-create-keys` returns successfully. A recent upgrade of our production cluster failed here because it took a couple of minutes for the newly-upgraded `mon` to be ready. So increase the timeout significantly. This patch is applied to stable-3.2, because the affected code is refactored in stable-4.0 and ceph-create-keys is no longer called. Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>	2019-04-12 17:03:39 +00:00
Dimitri Savineau	f3785ef7dd	tests: Add debug to ceph-override.json It's usefull to have logs in debug mode enabled in order to have more information for developpers. Also reindent to json file. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `d25af1b872`)	2019-04-11 15:38:14 +00:00
Dimitri Savineau	e3e6285aa9	tests/functional: use ceph-override.json symlink We don't need to have multiple ceph-override.json copies. We currently already have symlink to all_daemons/ceph-override.json so we can do it for all scenarios. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `a19054be18`)	2019-04-11 15:38:14 +00:00
Dimitri Savineau	56215d7688	ceph-mds: Set application pool to cephfs We don't need to use the cephfs variable for the application pool name because it's always cephfs. If the cephfs variable is set to something else than the default value it will break the appplication pool task. Resolves: #3790 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `d2efb7f02b`)	2019-04-11 15:38:14 +00:00
Guillaume Abrioux	c5c354a61a	remove all NBSPs char in stable-3.2 branch this can cause issues, let's replace all of these chars with real spaces. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-10 13:27:48 +02:00
Matthew Vernon	a8c9b65d13	UCA: Uncomment UCA variables in defaults, fix consequent breakage The Ubuntu Cloud Archive-related (UCA) defaults in roles/ceph-defaults/defaults/main.yml were commented out, which means if you set `ceph_repository` to "uca", you get undefined variable errors, e.g. ``` The task includes an option with an undefined variable. The error was: 'ceph_stable_repo_uca' is undefined The error appears to have been in '/nfs/users/nfs_m/mv3/software/ceph-ansible/roles/ceph-common/tasks/installs/debian_uca_repository.yml': line 6, column 3, but may be elsewhere in the file depending on the exact syntax problem. The offending line appears to be: - name: add ubuntu cloud archive repository ^ here ``` Unfortunately, uncommenting these results in some other breakage, because further roles were written that use the fact of `ceph_stable_release_uca` being defined as a proxy for "we're using UCA", so try and install packages from the bionic-updates/queens release, for example, which doesn't work. So there are a few `apt` tasks that need modifying to not use `ceph_stable_release_uca` unless `ceph_origin` is `repository` and `ceph_repository` is `uca`. Closes: #3475 Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk> (cherry picked from commit `9dd913cf8a`)	2019-04-09 16:54:37 +00:00
Dimitri Savineau	efa0083f3c	ceph-osd: Drop memory flag with bluestore Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `dc1c0dcee2`)	2019-04-09 13:26:20 +00:00
Dimitri Savineau	bbb8ca6643	mon/rgw: use last ipv6 address When using monitor_address_block or radosgw_address_block variables to configure the mon/rgw address we're getting the first ip address from the ansible facts present in that cidr. When there's VIP on that network the first filter could return the wrong value. This seems to affect only IPv6 setup because the VIP addresses are added to the ansible facts at the beginning of the list. This is the opposite (at the end) when using IPv4. This causes the mon/rgw processes to bind on the VIP address. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1680155 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-04-09 06:17:27 +02:00
Guillaume Abrioux	e8a526c5e0	tests: fix update job jenkins sets CEPH_ANSIBLE_BRANCH to stable-3.2, this makes all nightly job failing. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-08 09:32:43 -04:00
Ali Maredia	e943288cae	rgw multisite: add more than 1 rgw to the master or secondary zone Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1664869 Signed-off-by: Ali Maredia <amaredia@redhat.com> (cherry picked from commit `37f46a8c5d`)	2019-04-06 08:50:30 +00:00
Guillaume Abrioux	f567f66085	tests: run lvm_setup.yml on secondary cluster otherwise ceph-osd fails: ``` ceph-volume lvm prepare: error: Unable to proceed with non-existing device: test_group/data-lv2 ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-06 08:44:53 +02:00
Dimitri Savineau	d1b3d18af1	radosgw: Raise cpu limit to 8 In containerized deployment the default radosgw quota is too low for production environment. This is causing performance degradation compared to bare-metal. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1680171 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `d3ae9fd05f`)	2019-04-04 19:14:28 +02:00
Guillaume Abrioux	aba3d64b87	tests: do not deploy ceph@master in rgw_multisite deploying ceph@master in stable-3.2 is not possible. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-04 17:52:19 +02:00
Guillaume Abrioux	82ed220367	tests: add back testinfra testing `136bfe0` removed testinfra testing on all scenario excepted all_daemons Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `8d106c2c58`)	2019-04-04 10:36:34 +00:00
Guillaume Abrioux	68a832e3c8	tests: pin pytest-xdist to 1.27.0 looks like newer version of pytest-xdist requires pytest>=4.4.0 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `ba0a95211c`)	2019-04-04 10:36:34 +00:00
Guillaume Abrioux	7136f1734e	purge: fix lvm-batch purge osd `lvm_volumes` and/or `devices` variable(s) can be undefined depending on the scenario chosen. These tasks should be run only if these variable are defined, otherwise it ends up with undefined variable errors. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1653307 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `0180738313`)	2019-04-03 08:48:39 +02:00
Guillaume Abrioux	3421cb08d9	tests: test idempotency only on all_daemons job there's no need to test this on all scenarios. testing idempotency on all_daemons should be enough and allow us to save precious resources for the CI. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `136bfe096c`)	2019-04-02 15:28:28 +00:00
Dimitri Savineau	fa6d9c940a	rolling_update: Update systemd unit regex for nvme The systemd unit regex doesn't handle nvme devices (/dev/nvmeXn1). Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1687828 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `c8442f3705`)	2019-04-01 15:22:24 +00:00
Guillaume Abrioux	f200f1ca87	tests: refact update scenario (stable-3.2) refact the update scenario like it has been made in master. (see `f0e616962`) Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-01 16:35:24 +02:00
Dimitri Savineau	8e2cfd9d24	purge-docker-cluster: Remove ceph-osd service The systemd ceph-osd@.service file used for starting the ceph osd containers is used in all osd_scenarios. Currently purging a containerized deployment using the lvm scenario didn't remove the ceph-osd systemd service. If the next deployment is a non-containerized deployment, the OSDs won't be online because the file is still present and override the one from the package. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `7cc626b72d`)	2019-04-01 09:10:29 +00:00
Dimitri Savineau	e08846c14c	tox: Fix container purge jobs On containerized CI jobs the playbook executed is purge-cluster.yml but it should be set to purge-docker-cluster.yml Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `bd0869cd01`)	2019-04-01 06:59:15 +00:00
Guillaume Abrioux	005cb09ba9	tests: add mgr and nfs nodes in all_daemons even not used, we need to fire up those VMs to be able to perform the upgrade in the CI. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-28 15:40:43 +01:00
Dimitri Savineau	e994dabaec	Add uca to ceph_repository choices validation Ubuntu cloud archive is configurable via ceph_repository variable but the uca choice isn't accepted. This commit fixes this issue and also validates the associated uca repository variables. Resolves: #3739 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `94505a3af2`)	2019-03-26 10:27:08 +00:00
Guillaume Abrioux	b92c826661	defaults: change default value for ceph_docker_image_tag Since nautilus has been released, it's now the latest stable release, it means the tag `latest` now refers to nautilus. `stable-3.2` isn't intended to deploy nautilus, therefore, we should change the default value for this variable to the latest release stable-3.2 is able to deploy (mimic). Closes: #3734 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-21 18:37:21 +00:00
Dimitri Savineau	e4a71eabd9	ceph-osd: Ensure lvm2 is installed When using osd_scenario lvm, we never check if the lvm2 package is present on the host. When using containerized deployment and docker on CentOS/RedHat this package will be automatically installed as a dependency but not for Ubuntu distribution. OSD deployed via ceph-volume require the lvmetad.socket to be active and running. Resolves: #3728 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `179fdfbc19`)	2019-03-20 22:59:28 +00:00
Bruceforce	567ad1826b	ceph_crush: fix rstrip for python 3 Removing bytes literals since rstrip only supports type String or None. Please backport to stable-3.2 Signed-off-by: Bruceforce <markus.greis@gmx.de> (cherry picked from commit `6d506dba1a`)	2019-03-20 01:20:30 +00:00
Bruceforce	2590d3cfba	ceph_volume: fix rstrip for python 3 Removing bytes literals since rstrip only supports type String or None. Signed-off-by: Bruceforce <markus.greis@gmx.de>	2019-03-19 18:53:50 +00:00
Phuong Nguyen	274bf3e038	Remove trailing forward slash in ceph_docker_registry variable from group_vars/rhcs.yml.sample file. Also fixed rhcs_edits.txt for variable ceph_docker_registry. Moved namespace to ceph_docker_image variable. Signed-off-by: Phuong Nguyen <pnguyen@redhat.com> (cherry picked from commit `3305309e87`)	2019-03-19 14:40:27 +00:00
Guillaume Abrioux	d3f6556041	osd: backward compatibility with old disk_list.sh location Since all files in container image have moved to `/opt/ceph-container` this check must look for new AND the old path so it's backward compatible. Otherwise it could end up by templating an inconsistent `ceph-osd-run.sh`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `987bdac963`)	2019-03-18 21:56:53 +00:00
Dimitri Savineau	46e8898093	ceph-validate: fail if there's no ipaddr available in monitor_address_block subnet When using monitor_address_block to determine the ip address of the monitor node, we need an ip address available in that cidr to be present in the ansible facts (ansible_all_ipv[46]_addresses). Currently we don't check if there's an ip address available during the ceph-validate role. As a result, the ceph-config role fails due to an empty list during ceph.conf template creation but the error isn't explicit. TASK [ceph-config : generate ceph.conf configuration file] ***** fatal: [0]: FAILED! => {"msg": "No first item, sequence was empty."} With this patch we will fail before the ceph deployment with an explicit failure message. Resolves: rhbz#1673687 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `5c39735be5`)	2019-03-18 18:31:18 +00:00
Gregory Orange	86e39a29c8	Change docker_container parameter network to network_mode Addressing "populate kv_store with custom ceph.conf": Unsupported parameters for (docker_container) module. Looking at https://docs.ansible.com/ansible/latest/modules/docker_container_module.html shows that the correct parameter is network_mode, not network. Signed-off-by: Gregory Orange <gregoryo2014@users.noreply.github.com>	2019-03-18 13:23:10 +00:00
Dimitri Savineau	bfa99cdd53	Set the default crush rule in ceph.conf Currently the default crush rule value is added to the ceph config on the mon nodes as an extra configuration applied after the template generation via the ansible ini module. This implies two behaviors: 1/ On each ceph-ansible run, the ceph.conf will be regenerated via ceph-config+template and then ceph-mon+ini_file. This leads to a non necessary daemons restart. 2/ When other ceph daemons are collocated on the monitor nodes (like mgr or rgw), the default crush rule value will be erased by the ceph.conf template (mon -> mgr -> rgw). This patch adds the osd_pool_default_crush_rule config to the ceph template and only for the monitor nodes (like crush_rules.yml). The default crush rule id is read (if exist) from the current ceph configuration. The default configuration is -1 (ceph default). Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1638092 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `d8538ad4e1`)	2019-03-14 14:48:03 +00:00
Dimitri Savineau	ef9525482b	add-osd.yml: Add become flag for ceph-validate The check_devices task fails if the ceph-validate role isn't executed as a privileged user (Permission denied). failed: [osd0] (item=/dev/sdb) => {"changed": false, "err": "Error: Error opening /dev/sdb: Permission denied\n", "item": "/dev/sdb", "msg": "Error while getting device information with parted script: '/sbin/parted -s -m /dev/sdb -- unit 'MiB' print'", "out": "", "rc": 1} Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `b23c05ae52`)	2019-03-12 14:48:03 +01:00
Dimitri Savineau	2f3206abeb	ceph-osd: Install numactl package when needed With `3e32dce` we can run OSD containers with numactl support. When using numactl command in a containerized deployment we need to be sure that the corresponding package is installed on the host. The package installation is only executed when the ceph_osd_numactl_opts variable isn't empty. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `b7f4e3e7c7`)	2019-03-12 08:14:47 +00:00
Guillaume Abrioux	34086ec233	osd: support numactl options on OSD activate This commit adds OSD containers activate with numactl support. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1684146 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `b3eb9206fa`)	2019-03-11 09:50:29 +00:00
Guillaume Abrioux	224bab0d70	tests: add mgrs section in non_container-collocation No mgrs are deployed in this scenario, causing the testinfra jobs to fail. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-05 10:49:45 +01:00
Guillaume Abrioux	36fafadc67	tests: fix collocation scenario ceph_origin and ceph_repository are mandatory variables. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-05 10:49:45 +01:00
Guillaume Abrioux	e548a9ae7c	tests: use memory backend for cache fact force ansible to generate facts for each run. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `4a1bafdc21`)	2019-03-05 08:40:11 +01:00
Guillaume Abrioux	1209fb1874	tests: pin testinfra version As of testinfra 2.0.0, the binary name is `py.test`. But let's pin the version to 1.19.0. Indeed, migrating to 2.0.0 requires our current testing to be reworked a bit. Since we don't have the bandwidth ATM for this, it's better to simply keep testing with testinfra 1.19.0. Note that I've replaced all `testinfra` occurences by `py.test` anyway. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `b42250332a`)	2019-03-04 15:48:44 +00:00
Guillaume Abrioux	4dd46ec396	add-osd: gather facts in second part of playbook otherwise, it will end up with error like following: ``` FAILED! => {"msg": "'ansible.vars.hostvars.HostVarsVars object' has no attribute 'ansible_hostname'"} ``` because facts won't have been gathered. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1670663 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `a440878533`)	2019-03-04 15:48:44 +00:00
Guillaume Abrioux	06ad7e0b57	purge: fix rbd-mirror group name the default is rbdmirrors in ceph-defaults Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `47ebef374f`)	2019-03-01 22:16:19 +00:00
Guillaume Abrioux	a8467d8f33	purge: fix rbd mirror purge as of `b70d54ac80` the service launched isn't ceph-rbd-mirror@admin.service. it's now `ceph-rbd-mirror@rbd-mirror.{{ ansible_hostname }}` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `a915308477`)	2019-03-01 22:16:19 +00:00
Guillaume Abrioux	5470e6fa42	purge: do not remove /var/lib/apt/lists/* removing the content of this directory seems a bit agressive and cause a redeployment to fail after a purge on debian based distrubition. Typical error: ``` fatal: [mon0]: FAILED! => changed=false attempts: 3 msg: No package matching 'ceph' is available ``` The following task will consider the cache is still valid, so apt doesn't refresh it: ``` - name: update apt cache if cache_valid_time has expired apt: update_cache: yes cache_valid_time: 3600 register: result until: result is succeeded ``` since the task installing ceph packages has a `update_cache: no` it fails: ``` - name: install ceph for debian apt: name: "{{ debian_ceph_pkgs \| unique }}" update_cache: no state: "{{ (upgrade_ceph_packages\|bool) \| ternary('latest','present') }}" default_release: "{{ ceph_stable_release_uca \| default('') }}{{ ansible_distribution_release ~ '-backports' if ceph_origin == 'distro' and ceph_use_distro_backports else '' }}" register: result until: result is succeeded ``` /tmp/* isn't specific to ceph as well, so we shouldn't remove everything in this directory. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `3849f30f58`)	2019-03-01 22:16:19 +00:00
Guillaume Abrioux	255eab59ac	purge: fix purge of lvm devices using `shell` module seems to be the only way to make this task working on rhel based distribution AND debian based distributions. on ubuntu, using `command` ansible module fails like following (not due to `sudo` usage or not): ``` ok: [osd1] => changed=false cmd: command -v ceph-volume failed_when_result: false msg: '[Errno 2] No such file or directory: ''command'': ''command''' rc: 2 ``` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1653307 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `89f77589fa`)	2019-03-01 22:16:19 +00:00
VasishtaShastry	2393d82306	Extends check_devices tasks to non-collocated an lvm-batch scenarios Tuned name of a task and error message to make it more user understandable Fixes BZ 1648168 - ceph-validate : devices are not validated in non-collocated and lvm_batch scenario Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1648168 Signed-off-by: VasishtaShastry <vipin.indiasmg@gmail.com> (cherry picked from commit `34c25ef49b`)	2019-03-01 04:06:57 +00:00
ToprHarley	d1051c8e55	Convert interface names to underscores Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1540881 Signed-off-by: Tomas Petr <tpetr@redhat.com> (cherry picked from commit `573adce7dd`)	2019-02-28 19:02:32 +00:00
Guillaume Abrioux	de3465b6a3	osd: add ipc=host in systemd template for containers in addition to `15812970f0` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `d5be83e504`)	2019-02-28 13:48:39 +00:00

1 2 3 4 5 ...

4269 Commits (c32d690a4cd1aa60987b18f36731696d12615c0a) All Branches Search

4269 Commits (c32d690a4cd1aa60987b18f36731696d12615c0a)

All Branches