ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Dimitri Savineau	2b492e3de1	ceph-handler: Fix OSD restart script There's two big issues with the current OSD restart script. 1/ We try to test if the ceph osd daemon socket exists but we use a wildcard for the socket name : /var/run/ceph/*.asok. This fails because we usually have multiple ceph osd sockets (or other ceph daemon collocated) present in /var/run/ceph directory. Currently the test fails with: bash: line xxx: [: too many arguments But it doesn't stop the script execution. Instead we can specify the full ceph osd socket name because we already know the OSD id. 2/ The container filter pattern is wrong and could matches multiple containers resulting the script to fail. We use the filter with two different patterns. One is with the device name (sda, sdb, ..) and the other one is with the OSD id (ceph-osd-0, ceph-osd-15, ..). In both case we could match more than needed. $ docker container ls CONTAINER ID IMAGE NAMES 958121a7cc7d ceph-daemon:latest ceph-osd-strg0-sda 589a982d43b5 ceph-daemon:latest ceph-osd-strg0-sdb 46c7240d71f3 ceph-daemon:latest ceph-osd-strg0-sdaa 877985ec3aca ceph-daemon:latest ceph-osd-strg0-sdab $ docker container ls -q -f "name=sda" 958121a7cc7d 46c7240d71f3 877985ec3aca $ docker container ls CONTAINER ID IMAGE NAMES 2db399b3ee85 ceph-daemon:latest ceph-osd-5 099dc13f08f1 ceph-daemon:latest ceph-osd-13 5d0c2fe8f121 ceph-daemon:latest ceph-osd-17 d6c7b89db1d1 ceph-daemon:latest ceph-osd-1 $ docker container ls -q -f "name=ceph-osd-1" 099dc13f08f1 5d0c2fe8f121 d6c7b89db1d1 Adding an extra '$' character at the end of the pattern solves the problem. Finally removing the get_container_osd_id function because it's not used in the script at all. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `45d46541cb`)	2019-06-21 14:49:55 -04:00
Dimitri Savineau	f4212b20e5	ceph-volume: Set max open files limit on container The ceph-volume lvm list command takes ages to complete when having a lot of LV devices on containerized deployment. For instance, with 25 OSDs on a node it takes 3 mins 44s to list the OSD. Adding the max open files limit to the container engine cli when executing the ceph-volume command seems to improve a lot thee execution time ~30s. This was impacting the OSDs creation with ceph-volume (both filestore and bluestore) when using multiple LV devices. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1702285 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `b987534881`)	2019-06-20 20:01:13 -04:00
Guillaume Abrioux	f29366b848	ceph-osd: do not relabel /run/udev in containerized context Otherwise content in /run/udev is mislabeled and prevent some services like NetworkManager from starting. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `80875adba7`)	2019-06-19 23:46:46 +02:00
Rishabh Dave	114078bfa1	ceph-infra: make chronyd default NTP daemon Since timesyncd is not available on RHEL-based OSs, change the default to chronyd for RHEL-based OSs. Also, chronyd is chrony on Ubuntu, so set the Ansible fact accordingly. Fixes: https://github.com/ceph/ceph-ansible/issues/3628 Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit `9d88d3199f`)	2019-06-18 10:46:34 +02:00
Rishabh Dave	93c7d8d79d	don't install NTPd on Atomic Since Atomic doesn't allow any installations and NTPd is not present on Atomic image we are using, abort when ntp_daemon_type is set to ntpd. https://github.com/ceph/ceph-ansible/issues/3572 Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit `bdff3e48fd`)	2019-06-18 10:46:34 +02:00
Dimitri Savineau	81de8a8106	remove ceph-agent role and references The ceph-agent role was used only for RHCS 2 (jewel) so it's not usefull anymore. The current code will fail on CentOS distribution because the rhscon package is only avaible on Red Hat with the RHCS 2 repository and this ceph release is supported on stable-3.0 branch. Resolves: #4020 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `7503098ca0`)	2019-06-17 14:42:08 -04:00
Dimitri Savineau	ed9b594b80	tests: Update ansible ssh_args variable Because we're using vagrant, a ssh config file will be created for each nodes with options like user, host, port, identity, etc... But via tox we're override ANSIBLE_SSH_ARGS to use this file. This remove the default value set in ansible.cfg. Also adding PreferredAuthentications=publickey because CentOS/RHEL servers are configured with GSSAPIAuthenticationis enabled for ssh server forcing the client to make a PTR DNS query. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `34f9d51178`)	2019-06-17 12:02:36 -04:00
Guillaume Abrioux	64659d2c82	iscsi: assign application (rbd) to pool 'rbd' if we don't assign the rbd application tag on this pool, the cluster will get `HEALTH_WARN` state like following: ``` HEALTH_WARN application not enabled on 1 pool(s) POOL_APP_NOT_ENABLED application not enabled on 1 pool(s) application not enabled on pool 'rbd' ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `4cf17a6fdd`)	2019-06-13 14:43:25 +02:00
Dimitri Savineau	95f3908e44	ceph-handler: replace fuser by /proc/net/unix We're using fuser command to see if a process is using a ceph unix socket file. But the fuser command runs through every PID present in /proc/<PID> to see if one of them is using the file. On a system running thousands processes, the fuser command can take a long time to finish. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1717011 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `da9891da1e`)	2019-06-12 23:00:21 +02:00
Guillaume Abrioux	db90debcc7	validate: fail in check_devices at the right task see https://bugzilla.redhat.com/show_bug.cgi?id=1648168#c17 for details. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1648168#c17 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `771648304d`)	2019-06-10 08:09:58 +02:00
Guillaume Abrioux	62647e1935	spec: bring back possibility to install ceph with custom repo This can be seen as a regression for customers who were used to deploy in offline environment with custom repositories. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1673254 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `c933645bf7`)	2019-06-07 17:29:57 +02:00
Dimitri Savineau	0b653ee5b4	update default rhcs values and docs The RHCS documentation mentionned in the default values and group_vars directory are referring to RHCS 2.x while it should be 3.x. Revolves: https://bugzilla.redhat.com/show_bug.cgi?id=1702732 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-06-04 14:18:23 +02:00
Dimitri Savineau	b5fdf5fdcb	vagrant: Default box to centos/7 We don't use ceph/ubuntu-xenial anymore but only centos/7 and centos/atomic-host. Changing the default to centos/7. Resolves: #4036 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `24d0fd7003`)	2019-05-31 13:57:55 -04:00
Dimitri Savineau	8a74928a19	tox: Refact lvm_osds scenario The current lvm_osds only tests filestore on one OSD node. We also have bs_lvm_osds to test bluestore and encryption. Let's use only one scenario to test filestore/bluestore and with or without dmcrypt on four OSD nodes. Also use validate_dmcrypt_bool_value instead of types.boolean on dmcrypt validation via notario. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `52b9f3fb28`)	2019-05-10 11:24:32 +02:00
Mike Christie	0a24078bbb	igw: Fix rolling update service ordering We must stop tcmu-runner after the other rbd-target-* services because they may need to interact with tcmu-runner during shutdown. There is also a bug in some kernels where IO can get stuck in the kernel and by stopping rbd-target-* first we can make sure all IO is flushed. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1659611 Signed-off-by: Mike Christie <mchristi@redhat.com> (cherry picked from commit `d7ef12910e`)	2019-05-10 11:12:50 +02:00
Guillaume Abrioux	900244e065	Revert "Revert "cv: support zap by osd fsid"" This reverts commit `addcc1e61a`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-05-10 09:13:10 +02:00
Guillaume Abrioux	f1b4874176	Revert "Revert "shrink_osd: use cv zap by fsid to remove parts/lvs"" This reverts commit `043ee8c158`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-05-10 09:13:10 +02:00
Guillaume Abrioux	5053f32c15	osds: allow passing devices by path ceph-volume didn't work when the devices where passed by path. Since it now support it, let's allow this feature in ceph-ansible Closes: #3812 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `8f2c45dfd3`)	2019-05-09 14:21:43 +02:00
Guillaume Abrioux	addcc1e61a	Revert "cv: support zap by osd fsid" This reverts commit `8454f0144a`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-25 21:27:37 +02:00
Guillaume Abrioux	043ee8c158	Revert "shrink_osd: use cv zap by fsid to remove parts/lvs" This reverts commit `be59e0b451`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-25 21:27:37 +02:00
Dimitri Savineau	2fa8099fa7	osd: set default bluestore_wal_devices empty We only need to set the wal dedicated device when there's three tiers of storage used. Currently the block.wal partition will also be created on the same device than block.db. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1685253 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-04-25 07:13:38 +00:00
Dimitri Savineau	9ff19cc604	rolling_update: restart all ceph-iscsi services Currently only rbd-target-gw service is restarted during an update. We also need to restart tcmu-runner and rbd-target-api services during the ceph iscsi upgrade. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1659611 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `f1048627ea`)	2019-04-24 23:17:41 +00:00
Dimitri Savineau	7418999638	ceph-mds: Increase cpu limit to 4 In containerized deployment the default mds cpu quota is too low for production environment. This is causing performance degradation compared to bare-metal. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1695850 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `1999cf3d19`)	2019-04-24 21:44:23 +00:00
Dimitri Savineau	54128db5cd	ceph-osd: Fix merge conflict from mergify The PR #3916 was merged automatically by mergify even if there was a confict in the ceph-osd-run.sh.j2 template. This commit resolves the conflict. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-04-24 12:41:23 -04:00
Dimitri Savineau	3ae2a687ed	ceph-osd: Increase cpu limit to 4 In containerized deployment the default osd cpu quota is too low for production environment using NVMe devices. This is causing performance degradation compared to bare-metal. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1695880 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `c17106874c`) # Conflicts: # roles/ceph-osd/templates/ceph-osd-run.sh.j2	2019-04-24 16:02:28 +00:00
Dimitri Savineau	c056ae7b8c	ansible.cfg: Add library path to configuration Ceph module path needs to be configured if we want to avoid issues like: no action detected in task. This often indicates a misspelled module name, or incorrect module path Currently the ansible-lint command in Travis CI complains about that. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1668478 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `a1a871cade`)	2019-04-24 07:49:48 +00:00
Matthew Vernon	1556d802ff	ceph-mon: increase timeout waiting for admin and bootstrap keys With a large and/or busy cluster, it can take significantly more than 30s for a restarted monitor to get to the point where `ceph-create-keys` returns successfully. A recent upgrade of our production cluster failed here because it took a couple of minutes for the newly-upgraded `mon` to be ready. So increase the timeout significantly. This patch is applied to stable-3.2, because the affected code is refactored in stable-4.0 and ceph-create-keys is no longer called. Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>	2019-04-12 17:03:39 +00:00
Dimitri Savineau	f3785ef7dd	tests: Add debug to ceph-override.json It's usefull to have logs in debug mode enabled in order to have more information for developpers. Also reindent to json file. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `d25af1b872`)	2019-04-11 15:38:14 +00:00
Dimitri Savineau	e3e6285aa9	tests/functional: use ceph-override.json symlink We don't need to have multiple ceph-override.json copies. We currently already have symlink to all_daemons/ceph-override.json so we can do it for all scenarios. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `a19054be18`)	2019-04-11 15:38:14 +00:00
Dimitri Savineau	56215d7688	ceph-mds: Set application pool to cephfs We don't need to use the cephfs variable for the application pool name because it's always cephfs. If the cephfs variable is set to something else than the default value it will break the appplication pool task. Resolves: #3790 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `d2efb7f02b`)	2019-04-11 15:38:14 +00:00
Guillaume Abrioux	c5c354a61a	remove all NBSPs char in stable-3.2 branch this can cause issues, let's replace all of these chars with real spaces. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-10 13:27:48 +02:00
Matthew Vernon	a8c9b65d13	UCA: Uncomment UCA variables in defaults, fix consequent breakage The Ubuntu Cloud Archive-related (UCA) defaults in roles/ceph-defaults/defaults/main.yml were commented out, which means if you set `ceph_repository` to "uca", you get undefined variable errors, e.g. ``` The task includes an option with an undefined variable. The error was: 'ceph_stable_repo_uca' is undefined The error appears to have been in '/nfs/users/nfs_m/mv3/software/ceph-ansible/roles/ceph-common/tasks/installs/debian_uca_repository.yml': line 6, column 3, but may be elsewhere in the file depending on the exact syntax problem. The offending line appears to be: - name: add ubuntu cloud archive repository ^ here ``` Unfortunately, uncommenting these results in some other breakage, because further roles were written that use the fact of `ceph_stable_release_uca` being defined as a proxy for "we're using UCA", so try and install packages from the bionic-updates/queens release, for example, which doesn't work. So there are a few `apt` tasks that need modifying to not use `ceph_stable_release_uca` unless `ceph_origin` is `repository` and `ceph_repository` is `uca`. Closes: #3475 Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk> (cherry picked from commit `9dd913cf8a`)	2019-04-09 16:54:37 +00:00
Dimitri Savineau	efa0083f3c	ceph-osd: Drop memory flag with bluestore Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `dc1c0dcee2`)	2019-04-09 13:26:20 +00:00
Dimitri Savineau	bbb8ca6643	mon/rgw: use last ipv6 address When using monitor_address_block or radosgw_address_block variables to configure the mon/rgw address we're getting the first ip address from the ansible facts present in that cidr. When there's VIP on that network the first filter could return the wrong value. This seems to affect only IPv6 setup because the VIP addresses are added to the ansible facts at the beginning of the list. This is the opposite (at the end) when using IPv4. This causes the mon/rgw processes to bind on the VIP address. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1680155 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-04-09 06:17:27 +02:00
Guillaume Abrioux	e8a526c5e0	tests: fix update job jenkins sets CEPH_ANSIBLE_BRANCH to stable-3.2, this makes all nightly job failing. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-08 09:32:43 -04:00
Ali Maredia	e943288cae	rgw multisite: add more than 1 rgw to the master or secondary zone Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1664869 Signed-off-by: Ali Maredia <amaredia@redhat.com> (cherry picked from commit `37f46a8c5d`)	2019-04-06 08:50:30 +00:00
Guillaume Abrioux	f567f66085	tests: run lvm_setup.yml on secondary cluster otherwise ceph-osd fails: ``` ceph-volume lvm prepare: error: Unable to proceed with non-existing device: test_group/data-lv2 ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-06 08:44:53 +02:00
Dimitri Savineau	d1b3d18af1	radosgw: Raise cpu limit to 8 In containerized deployment the default radosgw quota is too low for production environment. This is causing performance degradation compared to bare-metal. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1680171 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `d3ae9fd05f`)	2019-04-04 19:14:28 +02:00
Guillaume Abrioux	aba3d64b87	tests: do not deploy ceph@master in rgw_multisite deploying ceph@master in stable-3.2 is not possible. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-04 17:52:19 +02:00
Guillaume Abrioux	82ed220367	tests: add back testinfra testing `136bfe0` removed testinfra testing on all scenario excepted all_daemons Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `8d106c2c58`)	2019-04-04 10:36:34 +00:00
Guillaume Abrioux	68a832e3c8	tests: pin pytest-xdist to 1.27.0 looks like newer version of pytest-xdist requires pytest>=4.4.0 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `ba0a95211c`)	2019-04-04 10:36:34 +00:00
Guillaume Abrioux	7136f1734e	purge: fix lvm-batch purge osd `lvm_volumes` and/or `devices` variable(s) can be undefined depending on the scenario chosen. These tasks should be run only if these variable are defined, otherwise it ends up with undefined variable errors. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1653307 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `0180738313`)	2019-04-03 08:48:39 +02:00
Guillaume Abrioux	3421cb08d9	tests: test idempotency only on all_daemons job there's no need to test this on all scenarios. testing idempotency on all_daemons should be enough and allow us to save precious resources for the CI. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `136bfe096c`)	2019-04-02 15:28:28 +00:00
Dimitri Savineau	fa6d9c940a	rolling_update: Update systemd unit regex for nvme The systemd unit regex doesn't handle nvme devices (/dev/nvmeXn1). Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1687828 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `c8442f3705`)	2019-04-01 15:22:24 +00:00
Guillaume Abrioux	f200f1ca87	tests: refact update scenario (stable-3.2) refact the update scenario like it has been made in master. (see `f0e616962`) Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-01 16:35:24 +02:00
Dimitri Savineau	8e2cfd9d24	purge-docker-cluster: Remove ceph-osd service The systemd ceph-osd@.service file used for starting the ceph osd containers is used in all osd_scenarios. Currently purging a containerized deployment using the lvm scenario didn't remove the ceph-osd systemd service. If the next deployment is a non-containerized deployment, the OSDs won't be online because the file is still present and override the one from the package. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `7cc626b72d`)	2019-04-01 09:10:29 +00:00
Dimitri Savineau	e08846c14c	tox: Fix container purge jobs On containerized CI jobs the playbook executed is purge-cluster.yml but it should be set to purge-docker-cluster.yml Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `bd0869cd01`)	2019-04-01 06:59:15 +00:00
Guillaume Abrioux	005cb09ba9	tests: add mgr and nfs nodes in all_daemons even not used, we need to fire up those VMs to be able to perform the upgrade in the CI. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-28 15:40:43 +01:00
Dimitri Savineau	e994dabaec	Add uca to ceph_repository choices validation Ubuntu cloud archive is configurable via ceph_repository variable but the uca choice isn't accepted. This commit fixes this issue and also validates the associated uca repository variables. Resolves: #3739 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `94505a3af2`)	2019-03-26 10:27:08 +00:00
Guillaume Abrioux	b92c826661	defaults: change default value for ceph_docker_image_tag Since nautilus has been released, it's now the latest stable release, it means the tag `latest` now refers to nautilus. `stable-3.2` isn't intended to deploy nautilus, therefore, we should change the default value for this variable to the latest release stable-3.2 is able to deploy (mimic). Closes: #3734 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-21 18:37:21 +00:00

1 2 3 4 5 ...

4242 Commits (2b492e3de1f12e2c90b53e8c350028d833705701) All Branches Search

4242 Commits (2b492e3de1f12e2c90b53e8c350028d833705701)

All Branches