ceph-ansible

Commit Graph

Author	SHA1	Message	Date
fmount	e655038743	Set grafana_server_addr fact for ipv6 scenarios. As the bz1721914 describes, the grafana_server_addr fact is not defined if ip_version used is ipv6. This commit adds the ip_version condition to set correctly this fact. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1721914 Signed-off-by: fmount <fpantano@redhat.com>	2019-06-26 15:47:22 +02:00
Guillaume Abrioux	366b309c12	facts: fix bug in grafana_server_addr fact setting If no grafana-server group is defined while an mgr group is, that task will fail because `hostvars[groups[grafana_server_group_name][0]` can't return anything since `groups['grafana-server']` will be a non existing key. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-26 10:49:30 +02:00
Guillaume Abrioux	2b9fb377a8	nfs: add missing \| bool filters To address this warning: ``` [DEPRECATION WARNING]: evaluating nfs_ganesha_dev as a bare variable, this behaviour will go away and you might need to add \|bool to the expression in the future ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-26 08:58:51 +02:00
Guillaume Abrioux	edb8d42596	nfs: remove duplicate task This task is already present in pre_requisite_non_container.yml Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-26 08:58:51 +02:00
Dimitri Savineau	45d46541cb	ceph-handler: Fix OSD restart script There's two big issues with the current OSD restart script. 1/ We try to test if the ceph osd daemon socket exists but we use a wildcard for the socket name : /var/run/ceph/*.asok. This fails because we usually have multiple ceph osd sockets (or other ceph daemon collocated) present in /var/run/ceph directory. Currently the test fails with: bash: line xxx: [: too many arguments But it doesn't stop the script execution. Instead we can specify the full ceph osd socket name because we already know the OSD id. 2/ The container filter pattern is wrong and could matches multiple containers resulting the script to fail. We use the filter with two different patterns. One is with the device name (sda, sdb, ..) and the other one is with the OSD id (ceph-osd-0, ceph-osd-15, ..). In both case we could match more than needed. $ docker container ls CONTAINER ID IMAGE NAMES 958121a7cc7d ceph-daemon:latest ceph-osd-strg0-sda 589a982d43b5 ceph-daemon:latest ceph-osd-strg0-sdb 46c7240d71f3 ceph-daemon:latest ceph-osd-strg0-sdaa 877985ec3aca ceph-daemon:latest ceph-osd-strg0-sdab $ docker container ls -q -f "name=sda" 958121a7cc7d 46c7240d71f3 877985ec3aca $ docker container ls CONTAINER ID IMAGE NAMES 2db399b3ee85 ceph-daemon:latest ceph-osd-5 099dc13f08f1 ceph-daemon:latest ceph-osd-13 5d0c2fe8f121 ceph-daemon:latest ceph-osd-17 d6c7b89db1d1 ceph-daemon:latest ceph-osd-1 $ docker container ls -q -f "name=ceph-osd-1" 099dc13f08f1 5d0c2fe8f121 d6c7b89db1d1 Adding an extra '$' character at the end of the pattern solves the problem. Finally removing the get_container_osd_id function because it's not used in the script at all. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-06-21 19:54:15 +02:00
Dimitri Savineau	dc187ea6fa	Change ansible_lsb by ansible_distribution_release The ansible_lsb fact is based on the lsb package (lsb-base, lsb-release or redhat-lsb-core). If the package isn't installed on the remote host then the fact isn't populated. -------- "ansible_lsb": {}, -------- Switching to the ansible_distribution_release fact instead. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-06-21 11:55:05 -04:00
fpantano	ba73dc7b21	Add higher retry/delay defaults to check the quorum status. As per bz1718981, this commit adds higher values to check the quorum status. This is helpful for several OSP deployments that fail during the scale up. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1718981 Signed-off-by: fpantano <fpantano@redhat.com>	2019-06-20 22:39:57 +02:00
Dimitri Savineau	b987534881	ceph-volume: Set max open files limit on container The ceph-volume lvm list command takes ages to complete when having a lot of LV devices on containerized deployment. For instance, with 25 OSDs on a node it takes 3 mins 44s to list the OSD. Adding the max open files limit to the container engine cli when executing the ceph-volume command seems to improve a lot thee execution time ~30s. This was impacting the OSDs creation with ceph-volume (both filestore and bluestore) when using multiple LV devices. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1702285 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-06-20 22:37:40 +02:00
Guillaume Abrioux	46a2683944	facts: add a retry on get current fsid task sometimes it can happen the following task fails: ``` TASK [ceph-facts : get current fsid] ***************************************** task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-dev-centos-container-update/roles/ceph-facts/tasks/facts.yml:78 Wednesday 19 June 2019 18:12:49 +0000 (0:00:00.203) 0:02:39.995 **** fatal: [mon2 -> mon1]: FAILED! => changed=true cmd: - timeout - --foreground - -s - KILL - 600s - docker - exec - ceph-mon-mon1 - ceph - --cluster - ceph - daemon - mon.mon1 - config - get - fsid delta: '0:00:00.239339' end: '2019-06-19 18:12:49.812099' msg: non-zero return code rc: 22 start: '2019-06-19 18:12:49.572760' stderr: 'admin_socket: exception getting command descriptions: [Errno 2] No such file or directory' stderr_lines: <omitted> stdout: '' stdout_lines: <omitted> ``` not sure exactly why since just before this task, mon1 seems to be well UP otherwise it wouldn't have passed the task `waiting for the containerized monitor to join the quorum`. As a quick fix/workaround, let's add a retry which allows us to get around this situation: ``` TASK [ceph-facts : get current fsid] *************************************** task path: /home/jenkins-build/build/workspace/ceph-ansible-scenario/roles/ceph-facts/tasks/facts.yml:78 Thursday 20 June 2019 15:35:07 +0000 (0:00:00.201) 0:03:47.288 ******* FAILED - RETRYING: get current fsid (3 retries left). changed: [mon2 -> mon1] => changed=true attempts: 2 cmd: - timeout - --foreground - -s - KILL - 600s - docker - exec - ceph-mon-mon1 - ceph - --cluster - ceph - daemon - mon.mon1 - config - get - fsid delta: '0:00:00.290252' end: '2019-06-20 15:35:13.960188' rc: 0 start: '2019-06-20 15:35:13.669936' stderr: '' stderr_lines: <omitted> stdout: \|- { "fsid": "153e159d-7ade-42a7-842c-4d04348b901e" } stdout_lines: <omitted> ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-20 13:13:04 -04:00
Dimitri Savineau	7c3640177b	roles: Remove useless become (true) flag We already set the become flag to true at a play level in the site* playbooks so we don't need to set it at a task level. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-06-19 10:31:32 +02:00
Guillaume Abrioux	eece362b38	osd: remove legacy task `parted_results` isn't used anymore in the playbook. By the way, `parted` seems to cause issue because it changes the ownership on devices: ``` root@osd0 ~]# ls -l /dev/sdc* brw-rw----. 1 root disk 8, 32 Jun 11 08:53 /dev/sdc brw-rw----. 1 ceph ceph 8, 33 Jun 11 08:53 /dev/sdc1 brw-rw----. 1 ceph ceph 8, 34 Jun 11 08:53 /dev/sdc2 [root@osd0 ~]# parted -s /dev/sdc print Model: ATA QEMU HARDDISK (scsi) Disk /dev/sdc: 53.7GB Sector size (logical/physical): 512B/512B Partition Table: gpt Disk Flags: Number Start End Size File system Name Flags 1 1049kB 1075MB 1074MB ceph block.db 2 1075MB 2149MB 1074MB ceph block.db [root@osd0 ~]# #We can see ownerships have changed from ceph:ceph to root:disk: [root@osd0 ~]# ls -l /dev/sdc* brw-rw----. 1 root disk 8, 32 Jun 11 08:57 /dev/sdc brw-rw----. 1 root disk 8, 33 Jun 11 08:57 /dev/sdc1 brw-rw----. 1 root disk 8, 34 Jun 11 08:57 /dev/sdc2 [root@osd0 ~]# ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-18 12:45:01 -04:00
Dimitri Savineau	34f9d51178	tests: Update ansible ssh_args variable Because we're using vagrant, a ssh config file will be created for each nodes with options like user, host, port, identity, etc... But via tox we're override ANSIBLE_SSH_ARGS to use this file. This remove the default value set in ansible.cfg. Also adding PreferredAuthentications=publickey because CentOS/RHEL servers are configured with GSSAPIAuthenticationis enabled for ssh server forcing the client to make a PTR DNS query. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-06-17 09:24:24 +02:00
Rishabh Dave	9d88d3199f	ceph-infra: make chronyd default NTP daemon Since timesyncd is not available on RHEL-based OSs, change the default to chronyd for RHEL-based OSs. Also, chronyd is chrony on Ubuntu, so set the Ansible fact accordingly. Fixes: https://github.com/ceph/ceph-ansible/issues/3628 Signed-off-by: Rishabh Dave <ridave@redhat.com>	2019-06-13 14:53:22 -04:00
Rishabh Dave	d1c266e6c7	ceph-infra: update cache for Ubuntu Ubuntu-based CI jobs often fail with error code 404 while installing NTP daemons. Updating cache beforehand should fix the issue. Signed-off-by: Rishabh Dave <ridave@redhat.com>	2019-06-13 14:00:29 +02:00
Rishabh Dave	67071c3169	align cephfs pool creation The definitions of cephfs pools should match openstack pools. Signed-off-by: Rishabh Dave <ridave@redhat.com> Co-Authored-by: Simone Caronni <simone.caronni@teralytics.net>	2019-06-13 09:44:05 +02:00
Guillaume Abrioux	4cf17a6fdd	iscsi: assign application (rbd) to pool 'rbd' if we don't assign the rbd application tag on this pool, the cluster will get `HEALTH_WARN` state like following: ``` HEALTH_WARN application not enabled on 1 pool(s) POOL_APP_NOT_ENABLED application not enabled on 1 pool(s) application not enabled on pool 'rbd' ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-13 07:35:39 +02:00
Dimitri Savineau	da9891da1e	ceph-handler: replace fuser by /proc/net/unix We're using fuser command to see if a process is using a ceph unix socket file. But the fuser command runs through every PID present in /proc/<PID> to see if one of them is using the file. On a system running thousands processes, the fuser command can take a long time to finish. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1717011 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-06-12 19:31:21 +02:00
Guillaume Abrioux	905c2256bd	mon: enforce mon0 delegation for initial_mon_key register since this task is designed to be always run on the first monitor, let's enforce the container name accordingly otherwise it could fail like following: ``` fatal: [mon1 -> mon0]: FAILED! => changed=true cmd: - docker - exec - ceph-mon-mon1 - ceph - --cluster - ceph - --name - mon. - -k - /var/lib/ceph/mon/ceph-mon0/keyring - auth - get-key - mon. delta: '0:00:00.085025' end: '2019-06-12 06:12:27.677936' msg: non-zero return code rc: 1 start: '2019-06-12 06:12:27.592911' stderr: 'Error response from daemon: No such container: ceph-mon-mon1' stderr_lines: <omitted> stdout: '' stdout_lines: <omitted> ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-12 11:21:19 -04:00
Guillaume Abrioux	27856cc499	dashboard: add allow_embedding support Add a variable to support the allow_embedding support. See ceph/ceph-ansible/issues/4084 for details. Fixes: #4084 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-12 16:00:32 +02:00
Guillaume Abrioux	2c9cd9d9e7	dashboard: fix dashboard_url setting This setting must be set to something resolvable. See: ceph/ceph-ansible/issues/4085 for details Fixes: #4085 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-12 15:59:58 +02:00
Dimitri Savineau	d0840217f3	ceph-node-exporter: Fix systemd template `069076b` introduced a bug in the systemd unit script template. This commit fixes the options used by the node-exporter container. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-06-11 21:48:40 +02:00
Dimitri Savineau	dbf81b6b5b	ceph-node-exporter: use modprobe ansible module Instead of using the modprobe command from the path in the systemd unit script, we can use the modprobe ansible module. That way we don't have to manage the binary path based on the linux distribution. Resolves: #4072 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-06-11 21:40:50 +02:00
fmount	069076bbfd	Fix units and add ability to have a dedicated instance Few fixes on systemd unit templates for node_exporter and alertmanager container parameters. Added the ability to use a dedicated instance to deploy the dashboard components (prometheus and grafana). This commit also introduces the grafana_group_name variable to refer grafana group and keep consistency with the other groups. During the integration with TripleO some grafana/prometheus template variables resulted undefined. This commit adds the ability to check if the group exist and create, accordingly, different job groups in prometheus template. Signed-off-by: fmount <fpantano@redhat.com>	2019-06-10 18:18:46 +02:00
Guillaume Abrioux	771648304d	validate: fail in check_devices at the right task see https://bugzilla.redhat.com/show_bug.cgi?id=1648168#c17 for details. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1648168#c17 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-07 16:14:18 +02:00
Dimitri Savineau	f49090df7e	podman: Add systemd dependency on network.target When using podman, the systemd unit scripts don't have a dependency on the network. So we're not sure that the network is up and running when the containers are starting. With docker this behaviour is already handled because the systemd unit scripts depend on docker service which is started after the network. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-06-07 09:28:58 +02:00
guihecheng	35d40c65f8	Add role definitions of ceph-rgw-loadbalancer This add support for rgw loadbalancer based on HAProxy and Keepalived. We define a single role ceph-rgw-loadbalancer and include HAProxy and Keepalived configurations all in this. A single haproxy backend is used to balance all RGW instances and a single frontend is exported via a single port, default 80. Keepalived is used to maintain the high availability of all haproxy instances. You are free to use any number of VIPs. A single VIP is shared across all keepalived instances and there will be one master for one VIP, selected sequentially, and others serve as backups. This assumes that each keepalived instance is on the same node as one haproxy instance and we use a simple check script to detect the state of each haproxy instance and trigger the VIP failover upon its failure. Signed-off-by: guihecheng <guihecheng@cmiot.chinamobile.com>	2019-06-06 17:12:04 +02:00
L3D	ab54fe20ec	ansible: use 'bool' filter on boolean conditionals By running ceph-ansible there are a lot ``[DEPRECATION WARNING]`` like these: ``` [DEPRECATION WARNING]: evaluating containerized_deployment as a bare variable, this behaviour will go away and you might need to add \|bool to the expression in the future. Also see CONDITIONAL_BARE_VARS configuration toggle.. This feature will be removed in version 2.12. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg. ``` Now appended ``\| bool`` on a lot of the affected variables. Sometimes the coding style from ``variable\|bool`` changed to ``variable \| bool`` (with spaces at the pipe). Closes: #4022 Signed-off-by: L3D <l3d@c3woc.de>	2019-06-06 10:21:17 +02:00
Dimitri Savineau	518ab794fb	container-common: support podman on Ubuntu Currently we're only able to use podman on ubuntu if podman's installation is done manually before the ceph-ansible execution because the deb package is present in an external repository. We already manage the docker-ce installation via an external repository so we should be able to allow the podman installation with the same mechanism too. https://github.com/containers/libpod/blob/master/install.md#ubuntu Resolves: #3947 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-06-05 14:07:34 +02:00
Guillaume Abrioux	80875adba7	ceph-osd: do not relabel /run/udev in containerized context Otherwise content in /run/udev is mislabeled and prevent some services like NetworkManager from starting. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-04 11:32:41 -04:00
Guillaume Abrioux	a78fb209b1	tests: test podman against atomic os instead rhel8 the rhel8 image used is an outdated beta version, it is not worth it to maintain this image upstream, since it's possible to test podman with a newer version of centos/atomic-host image. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-04 11:32:41 -04:00
Dimitri Savineau	616c484698	ceph-nfs: use template module for configuration `789cef7` introduces a regression in the ganesha configuration file generation. The new config_template module version broke it. But the ganesha.conf file isn't an ini file and doesn't really need to use the config_template module. Instead we can use the classic template module. Resolves: #4045 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-06-04 09:11:52 +02:00
Guillaume Abrioux	6e2e30db54	dashboard: move ceph-grafana-dashboards package installation This commit moves the package installation into ceph-dashboard role. This is needed to install ceph dasboard json file in `/etc/grafana/dashboards/ceph-dashboard/`. Closes: #4026 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-03 13:36:38 +02:00
Guillaume Abrioux	14f5fc3c86	infra: refact dashboard firewall rules - There is no need to open ports 3000, 8234, 9283 on all nodes. - Add missing rule for alertmanager (port 9093) Closes: #4023 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-03 13:36:38 +02:00
Guillaume Abrioux	a2b6f44665	dashboard: append mgr modules to ceph_mgr_modules when `dashboard_enabled` is `True`, let's append `dashboard` and `prometheus` modules to `ceph_mgr_modules` so they are automatically loaded. Closes: #4026 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-03 13:36:38 +02:00
Dimitri Savineau	7503098ca0	remove ceph-agent role and references The ceph-agent role was used only for RHCS 2 (jewel) so it's not usefull anymore. The current code will fail on CentOS distribution because the rhscon package is only avaible on Red Hat with the RHCS 2 repository and this ceph release is supported on stable-3.0 branch. Resolves: #4020 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-06-03 13:35:50 +02:00
Guillaume Abrioux	003aeea45a	validate: add a check for nfs standalone if `nfs_obj_gw` is True when deploying an internal ganesha with an external ceph cluster, `ceph_nfs_rgw_access_key` and `ceph_nfs_rgw_secret_key` must be provided so the ganesha configuration file can be generated. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-03 13:34:38 +02:00
Guillaume Abrioux	6a6785b719	nfs: support internal Ganesha with external ceph cluster This commits allows to deploy an internal ganesha with an external ceph cluster. This requires to define `external_cluster_mon_ips` with a comma separated list of external monitors. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1710358 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-03 13:34:38 +02:00
Dimitri Savineau	daf92a9e1f	ceph-facts: generate fsid on mon node The fsid generation is done via a python command. When the ansible controller node only have python3 available (like RHEL 8) then the python command isn't necessarily present causing the fsid generation to fail. We already do some resource creation (like ceph keyring secret) with the python command too but from the mon node so we should do the same for fsid. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1714631 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-06-03 10:11:32 +02:00
Guillaume Abrioux	55420d6253	roles: introduce `ceph-container-engine` role This commit splits the current `ceph-container-common` role. This introduces a new role `ceph-container-engine` which handles the tasks specific to the installation of containers tools (docker/podman). This is needed for the ceph-dashboard implementation for 2 main reasons: 1/ Since the ceph-dashboard stack is only containerized, we must install everything needed to run containers even in non containerized deployments. Splitting this role allows us to not have to call the full `ceph-container-common` role which would run a bunch of unneeded tasks that would have been skipped anyway. 2/ The current implementation would have required to run `ceph-container-common` on all ceph-clients nodes which would have been conflicting with `9d3517c670` (we don't want to run ceph-container-common on all client nodes, see mentioned commit for more details) Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-05-22 13:02:10 +02:00
Dimitri Savineau	f37edfa113	ceph-mgr: install python-routes for dashboard The ceph mgr dashboard requires routes python library to be installed on the system. Resolves: #3995 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-05-22 08:46:16 +02:00
Dimitri Savineau	622d9feae9	common: use gnupg instead of gpg gpg package isn't available for all Debian/Ubuntu distribution but gnupg is. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-05-21 17:53:58 +02:00
Dimitri Savineau	29b0d47c8c	ceph-prometheus: fix error in templates - remove trailing double quotes in jinja templates - add jinja filename without .j2 suffix Resolves: #4011 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-05-21 17:53:39 +02:00
Guillaume Abrioux	6ca7372a2d	config: fix ipv6 As of nautilus, if you set `ms bind ipv6 = True` you must explicitly set `ms bind ipv4 = False` too, otherwise OSDs will still try to pick up an IPv4 address. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1710319 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-05-21 10:38:00 -04:00
Dimitri Savineau	0ee833432e	ceph-nfs: apply selinux fix anyway Because ansible_distribution_version doesn't return minor version on CentOS with ansible 2.8 we can apply the selinux anyway but only for CentOS/RHEL 7. Starting RHEL 8, there's a dedicated package for selinux called nfs-ganesha-selinux [1]. Also replace the command module + semanage by the selinux_permissive module. [1] https://github.com/nfs-ganesha/nfs-ganesha/commit/a7911f Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-05-20 13:04:58 +02:00
Dimitri Savineau	0c7fd79865	ceph-validate: use kernel validation for iscsi Ceph iSCSI gateway requires Red Hat Enterprise Linux or CentOS 7.5 or later. Because we can not check the ansible_distribution_version fact for CentOS with ansible 2.8 (returns only the major version) we can fallback by checking the kernel option. - CONFIG_TARGET_CORE=m - CONFIG_TCM_USER2=m - CONFIG_ISCSI_TARGET=m http://docs.ceph.com/docs/master/rbd/iscsi-target-cli-manual-install/ Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-05-20 13:04:58 +02:00
Guillaume Abrioux	72d8315299	switch to ansible 2.8 - remove private attribute with import_role. - update documentation. - update rpm spec requirement. - fix MagicMock python import in unit tests. Closes: #3765 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-05-20 13:04:58 +02:00
Dimitri Savineau	494746b7a6	common: install dependencies for apt modules When using a minimal Debian/Ubuntu distribution there's no ca-certificates and gpg packages installed so the apt modules will fail: Failed to find required executable gpg in paths: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin apt.cache.FetchFailedException: W:https://download.ceph.com/debian-luminous/dists/bionic/InRelease: No system certificates available. Try installing ca-certificates. Resolves: #3994 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-05-20 10:02:43 +02:00
Guillaume Abrioux	9f0d4d6847	dashboard: move defaults variables to ceph-defaults There is no need to have default values for these variables in each roles since there is no corresponding host groups Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-05-16 16:39:13 +02:00
Guillaume Abrioux	e74d80e72f	rename docker_exec_cmd variable This commit renames the `docker_exec_cmd` variable to `container_exec_cmd` so it's more generic. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-05-16 16:39:13 +02:00
Guillaume Abrioux	cc285c417a	dashboard: align the way containers are managed This commit aligns the way the different containers are managed with how it's currently done with the other ceph daemon. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-05-16 16:39:13 +02:00
Guillaume Abrioux	cd5f3fca64	dashboard: convert dashboard_rgw_api_no_ssl_verify to a bool make `dashboard_rgw_api_no_ssl_verify` a bool variable since it seems to be used as it. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-05-16 16:39:13 +02:00
Guillaume Abrioux	8bbcc46ae4	dashboard: remove legacy file this file seems to be no longer used, let's remove it. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-05-16 16:39:13 +02:00
Guillaume Abrioux	14f381200d	dashboard: set less permissive permissions on dashboard certificate/key use `0440` instead of `0644` is enough Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-05-16 16:39:13 +02:00
Guillaume Abrioux	4405f50c85	dashboard: simplify config-key command since stable-4.0 isn't to deploy ceph releases prior to nautilus, there's no need to add this complexity here. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-05-16 16:39:13 +02:00
Guillaume Abrioux	cdff0da7d4	dashboard: do not call ceph-container-common from other role use site.yml to deploy ceph-container-common in order to install docker even in non-containerized deployments since there's no RPM available to deploy the differents applications needed for ceph-dashboard. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-05-16 16:39:13 +02:00
Guillaume Abrioux	742bb6214c	dashboard: use existing variable to detect containerized deployment there is no need to add more complexity for this, let's use `containerized_deployment` in order to detect if we are running a containerized deployment. The idea is to use `container_exec_cmd` the same way we do in the rest of the playbook to run the different ceph commands needed to deploy the ceph-dashboard role. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-05-16 16:39:13 +02:00
Guillaume Abrioux	6d9dbb1d39	facts: set container_binary fact in non-containerized deployment This is needed for the ceph-dashboard implementation since it requires to run containerized application which aren't packaged as RPMs. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-05-16 16:39:13 +02:00
Guillaume Abrioux	3578d576a4	dashboard: rename template files add .j2 to all templates file related to dashboard roles. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-05-16 16:39:13 +02:00
Boris Ranto	b4d1c3693b	dashboard: Support podman This adds support for podman in dashboard-related roles. It also drops the creation of custom network for the dashboard-related roles as this functionality works in a different way with podman. Signed-off-by: Boris Ranto <branto@redhat.com>	2019-05-16 16:39:13 +02:00
Boris Ranto	e737a1f83e	dashboard: Set ssl_server_port if it is supported We cannot use the old fashioned config-key way, here. It was not supported when the option was introduced (post 14.2.0). Since the option is not always supported we can simply ignore the potential failure on ceph clusters that do not support it. Signed-off-by: Boris Ranto <branto@redhat.com>	2019-05-16 16:39:13 +02:00
Boris Ranto	8f77caa932	dashboard: Add and copy alerting rules This commit adds a list of alerting rules for ceph-dashboard from the old cephmetrics project. It also installs the configuration file so that the rules get recognized by the prometheus server. Signed-off-by: Boris Ranto <branto@redhat.com>	2019-05-16 16:39:13 +02:00
Boris Ranto	2f141a6e80	Merge cephmetrics/dashboard-ansible repo This commit will merge dashboard-ansible installation scripts with ceph-ansible. This includes several new roles to setup ceph-dashboard and the underlying technologies like prometheus and grafana server. Signed-off-by: Boris Ranto & Zack Cerza <team-gmeno@redhat.com> Co-authored-by: Zack Cerza <zcerza@redhat.com> Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-05-16 16:39:13 +02:00
Dimitri Savineau	d2ad191eca	container-common: allow podman for other distros Currently podman installation is very tied to RHEL 8 even if we're able to install it on Debian/Ubuntu distribution. This patch changes the way we are starting or not the (fat) container daemon. Before the condition was based on the distribution release and now on the container_service_name variable. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-05-13 16:24:00 +02:00
Bruceforce	c3b0ee30a1	ceph-nfs: fixed with_items If we do this in one line we get the error described in #3968 fixes #3968 Signed-off-by: Bruceforce <markus.greis@gmx.de>	2019-05-13 16:23:43 +02:00
Bruceforce	29f2c953b4	ceph-nfs: fixed condition for "stable repos specific tasks" The old condition would resolve to "when": "nfs_ganesha_stable - ceph_repository == 'community'" now it is "when": [ "nfs_ganesha_stable", "ceph_repository == 'community'" ] Please backport to stable-4.0 Signed-off-by: Bruceforce <markus.greis@gmx.de>	2019-05-13 09:53:54 +02:00
Dimitri Savineau	ba49225eab	Update RHCS version with Nautilus RHCS 4 will be based on Nautilus and only usable on RHEL 8. Updated the default ceph_rhcs_version to 4 and update the rhcs repositories to rhcs 4 with RHEL 8. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-05-13 09:53:18 +02:00
Kevin Coakley	381c58ca3e	Set the rgw_create_pools pools application to rgw Set the application to rgw for pools created from rgw_create_pools. On Ceph Nautilus the heath is set to HEALTH_WARN with the message "application not enabled on X pool(s)" if an application isn't specified for a pool. Signed-off-by: Kevin Coakley <kcoakley@sdsc.edu>	2019-05-13 09:48:25 +02:00
Rishabh Dave	121b5e4184	ceph-rbd-mirror: refactor tasks/main.yml Use blocks for similar tasks in main.yml. And move when keywords before block keywords. Signed-off-by: Rishabh Dave <ridave@redhat.com>	2019-05-10 09:21:54 +02:00
Rishabh Dave	1a4dccdbb9	ceph-mds: group similar tasks in create_mds_filesystem.yml Group similar tasks together using block keyword. Signed-off-by: Rishabh Dave <ridave@redhat.com>	2019-05-10 09:21:05 +02:00
Guillaume Abrioux	936c6fca78	facts: fix external cluster bug running an external ceph cluster deployment with (obviously) no monitors defined in inventory breaks with an undefined error because `_monitor_addresses` never get defined. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1707460 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-05-07 17:53:53 +02:00
Rishabh Dave	56bfec7c58	ceph-mgr: create keys for MGRs Add code in ceph-mgr for creating a keyring for manager in so that managers can be deployed on a separate node too. Signed-off-by: Rishabh Dave <ridave@redhat.com>	2019-05-07 14:13:06 +02:00
Rishabh Dave	89748d579a	don't access other node's docker_exec_cmd variable Except for some corner case, it's not correct to access some other node's copy of variable docker_exec_cmd. Therefore replace "hostvars[groups[mon_group_name][0]]['docker_exec_cmd']" by "docker_exec_cmd". Signed-off-by: Rishabh Dave <ridave@redhat.com>	2019-05-07 12:37:48 +02:00
Gaudenz Steinlin	3c8987c7a5	Fix check mode support Adds "check_mode: no" to commands which register cluster state in a variable and don't modify anything. These commands have to run in order to support running the playbook in check mode. Signed-off-by: Gaudenz Steinlin <gaudenz.steinlin@cloudscale.ch>	2019-05-07 09:49:20 +02:00
Dimitri Savineau	ae266c6f2b	ansible: remove private and static attribute This will be removed in ansible 2.8 and breaks the playbook execution with this release. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-05-02 14:25:17 -04:00
Dimitri Savineau	1999cf3d19	ceph-mds: Increase cpu limit to 4 In containerized deployment the default mds cpu quota is too low for production environment. This is causing performance degradation compared to bare-metal. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1695850 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-04-24 20:33:02 +02:00
Dimitri Savineau	c17106874c	ceph-osd: Increase cpu limit to 4 In containerized deployment the default osd cpu quota is too low for production environment using NVMe devices. This is causing performance degradation compared to bare-metal. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1695880 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-04-24 17:59:42 +02:00
Dimitri Savineau	4ae5ce399b	ceph-iscsi: start tcmu-runner for non-container Only rbd-target-api and rbd-target-gw were started/enabled for non containerized deployment. The issue doesn't happen with containerized setup. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-04-24 10:03:25 +02:00
Rishabh Dave	739a662c80	improve coding style Keywords requiring only one item shouldn't express it by creating a list with single item. Signed-off-by: Rishabh Dave <ridave@redhat.com>	2019-04-23 15:37:07 +02:00
Radu Toader	b2f242660e	Allow CephFS pool to be created with specific rule_name, erasure_profile just like rbd pools Signed-off-by: Radu Toader <radu.m.toader@gmail.com>	2019-04-20 02:26:05 +00:00
Dimitri Savineau	8105a1cefb	ceph-container-common: modify requirement flow Until now it was not possible to install a specific container package because it was somehow hardcoded. This patch allows to override the container package name (docker.io vs docker-ce) and refacts the package installation. This could be achieve via the container_package_name variable. Instead of using one task per distribution we can set the package and service name in vars. This allows to have a unified package task. Also refactorize the debian_prerequisites tasks because the content was outdated. https://docs.docker.com/install/linux/docker-ce/debian/ https://docs.docker.com/install/linux/docker-ce/ubuntu/ Resolves: #3609 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-04-18 16:18:01 +02:00
Guillaume Abrioux	58f3851573	mds: remove legacy task this task has nothing to do in stable-4.0 and after. Let's remove it since stable-4.0 and after aren't intended to deploy luminous. Closes: #3873 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-18 15:55:45 +02:00
Kyle Bader	0bee90b201	rgw: add cpuset support 1/ The OSD already supports cpuset to be used for containerized deployments through the use of the ceph_osd_docker_cpuset_cpus variable. This adds similar support to the RGW service for containerized deployments by setting a new variable named ceph_rgw_docker_cpuset_cpus. Like the OSD, there are times where using distinct cores has advantages over using the CFS in kernel scheduler. ceph_rgw_docker_cpuset_cpus accepts a comma delimited set of CPU ids 2/ Add support for specifying --cpuset-mem variable to restrict the cgroup's memory allocations to a particular numa node, which should typically correspond with the cpu ids of that numa node that were provided with --cpuset-cpus. To ensure the correct cpu ids are used one can run `numactl --hardware` to list the nodes and which cpu ids correspond to each. Signed-off-by: Kyle Bader <kbader@redhat.com> Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-18 15:55:19 +02:00
Dimitri Savineau	86315272c7	ceph-mgr: Add extra module packages Since Nautilus there's mgr extra modules not present in ceph-mgr package but in dedicated packages. Resolves: #3860 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-04-18 15:31:22 +02:00
Guillaume Abrioux	a4bc7bda51	update: refact msgr2 migration this commit refact the msgr2 protocol introduction. If it's a fresh install, let's go with v2 only. If we upgrade to nautilus, we should go with v2+v1 syntax to ensure nothing breaks. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-18 11:16:11 +02:00
Andrew Schoen	67453853ff	rolling_update: set num_osds to the number of running osds We do this so that the ceph-config role can most accurately report the number of osds for the generation of the ceph.conf file. We don't want to use ceph-volume to determine the number of osds because in an upgrade to nautilus ceph-volume won't be able to accurately count osds created by ceph-disk. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2019-04-18 10:55:11 +02:00
Andrew Schoen	5e3dfe5021	ceph-osd: do not run lvm batch tasks during update When performing a rolling update do not try to create any new osds with `ceph-volume lvm batch`. This is troublesome because when upgrading to nautilus the devices list might contain devices that are currently being used by ceph-disk and have GPT headers on them, which will cause ceph-volume to fail when trying to use such a device. Any devices originally created by ceph-disk will need to be removed from the devices list before any new osds can be created. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2019-04-18 10:55:11 +02:00
Dimitri Savineau	c8814d1331	ceph-iscsi-gw: Remove library directory The library directory that contain the custom ceph modules in present in the ceph-ansible root directory. All igw_* mocules are already present there so we don't need the one present in roles/ceph-iscsi-gw/library. Also remove the associated spec file. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-04-18 10:37:57 +02:00
Dimitri Savineau	e471bce76b	allow using ansible 2.8 Currently we only support ansible 2.7 We plan to use 2.8 when it will be release so we have to support both 2.7 and 2.8. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1700548 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-04-17 16:57:37 +02:00
Guillaume Abrioux	edfa4310d3	defaults: refact package dependencies installation. Because `5c98e361df` could be seen as a non backward compatible change this commit reverts it and bring back package dependencies installation support. Let's just modify the default value instead. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-16 11:07:59 -04:00
Guillaume Abrioux	83df60cbc3	defaults: remove some package dependencies These packages aren't needed anymore. They were needed for ceph-init-detect buti as of ceph-init-detect doesn't exist anymore. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1683885 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-15 11:28:58 -04:00
Rishabh Dave	96c180cc0e	check if mon daemon is installed before restarting it Signed-off-by: Rishabh Dave <ridave@redhat.com>	2019-04-15 10:00:50 +02:00
Guillaume Abrioux	edf1ee2073	mon: check if an initial monitor keyring already exists When adding a new monitor, we must reuse the existing initial monitor keyring. Otherwise, the new monitor will issue its 'mkfs' with a new monitor keyring and it will result with a mismatch between them. The new monitor will be unable to join the quorum in the end. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> Co-authored-by: Rishabh Dave <ridave@redhat.com>	2019-04-15 10:00:50 +02:00
Guillaume Abrioux	f899da3172	osd: remove legacy file this file is not used anymore, let's remove it. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-11 11:57:02 -04:00
Guillaume Abrioux	4f68462009	osd: remove ceph-disk scenarios files these files aren't needed anymore since we only use lvm scenario. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-11 11:57:02 -04:00
Guillaume Abrioux	f0416c8892	osd: remove dedicated_devices variable This variable was related to ceph-disk scenarios. Since we are entirely dropping ceph-disk support as of stable-4.0, let's remove this variable. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-11 11:57:02 -04:00
Guillaume Abrioux	4d35e9eeed	osd: remove variable osd_scenario As of stable-4.0, the only valid scenario is `lvm`. Thus, this makes this variable useless. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-11 11:57:02 -04:00
Guillaume Abrioux	4d5637fd8a	osd: remove legacy file ceph_disk_cli_options_facts.yml is not used anymore, let's remove it. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-11 11:57:02 -04:00
Sébastien Han	2888c0825f	validate: only check device when they are devices We only validate the devices that are passed if there is a list of devices to validate. Signed-off-by: Sébastien Han <seb@redhat.com>	2019-04-11 11:57:02 -04:00
Sébastien Han	52df15895b	osd: default osd_scenario to lvm osd_scenario has become obsolete and defaults to lvm. With lvm there is no such things has collocated and non-collocated. Signed-off-by: Sébastien Han <seb@redhat.com>	2019-04-11 11:57:02 -04:00
Sébastien Han	9ea1e49407	validate: print a message for old scenarios ceph-disk is not supported anymore, so all the newly created OSDs will be configured using ceph-volume. Signed-off-by: Sébastien Han <seb@redhat.com>	2019-04-11 11:57:02 -04:00
Sébastien Han	e2a5aa062e	osd: remove ceph-disk support We don't support the preparation of OSD with ceph-disk. ceph-volume is only supported. However, the start operation of OSD is still supported. So let's say you change a config option, the handlers will be able to restart all the OSDs via their respective systemd unit files. Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-11 11:57:02 -04:00
Dimitri Savineau	d2efb7f02b	ceph-mds: Set application pool to cephfs We don't need to use the cephfs variable for the application pool name because it's always cephfs. If the cephfs variable is set to something else than the default value it will break the appplication pool task. Resolves: #3790 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-04-11 15:15:41 +02:00
Guillaume Abrioux	7e0adca7a4	osds: allow passing devices by path ceph-volume didn't work when the devices where passed by path. Since it now support it, let's allow this feature in ceph-ansible Closes: #3812 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-10 13:22:30 -04:00
Guillaume Abrioux	631e5d3144	mon: remove useless delegate_to Let's use a condition to run this task only on the first mon. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-10 00:39:25 +00:00
Dimitri Savineau	d17b1b48b6	rgw: change default frontend on nautilus As discussed in ceph/ceph#26599, beast is now the default frontend for rados gateway with nautilus release. Add rgw_thread_pool_size variable with 512 as default value and keep backward compatibility with num_threads option when using civetweb. Update radosgw_civetweb_num_threads to reflect rgw_thread_pool_size change. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-04-09 17:21:51 +02:00
Dimitri Savineau	37816570c6	container-common: Enable docker on boot for ubuntu docker daemon is automatically started during package installation but the service isn't enabled on boot. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-04-09 16:50:10 +02:00
Matthew Vernon	9dd913cf8a	UCA: Uncomment UCA variables in defaults, fix consequent breakage The Ubuntu Cloud Archive-related (UCA) defaults in roles/ceph-defaults/defaults/main.yml were commented out, which means if you set `ceph_repository` to "uca", you get undefined variable errors, e.g. ``` The task includes an option with an undefined variable. The error was: 'ceph_stable_repo_uca' is undefined The error appears to have been in '/nfs/users/nfs_m/mv3/software/ceph-ansible/roles/ceph-common/tasks/installs/debian_uca_repository.yml': line 6, column 3, but may be elsewhere in the file depending on the exact syntax problem. The offending line appears to be: - name: add ubuntu cloud archive repository ^ here ``` Unfortunately, uncommenting these results in some other breakage, because further roles were written that use the fact of `ceph_stable_release_uca` being defined as a proxy for "we're using UCA", so try and install packages from the bionic-updates/queens release, for example, which doesn't work. So there are a few `apt` tasks that need modifying to not use `ceph_stable_release_uca` unless `ceph_origin` is `repository` and `ceph_repository` is `uca`. Closes: #3475 Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>	2019-04-09 13:44:00 +02:00
Dimitri Savineau	fd4b0ec7eb	ceph-facts: use last ipv6 address for mon/rgw When using monitor_address_block or radosgw_address_block variables to configure the mon/rgw address we're getting the first ip address from the ansible facts present in that cidr. When there's VIP on that network the first filter could return the wrong value. This seems to affect only IPv6 setup because the VIP addresses are added to the ansible facts at the beginning of the list. This is the opposite (at the end) when using IPv4. This causes the mon/rgw processes to bind on the VIP address. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1680155 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-04-09 06:17:05 +02:00
François Lafont	4c3e77d869	ceph-rgw: Fix bad paths which depend on the clustername The path of the RGW environment file (in the /var/lib/ceph/radosgw/ directory) depends on the Ceph clustername. It was not taken into account in the Ansible role `ceph-rgw`. Signed-off-by: flaf <francois.lafont.1978@gmail.com>	2019-04-09 06:16:31 +02:00
Guillaume Abrioux	cbfdbab177	mgr: manage mgr modules when mgr and mon are collocated When mgrs are implicitly collocated on monitors (no mgrs in mgrs group). That include was skipped because of this condition : `inventory_hostname == groups[mgr_group_name][0]` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-09 06:12:29 +02:00
Guillaume Abrioux	f596cc1711	mgr: wait for all mgr to be available before managing mgr modules, we must ensure all mgr are available otherwise we can hit failure like following: ``` stdout:Error ENOENT: all mgr daemons do not support module 'restful', pass --force to force enablement ``` It happens because all mgr are not yet available when trying to manage with mgr modules. Closes: #3100 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-09 06:12:29 +02:00
Rishabh Dave	c0dfa9b61a	allow adding a MDS to already deployed cluster Add a tox scenario that adds an new MDS node as a part of already deployed Ceph cluster and deploys MDS there. Signed-off-by: Rishabh Dave <ridave@redhat.com>	2019-04-08 13:33:28 +02:00
Ali Maredia	37f46a8c5d	rgw multisite: add more than 1 rgw to the master or secondary zone Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1664869 Signed-off-by: Ali Maredia <amaredia@redhat.com>	2019-04-06 08:01:19 +02:00
Dimitri Savineau	d3ae9fd05f	radosgw: Raise cpu limit to 8 In containerized deployment the default radosgw quota is too low for production environment. This is causing performance degradation compared to bare-metal. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1680171 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-04-04 18:50:48 +02:00
fpantano	afbb90e4ac	Check ceph_health_raw.stdout value as string during mon bootstrap According to rdo testing https://review.rdoproject.org/r/#/c/18721 a check on the output of the ceph_health value is added to allow the playbook to make several attempts (according to the retry/delay variables) when waiting the cluster quorum or when the container bootstrap is not ended. It avoids the failure of the command execution when it doesn't receive a valid json object to decode (because cluster is too slow to boostrap compared to ceph-ansible task execution). Signed-off-by: fpantano <fpantano@redhat.com>	2019-04-03 20:55:05 +00:00
Dimitri Savineau	7e5e4229b7	ceph-volume: Add PYTHONIOENCODING env variable Since https://github.com/ceph/ceph/commit/77912c0 ceph-volume uses stdout encoding based on LC_CTYPE and PYTHONIOENCODING environment variables. Thoses variables aren't set when using ansible. Currently this commit breaks non containerized deployment on Ubuntu. TASK [use ceph-volume to create bluestore osds] ******************** cmd: - ceph-volume - --cluster - ceph - lvm - create - --bluestore - --data - /dev/sdb rc: 1 stderr: \|- Traceback (most recent call last): (...) UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 132: ordinal not in range(128) Note that the task is failing on ansible side due to the stdout decoding but the osd creation is successful. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-04-02 12:41:55 +02:00
Rishabh Dave	4241b6403f	merge task blocks if their execution is based on same conditions Signed-off-by: Rishabh Dave <ridave@redhat.com>	2019-03-29 16:16:04 +00:00
Rishabh Dave	e0beaf123a	"when" keyword should precede "block" keyword Otherwise the reader is forced to search for "when" when blocks are too long. Signed-off-by: Rishabh Dave <ridave@redhat.com>	2019-03-29 16:16:04 +00:00
Guillaume Abrioux	f55e2b08be	remove all NBSPs on master branch Similar to #3658 Since there's too many changes between master and stable branches let's commit directly in each branches instead of trying to backport this commit. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-28 11:57:55 +00:00
Dimitri Savineau	40a8e1160c	container: Add python3-docker on Ubuntu bionic When installing python-minimal on Ubuntu bionic, this will add the /usr/bin/python symlink to the default python interpreter. On bionic, this isn't python2 but python3. $ /usr/bin/python --version Python 3.6.7 The python docker library is only installed for python2 which causes issues when running the purge-docker-cluster playbook. This playbook uses the ansible docker modules and requires to have python bindings installed on the remote host. Without the bindings we can see python error reported by the docker module. msg: Failed to import docker or docker-py - No module named 'docker'. Try `pip install docker` or `pip install docker-py` (Python 2.6) Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-03-28 08:03:58 +00:00
Guillaume Abrioux	6f47c20c3a	rgw: fix a typo `ee2d52d33d` introduced a typo. This commit fixes it. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-25 16:02:56 -04:00
Guillaume Abrioux	3c4f464c54	rgw: cleanup legacy task this task was here for backward compatibility. It's time to remove it in the next release. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-25 16:02:56 -04:00
Guillaume Abrioux	9134624578	rgw: add a retry on pool related tasks sometimes those tasks might fail because of a timeout. I've been facing this several times in the CI, adding this retry might help and won't hurt in any case. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-25 16:02:56 -04:00
Guillaume Abrioux	f6e0185146	update: add containerized deployment upgrade support (L->N) Add a couple of fixes to allow containerized deployments upgrade support to upgrade from luminous/mimic to nautilus. - pass CEPH_CONTAINER_IMAGE and CEPH_CONTAINER_BINARY environment variable to the ceph_key module, - fix the docker exec command in 'waiting for the containerized monitor to join the quorum' task according to the `delegate_to` parameter, - override `docker_exec_cmd` in `ceph-facts` with `mon_host` when rolling_update is `True`, - do not run unnecessarily `create_mds_filesystems.yml` when performing an upgrade. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-25 16:02:56 -04:00
Guillaume Abrioux	7386249c71	facts: retrieve fsid during rolling_update playbook otherwise it generates a new cluster fsid and makes the upgrade failing Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-25 16:02:56 -04:00
Guillaume Abrioux	5c3ce4ca77	mon: fetch initial keyring even when running rolling_update otherwise, the task to copy mgr keyring fails during the rolling_update. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-25 16:02:56 -04:00
Guillaume Abrioux	afdaa70a63	update: enable msgr2 protocol This commit enable the msgr2 protocol when the cluster is fully upgraded to nautilus Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-25 16:02:56 -04:00
Guillaume Abrioux	82764afe8d	update: mask systemd service units during upgrade This prevents the packaging from restarting services before we do need to restart them in the rolling update sequence. We want to handle services restart at rolling_update playbook. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-25 16:02:56 -04:00
Guillaume Abrioux	b4f14aba8e	ceph_key: `lookup_ceph_initial_entities` shouldn't fail on update As of nautilus, the initial keyrings list has changed, it means when upgrading from Luminous or Mimic, it is expected there's a mismatch between what is found on the cluster and the expected initial keyring list hardcoded in ceph_key module. We shouldn't fail when upgrading to nautilus. str_to_bool() took from ceph-volume. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> Co-Authored-by: Alfredo Deza <adeza@redhat.com>	2019-03-25 16:02:56 -04:00
Guillaume Abrioux	e99305c684	handlers: do not trigger handlers on rolling_update rolling_update playbook already takes care of stopping/starting services during the sequence. There's no need to trigger potential unwanted services restart. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-25 16:02:56 -04:00
Dimitri Savineau	179fdfbc19	ceph-osd: Ensure lvm2 is installed When using osd_scenario lvm, we never check if the lvm2 package is present on the host. When using containerized deployment and docker on CentOS/RedHat this package will be automatically installed as a dependency but not for Ubuntu distribution. OSD deployed via ceph-volume require the lvmetad.socket to be active and running. Resolves: #3728 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-03-20 22:26:45 +00:00
Guillaume Abrioux	987bdac963	osd: backward compatibility with old disk_list.sh location Since all files in container image have moved to `/opt/ceph-container` this check must look for new AND the old path so it's backward compatible. Otherwise it could end up by templating an inconsistent `ceph-osd-run.sh`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-18 17:25:51 +00:00
Dimitri Savineau	5c39735be5	ceph-validate: fail if there's no ipaddr available in monitor_address_block subnet When using monitor_address_block to determine the ip address of the monitor node, we need an ip address available in that cidr to be present in the ansible facts (ansible_all_ipv[46]_addresses). Currently we don't check if there's an ip address available during the ceph-validate role. As a result, the ceph-config role fails due to an empty list during ceph.conf template creation but the error isn't explicit. TASK [ceph-config : generate ceph.conf configuration file] ***** fatal: [0]: FAILED! => {"msg": "No first item, sequence was empty."} With this patch we will fail before the ceph deployment with an explicit failure message. Resolves: rhbz#1673687 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-03-18 16:35:36 +00:00
Dimitri Savineau	a7b1e35a16	ceph-common: Install yum plugin priorities When using community repository we need to set the priority on the ceph repositories because we could have some conflict with EPEL packages. In order to set the priority on the ceph repositories, we need to install the yum-plugin-priorities package. http://docs.ceph.com/docs/master/install/get-packages/#rpm-packages Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-03-16 06:24:55 +00:00
wumingqiao	31617afca9	ceph-mgr: run mgr_modules.yml only on the first mgr host the task will be delegated to mons[0] for all mgr hosts, so we can just run it on the first host and have the same effect. Signed-off-by: wumingqiao <wumingqiao@beyondcent.com>	2019-03-14 20:16:33 +00:00
Dimitri Savineau	d8538ad4e1	Set the default crush rule in ceph.conf Currently the default crush rule value is added to the ceph config on the mon nodes as an extra configuration applied after the template generation via the ansible ini module. This implies two behaviors: 1/ On each ceph-ansible run, the ceph.conf will be regenerated via ceph-config+template and then ceph-mon+ini_file. This leads to a non necessary daemons restart. 2/ When other ceph daemons are collocated on the monitor nodes (like mgr or rgw), the default crush rule value will be erased by the ceph.conf template (mon -> mgr -> rgw). This patch adds the osd_pool_default_crush_rule config to the ceph template and only for the monitor nodes (like crush_rules.yml). The default crush rule id is read (if exist) from the current ceph configuration. The default configuration is -1 (ceph default). Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1638092 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-03-14 08:56:52 +00:00
Dimitri Savineau	b7f4e3e7c7	ceph-osd: Install numactl package when needed With `3e32dce` we can run OSD containers with numactl support. When using numactl command in a containerized deployment we need to be sure that the corresponding package is installed on the host. The package installation is only executed when the ceph_osd_numactl_opts variable isn't empty. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-03-12 07:43:06 +00:00
Guillaume Abrioux	b3eb9206fa	osd: support numactl options on OSD activate This commit adds OSD containers activate with numactl support. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1684146 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-11 10:14:50 +01:00
Dimitri Savineau	a089e1ec23	systemd/service: Set docker.service conditionally We don't need to set After=docker.service when the container_binary variable isn't set to docker. It doesn't break anything currently but it could be confusing when using podman. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-03-07 20:56:11 +00:00
Dimitri Savineau	d6e71d769c	common: Use rhsm_repository module for RHCS Instead of using subscription-manager with command module we can use the rhsm_repository ansible module. This module already uses repos list feature to determine if a repository is enabled or not. That way this module is idempotent so we don't need changed_when: false anymore. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-03-07 19:15:42 +00:00
Dimitri Savineau	53514a5b50	common: Add noarch to community repository The ceph stable community repository only enables the basearch packages url. Adding the noarch url because starting with nautilus release, some packages are added there and useful for mgr or grafana. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-03-06 00:25:11 +00:00
Dimitri Savineau	4d32ecc980	Force osd pool min_size value to integer After `b8d580b` and `e9e5d5a` we could have either item.min_size or osd_pool_default_min_size using string instead of int causing the condition to be true when it's false. As a result, the task could try to set the pool min_size value to 0 which leads to: Error EINVAL: pool min_size must be between 1 and 1 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-03-05 19:48:09 +00:00
Dimitri Savineau	cb381b41fe	Add CONTAINER_IMAGE env var to ceph daemons Ceph daemons will set the CONTAINER_IMAGE environment variable value in the daemon metadata. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-03-05 15:07:05 +00:00
Guillaume Abrioux	e9e5d5a39a	fix pool min_size customization `b8d580b3f4` introduced a bug when `min_size` isn't set (default to 0). Typical error: ``` Error EINVAL: pool min_size must be between 1 and 1 ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-05 13:29:34 +00:00
Radu Toader	b8d580b3f4	Customize pools min_size Signed-off-by: Radu Toader <radu.m.toader@gmail.com>	2019-03-05 10:57:15 +00:00
Radu Toader	2048255f61	When creating pool, read pool.application and make the call to ceph osd pool enable application Signed-off-by: Radu Toader <radu.m.toader@gmail.com>	2019-03-05 09:16:03 +00:00
Kevin Coakley	b11dc13476	Updated 7 ansible-lint issues in the ceph-mon, ceph-osd, and ceph-rgw roles The following lint issues have been resolved: [301] Commands should not change things if nothing needs doing /home/travis/build/ceph/ceph-ansible/roles/ceph-mon/tasks/ceph_keys.yml:2 [305] Use shell only when shell functionality is required /home/travis/build/ceph/ceph-ansible/roles/ceph-osd/tasks/start_osds.yml:47 [301] Commands should not change things if nothing needs doing /home/travis/build/ceph/ceph-ansible/roles/ceph-rgw/tasks/multisite/destroy.yml:2 [301] Commands should not change things if nothing needs doing /home/travis/build/ceph/ceph-ansible/roles/ceph-rgw/tasks/multisite/destroy.yml:7 [301] Commands should not change things if nothing needs doing /home/travis/build/ceph/ceph-ansible/roles/ceph-rgw/tasks/multisite/destroy.yml:14 [301] Commands should not change things if nothing needs doing /home/travis/build/ceph/ceph-ansible/roles/ceph-rgw/tasks/multisite/destroy.yml:19 [301] Commands should not change things if nothing needs doing /home/travis/build/ceph/ceph-ansible/roles/ceph-rgw/tasks/multisite/destroy.yml:24 Signed-off-by: Kevin Coakley <kcoakley@sdsc.edu>	2019-03-04 22:25:35 +00:00
Guillaume Abrioux	359f8a9a4a	nfs: fix systemd template service for ubuntu `mkdir` is located in `/bin` on Ubuntu. Let's use some jinja to support Ubuntu. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-04 19:54:25 +00:00
Dimitri Savineau	45a7082712	lint: Fix spaces before and after variables ansible-lint reports: [206] Variables should have spaces after {{ and before }} Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-03-01 17:22:24 +00:00
VasishtaShastry	34c25ef49b	Extends check_devices tasks to non-collocated an lvm-batch scenarios Tuned name of a task and error message to make it more user understandable Fixes BZ 1648168 - ceph-validate : devices are not validated in non-collocated and lvm_batch scenario Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1648168 Signed-off-by: VasishtaShastry <vipin.indiasmg@gmail.com>	2019-03-01 02:13:51 +00:00

1 2 3 4 5 ...

2401 Commits (89fbbab6161c4962d1cb368d0a297a39807c1e5c)