ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Mike Christie	702f2baccc	igw: open iscsi target port Open the port the iscsi target uses for iscsi traffic. Signed-off-by: Mike Christie <mchristi@redhat.com> (cherry picked from commit `5ba7d1671e`)	2018-11-12 10:46:41 +00:00
Mike Christie	44ee5c7495	igw: use api_port variable for firewall port setting Don't hard code api port because it might be overridden by the user. Signed-off-by: Mike Christie <mchristi@redhat.com> (cherry picked from commit `e2f1f81de4`)	2018-11-12 10:46:41 +00:00
Mike Christie	db576f6f0e	igw: fix firewall iscsi_group_name check The firewall setup for igw is not getting setup because iscsi_group_name does not it exist. It should be iscsi_gw_group_name. Signed-off-by: Mike Christie <mchristi@redhat.com> (cherry picked from commit `a4ff52842c`)	2018-11-12 10:46:41 +00:00
Mike Christie	c843ea1d92	igw: Fix default api port The default igw api port is 5000 in the manual setup docs and ceph-iscsi-config package so this syncs up ansible. Signed-off-by: Mike Christie <mchristi@redhat.com> (cherry picked from commit `a10853c5f8`)	2018-11-12 10:46:41 +00:00
Sébastien Han	12ce311da5	rbd-mirror: enable ceph-rbd-mirror.target Without this the daemon will never start after reboot. Signed-off-by: Sébastien Han <seb@redhat.com> (cherry picked from commit `b7a791e902`)	2018-11-09 16:48:35 +01:00
Guillaume Abrioux	d5409109fb	rgw: move multisite default variables in ceph-defaults Move all rgw multisite variables in ceph-defaults so ceph-validate can go through them. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-30 17:41:35 +01:00
Guillaume Abrioux	547e90f281	rgw: move multisite related tasks after docker/main.yml We must play this task after the container has started otherwise rgw_multisite tasks will fail. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-30 14:00:28 +01:00
Guillaume Abrioux	710e11668d	rgw: add rgw_multisite for containerized deployments run commands on containers when containerized deployments. (At the moment, all commands are run on the host only) Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-30 14:00:28 +01:00
Guillaume Abrioux	fe88c89c9c	validate: remove check on rgw_multisite_endpoint_addr definition since `rgw_multisite_endpoint_addr` has a default value to `{{ ansible_fqdn }}`, it shouldn't be mandatory to set this variable. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-30 14:00:28 +01:00
Ali Maredia	59e6d04f9b	rgw: add ceph-validate tasks for multisite, other fixes - updated README-MULTISITE - re-added destroy.yml - added tasks in ceph-validate to make sure the rgw multisite vars are set Signed-off-by: Ali Maredia <amaredia@redhat.com>	2018-10-30 14:00:28 +01:00
Guillaume Abrioux	77d5d128c3	rgw: add a dedicated variable for multisite endpoint We should give users the possibility to set the IP they want as multisite endpoint, setting the default value to `{{ ansible_fqdn }}` to not force them to set this variable. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-30 14:00:28 +01:00
Ali Maredia	474f151450	rgw: update rgw multisite tasks - remove destroy tasks - cleanup conditionals and syntax - remove unnecessary realm pulls - enable multisite to be tested in automated testing infra - add multisite related vars to main.yml and group_vars - update README-MULTISITE - ensure all `radosgw-admin` commands are being run on a mon Signed-off-by: Ali Maredia <amaredia@redhat.com>	2018-10-30 14:00:28 +01:00
Guillaume Abrioux	748342f5b6	roles: fix _docker_memory_limit default value append 'm' suffix to specify the unit size used in all `_docker_memory_limit`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-29 14:59:09 +01:00
Neha Ojha	b7e4d4eb84	roles: do not limit docker_memory_limit for various daemons Since we do not have enough data to put valid upper bounds for the memory usage of these daemons, do not put artificial limits by default. This will help us avoid failures like OOM kills due to low default values. Whenever required, these limits can be manually enforced by the user. More details in https://bugzilla.redhat.com/show_bug.cgi?id=1638148 Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1638148 Signed-off-by: Neha Ojha <nojha@redhat.com>	2018-10-29 14:59:09 +01:00
Sébastien Han	0e63f0f3c9	Merge branch 'master' into wip-rm-calamari	2018-10-29 14:50:37 +01:00
Sébastien Han	5ab90b358c	nfs: do not create the nfs user if already present Check if the user exists and skip its creation if true. Closes: https://github.com/ceph/ceph-ansible/issues/3254 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-10-26 16:24:38 +00:00
Guillaume Abrioux	4d698ce831	ceph-infra: reload firewall after rules are added we ensure that firewalld is installed and running before adding any rule. This has no sense anymore not to reload firewalld once the rule are added. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-23 09:53:09 +00:00
Rishabh Dave	ee2d52d33d	allow custom pool size Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1596339 Signed-off-by: Rishabh Dave <ridave@redhat.com>	2018-10-22 16:00:21 +02:00
Guillaume Abrioux	48cfc60722	defaults: set default `configure_firewall` to `True` Let's configure firewalld by default. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1526400 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-19 15:12:45 +02:00
Guillaume Abrioux	8fa437b7bd	iscsi: fix networking issue on containerized env The iscsi-gw containers can't reach monitors without `--net=host` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-19 00:12:43 +00:00
Guillaume Abrioux	e77c36ad17	infra: move restart fw handler in ceph-infra role Move the handler to restart firewall in ceph-infra role. Closes: #3243 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-19 00:12:43 +00:00
Sébastien Han	fbd878c8d5	infra: rename osd-configure to add-osd and improve it The playbook has various improvements: * run ceph-validate role before doing anything * run ceph-fetch-keys only on the first monitor of the inventory list * set noup flag so PGs get distributed once all the new OSDs have been added to the cluster and unset it when they are up and running Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1624962 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-10-17 11:26:11 +00:00
Sébastien Han	680574ed4c	ceph-fetch-keys: refact This commits simplies the usage of the ceph-fetch-keys role. The role now has a nicer way to find various ceph keys and fetch them on the ansible server. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1624962 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-10-17 11:26:11 +00:00
Andy McCrae	3e0fa3bc18	Add ability to use a different client container Currently a throw-away container is built to run ceph client commands to setup users, pools & auth keys. This utilises the same base ceph container which has all the ceph services inside it. This PR allows the use of a separate container if the deployer wishes - but defaults to use the same full ceph container. This can be used for different architectures or distributions, which may support the the Ceph client, but not Ceph server, and allows the deployer to build and specify a separate client container if need be. Signed-off-by: Andy McCrae <andy.mccrae@gmail.com>	2018-10-16 23:28:35 +00:00
Guillaume Abrioux	f0b2d82695	infra: fix wrong condition on firewalld start task a non skipped task won't have the `skipped` attribute, so `start firewalld` task will complain about that. Indeed, `skipped` and `rc` attributes won't exist since the first task `check firewalld installation on redhat or suse` won't be skipped in case of non-containerized deployment. Fixes: #3236 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1541840 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-16 16:24:42 +00:00
Christian Berendt	ac37a0d0cd	ceph-defaults: set ceph_stable_openstack_release_uca to queens Liberty is no longer available in the UCA. The last available release there is currently Queens. Signed-off-by: Christian Berendt <berendt@betacloud-solutions.de>	2018-10-16 12:56:32 +00:00
Guillaume Abrioux	b953965399	handler: remove some leftover in restart_*_daemon.sh.j2 Remove some legacy in those restart script. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-16 11:53:55 +00:00
Nan Li	55334baa0c	docker-ce is used in aarch64 instead of docker engine Signed-off-by: Nan Li <herbert.nan@linaro.org>	2018-10-15 18:38:40 +02:00
Guillaume Abrioux	60bc1e38db	handler: fix osd containers handler `ceph_osd_container_stat` might not be set on other osd node. We must ensure we are on the last node before trying to evaluate `ceph_osd_container_stat`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-15 10:30:40 +02:00
Guillaume Abrioux	40b7747af7	remove jewel support As of now, we should no longer support Jewel in ceph-ansible. The latest ceph-ansible release supporting Jewel is `stable-3.1`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-12 23:38:17 +00:00
Sébastien Han	31a0438cb2	ceph_volume: refactor This commit does a couple of things: * Avoid code duplication * Clarify the code * add more unit tests * add myself to the author of the module Signed-off-by: Sébastien Han <seb@redhat.com>	2018-10-10 16:08:41 -04:00
Sébastien Han	bfe689094e	osd: do not run when lvm scenario This task was created for ceph-disk based deployments so it's not needed when osd are prepared with ceph-volume. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-10-10 16:08:41 -04:00
Sébastien Han	2bea8d8ecf	handler: add support for ceph-volume containerized restart The restart script wasn't working with the current new addition of ceph-volume in container where now OSDs have the OSD id name in the container name. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-10-10 16:08:41 -04:00
Sébastien Han	790f52f934	ceph-handler: change osd container check Now that the container is named ceph-osd@<id> looking for something that contains a host is not necessary. This is also backward compatible as it will continue to match container names with hostname in them. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-10-10 16:08:41 -04:00
Sébastien Han	0580328340	validate: add warning for ceph-disk ceph-disk will be removed in 3.3 and we encourage to start using ceph-volume as of 3.2. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-10-10 16:08:41 -04:00
Sébastien Han	a948677de1	osd: ceph-volume activate, just pass the OSD_ID We don't need to pass the device and discover the OSD ID. We have a task that gathers all the OSD ID present on that machine, so we simply re-use them and activate them. This also handles the situation when you have multiple OSDs running on the same device. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-10-10 16:08:41 -04:00
Sébastien Han	5f35910ee1	osd: change unit template for ceph-volume container We don't need to pass the hostname on the container name but we can keep it simple and just call it ceph-osd-$id. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-10-10 16:08:41 -04:00
Sébastien Han	ece9e9812e	osd: do not use expose_partitions on lvm expose_partitions is only needed on ceph-disk OSDs so we don't need to activate this code when running lvm prepared OSDs. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-10-10 16:08:41 -04:00
Sébastien Han	e39fc4f6ce	ceph_volume: add container support for batch command The batch option got recently added, while rebasing this patch it was necessary to implement it. So now, the batch option can work on containerized environments. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1630977 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-10-10 16:08:41 -04:00
Sébastien Han	3ddcc9af16	ceph_volume: try to get ride of the dummy container If we run on a containerized deployment we pass an env variable which contains the container image. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-10-10 16:08:41 -04:00
Sébastien Han	aa2c1b27e3	ceph-osd: ceph-volume container support Signed-off-by: Sébastien Han <seb@redhat.com>	2018-10-10 16:08:41 -04:00
Guillaume Abrioux	678e155328	infra: fix a typo in filename configure_firewall is missing its dot. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-10 12:39:04 -04:00
Guillaume Abrioux	f666902d52	infra: add tags for each subcomponent This way we can skip one specific component if needed. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-10 15:44:33 +00:00
Guillaume Abrioux	f8a7ffb085	infra: add firewall configuration for containerized deployment firewalld is available on atomic so there is no reason to not apply firewall configuration. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-10 15:44:33 +00:00
Guillaume Abrioux	0fb8812e47	infra: update firewall rules, add cluster_network for osds At the moment, all daemons accept connections from 0.0.0.0. We should at least restrict to public_network and add cluster_network for OSDs. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1541840 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-10 15:44:33 +00:00
Guillaume Abrioux	b3a71eeb08	ceph-infra: add new role ceph-infra this role manages ceph infra services such as ntp, firewall, ... Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-10 15:44:33 +00:00
Noah Watkins	8dcc8d1434	Stringify ceph_docker_image_tag This could be a numeric input, but is treated like a string leading to runtime errors. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1635823 Signed-off-by: Noah Watkins <nwatkins@redhat.com>	2018-10-10 04:26:33 +00:00
Noah Watkins	306e308f13	Avoid using tests as filter Fixes the deprecation warning: [DEPRECATION WARNING]: Using tests as filters is deprecated. Instead of using `result\|search` use `result is search`. Signed-off-by: Noah Watkins <nwatkins@redhat.com>	2018-10-10 04:26:33 +00:00
Andrew Schoen	ada03d064d	ceph-validate: remove versions checks for bluestore and lvm scenario These checks will never pass unless ceph_stable_release is passed and ceph-defaults is run before ceph-validate. Additionally, we don't want to support deploying jewel upstream at ceph-ansible master. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1637537 Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-10-09 13:30:42 -04:00
Andrew Schoen	436dc8c5e1	ceph-config: allow the batch --report to fail when getting the OSD num Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-10-09 10:09:50 -04:00
Andrew Schoen	40f82319dd	ceph-config: use 'lvm list' to find num_osds for an existing cluster This makes finding num_osds idempotent for clusters that were deployed using 'lvm batch'. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-10-09 10:09:50 -04:00
Andrew Schoen	8afef3d0de	ceph-config: use the ceph_volume module to get num_osds for lvm batch This gives us an accurate number of how many osds will be created. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-10-09 10:09:50 -04:00
Andrew Schoen	c453ea25c0	ceph-osd: use journal_size and block_db_size for lvm batch Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-10-09 10:09:50 -04:00
Andrew Schoen	71ce539da5	ceph-defaults: add the block_db_size option This is used in the lvm osd scenario for the 'lvm batch' subcommand of ceph-volume. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-10-09 10:09:50 -04:00
Guillaume Abrioux	3e2cdcc735	common: remove check_firewall code Check firewall isn't working as expected and might break deployments. This part of the code will be reworked soon. Let's focus on configure_firewall code for now. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1541840 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-06 14:32:17 +02:00
Guillaume Abrioux	be31c15ccd	follow up on `b5d2ea2` Add some missed statements Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-06 14:32:17 +02:00
Rishabh Dave	b5d2ea269f	don't use "static" field while including tasks Instead used "import_tasks" and "include_tasks" to tell whether tasks must be included statically or dynamically. Fixes: https://github.com/ceph/ceph-ansible/issues/2998 Signed-off-by: Rishabh Dave <ridave@redhat.com>	2018-10-04 07:44:28 +00:00
Guillaume Abrioux	6130bc841d	config: look up for monitor_address_block in hostvars `monitor_address_block` should be read from hostvars[host] instead of current node being played. eg: Let's assume we have: ``` [mons] ceph-mon0 monitor_address=192.168.1.10 ceph-mon1 monitor_interface=eth1 ceph-mon2 monitor_address_block=192.168.1.0/24 ``` the ceph.conf generation task will end up with: ``` fatal: [ceph-mon0]: FAILED! => {} MSG: 'ansible.vars.hostvars.HostVarsVars object' has no attribute u'ansible_interface' ``` the reason is that it will assume `monitor_address_block` isn't defined even on ceph-mon2 because looking for `monitor_address_block` instead of `hostvars[host]['monitor_address_block']`, therefore it enters in the condition as default value: ``` {%- else -%} {% set interface = 'ansible_' + (monitor_interface \| replace('-', '_')) %} {% if ip_version == 'ipv4' -%} {{ hostvars[host][interface][ip_version]['address'] }} {%- elif ip_version == 'ipv6' -%} [{{ hostvars[host][interface][ip_version][0]['address'] }}] {%- endif %} {%- endif %} ``` `monitor_interface` is set with default value `'interface'` so the `interface` variable is built with 'ansible_' + 'interface'. It makes ansible throwing a confusing message about `'ansible_interface'`. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1635303 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-02 22:41:05 +02:00
Benjamin Cherian	85071e6e53	Add support for different NTP daemons Allow user to choose between timesyncd, chronyd and ntpd Installation will default to timesyncd since it is distributed as part of the systemd installation for most distros. Added note indicating NTP daemon type is not used for containerized deployments. Fixes issue #3086 on Github Signed-off-by: Benjamin Cherian <benjamin_cherian@amat.com>	2018-10-02 13:18:08 +00:00
Mike Christie	eddb95941b	igw: valid client CHAP settings. The linux kernel target layer, LIO, does not support the iscsi target to mix ACLs that have chap enabled and disabled under the same tpg. This patch adds a check and fails if this type of setup is detected. This fixes Red Hat BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1615088 Signed-off-by: Mike Christie <mchristi@redhat.com>	2018-10-01 18:23:03 +02:00
Sébastien Han	4db6a213f7	add ceph-handler role The role contains all the handlers for Ceph services. We decided to leave ceph-defaults role with variables and a few facts only. This is useful when organizing the site.yml files and also adding the known variables to infrastructure-playbooks. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-09-28 15:15:49 +00:00
Sébastien Han	145aef9fed	defaults: do not disable THP on bluestore As per #1013 it appears that BS will soon use THP to lower TLB misses, also disabling THP hasn't demonstrated any gains so far. Closes: https://github.com/ceph/ceph-ansible/issues/1013 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-09-27 21:23:49 +00:00
Sébastien Han	dc3319c3c4	default: use bluestore as default object store All tooling in Ceph is defaulting to use the bluestore objectstore for provisioning OSDs, there is no good reason for ceph-ansible to continue to default to filestore. Closes: https://github.com/ceph/ceph-ansible/issues/3149 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1633508 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-09-27 21:23:49 +00:00
Rishabh Dave	380168dadc	don't use "include" to include tasks Use "import_tasks" or "include_tasks" instead. Signed-off-by: Rishabh Dave <ridave@redhat.com>	2018-09-27 17:53:40 +02:00
Giulio Fidente	6126210e0e	Fix version check in ceph.conf template We need to look for ceph_release when comparing with release names, not ceph_version. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1631789 Signed-off-by: Giulio Fidente <gfidente@redhat.com>	2018-09-24 13:08:27 +02:00
Matthew Vernon	806461ac6e	restart_osd_daemon.sh.j2 - use `+` rather than `{1,}` in regex `+` is more idiomatic for "one or more" in a regex than `{1,}`; the latter was introduced in a previous fix for an incorrect `{1,2}` restriction. Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>	2018-09-24 10:33:46 +00:00
Matthew Vernon	04f4991648	restart_osd_daemon.sh.j2 - consider active+clean+* pgs as OK After restarting each OSD, restart_osd_daemon.sh checks that the cluster is in a good state before moving on to the next one. One of the checks it does is that the number of pgs in the state "active+clean" is equal to the total number of pgs in the cluster. On large clusters (e.g. we have 173,696 pgs), it is likely that at least one pg will be scrubbing and/or deep-scrubbing at any one time. These pgs are in state "active+clean+scrubbing" or "active+clean+scrubbing+deep", so the script was erroneously not including them in the "good" count. Similar concerns apply to "active+clean+snaptrim" and "active+clean+snaptrim_wait". Fix this by considering as good any pg whose state contains active+clean. Do this as an integer comparison to num_pgs in pgmap. (could this be backported to at least stable-3.0 please?) Closes: #2008 Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>	2018-09-24 10:33:46 +00:00
Matthew Vernon	aa97ecf048	restart_osd_daemon.sh.j2 - Reset RETRIES between calls of check_pgs Previously RETRIES was set (by default to 40) once at the start of the script; this meant that it would only ever wait for up to 40 lots of 30s across all the OSDs on a host before bombing out. In fact, we want to be prepared to wait for the same amount of time after each OSD restart for the clusters' pgs to be happy again before continuing. Closes: #3154 Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>	2018-09-24 08:20:32 +00:00
John Spray	26bfef4107	Remove Calamari-related pieces ...with the exception of the purge operation, since removing Calamari would still be useful for an old cluster. Signed-off-by: John Spray <john.spray@redhat.com>	2018-09-21 11:00:18 +01:00
Andrew Schoen	16ccac83fe	ceph-config: calculate num_osds for the lvm batch scenario For now our best guess is to count the number of devices and multiply by osds_per_device. Ideally we'd like to run ceph-volume lvm batch --report and get the number of OSDs that way, but currently we need a ceph.conf in place already before we can do that. There is a tracker ticket that would allow os to get around the need for a ceph.conf: http://tracker.ceph.com/issues/36088 Fixes: https://github.com/ceph/ceph-ansible/issues/3135 Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-09-20 15:41:52 +00:00
Guillaume Abrioux	6d6fd514e0	config: set default _rgw_hostname value to respective host the default value for _rgw_hostname was took from the current node being played while it should be took from the respective node in the loop. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1622505 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-09-18 20:10:34 +02:00
Andrew Schoen	8afad35f5a	ceph-config: default devices and lvm_volumes when setting num_osds This avoids errors when the osd scenario choosen does not require setting devices or lvm_volumes. The default values for these are not set because they exist in the ceph-osd role, not ceph-defaults. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-09-18 17:02:33 +00:00
Neha Ojha	27027a17d3	osd: add osd memory target option BlueStore's cache is sized conservatively by default, so that it does not overwhelm under-provisioned servers. The default is 1G for HDD, and 3G for SSD. To replace the page cache, as much memory as possible should be given to BlueStore. This is required for good performance. Since ceph-ansible knows how much memory a host has, it can set `bluestore cache size = max(total host memory / num OSDs on this host * safety factor, 1G)` Due to fragmentation and other memory use not included in bluestore's cache, a safety factor of 0.5 for dedicated nodes and 0.2 for hyperconverged nodes is recommended. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1595003 Signed-off-by: Neha Ojha <nojha@redhat.com> Co-Authored-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-09-18 10:12:46 +00:00
Mike Christie	8fcd63cc50	igw: enable and start rbd-target-api The commit: commit `1164cdc002` Author: Guillaume Abrioux <gabrioux@redhat.com> Date: Thu Aug 2 11:58:47 2018 +0200 iscsigw: install ceph-iscsi-cli package installs the cli package but does not start and enable the rbd-target-api daemon needed for gwcli to communicate with the igw nodes. This patch just enables and starts it for the non-container setup. The container setup is already doing this. This fixes bz https://bugzilla.redhat.com/show_bug.cgi?id=1613963 Signed-off-by: Mike Christie <mchristi@redhat.com>	2018-09-13 19:35:45 +00:00
Guillaume Abrioux	a6f77340fd	nfs: ignore error on semanage command for ganesha_t As of rhel 7.6, it has been decided it doesn't make sense to confine `ganesha_t` anymore. It means this domain won't exist anymore. Let's add a `failed_when: false` in order to make the deployment not failing when trying to run this command. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1626070 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-09-13 13:06:47 +02:00
Andrew Schoen	b36f3e06b5	ceph_volume: adds the osds_per_device parameter If this is set to anything other than the default value of 1 then the --osds-per-device flag will be used by the batch command to define how many osds will be created per device. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-09-12 20:27:14 +00:00
Guillaume Abrioux	1c88c444a3	mon: fix `ExecStartPre` option in systemd unit file This command line is not supported. According to official documentation: ``` Note that shell command lines are not directly supported. If shell command lines are to be used, they need to be passed explicitly to a shell implementation of some kind. ``` We must run this using /bin/sh instead. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-09-11 10:48:21 +02:00
Guillaume Abrioux	9ff26e80f2	defaults: add a default value to rgw_hostname let's add ansible_hostname as a default value for rgw_hostname if no hostname in servicemap matches ansible_fqdn. Fixes: #3063 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1622505 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-09-10 12:07:44 +02:00
Guillaume Abrioux	ecbd3e4558	Revert "client: add quotes to the dict values" This commit is adding quotes that make keyring unusuable eg: ``` client.john key: AQAN0RdbAAAAABAAH5D3WgMN9Rxw3M8jkpMIfg== caps: [mds] '' caps: [mgr] 'allow *' caps: [mon] 'allow rw' caps: [osd] 'allow rw' ``` Trying to import such a keyring and use it will result: ``` Error EACCES: access denied ``` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1623417 This reverts commit `424815501a`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-09-07 17:21:55 +00:00
Tom Barron	bf8f589958	run rados cmd in container if containerized deployment When ceph-nfs is deployed containerized and ceph-common is not installed on the host the start_nfs task fails because the rados command is missing on the host. Run rados commands from a ceph container instead so that they will succeed. Signed-off-by: Tom Barron <tpb@dyncloud.net>	2018-09-03 17:06:00 +00:00
Markos Chandras	217f35dbdb	roles: ceph-rgw: Enable the ceph-radosgw target If the ceph-radosgw target is not enabled, then enabling the ceph-radosgw@ service has no effect since nothing will pull it on the next reboot. As such, we need to ensure that the target is enabled. Signed-off-by: Markos Chandras <mchandras@suse.de>	2018-09-03 15:48:58 +02:00
Andy McCrae	772e6b9be2	Dont run client dummy container on non-x86_64 hosts The dummy client container currently wont work on non-x86_64 hosts. This PR creates a filtered client group that contains only hosts that are x86_64 - which can then be the group to run the dummy container against. This is for the specific case of a containerized_deployment where there is a mixture of non-x86_64 hosts and x86_64 hosts. As such the filtered group will contain all hosts when running with containerized_deployment: false. Currently ppc64le is not supported for Ceph server components. Signed-off-by: Andy McCrae <andy.mccrae@gmail.com>	2018-08-31 11:34:00 +00:00
Sébastien Han	9ba670567e	remove warning for unsupported variables As promised, these will go unsupported for 3.1 so let's actually remove them :). Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1622729 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-08-28 13:31:57 -07:00
Sébastien Han	6d7fa99ff7	defaults: fix rgw_hostname A couple if things were wrong in the initial commit: * ceph_release_num[ceph_release] >= ceph_release_num['luminous'] will never work since the ceph_release fact is set in the roles after. So either ceph-common or ceph-docker-common set it * we can easily re-use the initial command to check if a cluster is running, it's more elegant than running it twice. * set the fact rgw_hostname on rgw nodes only Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1618678 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-08-22 17:46:00 +02:00
Andy McCrae	18684b7209	Sync config_template with base plugin The config_template plugin exists in the ceph-common role so that config_template will still work with ansible galaxy. This PR syncs the config_template module from the base of the repo in plugins/actions to the ceph-common role. Signed-off-by: Andy McCrae <andy.mccrae@gmail.com>	2018-08-21 16:10:33 +00:00
Sébastien Han	8c70a5b197	osd: fix ceph_release We need ceph_release in the condition, not ceph_stable_release Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1619255 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-08-20 20:14:56 +02:00
Markos Chandras	126e2e3f92	roles: ceph-defaults: Check if 'rgw' attribute exists for rgw_hostname If there are no services on the cluster, then the 'rgw' could be missing and the task is failing with the following problem: msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'rgw' We fix this by checking the existence of the 'rgw' attribute. If it's missing, we skip the task since the role already contains code to set a good default rgw_hostname. Signed-off-by: Markos Chandras <mchandras@suse.de>	2018-08-20 11:37:45 +02:00
Markos Chandras	37e50114de	roles: ceph-defaults: Delegate cluster information task to monitor node Since commit `f422efb1d6` ("config: ensure rgw section has the correct name") we observe the following failures in new Ceph deployment with OpenStack-Ansible fatal: [aio1_ceph-rgw_container-fc588f0a]: FAILED! => {"changed": false, "cmd": "ceph --cluster ceph -s -f json", "msg": "[Errno 2] No such file or directory" This is because the task executes 'ceph' but at this point no package installation has happened. Packages are normally installed in the 'ceph-common' role which runs after the 'ceph-defaults' one. Since we are looking to obtain cluster information, the task should be delegated to a monitor node similar to other tasks in that role Signed-off-by: Markos Chandras <mchandras@suse.de>	2018-08-20 11:37:45 +02:00
Dardo D Kleiner	f6519e4003	mgr: improve/fix disabled modules check Follow up on `36942af698` "disabled_modules" is always a list, it's the items in the list that can be dicts in mimic. Many ways to fix this, here's one. Signed-off-by: Dardo D Kleiner <dardokleiner@gmail.com>	2018-08-20 11:23:58 +02:00
Sébastien Han	3149b2564f	Revert "osd: generate device list for osd_auto_discovery on rolling_update" This reverts commit `e84f11e99e`. This commit was giving a new failure later during the rolling_update process. Basically, this was modifying the list of devices and started impacting the ceph-osd itself. The modification to accomodate the osd_auto_discovery parameter should happen outside of the ceph-osd. Also we are trying to not play ceph-osd role during the rolling_update process so we can speed up the upgrade. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-08-16 11:13:12 +02:00
Markos Chandras	7172737f13	roles: ceph-defaults: Set ceph_uid on SUSE distributions The ceph_uid is also '167' on SUSE systems so extend the existing task. Signed-off-by: Markos Chandras <mchandras@suse.de>	2018-08-13 19:02:57 +00:00
Guillaume Abrioux	36942af698	mgr: backward compatibility for module management Follow up on `3abc253fec` The structure had even changed within `luminous` release. It was first: ``` { "enabled_modules": [ "balancer", "dashboard", "restful", "status" ], "disabled_modules": [ "influx", "localpool", "prometheus", "selftest", "zabbix" ] } ``` Then it changed for: ``` { "enabled_modules": [ "status" ], "disabled_modules": [ "balancer", "dashboard", "influx", "localpool", "prometheus", "restful", "selftest", "zabbix" ] } ``` and finally: ``` { "enabled_modules": [ "status" ], "disabled_modules": [ { "name": "balancer", "can_run": true, "error_string": "" }, { "name": "dashboard", "can_run": true, "error_string": "" } ] } ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-08-13 13:25:06 +00:00
Guillaume Abrioux	8b5e3cd999	validate: fail if fqdn deployment attempted fqdn configuration possibility caused a lot of trouble, it's adding a lot of complexity because of multiple cases and the relation between ceph-ansible and ceph-container. Moreover, there is no benefit for such a feature. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1613155 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-08-13 10:04:24 +02:00
Guillaume Abrioux	f422efb1d6	config: ensure rgw section has the correct name the ceph.conf.j2 always assumes the hostname used to register the radosgw in the servicemap is equivalent to `{{ ansible_hostname }}` which returns the shortname form. We need to detect which form of the hostname was used in case of already deployed cluster and update the ceph.conf accordingly. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1580408 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-08-13 10:04:24 +02:00
Guillaume Abrioux	db29b5b84d	config: clean template, remove useless conditions there is no need to have all these conditions. for instance, assuming `mds_group_name` is set to 'mdss': - `if groups[mds_group_name] is defined` checks if `'mdss'` is present in `{{ groups }}` - `if {{ mds_group_name }} in group_names` checks if the current node is part the group `'mdss'` - `if inventory_hostname in groups.get(mds_group_name, [])` checks if the current node is part of the group 'mdss' The third condition is enough to cover the need of ensuring we are running on a mds node. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-08-13 10:04:24 +02:00
Sébastien Han	4c9e24a90f	mon: fix calamari initialisation If calamari is already installed and ceph has been upgraded to a higher version the initialisation will fail later. So if we detect the calamari-server is too old compare to ceph_rhcs_version we try to update it. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1601755 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-08-10 14:14:23 +02:00
Andrew Schoen	6423ab4ad3	lvm: fix condition when selecting which scenario to run devices and lvm_volumes will always be defined, so we need to instead check it's length before deciding to run the scenario. This fixes the failure here: https://2.jenkins.ceph.com/job/ceph-ansible-prs-luminous-bluestore_lvm_osds/86/consoleFull#1667273050b5dd38fa-a56e-4233-a5ca-584604e56e3a Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-08-10 11:46:12 +02:00
Sébastien Han	e84f11e99e	osd: generate device list for osd_auto_discovery on rolling_update rolling_update relies on the list of devices when performing the restart of the OSDs. The task that is builind the devices list out of the ansible_devices dict only runs when there are no partitions on the drives. However during an upgrade the OSD are already configured, they have been prepared and have partitions so this task won't run and thus the devices list will be empty, skipping the restart during rolling_update. We now run the same task under different requirements when rolling_update is true and build a list when: * osd_auto_discovery is true * rolling_update is true * ansible_devices exists * no dm/lv are part of the discovery * the device is not removable * the device has more than 1 sector Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1613626 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-08-10 09:19:40 +02:00
Andrew Schoen	3592c68cca	ceph-osd: adds crush_device_class config option This is used with the lvm osd scenario. When using devices you need the option to set the crush device class for all of the OSDs that are created from those devices. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-08-09 09:41:58 -04:00
Andrew Schoen	6d431ec22d	ceph-volume: implement the 'lvm batch' subcommand This adds the action 'batch' to the ceph-volume module so that we can run the new 'ceph-volume lvm batch' subcommand. A functional test is also included. If devices is defind and osd_scenario is lvm then the 'ceph-volume lvm batch' command will be used to create the OSDs. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-08-09 09:41:58 -04:00

1 2 3 4 5 ...

2022 Commits (e548a9ae7c1598e18856bd374c208ab53686c6f4)