ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Boris Ranto	21758fcee8	dashboard: Use upstream default port We are currently using incorrect dashboard default port. The upstream uses 8443 instead of 8234 by default. This should get us closer to the upstream project. Signed-off-by: Boris Ranto <branto@redhat.com>	2019-07-10 09:17:36 +02:00
Guillaume Abrioux	a781ce881c	iscsi: refact deprecated variables This commit moves some old variables into ceph-defaults so we can move the `use_new_ceph_iscsi` fact in ceph-facts role in order. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-07-03 22:13:19 +02:00
Giulio Fidente	d526803c6c	Add radosgw_frontend_ssl_certificate parameter This is necessary when configuring RGW with SSL because in addition to passing specific frontend options, civetweb appends the 's' character to the binding port and beast uses ssl_endpoint instead of endpoint. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1722071 Signed-off-by: Giulio Fidente <gfidente@redhat.com>	2019-07-02 14:14:37 -04:00
Dimitri Savineau	dc187ea6fa	Change ansible_lsb by ansible_distribution_release The ansible_lsb fact is based on the lsb package (lsb-base, lsb-release or redhat-lsb-core). If the package isn't installed on the remote host then the fact isn't populated. -------- "ansible_lsb": {}, -------- Switching to the ansible_distribution_release fact instead. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-06-21 11:55:05 -04:00
fpantano	ba73dc7b21	Add higher retry/delay defaults to check the quorum status. As per bz1718981, this commit adds higher values to check the quorum status. This is helpful for several OSP deployments that fail during the scale up. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1718981 Signed-off-by: fpantano <fpantano@redhat.com>	2019-06-20 22:39:57 +02:00
Rishabh Dave	9d88d3199f	ceph-infra: make chronyd default NTP daemon Since timesyncd is not available on RHEL-based OSs, change the default to chronyd for RHEL-based OSs. Also, chronyd is chrony on Ubuntu, so set the Ansible fact accordingly. Fixes: https://github.com/ceph/ceph-ansible/issues/3628 Signed-off-by: Rishabh Dave <ridave@redhat.com>	2019-06-13 14:53:22 -04:00
Rishabh Dave	67071c3169	align cephfs pool creation The definitions of cephfs pools should match openstack pools. Signed-off-by: Rishabh Dave <ridave@redhat.com> Co-Authored-by: Simone Caronni <simone.caronni@teralytics.net>	2019-06-13 09:44:05 +02:00
Guillaume Abrioux	27856cc499	dashboard: add allow_embedding support Add a variable to support the allow_embedding support. See ceph/ceph-ansible/issues/4084 for details. Fixes: #4084 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-12 16:00:32 +02:00
fmount	069076bbfd	Fix units and add ability to have a dedicated instance Few fixes on systemd unit templates for node_exporter and alertmanager container parameters. Added the ability to use a dedicated instance to deploy the dashboard components (prometheus and grafana). This commit also introduces the grafana_group_name variable to refer grafana group and keep consistency with the other groups. During the integration with TripleO some grafana/prometheus template variables resulted undefined. This commit adds the ability to check if the group exist and create, accordingly, different job groups in prometheus template. Signed-off-by: fmount <fpantano@redhat.com>	2019-06-10 18:18:46 +02:00
guihecheng	35d40c65f8	Add role definitions of ceph-rgw-loadbalancer This add support for rgw loadbalancer based on HAProxy and Keepalived. We define a single role ceph-rgw-loadbalancer and include HAProxy and Keepalived configurations all in this. A single haproxy backend is used to balance all RGW instances and a single frontend is exported via a single port, default 80. Keepalived is used to maintain the high availability of all haproxy instances. You are free to use any number of VIPs. A single VIP is shared across all keepalived instances and there will be one master for one VIP, selected sequentially, and others serve as backups. This assumes that each keepalived instance is on the same node as one haproxy instance and we use a simple check script to detect the state of each haproxy instance and trigger the VIP failover upon its failure. Signed-off-by: guihecheng <guihecheng@cmiot.chinamobile.com>	2019-06-06 17:12:04 +02:00
Guillaume Abrioux	6a6785b719	nfs: support internal Ganesha with external ceph cluster This commits allows to deploy an internal ganesha with an external ceph cluster. This requires to define `external_cluster_mon_ips` with a comma separated list of external monitors. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1710358 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-03 13:34:38 +02:00
Guillaume Abrioux	9f0d4d6847	dashboard: move defaults variables to ceph-defaults There is no need to have default values for these variables in each roles since there is no corresponding host groups Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-05-16 16:39:13 +02:00
Guillaume Abrioux	e74d80e72f	rename docker_exec_cmd variable This commit renames the `docker_exec_cmd` variable to `container_exec_cmd` so it's more generic. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-05-16 16:39:13 +02:00
Boris Ranto	b4d1c3693b	dashboard: Support podman This adds support for podman in dashboard-related roles. It also drops the creation of custom network for the dashboard-related roles as this functionality works in a different way with podman. Signed-off-by: Boris Ranto <branto@redhat.com>	2019-05-16 16:39:13 +02:00
Boris Ranto	2f141a6e80	Merge cephmetrics/dashboard-ansible repo This commit will merge dashboard-ansible installation scripts with ceph-ansible. This includes several new roles to setup ceph-dashboard and the underlying technologies like prometheus and grafana server. Signed-off-by: Boris Ranto & Zack Cerza <team-gmeno@redhat.com> Co-authored-by: Zack Cerza <zcerza@redhat.com> Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-05-16 16:39:13 +02:00
Dimitri Savineau	ba49225eab	Update RHCS version with Nautilus RHCS 4 will be based on Nautilus and only usable on RHEL 8. Updated the default ceph_rhcs_version to 4 and update the rhcs repositories to rhcs 4 with RHEL 8. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-05-13 09:53:18 +02:00
Radu Toader	b2f242660e	Allow CephFS pool to be created with specific rule_name, erasure_profile just like rbd pools Signed-off-by: Radu Toader <radu.m.toader@gmail.com>	2019-04-20 02:26:05 +00:00
Guillaume Abrioux	a4bc7bda51	update: refact msgr2 migration this commit refact the msgr2 protocol introduction. If it's a fresh install, let's go with v2 only. If we upgrade to nautilus, we should go with v2+v1 syntax to ensure nothing breaks. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-18 11:16:11 +02:00
Dimitri Savineau	e471bce76b	allow using ansible 2.8 Currently we only support ansible 2.7 We plan to use 2.8 when it will be release so we have to support both 2.7 and 2.8. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1700548 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-04-17 16:57:37 +02:00
Guillaume Abrioux	edfa4310d3	defaults: refact package dependencies installation. Because `5c98e361df` could be seen as a non backward compatible change this commit reverts it and bring back package dependencies installation support. Let's just modify the default value instead. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-16 11:07:59 -04:00
Guillaume Abrioux	83df60cbc3	defaults: remove some package dependencies These packages aren't needed anymore. They were needed for ceph-init-detect buti as of ceph-init-detect doesn't exist anymore. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1683885 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-15 11:28:58 -04:00
Dimitri Savineau	d17b1b48b6	rgw: change default frontend on nautilus As discussed in ceph/ceph#26599, beast is now the default frontend for rados gateway with nautilus release. Add rgw_thread_pool_size variable with 512 as default value and keep backward compatibility with num_threads option when using civetweb. Update radosgw_civetweb_num_threads to reflect rgw_thread_pool_size change. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-04-09 17:21:51 +02:00
Matthew Vernon	9dd913cf8a	UCA: Uncomment UCA variables in defaults, fix consequent breakage The Ubuntu Cloud Archive-related (UCA) defaults in roles/ceph-defaults/defaults/main.yml were commented out, which means if you set `ceph_repository` to "uca", you get undefined variable errors, e.g. ``` The task includes an option with an undefined variable. The error was: 'ceph_stable_repo_uca' is undefined The error appears to have been in '/nfs/users/nfs_m/mv3/software/ceph-ansible/roles/ceph-common/tasks/installs/debian_uca_repository.yml': line 6, column 3, but may be elsewhere in the file depending on the exact syntax problem. The offending line appears to be: - name: add ubuntu cloud archive repository ^ here ``` Unfortunately, uncommenting these results in some other breakage, because further roles were written that use the fact of `ceph_stable_release_uca` being defined as a proxy for "we're using UCA", so try and install packages from the bionic-updates/queens release, for example, which doesn't work. So there are a few `apt` tasks that need modifying to not use `ceph_stable_release_uca` unless `ceph_origin` is `repository` and `ceph_repository` is `uca`. Closes: #3475 Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>	2019-04-09 13:44:00 +02:00
Ali Maredia	37f46a8c5d	rgw multisite: add more than 1 rgw to the master or secondary zone Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1664869 Signed-off-by: Ali Maredia <amaredia@redhat.com>	2019-04-06 08:01:19 +02:00
Dimitri Savineau	d8538ad4e1	Set the default crush rule in ceph.conf Currently the default crush rule value is added to the ceph config on the mon nodes as an extra configuration applied after the template generation via the ansible ini module. This implies two behaviors: 1/ On each ceph-ansible run, the ceph.conf will be regenerated via ceph-config+template and then ceph-mon+ini_file. This leads to a non necessary daemons restart. 2/ When other ceph daemons are collocated on the monitor nodes (like mgr or rgw), the default crush rule value will be erased by the ceph.conf template (mon -> mgr -> rgw). This patch adds the osd_pool_default_crush_rule config to the ceph template and only for the monitor nodes (like crush_rules.yml). The default crush rule id is read (if exist) from the current ceph configuration. The default configuration is -1 (ceph default). Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1638092 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-03-14 08:56:52 +00:00
Radu Toader	b8d580b3f4	Customize pools min_size Signed-off-by: Radu Toader <radu.m.toader@gmail.com>	2019-03-05 10:57:15 +00:00
Guillaume Abrioux	8f42007272	facts: fix auto_discovery exclude the previous approach was wrong. checking if `item.key` is in `osd_auto_discovery_exclude` (`['dm-', 'loop']`) is incorrect because it will obviously not match. Therefore, the condition will return `True` whatever the device we are checking. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-02-26 03:16:33 +00:00
Guillaume Abrioux	83d7ef777e	osd: add possibility to exclude device in osd_auto_discovery Add a new `osd_auto_discovery_exclude` to give the possibility of excluding some devices in auto_discovery scenario. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-02-25 10:05:34 +00:00
Guillaume Abrioux	fdca29f2a7	facts: set timeout_command fact in ceph-defaults - also add `--foreground` which seems to fix some issue we are facing when using timeout with `podman`. - use this fact in the `is ceph running already?` task. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-02-05 18:14:28 +01:00
Ramana Raja	dfff89ce67	Install nfs-ganesha stable v2.7 nfs-ganesha v2.5 and 2.6 have hit EOL. Install nfs-ganesha v2.7 stable that is currently being maintained. Signed-off-by: Ramana Raja <rraja@redhat.com>	2019-01-30 14:57:26 +01:00
guihecheng	1ac94c048f	rgw: add support for multiple rgw instances on a single host With this, we could have multiple rgw instances on a single host with a single run, don't have to use rgw-standalone.yml which does not seems able to bind ports separately. If you want to have multiple rgw instances, just change 'radosgw_instances' to the number you want, which defaults to 1. Not compatible with Multi-Site yet. Signed-off-by: guihecheng <guihecheng@cmiot.chinamobile.com>	2019-01-18 11:12:28 +01:00
jtudelag	23ad5fd9cb	Clarify RGWs configuration when using ceph_conf_overrides. To avoid future misconfigurations, clarify that the only valid scheme is [client.rgw.] instead of [client.radosgw.].	2018-12-20 13:55:03 +00:00
Guillaume Abrioux	0eb56e36f8	introduce new role ceph-facts sometimes we play the whole role `ceph-defaults` just to access the default value of some variables. It means we play the `facts.yml` part in this role while it's not desired. Splitting this role will speedup the playbook. Closes: #3282 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-12-12 11:18:01 +01:00
Sébastien Han	8e2585b6c7	ceph-defaults: do not use podman only on atomic We want to test podman on f29 non-atomic, atomic is not a hard requirement. However, if you want to get podman then you will have to install it first before running the playbook. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-12-07 15:37:07 +01:00
Guillaume Abrioux	1cff1f9806	revert infra: don't restart firewalld if unit is masked If firewalld unit is masked, setting `configure_firewall: false` is enough Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-12-04 15:52:08 +00:00
Guillaume Abrioux	fead0813b4	remove kv store support the next stable release will drop this feature. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-11-30 13:45:12 +00:00
Guillaume Abrioux	e4869ac8bd	validate: change default value for `radosgw_address` change default value of `radosgw_address` to keep consistency with `monitor_address`. Moreover, `ceph-validate` checks if the value is '0.0.0.0' to determine if it has to run `check_eth_rgw.yml`. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1600227 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-11-28 23:13:38 +01:00
Guillaume Abrioux	3684d421e4	defaults: play set_radosgw_address.yml only on rgw nodes This is not needed to play these tasks on nodes that are not in rgw group. Always playing this code makes `shrink_mon.yml` failing. Typical error: ``` TASK [ceph-defaults : set_fact _radosgw_address to radosgw_interface - ipv4] * task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-dev-shrink_mon/roles/ceph-defaults/tasks/set_radosgw_address.yml:21 Thursday 22 November 2018 12:34:51 +0000 (0:00:00.154) 0:00:12.371 *** fatal: [localhost]: FAILED! => {} MSG: The task includes an option with an undefined variable. The error was: 'ansible.vars.hostvars.HostVarsVars object' has no attribute u'ansible_eth1' ``` Indeed, `radosgw_interface` is the network interface on rgw only. It is expected that this same interface doesn't exist on `localhost`, so, when running `shrink_mon.yml`, the role `ceph-defaults` is called in `hosts: localhost` and causes the playbook to fail. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-11-27 16:47:40 +00:00
Sébastien Han	4f57e44f9c	defaults: declare container_binary Always declare container_binary and assign it a correct value. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-27 16:47:40 +00:00
Sébastien Han	ac3e18e4c1	ceph-defaults: use podman on Fedora only It seems Atomic 7.5 has podman already, however this is an old version (0.4). The podman integration is targetting RHEL 8, so Fedora is currently the closest to that. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-27 16:47:40 +00:00
Sébastien Han	a96e910114	Add new container scenario Test with podman instead of docker and also support for python 3 only. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-27 16:47:40 +00:00
Guillaume Abrioux	6d1fe32998	defaults: change default size for openstack pools default pool size should match the real default that is defined in ceph itself. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-11-21 18:23:07 +00:00
Guillaume Abrioux	fdc438dd0d	defaults: change for default pool size for cephfs_pools default pool size should match the real default that is defined in ceph itself. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-11-21 18:23:07 +00:00
Guillaume Abrioux	f1735e9bb0	defaults: add ceph related vars file This is to add a granularity level. We can have ceph specific variables that user shouldn't have to change here. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-11-21 15:42:50 +00:00
Guillaume Abrioux	7774069d45	refact osd pool size customization Add real default value for osd pool size customization. Ceph itself has an `osd_pool_default_size` default value to `3`. If users don't specify a pool size in various pools definition within ceph-ansible, we should default to `3`. By the way, this kind of condition isn't really clear: ``` when: - rbd_pool_size \| default ("") ``` we should try to get the customized value then default to what is in `osd_pool_default_size` (which has its default value pointing to `ceph_osd_pool_default_size` (`3`) as well) and compare it to `ceph_osd_pool_default_size`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-11-21 15:42:50 +00:00
Guillaume Abrioux	d4c0960f04	mon: move `osd_pool_default_pg_num` in `ceph-defaults` `osd_pool_default_pg_num` parameter is set in `ceph-mon`. When using ceph-ansible with `--limit` on a specifc group of nodes, it will fail when trying to access this variables since it wouldn't be defined. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1518696 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-11-21 15:42:50 +00:00
Boris Ranto	dfab42a21f	defaults/facts: Use list instead of keys It is safer to use the list filter than the keys() method since the keys method does have some interoperability issues between python2 and python3 based ansible/jinja. Signed-off-by: Boris Ranto <branto@redhat.com>	2018-11-20 18:48:22 +01:00
Neha Ojha	10538e9a23	osd_memory_target: standardize unit and fix calculation * The default value of osd_memory_target used by ceph is 4294967296 bytes, so use the same as ceph-ansible default. * Convert ansible_memtotal_mb to bytes to calculate osd_memory_target Signed-off-by: Neha Ojha <nojha@redhat.com>	2018-11-19 09:54:33 +00:00
Guillaume Abrioux	63b9835cbb	infra: don't restart firewalld if unit is masked if firewalld.service systemd unit is masked, the handler will fail when trying to restart it. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1650281 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-11-16 09:30:17 +00:00
Sébastien Han	f9ddc27cd5	lint: meta add company info Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-08 10:22:02 +00:00
Sébastien Han	094ae8baf1	lint: do not use local_action Use delegate_to: localhost instead. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-08 10:22:02 +00:00
Sébastien Han	aaceeff14e	lint: use galaxy_tags instead of categories Update the meta Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-08 10:22:02 +00:00
Sébastien Han	2cd0d2f1e6	lint: yaml space before and after {{ }} Fix tasks using variables that did not have space before and after {{ }} Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-08 10:22:02 +00:00
Guillaume Abrioux	404712ef01	defaults: add a fact '_current_monitor_address' So we don't have to loop over `_monitor_addresses` when we need the monitor address of the current node being played. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-31 14:16:10 +01:00
Guillaume Abrioux	a2b2028212	config: remove complex jinja logic in ceph.conf.j2 using consecutive set_fact in the playbook instead of complex jinja syntax makes ceph.conf.j2 more readable. By the way, jinja can be painful to debug at some point. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-31 14:16:10 +01:00
Guillaume Abrioux	34275ac847	rgw: move multisite default variables in ceph-defaults Move all rgw multisite variables in ceph-defaults so ceph-validate can go through them. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-30 17:41:05 +01:00
Guillaume Abrioux	d8d3e55006	remove restapi role As of `mimic`, restapi is no longer available because of manager daemon. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-30 14:19:13 +01:00
Rishabh Dave	ee2d52d33d	allow custom pool size Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1596339 Signed-off-by: Rishabh Dave <ridave@redhat.com>	2018-10-22 16:00:21 +02:00
Guillaume Abrioux	48cfc60722	defaults: set default `configure_firewall` to `True` Let's configure firewalld by default. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1526400 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-19 15:12:45 +02:00
Andy McCrae	3e0fa3bc18	Add ability to use a different client container Currently a throw-away container is built to run ceph client commands to setup users, pools & auth keys. This utilises the same base ceph container which has all the ceph services inside it. This PR allows the use of a separate container if the deployer wishes - but defaults to use the same full ceph container. This can be used for different architectures or distributions, which may support the the Ceph client, but not Ceph server, and allows the deployer to build and specify a separate client container if need be. Signed-off-by: Andy McCrae <andy.mccrae@gmail.com>	2018-10-16 23:28:35 +00:00
Christian Berendt	ac37a0d0cd	ceph-defaults: set ceph_stable_openstack_release_uca to queens Liberty is no longer available in the UCA. The last available release there is currently Queens. Signed-off-by: Christian Berendt <berendt@betacloud-solutions.de>	2018-10-16 12:56:32 +00:00
Guillaume Abrioux	40b7747af7	remove jewel support As of now, we should no longer support Jewel in ceph-ansible. The latest ceph-ansible release supporting Jewel is `stable-3.1`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-12 23:38:17 +00:00
Noah Watkins	8dcc8d1434	Stringify ceph_docker_image_tag This could be a numeric input, but is treated like a string leading to runtime errors. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1635823 Signed-off-by: Noah Watkins <nwatkins@redhat.com>	2018-10-10 04:26:33 +00:00
Noah Watkins	306e308f13	Avoid using tests as filter Fixes the deprecation warning: [DEPRECATION WARNING]: Using tests as filters is deprecated. Instead of using `result\|search` use `result is search`. Signed-off-by: Noah Watkins <nwatkins@redhat.com>	2018-10-10 04:26:33 +00:00
Andrew Schoen	71ce539da5	ceph-defaults: add the block_db_size option This is used in the lvm osd scenario for the 'lvm batch' subcommand of ceph-volume. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-10-09 10:09:50 -04:00
Guillaume Abrioux	3e2cdcc735	common: remove check_firewall code Check firewall isn't working as expected and might break deployments. This part of the code will be reworked soon. Let's focus on configure_firewall code for now. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1541840 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-06 14:32:17 +02:00
Guillaume Abrioux	be31c15ccd	follow up on `b5d2ea2` Add some missed statements Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-06 14:32:17 +02:00
Benjamin Cherian	85071e6e53	Add support for different NTP daemons Allow user to choose between timesyncd, chronyd and ntpd Installation will default to timesyncd since it is distributed as part of the systemd installation for most distros. Added note indicating NTP daemon type is not used for containerized deployments. Fixes issue #3086 on Github Signed-off-by: Benjamin Cherian <benjamin_cherian@amat.com>	2018-10-02 13:18:08 +00:00
Sébastien Han	4db6a213f7	add ceph-handler role The role contains all the handlers for Ceph services. We decided to leave ceph-defaults role with variables and a few facts only. This is useful when organizing the site.yml files and also adding the known variables to infrastructure-playbooks. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-09-28 15:15:49 +00:00
Sébastien Han	145aef9fed	defaults: do not disable THP on bluestore As per #1013 it appears that BS will soon use THP to lower TLB misses, also disabling THP hasn't demonstrated any gains so far. Closes: https://github.com/ceph/ceph-ansible/issues/1013 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-09-27 21:23:49 +00:00
Sébastien Han	dc3319c3c4	default: use bluestore as default object store All tooling in Ceph is defaulting to use the bluestore objectstore for provisioning OSDs, there is no good reason for ceph-ansible to continue to default to filestore. Closes: https://github.com/ceph/ceph-ansible/issues/3149 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1633508 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-09-27 21:23:49 +00:00
Rishabh Dave	380168dadc	don't use "include" to include tasks Use "import_tasks" or "include_tasks" instead. Signed-off-by: Rishabh Dave <ridave@redhat.com>	2018-09-27 17:53:40 +02:00
Matthew Vernon	806461ac6e	restart_osd_daemon.sh.j2 - use `+` rather than `{1,}` in regex `+` is more idiomatic for "one or more" in a regex than `{1,}`; the latter was introduced in a previous fix for an incorrect `{1,2}` restriction. Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>	2018-09-24 10:33:46 +00:00
Matthew Vernon	04f4991648	restart_osd_daemon.sh.j2 - consider active+clean+* pgs as OK After restarting each OSD, restart_osd_daemon.sh checks that the cluster is in a good state before moving on to the next one. One of the checks it does is that the number of pgs in the state "active+clean" is equal to the total number of pgs in the cluster. On large clusters (e.g. we have 173,696 pgs), it is likely that at least one pg will be scrubbing and/or deep-scrubbing at any one time. These pgs are in state "active+clean+scrubbing" or "active+clean+scrubbing+deep", so the script was erroneously not including them in the "good" count. Similar concerns apply to "active+clean+snaptrim" and "active+clean+snaptrim_wait". Fix this by considering as good any pg whose state contains active+clean. Do this as an integer comparison to num_pgs in pgmap. (could this be backported to at least stable-3.0 please?) Closes: #2008 Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>	2018-09-24 10:33:46 +00:00
Matthew Vernon	aa97ecf048	restart_osd_daemon.sh.j2 - Reset RETRIES between calls of check_pgs Previously RETRIES was set (by default to 40) once at the start of the script; this meant that it would only ever wait for up to 40 lots of 30s across all the OSDs on a host before bombing out. In fact, we want to be prepared to wait for the same amount of time after each OSD restart for the clusters' pgs to be happy again before continuing. Closes: #3154 Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>	2018-09-24 08:20:32 +00:00
Neha Ojha	27027a17d3	osd: add osd memory target option BlueStore's cache is sized conservatively by default, so that it does not overwhelm under-provisioned servers. The default is 1G for HDD, and 3G for SSD. To replace the page cache, as much memory as possible should be given to BlueStore. This is required for good performance. Since ceph-ansible knows how much memory a host has, it can set `bluestore cache size = max(total host memory / num OSDs on this host * safety factor, 1G)` Due to fragmentation and other memory use not included in bluestore's cache, a safety factor of 0.5 for dedicated nodes and 0.2 for hyperconverged nodes is recommended. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1595003 Signed-off-by: Neha Ojha <nojha@redhat.com> Co-Authored-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-09-18 10:12:46 +00:00
Guillaume Abrioux	9ff26e80f2	defaults: add a default value to rgw_hostname let's add ansible_hostname as a default value for rgw_hostname if no hostname in servicemap matches ansible_fqdn. Fixes: #3063 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1622505 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-09-10 12:07:44 +02:00
Sébastien Han	9ba670567e	remove warning for unsupported variables As promised, these will go unsupported for 3.1 so let's actually remove them :). Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1622729 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-08-28 13:31:57 -07:00
Sébastien Han	6d7fa99ff7	defaults: fix rgw_hostname A couple if things were wrong in the initial commit: * ceph_release_num[ceph_release] >= ceph_release_num['luminous'] will never work since the ceph_release fact is set in the roles after. So either ceph-common or ceph-docker-common set it * we can easily re-use the initial command to check if a cluster is running, it's more elegant than running it twice. * set the fact rgw_hostname on rgw nodes only Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1618678 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-08-22 17:46:00 +02:00
Markos Chandras	126e2e3f92	roles: ceph-defaults: Check if 'rgw' attribute exists for rgw_hostname If there are no services on the cluster, then the 'rgw' could be missing and the task is failing with the following problem: msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'rgw' We fix this by checking the existence of the 'rgw' attribute. If it's missing, we skip the task since the role already contains code to set a good default rgw_hostname. Signed-off-by: Markos Chandras <mchandras@suse.de>	2018-08-20 11:37:45 +02:00
Markos Chandras	37e50114de	roles: ceph-defaults: Delegate cluster information task to monitor node Since commit `f422efb1d6` ("config: ensure rgw section has the correct name") we observe the following failures in new Ceph deployment with OpenStack-Ansible fatal: [aio1_ceph-rgw_container-fc588f0a]: FAILED! => {"changed": false, "cmd": "ceph --cluster ceph -s -f json", "msg": "[Errno 2] No such file or directory" This is because the task executes 'ceph' but at this point no package installation has happened. Packages are normally installed in the 'ceph-common' role which runs after the 'ceph-defaults' one. Since we are looking to obtain cluster information, the task should be delegated to a monitor node similar to other tasks in that role Signed-off-by: Markos Chandras <mchandras@suse.de>	2018-08-20 11:37:45 +02:00
Markos Chandras	7172737f13	roles: ceph-defaults: Set ceph_uid on SUSE distributions The ceph_uid is also '167' on SUSE systems so extend the existing task. Signed-off-by: Markos Chandras <mchandras@suse.de>	2018-08-13 19:02:57 +00:00
Guillaume Abrioux	8b5e3cd999	validate: fail if fqdn deployment attempted fqdn configuration possibility caused a lot of trouble, it's adding a lot of complexity because of multiple cases and the relation between ceph-ansible and ceph-container. Moreover, there is no benefit for such a feature. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1613155 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-08-13 10:04:24 +02:00
Guillaume Abrioux	f422efb1d6	config: ensure rgw section has the correct name the ceph.conf.j2 always assumes the hostname used to register the radosgw in the servicemap is equivalent to `{{ ansible_hostname }}` which returns the shortname form. We need to detect which form of the hostname was used in case of already deployed cluster and update the ceph.conf accordingly. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1580408 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-08-13 10:04:24 +02:00
Sébastien Han	4d64dd4686	rgw: ability to use ceph-ansible vars into containers Since the container now simply reads the ceph.conf, we remove all the unnecessary options. Also this PR is the foundation to support multiple backend, such as the new 'beast' from Ceph Mimic. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1582411 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-08-09 14:13:17 +02:00
Artur Fijalkowski	52d9d406b1	Fix in regular expression matching OSD ID on non-contenerized deployment. restart_osd_daemon.sh is used to discover and restart all OSDs on a host. To do it the scripts loops the list of ceph-osd@ services in the system. This commit fixes bug in the regular expression responsile for extraction of OSDs - prior version uses `[0-9]{1,2}` expression which is ignoring all OSDS which numbers are greater than 99 (thus longer than 2 digits). Fix removed upper limit of digits in the number. This problem existed in two places in the script. Closes: #2964 Signed-off-by: Artur Fijalkowski <artur.fijalkowski@ing.com>	2018-08-06 15:53:49 +00:00
Guillaume Abrioux	0a6ff6bbf8	defaults: backward compatibility with fqdn deployments This commit ensures we are backward compatible with fqdn deployments. Since ceph-container enforces deployment to be done with shortname, we must keep backward compatibility with clusters already deployed with fqdn configuration Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-08-06 10:14:58 +00:00
Guillaume Abrioux	1a626d3c61	nfs: change default stable branch for nfs-ganesha repo Since `V2.6-stable` is available and has packages for `mimic`, let's update this default value accordingly so nfs nodes can be deployed with mimic. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-07-13 08:20:27 +00:00
Sébastien Han	b9f7df7ba2	common: remove hdparm As of Kraken, the journal code does not use the hdparm command anymore so we can remove it from our package dependency list. Fixes: https://github.com/ceph/ceph-ansible/issues/1402 Signed-off-by: Sébastien Han <seb@redhat.com> (cherry picked from commit f6910efa24389c264062963b2054c7cd29ffebb3)	2018-07-07 08:53:47 +00:00
Sébastien Han	103c279c21	ceph-defaults: add default application to pool We now add a default 'rbd' application type to each pool we create. This will remove the warning: " application not enabled on N pool(s) " Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1590275 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-07-02 10:28:34 +00:00
George Shuklin	653b483fc3	Add ceph_keyring_permissions variable to control permissions for keyring files in /etc/ceph. Default value is the same as it was (0600), but this variable allows user to override it (f.e. set it to 0640). Signed-off-by: George Shuklin <george.shuklin@gmail.com>	2018-06-28 15:48:39 +00:00
Ha Phan	b7b8aba47b	Generate a copy of ceph.conf locally Refers to #2697 This change creates a copy of `ceph.conf` in ansible server. Signed-off-by: Ha Phan <thanhha.work@gmail.com>	2018-06-28 07:39:30 +00:00
Christian Zunker	48394597c9	reset failed count of ceph-mgr Depending on your setup, ceph-mgr might get restarted multiple times. When this is done to fast, systemd will prevent further restarts because of configured limits in the ceph-mgr systemd unit file. Resetting the failure count will prevent this problem. The reset is done before the restart so in case of a real problem during the restart it still fails. Fixes: #2768 Signed-off-by: Christian Zunker <christian.zunker@codecentric.cloud>	2018-06-20 13:59:16 +02:00
Sébastien Han	2e8412734a	common: ability to enable/disable fw configuration Prior to this patch if you were running on a Red Hat system, ceph-ansible would try to configure firewalld for you without the operators's consent. Now you can enable or disable the fw configuration by setting configure_firewall to either true or false. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1589146 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-06-11 21:51:59 +02:00
Sébastien Han	20c8065e48	ceph-iscsi: rename group iscsi_gws Let's try to avoid using dashes as testinfra needs to be able to read the groups. Typically, with iscsi-gws we can't add a marker for these iscsi nodes, using an underscore fixes the issue. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-06-08 10:21:54 +02:00
Sébastien Han	91bf53ee93	ceph-iscsi: support for containerize deployment We now have the ability to deploy a containerized version of ceph-iscsi. The result is similar to the non-containerized version, you simply have 3 containers running for the following services: * rbd-target-api * rbd-target-gw * tcmu-runner Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1508144 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-06-08 10:21:54 +02:00
Guillaume Abrioux	5eacc8f8d8	tests: add a dummy value for 'dev' release Functional tests are broken when testing against 'dev' release (ceph). Adding a dummy value here will make it possible to run ceph-ansible CI against dev ceph release. Typical error: ``` > if request.node.get_marker("from_luminous") and ceph_release_num[ceph_stable_release] < ceph_release_num['luminous']: E KeyError: 'dev' ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit fd1487d93f21b609a637053f5b33cd2a4e408d00)	2018-06-07 13:59:17 +02:00
Patrick Donnelly	91f9da530f	change max_mds default to 1 Otherwise, with the removal of mds_allow_multimds, the default of 3 will be set on every new FS. Introduced by: `c8573fe0d7` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1583020 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>	2018-06-06 12:16:42 +08:00
Sébastien Han	db50aec13d	ceph-common: add firewall rules for ceph-mgr Prior to this commit the firewall tasks were not opening the ceph-mgr ports. This would lead to unclean configuration since the ceph-mgr daemons can not connect to the OSDs. Thi commit opens the right ports on the ceph-mgr nodes to talk with the OSDs. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1526400 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-06-04 12:11:41 +02:00
Andrew Schoen	c2423e2c48	ceph-defaults: add the nautilus 14.x entry to ceph_release_num The first 14.x tag has been cut so this needs to be added so that version detection will still work on the master branch of ceph. Fixes: https://github.com/ceph/ceph-ansible/issues/2671 Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-06-01 16:51:23 +02:00
Guillaume Abrioux	c68126d6fd	mdss: do not make pg_num a mandatory params When playing ceph-mds role, mon nodes have set a fact with the default pg num for osd pools, we can simply default to this value for cephfs pools (`cephfs_pools` variable). At the moment the variable definition for `cephfs_pools` looks like: ``` cephfs_pools: - { name: "{{ cephfs_data }}", pgs: "" } - { name: "{{ cephfs_metadata }}", pgs: "" } ``` and we have a task in `ceph-validate` to ensure `pgs` has been set to a valid value. We could simply avoid this check by setting the default value of `pgs` to `hostvars[groups[mon_group_name][0]]['osd_pool_default_pg_num']` and let to users the possibility to override this value. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1581164 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-05-30 16:20:34 +02:00
Guillaume Abrioux	564a662baf	osds: move openstack pools creation in ceph-osd When deploying a large number of OSD nodes it can be an issue because the protection check [1] won't pass since it tries to create pools before all OSDs are active. The idea here is to move openstack pools creation at the end of `ceph-osd` role. [1] `e59258943b/src/mon/OSDMonitor.cc (L5673)` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1578086 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-05-24 09:39:38 -07:00
Luigi Toscano	43e96c1f98	ceph-radosgw: disable NSS PKI db when SSL is disabled The NSS PKI database is needed only if radosgw_keystone_ssl is explicitly set to true, otherwise the SSL integration is not enabled. It is worth noting that the PKI support was removed from Keystone starting from the Ocata release, so some code paths should be changed anyway. Also, remove radosgw_keystone, which is not useful anymore. This variable was used until `fcba2c801a`. Now profiles drives the setting of rgw keystone *. Signed-off-by: Luigi Toscano <ltoscano@redhat.com>	2018-05-23 23:24:09 -07:00
Subhachandra Chandra	c7e269fcf5	Fix restarting OSDs twice during a rolling update. During a rolling update, OSDs are restarted twice currently. Once, by the handler in roles/ceph-defaults/handlers/main.yml and a second time by tasks in the rolling_update playbook. This change turns off restarts by the handler. Further, the restart initiated by the rolling_update playbook is more efficient as it restarts all the OSDs on a host as one operation and waits for them to rejoin the cluster. The restart task in the handler restarts one OSD at a time and waits for it to join the cluster.	2018-05-22 19:23:07 +02:00
Andrew Schoen	645f61c351	ceph-defaults: remove backwards compat for containerized_deployment The validation module does not get config options with the template syntax rendered, so we're gonna remove that and just default it to False. The backwards compat was schedule to be removed in 3.1 anyway. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-05-18 17:58:24 +02:00
Andrew Schoen	f84c2ba27b	ceph-defaults: fix failing tasks when osd_scenario was not set correctly When devices is not defined because you want to use the 'lvm' osd_scenario but you've made a mistake selecting that scenario these tasks should not fail. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-05-18 17:58:24 +02:00
Andrew Schoen	1f15a81c48	ceph-defaults: move cephfs vars from the ceph-mon role We're doing this so we can validate this in the ceph-validate role Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-05-18 17:58:24 +02:00
Sébastien Han	2f43e9dab5	defaults: restart_osd_daemon unit spaces Extra space in systemctl list-units can cause restart_osd_daemon.sh to fail It looks like if you have more services enabled in the node space between "loaded" and "active" get more space as compared to one space given in command the command[1]. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1573317 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-05-18 17:53:47 +02:00
Simone Caronni	b12bf62c36	Make sure the restart_mds_daemon script is created with the correct MDS name	2018-05-08 20:53:15 +02:00
Sébastien Han	65ba85aff6	Expose /var/run/ceph Useful for softwares that do data collection/monitoring like collectd. They can connect to the socket and then retrieve information. Even though the sockets are exposed now, I'm keeping the docker exec to check the socket, this will allow newer version of ceph-ansible to work with older versions. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1563280 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-20 15:48:32 +02:00
Sébastien Han	bf1e70e8cf	default: extent ceph_uid and gid We now have the ability to detect the uid/gid of the ceph user depending on the distribution we are running on and so we are doing non-container deployements. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-20 15:48:32 +02:00
Sébastien Han	f3656ad167	move create ceph initial directories to default This is needed for both non-container and container deployments. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-20 15:48:32 +02:00
Sébastien Han	641f141c0f	selinux: remove chcon calls We know bindmount with the :z option at the end of the -v command so this will basically run the exact same command as we used to run. So to speak: chcon -Rt svirt_sandbox_file_t /var/lib/ceph Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-19 14:59:37 +02:00
Randy J. Martinez	127a643fd0	ceph-defaults: fix ceph_uid fact on container deployments Red Hat is now using tags[3,latest] for image rhceph/rhceph-3-rhel7. Because of this, the ceph_uid conditional passes for Debian when 'ceph_docker_image_tag: latest' on RH deployments. I've added an additional task to check for rhceph image specifically, and also updated the RH family task for ceph/daemon [centos\|fedora]tags. Signed-off-by: Randy J. Martinez <ramartin@redhat.com>	2018-04-17 16:54:51 +02:00
Guillaume Abrioux	899b0eb451	defaults: check only 1 time if there is a running cluster There is no need to check for a running cluster n*nodes time in `ceph-defaults` so let's add a `run_once: true` to save some resources and time. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-04-16 11:23:00 +02:00
Douglas Fuller	c8573fe0d7	Remove deprecated allow_multimds allow_multimds will be officially deprecated in Mimic, specify it only for all versions of Ceph where it was declared stable. Going forward, specify only max_mds. Signed-off-by: Douglas Fuller <dfuller@redhat.com>	2018-04-12 10:29:17 +02:00
Sébastien Han	82ccbdafbc	ceph-defaults: bring backward compatibility for old syntax If people keep on using the mon_cap, osd_cap etc the playbook will translate this old syntax on the flight. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-11 12:18:34 +02:00
Guillaume Abrioux	66c4118dcd	defaults: fix backward compatibility backward compatibility with `ceph_mon_docker_interface` and `ceph_mon_docker_subnet` was not working since there wasn't lookup on `monitor_interface` and `public_network` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-04-10 00:19:11 +02:00
Sébastien Han	bb60f2fea4	ceph-defaults: fix ceoh_uid for container image tag latest According to our recent change, we now use "CentOS" as a latest container image. We need to reflect this on the ceph_uid. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-09 13:54:55 +02:00
Attila Fazekas	ecd3563c21	Deploying without managed monitors failed Tripleo deployment failed when the monitors not manged by tripleo itself with: FAILED! => {"msg": "list object has no element 0"} The failing play item was introduced by `f46217b69a` . fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1552327 Signed-off-by: Attila Fazekas <afazekas@redhat.com>	2018-04-04 18:16:46 +02:00
Guillaume Abrioux	dcf6a246a4	defaults: remove `run_once: true` when creating fetch_directory because of `serial: 1`, it can be an issue when the playbook is being run on client nodes. Since the refact of `ceph-client` we skip the role `ceph-defaults` on every node except the first client node, it means that the task is not going to be played because of `run_once: true`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-04-04 10:51:17 +02:00
Guillaume Abrioux	cf27c5e941	move selinux check to `ceph-defaults` This check is alone in `ceph-docker-common` since a previous code refactor. Moving this check in `ceph-defaults` allows us to run `ceph-clients` without having to run `ceph-docker-common` even in non-containerized deployment. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-04-04 10:51:17 +02:00
Andrew Schoen	6cffbd5409	ceph-defaults: set is_atomic variable This variable is needed for containerized clusters and is required for the ceph-docker-common role. Typically the is_atomic variable is set in site-docker.yml.sample though so if ceph-docker-common is used outside of that playbook it needs set in another way. Moving the creation of the variable inside this role means playbooks don't need to worry about setting it. fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1558252 Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-03-21 19:16:11 +01:00
Sébastien Han	18402b636f	defaults: add useful info if daemon are not restarted properly If OSDs don't restart normally we now also dump info of the crush map, crush rules, crush tree and pools. If the monitors don't restart normally we also print the socket status by calling mon_status and quorum_status. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-14 14:22:00 +01:00
Sébastien Han	3261ab23b8	osd: remove old crush_location implementation This was causing a lot of pain with the handlers. Also the implementation was not ideal since we were assembling files. Everything can now be done with the ceph_crush module so let's remove that. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-06 15:24:31 +00:00
Andy McCrae	04ca685ba7	Remove vars that are no longer used As part of `fcba2c801a` these vars were removed and no longer do anything: radosgw_dns_name radosgw_resolve_cname This patch removes them from the group_vars files and defaults/main.yml	2018-03-06 09:16:25 +01:00
Sébastien Han	165d9dec10	remove kernel.pid_max This is now managed by Ceph packages. See: https://github.com/ceph/ceph/pull/18544/files http://tracker.ceph.com/issues/21929 Closes: https://github.com/ceph/ceph-ansible/issues/2410 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-02-23 13:57:57 +01:00
Andy McCrae	59a4335a56	Restart services if handler called This patch fixes an issue where if hosts have different service lists, it will prevent restarting changes on services that run later on. For example, hostA in the mons and rgws group would initiate a config change and restart of services on all mons and rgws hosts, even though a separate hostB (which is only in the rgws group) has not had its configuration changed yet. Additionally, when the second host has its coniguration changed as part of the ceph-rgw role, it will not initiate a restart since its inventory name != the first hosts. To fix this we should run the restart once (using run_once: True) as long as the host has called the handler. This will ensure that even if only 1 host has called the handler it will initiate a restart on all hosts that have called the handler. Additionally, we add a var that is set when the handler runs, this will ensure that only hosts that have called the handler get restarted. Includes minor fix to remove unrequired "inventory_hostname in play_hosts" when: clause. This is no longer required since the handlers were changed. The host calling the handler will be in play_hosts already.	2018-02-16 10:40:20 +01:00
Sébastien Han	c816a9282c	container: osd remove run_once When used along with delegate, run_once does not belong well. Thus, using \| last always brings the desired result. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-02-14 02:01:29 +01:00
Sébastien Han	22f843e3d4	default: define 'osd_scenario' variable osd_scenario does not exist in the ceph-default role so if we try to play ceph-default on an OSD node, the playbook will fail with undefined variable. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-02-08 17:42:12 +01:00
Guillaume Abrioux	deaf273b25	syntax: change local_action syntax Use a nicer syntax for `local_action` tasks. We used to have oneliner like this: ``` local_action: wait_for port=22 host={{ hostvars[inventory_hostname]['ansible_default_ipv4']['address'] }} state=started delay=10 timeout=500 }} ``` The usual syntax: ``` local_action: module: wait_for port: 22 host: "{{ hostvars[inventory_hostname]['ansible_default_ipv4']['address'] }}" state: started delay: 10 timeout: 500 ``` is nicer and kind of way to keep consistency regarding the whole playbook. This also fix a potential issue about missing quotation : ``` Traceback (most recent call last): File "/tmp/ansible_wQtWsi/ansible_module_command.py", line 213, in <module> main() File "/tmp/ansible_wQtWsi/ansible_module_command.py", line 185, in main rc, out, err = module.run_command(args, executable=executable, use_unsafe_shell=shell, encoding=None, data=stdin) File "/tmp/ansible_wQtWsi/ansible_modlib.zip/ansible/module_utils/basic.py", line 2710, in run_command File "/usr/lib64/python2.7/shlex.py", line 279, in split return list(lex) File "/usr/lib64/python2.7/shlex.py", line 269, in next token = self.get_token() File "/usr/lib64/python2.7/shlex.py", line 96, in get_token raw = self.read_token() File "/usr/lib64/python2.7/shlex.py", line 172, in read_token raise ValueError, "No closing quotation" ValueError: No closing quotation ``` writing `local_action: shell echo {{ fsid }} \| tee {{ fetch_directory }}/ceph_cluster_uuid.conf` can cause trouble because it's complaining with missing quotes, this fix solves this issue. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1510555 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-01-31 10:45:34 +01:00
Sébastien Han	6f9dd26caa	config: remove any spaces in public_network or cluster_network With two public networks configured - we found that with "NETWORK_ADDR_1, NETWORK_ADDR_2" install process consistently became broken, trying to find docker registry on second network, and not finding mon container. but without spaces "NETWORK_ADDR_1,NETWORK_ADDR_2" install succeeds so, containerized install is more peculiar with formatting of this line Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1534003 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-01-30 17:47:15 +01:00
Andy McCrae	481173f203	Add default for radosgw_keystone_ssl This should default to False. The default for Keystone is not to use PKI keys, additionally, anybody using this setting had to have been manually setting it before. Fixes: #2111	2018-01-30 11:30:23 +01:00
Guillaume Abrioux	f1232b33fd	Revert "monitor_interface: document need to use monitor_address when using IPv6" This reverts commit `10b91661ce`. This reverts also the same comment added in `1359869497`	2018-01-29 14:43:24 +01:00
Guillaume Abrioux	ec16cbdb1a	defaults: avoid getting stuck (ceph --connect-timeout) Sometime the playbook gets stuck because even with `--connect-timeout=` option, the connexion to the existing ceph cluster never timeout. As a workaround, using `timeout` command provided by coreutils will actually timeout if we can't connect to the cluster. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1537003 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-01-25 10:15:59 +01:00
Guillaume Abrioux	900f447c82	containers: fix bug when looking for existing cluster When containerized deployment, `docker_exec_cmd` is not set before the task which try to retrieve the current fsid is played, it means it considers there is no existing fsid and try to generate a new one. Typical error: ``` ok: [mon0 -> mon0] => { "changed": false, "cmd": [ "ceph", "--connect-timeout", "3", "--cluster", "test", "fsid" ], "delta": "0:00:00.179909", "end": "2018-01-09 10:36:58.759846", "failed": false, "failed_when_result": false, "rc": 1, "start": "2018-01-09 10:36:58.579937" } STDERR: Error initializing cluster client: Error('error calling conf_read_file: errno EINVAL',) ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-01-10 16:23:18 +01:00
Guillaume Abrioux	acfbebe67e	defaults: rename check_socket files for containers When containerized deployment, we are not looking for a socket but for a running container. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-01-10 15:44:47 +01:00
Sébastien Han	7eaf444328	default: look for the right return code on socket stat in-use As reported in https://github.com/ceph/ceph-ansible/issues/2254, the check with fuser is not ideal. If fuser is not available the return code is 127. Here we want to make sure that we looking for the correct return code, so 1. Closes: https://github.com/ceph/ceph-ansible/issues/2254 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-12-14 16:59:14 +01:00
Eduard Egorov	a8a2c13f6a	firewall: add mds, nfs, restapi and iscsi ports, remove 'configure_firewall' variable used for conditional execution. Include the task only on rpm-based systems. Signed-off-by: Eduard Egorov <eduard.egorov@icl-services.com>	2017-12-12 23:44:55 +01:00
Eduard Egorov	6a5e0da30d	firewall: configure firewalld if it's already installed on the host (#2192 ). Signed-off-by: Eduard Egorov <eduard.egorov@icl-services.com>	2017-12-12 23:44:55 +01:00
Major Hayden	5676fa23b1	Convert interface names to underscores for facts If a deployer uses an interface name with a dash/hyphen in it, such as 'br-storage' for the monitor_interface group_var, the ceph.conf.j2 template fails to find the right facts. It looks for 'ansible_br-storage' but only 'ansible_br_storage' exists. This patch converts the interface name to underscores when the template does the fact lookup.	2017-12-12 09:03:40 +01:00
Guillaume Abrioux	6a9b5c9632	defaults: fix CI issue with ceph_uid fact The CI complains because of `ceph_uid` fact which doesn't exist since the docker image tag used in the CI doesn't match with this condition. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-12-12 09:02:37 +01:00
John Fulton	ffae294288	Set tighter permissions on keyrings when containerized During a containerized deployment, set the permissions of ceph.client.admin.keyring and other keyrings to chmod 600 and chown it to ceph.	2017-12-06 19:22:28 -05:00
Guillaume Abrioux	b26a840002	handlers: restart daemons only if docker is running In case where docker CLI is available but docker is not running, we don't want to trigger the restart of the daemons. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1510555 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-11-27 14:59:30 +01:00
Sébastien Han	cc264d6ba6	Merge pull request #2151 from hwoarang/add-opensuse Add openSUSE Leap 42.3 support	2017-11-16 14:35:28 +01:00
Markos Chandras	849786967a	ceph-common: Add initial support for openSUSE Leap distributions openSUSE Leap 42.3 provides support for Ceph Luminous in both the distribution package and the latest available version in the OBS repository so add these as the only available installation methods for openSUSE. Signed-off-by: Markos Chandras <mchandras@suse.de>	2017-11-14 10:51:22 +00:00
Guillaume Abrioux	44df3f9102	defaults: fix rgw restart script in handlers Like `80d32dec`, the path to the fact is not correct. In any case, we will retrieve the IP address in hostvars, the variable is the way we get the interface name according where it has been set (eg.: inventory host file vs. group_vars/) Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1510906 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-11-13 16:30:03 +01:00
Sébastien Han	7b0743be52	Merge pull request #2144 from ceph/quick_fix_lvm osd: skip some set_fact when osd_scenario=lvm	2017-11-13 21:50:37 +11:00
Guillaume Abrioux	238754a844	osd: skip some set_fact when osd_scenario=lvm these tasks are not needed when using `osd_scenario: lvm` Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1509230 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-11-07 15:30:08 +01:00
Arano-kai	5cde3175ae	FIX: run restart scripts in `noexec` /tmp - One can not run scripts directly in place, that mounted with `noexec` option. But one can run scripts as arguments for `bash/sh`. Signed-off-by: Arano-kai <captcha.is.evil@gmail.com>	2017-11-06 16:02:47 +02:00

1 2 3 4 5 ...

311 Commits (ca1c987e4ca360b8702a077ecacec79f0a8843f5)