ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Ha Phan	b7b8aba47b	Generate a copy of ceph.conf locally Refers to #2697 This change creates a copy of `ceph.conf` in ansible server. Signed-off-by: Ha Phan <thanhha.work@gmail.com>	2018-06-28 07:39:30 +00:00
Christian Zunker	48394597c9	reset failed count of ceph-mgr Depending on your setup, ceph-mgr might get restarted multiple times. When this is done to fast, systemd will prevent further restarts because of configured limits in the ceph-mgr systemd unit file. Resetting the failure count will prevent this problem. The reset is done before the restart so in case of a real problem during the restart it still fails. Fixes: #2768 Signed-off-by: Christian Zunker <christian.zunker@codecentric.cloud>	2018-06-20 13:59:16 +02:00
Sébastien Han	2e8412734a	common: ability to enable/disable fw configuration Prior to this patch if you were running on a Red Hat system, ceph-ansible would try to configure firewalld for you without the operators's consent. Now you can enable or disable the fw configuration by setting configure_firewall to either true or false. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1589146 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-06-11 21:51:59 +02:00
Sébastien Han	20c8065e48	ceph-iscsi: rename group iscsi_gws Let's try to avoid using dashes as testinfra needs to be able to read the groups. Typically, with iscsi-gws we can't add a marker for these iscsi nodes, using an underscore fixes the issue. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-06-08 10:21:54 +02:00
Sébastien Han	91bf53ee93	ceph-iscsi: support for containerize deployment We now have the ability to deploy a containerized version of ceph-iscsi. The result is similar to the non-containerized version, you simply have 3 containers running for the following services: * rbd-target-api * rbd-target-gw * tcmu-runner Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1508144 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-06-08 10:21:54 +02:00
Guillaume Abrioux	5eacc8f8d8	tests: add a dummy value for 'dev' release Functional tests are broken when testing against 'dev' release (ceph). Adding a dummy value here will make it possible to run ceph-ansible CI against dev ceph release. Typical error: ``` > if request.node.get_marker("from_luminous") and ceph_release_num[ceph_stable_release] < ceph_release_num['luminous']: E KeyError: 'dev' ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit fd1487d93f21b609a637053f5b33cd2a4e408d00)	2018-06-07 13:59:17 +02:00
Patrick Donnelly	91f9da530f	change max_mds default to 1 Otherwise, with the removal of mds_allow_multimds, the default of 3 will be set on every new FS. Introduced by: `c8573fe0d7` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1583020 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>	2018-06-06 12:16:42 +08:00
Sébastien Han	db50aec13d	ceph-common: add firewall rules for ceph-mgr Prior to this commit the firewall tasks were not opening the ceph-mgr ports. This would lead to unclean configuration since the ceph-mgr daemons can not connect to the OSDs. Thi commit opens the right ports on the ceph-mgr nodes to talk with the OSDs. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1526400 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-06-04 12:11:41 +02:00
Andrew Schoen	c2423e2c48	ceph-defaults: add the nautilus 14.x entry to ceph_release_num The first 14.x tag has been cut so this needs to be added so that version detection will still work on the master branch of ceph. Fixes: https://github.com/ceph/ceph-ansible/issues/2671 Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-06-01 16:51:23 +02:00
Guillaume Abrioux	c68126d6fd	mdss: do not make pg_num a mandatory params When playing ceph-mds role, mon nodes have set a fact with the default pg num for osd pools, we can simply default to this value for cephfs pools (`cephfs_pools` variable). At the moment the variable definition for `cephfs_pools` looks like: ``` cephfs_pools: - { name: "{{ cephfs_data }}", pgs: "" } - { name: "{{ cephfs_metadata }}", pgs: "" } ``` and we have a task in `ceph-validate` to ensure `pgs` has been set to a valid value. We could simply avoid this check by setting the default value of `pgs` to `hostvars[groups[mon_group_name][0]]['osd_pool_default_pg_num']` and let to users the possibility to override this value. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1581164 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-05-30 16:20:34 +02:00
Guillaume Abrioux	564a662baf	osds: move openstack pools creation in ceph-osd When deploying a large number of OSD nodes it can be an issue because the protection check [1] won't pass since it tries to create pools before all OSDs are active. The idea here is to move openstack pools creation at the end of `ceph-osd` role. [1] `e59258943b/src/mon/OSDMonitor.cc (L5673)` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1578086 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-05-24 09:39:38 -07:00
Luigi Toscano	43e96c1f98	ceph-radosgw: disable NSS PKI db when SSL is disabled The NSS PKI database is needed only if radosgw_keystone_ssl is explicitly set to true, otherwise the SSL integration is not enabled. It is worth noting that the PKI support was removed from Keystone starting from the Ocata release, so some code paths should be changed anyway. Also, remove radosgw_keystone, which is not useful anymore. This variable was used until `fcba2c801a`. Now profiles drives the setting of rgw keystone *. Signed-off-by: Luigi Toscano <ltoscano@redhat.com>	2018-05-23 23:24:09 -07:00
Subhachandra Chandra	c7e269fcf5	Fix restarting OSDs twice during a rolling update. During a rolling update, OSDs are restarted twice currently. Once, by the handler in roles/ceph-defaults/handlers/main.yml and a second time by tasks in the rolling_update playbook. This change turns off restarts by the handler. Further, the restart initiated by the rolling_update playbook is more efficient as it restarts all the OSDs on a host as one operation and waits for them to rejoin the cluster. The restart task in the handler restarts one OSD at a time and waits for it to join the cluster.	2018-05-22 19:23:07 +02:00
Andrew Schoen	645f61c351	ceph-defaults: remove backwards compat for containerized_deployment The validation module does not get config options with the template syntax rendered, so we're gonna remove that and just default it to False. The backwards compat was schedule to be removed in 3.1 anyway. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-05-18 17:58:24 +02:00
Andrew Schoen	f84c2ba27b	ceph-defaults: fix failing tasks when osd_scenario was not set correctly When devices is not defined because you want to use the 'lvm' osd_scenario but you've made a mistake selecting that scenario these tasks should not fail. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-05-18 17:58:24 +02:00
Andrew Schoen	1f15a81c48	ceph-defaults: move cephfs vars from the ceph-mon role We're doing this so we can validate this in the ceph-validate role Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-05-18 17:58:24 +02:00
Sébastien Han	2f43e9dab5	defaults: restart_osd_daemon unit spaces Extra space in systemctl list-units can cause restart_osd_daemon.sh to fail It looks like if you have more services enabled in the node space between "loaded" and "active" get more space as compared to one space given in command the command[1]. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1573317 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-05-18 17:53:47 +02:00
Simone Caronni	b12bf62c36	Make sure the restart_mds_daemon script is created with the correct MDS name	2018-05-08 20:53:15 +02:00
Sébastien Han	65ba85aff6	Expose /var/run/ceph Useful for softwares that do data collection/monitoring like collectd. They can connect to the socket and then retrieve information. Even though the sockets are exposed now, I'm keeping the docker exec to check the socket, this will allow newer version of ceph-ansible to work with older versions. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1563280 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-20 15:48:32 +02:00
Sébastien Han	bf1e70e8cf	default: extent ceph_uid and gid We now have the ability to detect the uid/gid of the ceph user depending on the distribution we are running on and so we are doing non-container deployements. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-20 15:48:32 +02:00
Sébastien Han	f3656ad167	move create ceph initial directories to default This is needed for both non-container and container deployments. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-20 15:48:32 +02:00
Sébastien Han	641f141c0f	selinux: remove chcon calls We know bindmount with the :z option at the end of the -v command so this will basically run the exact same command as we used to run. So to speak: chcon -Rt svirt_sandbox_file_t /var/lib/ceph Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-19 14:59:37 +02:00
Randy J. Martinez	127a643fd0	ceph-defaults: fix ceph_uid fact on container deployments Red Hat is now using tags[3,latest] for image rhceph/rhceph-3-rhel7. Because of this, the ceph_uid conditional passes for Debian when 'ceph_docker_image_tag: latest' on RH deployments. I've added an additional task to check for rhceph image specifically, and also updated the RH family task for ceph/daemon [centos\|fedora]tags. Signed-off-by: Randy J. Martinez <ramartin@redhat.com>	2018-04-17 16:54:51 +02:00
Guillaume Abrioux	899b0eb451	defaults: check only 1 time if there is a running cluster There is no need to check for a running cluster n*nodes time in `ceph-defaults` so let's add a `run_once: true` to save some resources and time. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-04-16 11:23:00 +02:00
Douglas Fuller	c8573fe0d7	Remove deprecated allow_multimds allow_multimds will be officially deprecated in Mimic, specify it only for all versions of Ceph where it was declared stable. Going forward, specify only max_mds. Signed-off-by: Douglas Fuller <dfuller@redhat.com>	2018-04-12 10:29:17 +02:00
Sébastien Han	82ccbdafbc	ceph-defaults: bring backward compatibility for old syntax If people keep on using the mon_cap, osd_cap etc the playbook will translate this old syntax on the flight. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-11 12:18:34 +02:00
Guillaume Abrioux	66c4118dcd	defaults: fix backward compatibility backward compatibility with `ceph_mon_docker_interface` and `ceph_mon_docker_subnet` was not working since there wasn't lookup on `monitor_interface` and `public_network` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-04-10 00:19:11 +02:00
Sébastien Han	bb60f2fea4	ceph-defaults: fix ceoh_uid for container image tag latest According to our recent change, we now use "CentOS" as a latest container image. We need to reflect this on the ceph_uid. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-09 13:54:55 +02:00
Attila Fazekas	ecd3563c21	Deploying without managed monitors failed Tripleo deployment failed when the monitors not manged by tripleo itself with: FAILED! => {"msg": "list object has no element 0"} The failing play item was introduced by `f46217b69a` . fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1552327 Signed-off-by: Attila Fazekas <afazekas@redhat.com>	2018-04-04 18:16:46 +02:00
Guillaume Abrioux	dcf6a246a4	defaults: remove `run_once: true` when creating fetch_directory because of `serial: 1`, it can be an issue when the playbook is being run on client nodes. Since the refact of `ceph-client` we skip the role `ceph-defaults` on every node except the first client node, it means that the task is not going to be played because of `run_once: true`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-04-04 10:51:17 +02:00
Guillaume Abrioux	cf27c5e941	move selinux check to `ceph-defaults` This check is alone in `ceph-docker-common` since a previous code refactor. Moving this check in `ceph-defaults` allows us to run `ceph-clients` without having to run `ceph-docker-common` even in non-containerized deployment. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-04-04 10:51:17 +02:00
Andrew Schoen	6cffbd5409	ceph-defaults: set is_atomic variable This variable is needed for containerized clusters and is required for the ceph-docker-common role. Typically the is_atomic variable is set in site-docker.yml.sample though so if ceph-docker-common is used outside of that playbook it needs set in another way. Moving the creation of the variable inside this role means playbooks don't need to worry about setting it. fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1558252 Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-03-21 19:16:11 +01:00
Sébastien Han	18402b636f	defaults: add useful info if daemon are not restarted properly If OSDs don't restart normally we now also dump info of the crush map, crush rules, crush tree and pools. If the monitors don't restart normally we also print the socket status by calling mon_status and quorum_status. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-14 14:22:00 +01:00
Sébastien Han	3261ab23b8	osd: remove old crush_location implementation This was causing a lot of pain with the handlers. Also the implementation was not ideal since we were assembling files. Everything can now be done with the ceph_crush module so let's remove that. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-06 15:24:31 +00:00
Andy McCrae	04ca685ba7	Remove vars that are no longer used As part of `fcba2c801a` these vars were removed and no longer do anything: radosgw_dns_name radosgw_resolve_cname This patch removes them from the group_vars files and defaults/main.yml	2018-03-06 09:16:25 +01:00
Sébastien Han	165d9dec10	remove kernel.pid_max This is now managed by Ceph packages. See: https://github.com/ceph/ceph/pull/18544/files http://tracker.ceph.com/issues/21929 Closes: https://github.com/ceph/ceph-ansible/issues/2410 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-02-23 13:57:57 +01:00
Andy McCrae	59a4335a56	Restart services if handler called This patch fixes an issue where if hosts have different service lists, it will prevent restarting changes on services that run later on. For example, hostA in the mons and rgws group would initiate a config change and restart of services on all mons and rgws hosts, even though a separate hostB (which is only in the rgws group) has not had its configuration changed yet. Additionally, when the second host has its coniguration changed as part of the ceph-rgw role, it will not initiate a restart since its inventory name != the first hosts. To fix this we should run the restart once (using run_once: True) as long as the host has called the handler. This will ensure that even if only 1 host has called the handler it will initiate a restart on all hosts that have called the handler. Additionally, we add a var that is set when the handler runs, this will ensure that only hosts that have called the handler get restarted. Includes minor fix to remove unrequired "inventory_hostname in play_hosts" when: clause. This is no longer required since the handlers were changed. The host calling the handler will be in play_hosts already.	2018-02-16 10:40:20 +01:00
Sébastien Han	c816a9282c	container: osd remove run_once When used along with delegate, run_once does not belong well. Thus, using \| last always brings the desired result. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-02-14 02:01:29 +01:00
Sébastien Han	22f843e3d4	default: define 'osd_scenario' variable osd_scenario does not exist in the ceph-default role so if we try to play ceph-default on an OSD node, the playbook will fail with undefined variable. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-02-08 17:42:12 +01:00
Guillaume Abrioux	deaf273b25	syntax: change local_action syntax Use a nicer syntax for `local_action` tasks. We used to have oneliner like this: ``` local_action: wait_for port=22 host={{ hostvars[inventory_hostname]['ansible_default_ipv4']['address'] }} state=started delay=10 timeout=500 }} ``` The usual syntax: ``` local_action: module: wait_for port: 22 host: "{{ hostvars[inventory_hostname]['ansible_default_ipv4']['address'] }}" state: started delay: 10 timeout: 500 ``` is nicer and kind of way to keep consistency regarding the whole playbook. This also fix a potential issue about missing quotation : ``` Traceback (most recent call last): File "/tmp/ansible_wQtWsi/ansible_module_command.py", line 213, in <module> main() File "/tmp/ansible_wQtWsi/ansible_module_command.py", line 185, in main rc, out, err = module.run_command(args, executable=executable, use_unsafe_shell=shell, encoding=None, data=stdin) File "/tmp/ansible_wQtWsi/ansible_modlib.zip/ansible/module_utils/basic.py", line 2710, in run_command File "/usr/lib64/python2.7/shlex.py", line 279, in split return list(lex) File "/usr/lib64/python2.7/shlex.py", line 269, in next token = self.get_token() File "/usr/lib64/python2.7/shlex.py", line 96, in get_token raw = self.read_token() File "/usr/lib64/python2.7/shlex.py", line 172, in read_token raise ValueError, "No closing quotation" ValueError: No closing quotation ``` writing `local_action: shell echo {{ fsid }} \| tee {{ fetch_directory }}/ceph_cluster_uuid.conf` can cause trouble because it's complaining with missing quotes, this fix solves this issue. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1510555 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-01-31 10:45:34 +01:00
Sébastien Han	6f9dd26caa	config: remove any spaces in public_network or cluster_network With two public networks configured - we found that with "NETWORK_ADDR_1, NETWORK_ADDR_2" install process consistently became broken, trying to find docker registry on second network, and not finding mon container. but without spaces "NETWORK_ADDR_1,NETWORK_ADDR_2" install succeeds so, containerized install is more peculiar with formatting of this line Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1534003 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-01-30 17:47:15 +01:00
Andy McCrae	481173f203	Add default for radosgw_keystone_ssl This should default to False. The default for Keystone is not to use PKI keys, additionally, anybody using this setting had to have been manually setting it before. Fixes: #2111	2018-01-30 11:30:23 +01:00
Guillaume Abrioux	f1232b33fd	Revert "monitor_interface: document need to use monitor_address when using IPv6" This reverts commit `10b91661ce`. This reverts also the same comment added in `1359869497`	2018-01-29 14:43:24 +01:00
Guillaume Abrioux	ec16cbdb1a	defaults: avoid getting stuck (ceph --connect-timeout) Sometime the playbook gets stuck because even with `--connect-timeout=` option, the connexion to the existing ceph cluster never timeout. As a workaround, using `timeout` command provided by coreutils will actually timeout if we can't connect to the cluster. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1537003 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-01-25 10:15:59 +01:00
Guillaume Abrioux	900f447c82	containers: fix bug when looking for existing cluster When containerized deployment, `docker_exec_cmd` is not set before the task which try to retrieve the current fsid is played, it means it considers there is no existing fsid and try to generate a new one. Typical error: ``` ok: [mon0 -> mon0] => { "changed": false, "cmd": [ "ceph", "--connect-timeout", "3", "--cluster", "test", "fsid" ], "delta": "0:00:00.179909", "end": "2018-01-09 10:36:58.759846", "failed": false, "failed_when_result": false, "rc": 1, "start": "2018-01-09 10:36:58.579937" } STDERR: Error initializing cluster client: Error('error calling conf_read_file: errno EINVAL',) ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-01-10 16:23:18 +01:00
Guillaume Abrioux	acfbebe67e	defaults: rename check_socket files for containers When containerized deployment, we are not looking for a socket but for a running container. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-01-10 15:44:47 +01:00
Sébastien Han	7eaf444328	default: look for the right return code on socket stat in-use As reported in https://github.com/ceph/ceph-ansible/issues/2254, the check with fuser is not ideal. If fuser is not available the return code is 127. Here we want to make sure that we looking for the correct return code, so 1. Closes: https://github.com/ceph/ceph-ansible/issues/2254 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-12-14 16:59:14 +01:00
Eduard Egorov	a8a2c13f6a	firewall: add mds, nfs, restapi and iscsi ports, remove 'configure_firewall' variable used for conditional execution. Include the task only on rpm-based systems. Signed-off-by: Eduard Egorov <eduard.egorov@icl-services.com>	2017-12-12 23:44:55 +01:00
Eduard Egorov	6a5e0da30d	firewall: configure firewalld if it's already installed on the host (#2192 ). Signed-off-by: Eduard Egorov <eduard.egorov@icl-services.com>	2017-12-12 23:44:55 +01:00
Major Hayden	5676fa23b1	Convert interface names to underscores for facts If a deployer uses an interface name with a dash/hyphen in it, such as 'br-storage' for the monitor_interface group_var, the ceph.conf.j2 template fails to find the right facts. It looks for 'ansible_br-storage' but only 'ansible_br_storage' exists. This patch converts the interface name to underscores when the template does the fact lookup.	2017-12-12 09:03:40 +01:00
Guillaume Abrioux	6a9b5c9632	defaults: fix CI issue with ceph_uid fact The CI complains because of `ceph_uid` fact which doesn't exist since the docker image tag used in the CI doesn't match with this condition. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-12-12 09:02:37 +01:00
John Fulton	ffae294288	Set tighter permissions on keyrings when containerized During a containerized deployment, set the permissions of ceph.client.admin.keyring and other keyrings to chmod 600 and chown it to ceph.	2017-12-06 19:22:28 -05:00
Guillaume Abrioux	b26a840002	handlers: restart daemons only if docker is running In case where docker CLI is available but docker is not running, we don't want to trigger the restart of the daemons. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1510555 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-11-27 14:59:30 +01:00
Sébastien Han	cc264d6ba6	Merge pull request #2151 from hwoarang/add-opensuse Add openSUSE Leap 42.3 support	2017-11-16 14:35:28 +01:00
Markos Chandras	849786967a	ceph-common: Add initial support for openSUSE Leap distributions openSUSE Leap 42.3 provides support for Ceph Luminous in both the distribution package and the latest available version in the OBS repository so add these as the only available installation methods for openSUSE. Signed-off-by: Markos Chandras <mchandras@suse.de>	2017-11-14 10:51:22 +00:00
Guillaume Abrioux	44df3f9102	defaults: fix rgw restart script in handlers Like `80d32dec`, the path to the fact is not correct. In any case, we will retrieve the IP address in hostvars, the variable is the way we get the interface name according where it has been set (eg.: inventory host file vs. group_vars/) Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1510906 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-11-13 16:30:03 +01:00
Sébastien Han	7b0743be52	Merge pull request #2144 from ceph/quick_fix_lvm osd: skip some set_fact when osd_scenario=lvm	2017-11-13 21:50:37 +11:00
Guillaume Abrioux	238754a844	osd: skip some set_fact when osd_scenario=lvm these tasks are not needed when using `osd_scenario: lvm` Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1509230 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-11-07 15:30:08 +01:00
Arano-kai	5cde3175ae	FIX: run restart scripts in `noexec` /tmp - One can not run scripts directly in place, that mounted with `noexec` option. But one can run scripts as arguments for `bash/sh`. Signed-off-by: Arano-kai <captcha.is.evil@gmail.com>	2017-11-06 16:02:47 +02:00
Sébastien Han	6ea92756c0	Merge pull request #2117 from ceph/rm-dup default: remove dup variable	2017-10-27 13:49:52 +02:00
Sébastien Han	d2575c7f5e	default: remove dup variable ceph_repository_type was declared multiple times. This commit fixes this. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-27 11:46:15 +02:00
Sébastien Han	d6a0d2f9be	Merge pull request #2071 from jtaleric/master Docker image pull retry	2017-10-27 09:49:03 +02:00
Sébastien Han	5a10b048b0	Merge pull request #2105 from major/really-fix-always-run Really fix always run	2017-10-27 09:33:47 +02:00
Joe Talerico	ab58764288	Docker image pull retry This change sets a default timeout of 300s for the image pull. If the image pull times out (300s), we will retry 3 times by default. fixes 1954	2017-10-25 13:37:10 -04:00
Major Hayden	f73232caa4	Use check_mode instead of always_run This patch changes the `always_run: yes` task option to `check_mode: no` to avoid Ansible warnings.	2017-10-25 09:53:34 -05:00
Major Hayden	c2b5118c1b	Revert "Avoid deprecated always_run" This reverts commit `620fb37dd4`.	2017-10-25 09:48:09 -05:00
Andy McCrae	7f6c39102d	Option to set TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES Use "ceph_tcmalloc_max_total_thread_cache" to set the TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES value inside /etc/default/ceph for Debian installs, or /etc/sysconfig/ceph for Red Hat/CentOS installs. By default this is set to 0, so the default package value will be used, if specified this value will be changed to match the variable, and ceph osd services will be restarted.	2017-10-25 14:38:36 +01:00
Sébastien Han	4413511b66	all: backward compatibility between stable-2.2 and 3.0 stable-3.0 brought numerous changes in ceph-ansible variables, this PR aims to maintain backward compatibility for someone running stable-2.2 upgrading to stable-3.0 but keeps its groups_vars untouched. We will then determine the right options to make sure the upgrade works but we are expecting that new variables should be used. We will drop this in a near future, maybe 3.1 or 3.2. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-20 11:54:10 +02:00
Sébastien Han	c527515502	Merge pull request #2000 from ceph/merge-osd-scenarios [skip ci] ci: new osd scenarios	2017-10-19 09:18:02 +02:00
Sébastien Han	1579f1c5b1	Merge pull request #2073 from ceph/fix_rbd_handler [skip ci] rbd: fix restart script for jewel	2017-10-18 11:12:05 +02:00
Guillaume Abrioux	c2850b11be	rbd: fix restart script for jewel In Jewel, we don't use bootstrap-rbd keyring for rbd-mirror nodes, it results with a socket path/name different according to which ceph release you are deploying. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-10-18 11:10:49 +02:00
Sébastien Han	a53aa9e8b4	ci: new osd scenarios This commit add new osd scenarios, it aims to simplify the CI setup and brings a better coverage on the OSD scenarios. We decided to differentiate between filestore and bluestore, thinking ahead when filestore won't be supported anymore. So we now have two classes of tests: * Filestore * Bluestore In each of those classes we have container and non-container. Then for each we test the following: * collocated * collocated dmcrypt * non-collocated * non-collocated dmcrypt * auto discovery collocated * auto discovery collocated dmcrypt This gives us a nice coverage and also reduces the footprint on the CI. We are now up to 4 scenarios, each containing 6 OSD VMs. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-18 09:26:06 +02:00
Sébastien Han	90b75185d5	defaults: fix handlers for collocation When doing collocation the condition "inventory_hostname in play_hosts" is breaking the restart workflow. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-17 19:23:16 +02:00
Christian Berendt	4c380c9ef8	Cleanup readme files in roles directories The contents of the README files are no longer up to date. Documentation for all roles is located below the docs directory.	2017-10-17 11:22:06 +02:00
Major Hayden	620fb37dd4	Avoid deprecated always_run The `always_run` key is deprecated and being removed in Ansible 2.4. Using it causes a warning to be displayed: [DEPRECATION WARNING]: always_run is deprecated. This patch changes all instances of `always_run` to use the `always` tag, which causes the task to run each time the playbook runs.	2017-10-12 08:29:44 -05:00
Sébastien Han	7054abef99	Merge pull request #2009 from ceph/fix-clean-pg [skip ci] handler: do not test if pgs_num = 0	2017-10-07 03:39:26 +02:00
Sébastien Han	9f1bd3d6dd	handler: add serial restart back We now restart daemons on each machine in a serialized fashion. Closes: https://github.com/ceph/ceph-ansible/issues/1989 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-07 03:39:10 +02:00
Sébastien Han	ac29e8f977	Merge pull request #1983 from jprovaznik/suffix Allow to override systemd service instance id	2017-10-06 22:40:57 +02:00
Sébastien Han	779f642fa8	use get to check stdout_lines During the initial play, the docker command doesn't not exist and then there is no stdout_lines to the command. So get allows us to fix this by declaring an array if the command fails. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-06 16:57:46 +02:00
Sébastien Han	d5ae0a3340	handler: do not test if pgs_num = 0 We don't need to wait if they are no PGS. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-06 16:57:46 +02:00
Jan Provaznik	3c16af5ef2	Allow to override systemd service instance id It's useful to have constant service instance id when ceph-nfs is managed by pacemaker.	2017-10-06 08:20:37 +02:00
Sébastien Han	425ecb3c7d	common: fix ga verison for debian rhcs Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-05 18:45:30 +02:00
Sébastien Han	639389b9cd	Merge pull request #1985 from ceph/debian-rhcs [skip ci] common: fix rhcs installation on debian	2017-10-05 18:42:46 +02:00
Sébastien Han	9193e88878	common: fix rhcs installation on debian * Change version from 2 to 3. * use ceph_rhcs_cdn_debian_repo_version to use other repositories along * with ceph_rhcs_cdn_debian_repo Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-05 17:42:21 +02:00
Sébastien Han	b6b24a5ca9	iscsi: fix wrong group name for iscsi Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1498490 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-05 17:25:32 +02:00
Sébastien Han	cac7d034bf	defaults: fix check socket non-container handler Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-04 15:33:52 +02:00
Sébastien Han	5968cf09b1	ci: add collocation scenario Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-04 11:19:12 +02:00
Sébastien Han	0ce76113bf	Merge pull request #1956 from ceph/osd-container-id Osd container	2017-10-03 18:52:24 +02:00
Sébastien Han	3bd341f6c0	osd: container use id instead of dev name Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1494127 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-03 14:44:00 +02:00
Guillaume Abrioux	081f226106	defaults: change running order in main.yml The task which sets `ceph_current_fsid` in `facts.yml` in case of containerized deployment, will definitely fail because `docker_exec_cmd` is not set yet. This commits simply makes `facts.yml` played after `check_socket.yml` so `docker_exec_cmd` is set properly. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-10-02 18:42:43 +02:00
Sébastien Han	e121bc58e9	defaults: add missing handlers for rbd mirorr and mgr Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-29 02:38:24 +02:00
Sébastien Han	048b55be4a	defaults: only run socket checks on their specific roles Running the socket check on all the hosts will override the default value of docker_exec_cmd, leaving it with the last value (currently rbd-mirror), as a result the subsequent docker_exec_cmd usage for the :x Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-29 02:38:24 +02:00
Sébastien Han	341c9e077b	nfs: fix container setup and re-arrange files Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-29 02:38:24 +02:00
Sébastien Han	8b6456dc8a	handler: enhance socket detection We have seen issues with leftover socker. So now, if a socket is found we also check if it's accessed by a process. If so, we can run the handler, if not we remove it and continue the playbook. Signed-off-by: Sébastien Han <seb@redhat.com> Co-Authored-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-09-25 13:44:51 +02:00
Sébastien Han	8f71c08e7b	handler: display ceph status properly Fix bash error, doing ceph "$CEPH_CLI" -s gives us ceph '--name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/test.keyring --cluster test' -s which results in a wrongly formatted command. Removing the double quotes expands the array properly. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-22 17:45:35 +02:00
Sébastien Han	adf5017924	config: remove max open file This is only used by the old sysvinit scripts Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-20 23:06:01 +02:00
Sébastien Han	d100b4e596	name includes and set_fact for clarity When Ansible is not run with verbose options it's difficult to see which include and/or set_fact does what. So adding a name for each clarifies. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-18 23:39:46 +02:00
Sébastien Han	ed3003cf41	defaults: restart docker daemon higher delay Use default delay since the mon (in particular) can take more time to restart. Solves error with: STDERR: Error response from daemon: No such container: ceph-mon-mon0 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-14 13:38:11 -06:00
Sébastien Han	aa364264cd	resync ceph-iscsi-gw with old upstream Taken from https://github.com/pcuzner/ceph-iscsi-ansible/tree/tcmu-fixes Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1454945 and https://bugzilla.redhat.com/show_bug.cgi?id=1484083 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-12 18:06:10 -06:00
Sébastien Han	d46d453b83	Merge pull request #1780 from ceph/wip-rgw-nfs Wip RGW NFS	2017-09-08 19:26:02 +02:00
Ali Maredia	55724c6e93	nfs-ganesha: add dev, stable, and rhcs nfs-ganesha's for ceph-nfs role Signed-off-by: Ali Maredia <amaredia@redhat.com>	2017-09-08 09:13:20 -04:00
Sébastien Han	12f6e53090	defaults: do not restart unconfigured (yet) daemons In a collocated scenario, where you might put a rgw, a mds and a mon on the same node you don't want the handler blindly restart all the daemons on the node. Indeed some of them might not be configured yet. Implementing a more precise socket detection, for each daemon type. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1488813 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-08 12:02:37 +02:00
Sébastien Han	3dd47a45cb	ceph-defaults: fix handlers for mds and rgw The way we handle the restart for both mds and rgw is not ideal, it will try to restart the daemon on the host that don't run the daemon, resulting in a service file being created (see bug description). Now we restart each daemon precisely and in a serialized fashion. Note: the current implementation does NOT support multiple mds or rgw on the same node. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1469781 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-08-31 19:02:21 +02:00
Sébastien Han	ae2fd45994	common: refactor installation method The installation process is now described as follow: * you still have to choose a 'ceph_origin' installation method. The origin can be a 'repository' (add a new repository), distro (it will use the packages provided by the native repo source of your distribution), local (only available on redhat system, it installs locally built packages). This option is not well tested, so use it carefully * if ceph_origin == 'repository' you will have to decide what kind of repository you want to enable: - community: corresponds to the stable upstream/community version - enterprise: corresponds to the stable enterprise/downstream version (basically you are a red hat customer) - dev: it will install ceph from packages built out of the github development branches Signed-off-by: Sébastien Han <seb@redhat.com> Co-Authored-by: Guillaume Abrioux <gabrioux@redhat.com> Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-08-30 10:52:01 +02:00
Sébastien Han	5743916092	common: add mimic release facts Signed-off-by: Sébastien Han <seb@redhat.com>	2017-08-29 17:21:37 +02:00
Sébastien Han	29753da05c	handler: default to empty array if task skipped with_items is evaluated before the when condition so if the task that registers the 'results' is skipped the task will fail with: {"failed": true, "msg": "'dict object' has no attribute 'results'"} Defaulting to an empty array fixes the issue. Reverts: `abdd66619e` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1482061 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-08-25 18:39:00 +02:00
Sébastien Han	1f4082f200	update meta for ansible galaxy Closes: https://github.com/ceph/ceph-ansible/issues/1637 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-08-25 00:05:44 +02:00
Guillaume Abrioux	539197a2fc	Introduce new role ceph-config. This will give us more flexibility and the possibility to deploy a client node for an external ceph-cluster. related BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1469426 Fixes: #1670 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-08-24 11:33:03 +02:00
Sébastien Han	f2499ff5ac	Merge pull request #1788 from ceph/improve-switch switch-from-non-containerized-to-containerized: simplify	2017-08-23 19:47:26 +02:00
Sébastien Han	4f0ecb7f30	switch-from-non-containerized-to-containerized: simplify This commit eases the use of the infrastructure-playbooks/switch-from-non-containerized-to-containerized-ceph-daemons.yml playbook. We basically run it with a couple of pre-tasks and then we let the playbook run the docker roles. It obviously expect to have proper variables configured in order to work. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-08-23 18:39:45 +02:00
Sébastien Han	d9b3d4a981	Merge pull request #1731 from SirishaGuduru/rgw-civetwebIP-conf Common: changed civetweb line in rgw section(conf)	2017-08-23 15:33:08 +02:00
SirishaGuduru	1359869497	Common: changed civetweb line in rgw section(conf) Resolves issue: Multiple RGW Ceph.conf Issue #1258 In multi-RGW setup, in ceph.conf the RGW sections contain identical bind IP in civetweb line. So this modification fixes that issue and puts the right IP for each RGW. Signed-off-by: SirishaGuduru SGuduru@walmartlabs.com Modified ceph-defaults and ran generate_group_vars_sample.sh group_vars/osds.yml.sample and group_vars/rhcs.yml.sample are not part of the changes. But they got modified when generate_group_vars_sample.sh is ran to generate group_vars/ all.yml.sample. Uncommented added variables in ceph-defaults Updated tests by adding value for radosgw_interface Added radosgw_interface to centos cluster tests Modified ceph-rgw role,rebased and ran generate_group_vars_sample.sh In ceph-rgw role removed check_mandatory_vars.yml. Rebased on master. Ran generate_group_vars_sample.sh and then the below files got modified.	2017-08-23 15:03:37 +05:30
Sébastien Han	abdd66619e	ceph-defaults: fix handler for osd container Problem: task "check for a ceph socket in containerized deployment" will be skipped if we are not an OSD. with_items are still evaluated before when conditions so if the task was skipped the dict will be empty and then fail. Adding a "not skipped" condition skips the execution of the task. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1482061 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-08-22 11:56:05 +02:00
Sébastien Han	19ae8b42e6	resync group_vars files Signed-off-by: Sébastien Han <seb@redhat.com>	2017-08-22 11:31:49 +02:00
Andrew Schoen	be78bc1a90	ceph-defaults: fix containerized osd restarts This needs to check `containerized_deployment` because socket_osd_container is undefined otherwise. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-08-04 06:38:38 -05:00
Sébastien Han	7559a2deff	common: automate setting up online repositories for ceph deployments on debian nodes This commits automates the process of setting up online repositories for Red Hat Ceph Storage on Debian nodes. The manual steps are currently described here: https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/2/html/installation_guide_for_ubuntu/prerequisites#online_repositories If you are an RHCS customer and run a Debian based system you can now access package through the Red Hat CDN. For this set: ceph_rhcs and ceph_rhcs_cdn_install to true. Then set your customer credentials in ceph_rhcs_cdn_debian_repo. Replace customername:customerpasswd with your details. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1434175 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-08-03 17:15:07 +02:00
Guillaume Abrioux	7a333d05ce	Add handlers for containerized deployment Until now, there is no handlers for containerized deployments. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-08-02 17:12:20 +02:00
Guillaume Abrioux	fc6b6e9859	Move basics facts to `ceph-defaults` Move `fsid`,`monitor_name`,`docker_exec_cmd` and `ceph_release` set_fact to `ceph-defaults` role. It will allow to reuse these facts without having to play `ceph-common` or `ceph-docker-common`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-08-02 17:12:20 +02:00
Guillaume Abrioux	7322526838	Add new role `ceph-defaults` Add a new role `ceph-defaults`. This role aims to handle all defaults vars for `ceph-docker-common` and `ceph-common` and set basic facts (eg. `fsid`) Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-08-02 14:46:57 +02:00
Guillaume Abrioux	1d003aa887	merge docker-common and common defaults vars Merge `ceph-docker-common` and `ceph-common` defaults vars in `ceph-defaults` role. Remove redundant variables declaration in `ceph-mon` and `ceph-osd` roles. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-08-02 14:46:51 +02:00

... 3 4 5 6 7

320 Commits (b2273ef4b8084a8ae32124d71f509a1160f4a4e3)