ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Guillaume Abrioux	564a662baf	osds: move openstack pools creation in ceph-osd When deploying a large number of OSD nodes it can be an issue because the protection check [1] won't pass since it tries to create pools before all OSDs are active. The idea here is to move openstack pools creation at the end of `ceph-osd` role. [1] `e59258943b/src/mon/OSDMonitor.cc (L5673)` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1578086 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-05-24 09:39:38 -07:00
Vishal Kanaujia	ef5f52b1f3	Skip GPT header creation for lvm osd scenario The LVM lvcreate fails if the disk already has a GPT header. We create GPT header regardless of OSD scenario. The fix is to skip header creation for lvm scenario. fixes: https://github.com/ceph/ceph-ansible/issues/2592 Signed-off-by: Vishal Kanaujia <vishal.kanaujia@flipkart.com>	2018-05-23 11:44:09 -07:00
Andrew Schoen	32bac6b491	ceph-validate: move var checks from ceph-osd into this role Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-05-18 17:58:24 +02:00
Andy McCrae	08a2b58d39	Allow os_tuning_params to overwrite fs.aio-max-nr The order of fs.aio-max-nr (which is hard-coded to 1048576) means that if you set fs.aio-max-nr in os_tuning_params it will effectively be ignored for bluestore scenarios. To resolve this we should move the setting of fs.aio-max-nr above the setting of os_tuning_params, in this way the operator can define the value of fs.aio-max-nr to be something other than 1048576 if they want to. Additionally, we can make the sysctl settings happen in 1 task rather than multiple.	2018-05-11 10:49:37 +01:00
Sébastien Han	65ba85aff6	Expose /var/run/ceph Useful for softwares that do data collection/monitoring like collectd. They can connect to the socket and then retrieve information. Even though the sockets are exposed now, I'm keeping the docker exec to check the socket, this will allow newer version of ceph-ansible to work with older versions. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1563280 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-20 15:48:32 +02:00
Sébastien Han	641f141c0f	selinux: remove chcon calls We know bindmount with the :z option at the end of the -v command so this will basically run the exact same command as we used to run. So to speak: chcon -Rt svirt_sandbox_file_t /var/lib/ceph Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-19 14:59:37 +02:00
Sébastien Han	d2a2793cb0	refactor the way we copy keys This commit does a couple of things: * use a common.yml file that contains things that can be played on both container and non-container * refactor the ability to copy the admin key to the nodes Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-18 16:46:33 +02:00
Sébastien Han	5bbbce527e	osd: do not do anything if the dev has a partition Regardless if the partition is 'ceph' or something else, we don't want to be as strick as checking for a particular partition. If the drive has a partition, we just don't do anything. This solves the case where the server reboots, disks get a different /dev/sda (node) allocation. In this case, prior to restarting the server /dev/sda was an OSD, but now it's /dev/sdb and the other way around. In such scenario, we will try to prepare the OSD and create a new partition, so let's not mess around with devices that have partitions. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1498303 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-13 19:11:15 +02:00
vasishta p shastry	e1a1f81b6f	osd: to support copy_admin_key	2018-04-11 14:21:15 +02:00
Sébastien Han	e3275c1ca1	osd: add fs.aio-max-nr tuning The number of osds per nodes is limited by aio-max-nr, default is low, so we need to increase it. Full story: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-August/020408.html Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1553407 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-15 14:06:26 +01:00
Sébastien Han	f432819c1e	osd: apply systcl right away Without sysctl_set: yes the sysctm tuning will only get applied on the systctl.conf but not on the fly. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-15 14:06:26 +01:00
Sébastien Han	0f8a4251ba	move system tuning to osd role The changes from these tasks only apply to osd nodes so there is no reason to have them in ceph-common. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-15 14:06:26 +01:00
Sébastien Han	3261ab23b8	osd: remove old crush_location implementation This was causing a lot of pain with the handlers. Also the implementation was not ideal since we were assembling files. Everything can now be done with the ceph_crush module so let's remove that. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-06 15:24:31 +00:00
Caleb Boylan	0be60456ce	osd: Add support for multipath disks Multipath disks have partitions with a different format than what ceph-ansible currently supports, this update makes ceph-ansible aware of that format so multipath disks can be used as OSDs Signed-off-by: Caleb Boylan <caleb.boylan@ormuco.com>	2018-02-09 18:06:25 +01:00
Guillaume Abrioux	deaf273b25	syntax: change local_action syntax Use a nicer syntax for `local_action` tasks. We used to have oneliner like this: ``` local_action: wait_for port=22 host={{ hostvars[inventory_hostname]['ansible_default_ipv4']['address'] }} state=started delay=10 timeout=500 }} ``` The usual syntax: ``` local_action: module: wait_for port: 22 host: "{{ hostvars[inventory_hostname]['ansible_default_ipv4']['address'] }}" state: started delay: 10 timeout: 500 ``` is nicer and kind of way to keep consistency regarding the whole playbook. This also fix a potential issue about missing quotation : ``` Traceback (most recent call last): File "/tmp/ansible_wQtWsi/ansible_module_command.py", line 213, in <module> main() File "/tmp/ansible_wQtWsi/ansible_module_command.py", line 185, in main rc, out, err = module.run_command(args, executable=executable, use_unsafe_shell=shell, encoding=None, data=stdin) File "/tmp/ansible_wQtWsi/ansible_modlib.zip/ansible/module_utils/basic.py", line 2710, in run_command File "/usr/lib64/python2.7/shlex.py", line 279, in split return list(lex) File "/usr/lib64/python2.7/shlex.py", line 269, in next token = self.get_token() File "/usr/lib64/python2.7/shlex.py", line 96, in get_token raw = self.read_token() File "/usr/lib64/python2.7/shlex.py", line 172, in read_token raise ValueError, "No closing quotation" ValueError: No closing quotation ``` writing `local_action: shell echo {{ fsid }} \| tee {{ fetch_directory }}/ceph_cluster_uuid.conf` can cause trouble because it's complaining with missing quotes, this fix solves this issue. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1510555 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-01-31 10:45:34 +01:00
Sébastien Han	5132cc3de4	Do not search osd ids if ceph-volume Description of problem: The 'get osd id' task goes through all the 10 times (and its respective timeouts) to make sure that the number of OSDs in the osd directory match the number of devices. This happens always, regardless if the setup and deployment is correct. Version-Release number of selected component (if applicable): Surely the latest. But any ceph-ansible version that contains ceph-volume support is affected. How reproducible: 100% Steps to Reproduce: 1. Use ceph-volume (LVM) to deploy OSDs 2. Avoid using anything in the 'devices' section 3. Deploy the cluster Actual results: TASK [ceph-osd : get osd id _uses_shell=True, _raw_params=ls /var/lib/ceph/osd/ \| sed 's/.-//'] ********************************************************************************************************************************************* task path: /Users/alfredo/python/upstream/ceph/src/ceph-volume/ceph_volume/tests/functional/lvm/.tox/xenial-filestore-dmcrypt/tmp/ceph-ansible/roles/ceph-osd/tasks/start_osds.yml:6 FAILED - RETRYING: get osd id (10 retries left). FAILED - RETRYING: get osd id (9 retries left). FAILED - RETRYING: get osd id (8 retries left). FAILED - RETRYING: get osd id (7 retries left). FAILED - RETRYING: get osd id (6 retries left). FAILED - RETRYING: get osd id (5 retries left). FAILED - RETRYING: get osd id (4 retries left). FAILED - RETRYING: get osd id (3 retries left). FAILED - RETRYING: get osd id (2 retries left). FAILED - RETRYING: get osd id (1 retries left). ok: [osd0] => { "attempts": 10, "changed": false, "cmd": "ls /var/lib/ceph/osd/ \| sed 's/.*-//'", "delta": "0:00:00.002717", "end": "2018-01-21 18:10:31.237933", "failed": true, "failed_when_result": false, "rc": 0, "start": "2018-01-21 18:10:31.235216" } STDOUT: 0 1 2 Expected results: There aren't any (or just a few) timeouts while the OSDs are found Additional info: This is happening because the check is mapping the number of "devices" defined for ceph-disk (in this case it would be 0) to match the number of OSDs found. Basically this line: until: osd_id.stdout_lines\|length == devices\|unique\|length Means in this 2 OSD case it is trying to ensure the following incorrect condition: until: 2 == 0 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1537103	2018-01-30 14:44:38 +01:00
Andrew Schoen	79473badfe	ceph-osd: adds dmcrypt to the lvm scenario Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-01-24 14:10:08 +01:00
Andrew Schoen	6cbb56a3b6	ceph-osd: adds the crush_device_class param to the lvm scenario Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-01-17 13:49:29 +01:00
Sébastien Han	6db4aea453	osd: skip devices marked as '/dev/dead' On a non-collocated scenario, if a drive is faulty we can't really remove it from the list of 'devices' without messing up or having to re-arrange the order of the 'dedicated_devices'. We want to keep this device list ordered. This will prevent the activation failing on a device that we know is failing but we can't remove it yet to not mess up the dedicated_devices mapping with devices. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-01-11 17:34:32 +01:00
Guillaume Abrioux	70401f955b	container: trigger handlers on systemd file change When a systemd unit file is changed we should trigger handlers to restart the services. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-01-10 16:46:42 +01:00
Guillaume Abrioux	895949d6c4	osd: fix check gpt the gpt label creation doesn't work even with parted module. This commit fixes the gpt label creation by using parted command instead. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-12-20 17:42:45 +01:00
Konstantin Shalygin	d7dadc3e7b	ceph-osd: respect nvme partitions when device is a disk.	2017-12-12 09:03:18 +01:00
Andrew Schoen	788c3f351a	ceph-osd: adds osd_objectstore to the name when using the ceph_volume module This allows for easier debugging if verbosity is not set high enough. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-12-11 09:58:06 -06:00
Andrew Schoen	5e3d8dbf63	ceph-osd: use the cluster param with the ceph_volume module Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-12-11 09:58:06 -06:00
Andrew Schoen	423166f671	ceph-osd: use the new ceph_volume module for the lvm scenario Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-12-11 09:58:06 -06:00
Andy McCrae	4f1e854c79	Use parted module instead of command	2017-12-11 17:33:40 +10:00
Guillaume Abrioux	b449b16edd	Merge pull request #2215 from squidboylan/support_loopback_devices Add support for using loopback devices as OSDs	2017-11-28 14:04:47 +01:00
Caleb Boylan	8f02bb007f	Add support for using loopback devices as OSDs This is particularly useful in CI environments where you dont have the option of adding extra devices or volumes to the host. It is also a simple change to support loopback devices	2017-11-27 16:02:36 -08:00
Guillaume Abrioux	1cba626484	osd: remove leftover and fix a typo This task was originally needed to fix a docker installation issue (see: #1030). This has been fixed, therefore it can be removed. Fixes: #2199 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-11-21 11:11:34 +01:00
Guillaume Abrioux	efe06be10f	osd: ensure a gpt label is set on device ceph-disk prepare will fail on jewel if a GPT label is not present on device. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-11-17 17:32:23 +01:00
Sébastien Han	932345ab2a	osd: remove leftover from osd partition We used to support osds that are a partition. This is long gone so removing this task. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-11-16 14:58:40 +01:00
Sébastien Han	b1c1322357	osd: remove failed_when on activation There is no need to continue if the activation fails. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-11-16 14:57:49 +01:00
Sébastien Han	80d3a242d0	osd: fix bad activation for dmcrypt We were activating dmcrypt devices with the wrong command. Basically the first task execute the wrong activate command. The task fails but continues because of the 'failed_when: false'. Then the right activation sequence is being done by the next task. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-11-16 14:55:08 +01:00
Andrew Schoen	3c604f1115	lvm: support --data as a raw device or partition in ceph-volume Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-11-15 09:36:17 -06:00
Andrew Schoen	04f02910a9	lvm: ensure the data_vg exists before using it Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-11-15 09:36:17 -06:00
Guillaume Abrioux	aa0b1ed118	tests: remove OSD_FORCE_ZAP variable from tests according to ceph/ceph-container#840, this variable is no longer needed. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-11-14 17:55:01 +01:00
Guillaume Abrioux	0369bd59e2	Merge pull request #2146 from mslovy/wip-fix-crush-location osd: fix crush location for non-containerized deployment	2017-11-13 12:23:44 +01:00
Guillaume Abrioux	c06faf2deb	Merge pull request #2154 from ceph/fix_auto_discover osd: avoid using non desired loop device in autodiscovery	2017-11-10 01:19:20 +01:00
Guillaume Abrioux	591d77220e	osd: always run disk_list test there is no need to have a condition on this task, this test should be always run since the result will be interpreted later. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-11-09 11:51:16 +01:00
Guillaume Abrioux	43975a7332	osd: avoid using non desired loop device in autodiscovery This will prevent ceph-ansible from using a loop device while it shouldn't in auto_discovery mode. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-11-09 10:26:24 +01:00
Guillaume Abrioux	d5dfc63c89	osd: fix automatic prepare when auto_discover Use `devices` variable instead of `ansible_devices`, otherwise it means we are not using the devices which have been 'auto discovered' Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-11-08 10:20:44 +01:00
yaoning	d82a09dddd	fix crush location for non-containerized deployment crush location only set for containerized deployment Signed-off-by: yaoning <yaoning@unitedstack.com>	2017-11-08 12:05:10 +11:00
Sébastien Han	0930f14915	osd: do not use dm when osd_auto_discovery The current code will also return lvm devices such as /dev/dm-2, this kind of device type is not supported by ceph-disk at the moment. Now we just ignore them. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-11-08 11:33:10 +11:00
Sébastien Han	d4ed9a2064	osd: enhance backward compatibility During the initial implementation of this 'old' thing we were falling into this issue without noticing https://github.com/moby/moby/issues/30341 and where blindly using --rm, now this is fixed the prepare container disappears and thus activation fail. I'm fixing this for old jewel images. Also this fixes the machine reboot case where the docker logs are purgend. In the old scenario, we now store the log locally in the same directory as the ceph-osd-run.sh script. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-11-03 11:15:23 +01:00
Sébastien Han	faccd0acf0	Merge pull request #2100 from ceph/lvm-bluestore ceph-volume lvm bluestore support	2017-10-27 17:36:16 +02:00
Alfredo Deza	517a2b3feb	ceph-osd skip lvm creation if they are already in use Signed-off-by: Alfredo Deza <adeza@redhat.com>	2017-10-27 11:33:54 -04:00
Sébastien Han	5a10b048b0	Merge pull request #2105 from major/really-fix-always-run Really fix always run	2017-10-27 09:33:47 +02:00
Sébastien Han	07e2a783f8	Merge pull request #2084 from ceph/backward-osd-2.4 osd: bring backward compatibility with old Jewel images	2017-10-25 17:33:49 +02:00
Major Hayden	f73232caa4	Use check_mode instead of always_run This patch changes the `always_run: yes` task option to `check_mode: no` to avoid Ansible warnings.	2017-10-25 09:53:34 -05:00
Major Hayden	c2b5118c1b	Revert "Avoid deprecated always_run" This reverts commit `620fb37dd4`.	2017-10-25 09:48:09 -05:00
Alfredo Deza	d3b427e169	ceph-osd lvm scnearios are no longer limited to filestore Signed-off-by: Alfredo Deza <adeza@redhat.com>	2017-10-25 08:23:45 -04:00
Alfredo Deza	df05e63c10	ceph-osd use --cluster in ceph-volume calls Signed-off-by: Alfredo Deza <adeza@redhat.com>	2017-10-25 08:23:45 -04:00
Alfredo Deza	628d98a92c	ceph-osd add the CEPH_VOLUME_DEBUG env var to all ceph-volume commands Signed-off-by: Alfredo Deza <adeza@redhat.com>	2017-10-25 06:50:22 -04:00
Alfredo Deza	bbc3672253	ceph-osd: lvm support for bluestore Signed-off-by: Alfredo Deza <adeza@redhat.com>	2017-10-25 06:46:39 -04:00
John Fulton	7a7ddab6c2	Require osd_scenario parameter to be provided in containerized deploy Fixes: #2095	2017-10-23 15:16:03 +00:00
Sébastien Han	968ef04324	osd: bring backward compatibility with old Jewel images There was a huge resync from luminous to jewel in ceph-docker: https://github.com/ceph/ceph-docker/pull/797 This change brought a new handy function to discover partitions tight to an OSD. This function doesn't exist in the old image so the ceph-osd-run.sh script breaks when trying to deploy Jewel OSD with that old Jewel image version. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-20 16:26:41 +02:00
Sébastien Han	a53aa9e8b4	ci: new osd scenarios This commit add new osd scenarios, it aims to simplify the CI setup and brings a better coverage on the OSD scenarios. We decided to differentiate between filestore and bluestore, thinking ahead when filestore won't be supported anymore. So we now have two classes of tests: * Filestore * Bluestore In each of those classes we have container and non-container. Then for each we test the following: * collocated * collocated dmcrypt * non-collocated * non-collocated dmcrypt * auto discovery collocated * auto discovery collocated dmcrypt This gives us a nice coverage and also reduces the footprint on the CI. We are now up to 4 scenarios, each containing 6 OSD VMs. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-18 09:26:06 +02:00
Major Hayden	c01851325e	Remove jinja2 delimiters from `when` keys This patch changes the `when:` keys so that they have no jinja2 delimiters. This avoids Ansible warnings which could turn into errors in a future Ansible release.	2017-10-12 11:27:42 -05:00
Major Hayden	620fb37dd4	Avoid deprecated always_run The `always_run` key is deprecated and being removed in Ansible 2.4. Using it causes a warning to be displayed: [DEPRECATION WARNING]: always_run is deprecated. This patch changes all instances of `always_run` to use the `always` tag, which causes the task to run each time the playbook runs.	2017-10-12 08:29:44 -05:00
Sébastien Han	c693e95cbf	purge-docker: rework device detection we don't need "devices" and other device variable anymore, the playbook detects that for us. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-07 03:39:04 +02:00
Guillaume Abrioux	6b027557e6	osd: fix `set_fact build dedicated_devices` Use an intermediate variable to build the final `dedicated_devices` list to avoid duplicate entry in that array. (We need a 1:1 relation between `dedicated_devices` and `devices` since we are using a `with_together` later. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-10-06 15:00:32 +02:00
Sébastien Han	29888649e5	osd: do not do unique on dedicated_devices This is needed later, if we do unique, only the first OSD will get a journal. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-05 18:20:18 +02:00
Michel Rode	b462b68e65	Fixing path to osd_fragment.yml	2017-10-05 14:42:10 +02:00
Guillaume Abrioux	70e2787fe2	docker: fix keyrings copied on all nodes All keyring are getting copied to all nodes. This commit fixes a leftover from a previous code refactor. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1498583 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-10-05 09:23:22 +02:00
Guillaume Abrioux	784cc73da0	set docker_exec_cmd fact early in each role This is to ensure `docker_exec_cmd` fact is set with the correct value in case of daemons collocation Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-10-04 11:31:09 +02:00
Sébastien Han	3bd341f6c0	osd: container use id instead of dev name Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1494127 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-03 14:44:00 +02:00
Sébastien Han	ba42894516	osd: do not copy admin key on collocated scenario ceph-disk used to have a bug requiring the admin key to store the encrypted key in the mon kv store. This was reported in: http://tracker.ceph.com/issues/17849 Fixed and backported here: https://github.com/ceph/ceph/pull/11996 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-03 14:44:00 +02:00
Guillaume Abrioux	466f6f35b7	Use systemd module instead of service. Using systemd module allows us to do in one task what we did in three tasks: - enable unit file, - issue a `daemon-reload`, - start the service Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-09-29 14:54:00 +02:00
Guillaume Abrioux	913ad53709	docker: add condition to run selinux tasks only on rhel os family This fixes the error : ``` The conditional check 'sestatus.stdout != 'Disabled'' failed. ``` that occurs when running on non rhel based system since the `sestatus` fact is registered only on rhel based distribution. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-09-29 02:35:07 +02:00
Sébastien Han	cb05172605	docker: we don't need to copy the ceph.conf on all the nodes We generate the ceph.conf on all the nodes through the ceph-docker-common so there is no need to push it to the Ansible file. Also this is breaking the ceph.conf template generation since we only generate sections based on the host the ansible task is running on. For example, what's typically happening, we bootstrap the monitor, we get a ceph.conf generated for a mon only, we go on an osd, we generate the ceph.conf with osd section (done by ceph-docker-common) but this gets overwritten by the copy_config task of the ceph-osd role. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-20 16:33:29 +02:00
Sébastien Han	d100b4e596	name includes and set_fact for clarity When Ansible is not run with verbose options it's difficult to see which include and/or set_fact does what. So adding a name for each clarifies. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-18 23:39:46 +02:00
Sébastien Han	66d41f342d	Merge pull request #1889 from ceph/client-containers client: ability to create keys and pool with no ceph binaries	2017-09-18 17:27:32 +02:00
Sébastien Han	660893e70e	osd: add meaningful message for journal_size Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-13 23:49:15 -06:00
Sébastien Han	ef8d37dd0d	Merge pull request #1800 from ceph/wip-osd-start-fix ceph-osd: Fix osd start sequence	2017-09-13 17:20:10 -06:00
Sébastien Han	f67b47d056	Merge pull request #1882 from ceph/multi-journal osd: drop support for device partition	2017-09-13 11:43:48 -06:00
Sébastien Han	ac62437609	Merge pull request #1883 from ceph/quick_refact osd: refact include of `activate_osds.yml`	2017-09-12 22:11:31 -06:00
Sébastien Han	fdf924401f	osd: drop support for device partition We have been struggling with this, it's still broken and breaking other things too now. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1490283 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-12 17:42:07 -06:00
Guillaume Abrioux	49ad8528e5	osd: refact include of `activate_osds.yml` remove duplicate code. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-09-12 16:53:11 -06:00
Guillaume Abrioux	0f506f4f0a	Docker: split the task 'copy ceph configs&keys' All keys are copied to all nodes. This commit split that task in each roles so keys are copied to their respective nodes. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1488999 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-09-11 21:14:13 +02:00
Sébastien Han	3753e6cfa7	ceph-osd: fix autodetection activation Prior to this patch this activation sequence for autodetection was always skipped because we were asking to activate on device without partitions, which doesn't make sense. We also fix the way we lookup for a device, since the data partition is always numbered 1, we take the min element of the dict. Closes: https://github.com/ceph/ceph-ansible/issues/1782 Signed-off-by: Sébastien Han <seb@redhat.com> Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-09-07 17:47:37 +02:00
Sébastien Han	1dd976d28e	ceph-osd: do not re-prepare if alreadyy prepared I forgot to re-add the partition check while refactoring the osd Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-05 09:51:57 +02:00
Andrew Schoen	fcba9d17f0	ceph-osd: add support for --journal vg/lv for lvm osds This also updates the tests Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-08-30 15:55:16 -05:00
Sébastien Han	e0a264c7e9	osd: allow multi dedicated journals for containers Fix: https://bugzilla.redhat.com/show_bug.cgi?id=1475820 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-08-30 12:34:06 +02:00
Boris Ranto	5f1b8fcd75	ceph-osd: Fix osd start sequence The script can fail to get the osd id because the osds are activated by udev and it can take a while for them to activate. This commit fixes that by trying to get all the osds per node in a loop. This commit also makes the osd services enabled so that they are available after reboot. Signed-off-by: Boris Ranto <branto@redhat.com>	2017-08-25 13:40:04 +02:00
Andrew Schoen	758c31b1cd	ceph-osd: ceph-volume requires --data to be in vg/lv format Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-08-23 13:43:31 -05:00
Andrew Schoen	594d5e017a	ceph-osd: restructure lvm_volumes variable for more flexiblity The lvm_volumes variable is now a list of dictionaries that represent each OSD you'd like to deploy using ceph-volume. Each dictionary must have the following keys: data, journal and data_vg. Each dictionary also can optionaly provide a journal_vg key. The 'data' key represents the lv name used for the OSD and the 'data_vg' key is the vg name that the given lv resides on. The 'journal' key is either an lv, device or partition. The 'journal_vg' key is optional and must be the vg name for the journal lv if given. This key is mainly used for purging of the journal lv if purge-cluster.yml is run. For example: lvm_volumes: - data: data_lv1 journal: journal_lv1 data_vg: vg1 journal_vg: vg2 - data: data_lv2 journal: /dev/sdc data_vg: vg1 Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-08-23 10:14:14 -05:00
Sébastien Han	07821d9bb1	Merge pull request #1786 from ceph/re-arrange-skipped mon, osd: fix skipped condition	2017-08-22 19:44:48 +02:00
Sébastien Han	a359fc35b4	mon, osd: fix skipped condition To be properly evaluated the "skipped" conditions must always have the first place on the list of condition, otherwise the other conditions are evaluated before and make the task fail. Closes: https://github.com/ceph/ceph-ansible/issues/1733 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-08-22 18:34:51 +02:00
Andy McCrae	4671b9e74e	Allow ceph service systemd overrides to be specified ceph services can fail to start under certain circumstances (for example, when running in a container) because the default systemd service configuration causes namespace issues. To work around this we can override the system service settings by placing an overrides file in the ceph-<service>@.service.d directory. This can be generic so as to allow any potential changes required to the ceph-<service> service files. The overrides file is only setup when the "ceph_<service>_systemd_overrides" config_template override variable is specified. The available service systemd override files are as follows: ceph_mds_systemd_overrides ceph_mgr_systemd_overrides ceph_mon_systemd_overrides ceph_osd_systemd_overrides ceph_rbd_mirror_systemd_overrides ceph_rgw_systemd_overrides	2017-08-16 17:57:06 +01:00
Andrew Schoen	1d5f876729	ceph-osd: devices is not required when osd_scenario == lvm Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-08-04 06:38:37 -05:00
Andrew Schoen	e597628be9	lvm: update scenario for new osd_scenario variable Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-08-04 06:38:36 -05:00
Andrew Schoen	3b5a06bb3c	lvm-osds: reorder mandatory vars checks Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-08-04 06:13:10 -05:00
Andrew Schoen	96c92a154e	lvm-osds: check for osd_objectstore == 'filestore' ceph-volume currently only has support for filestore, not bluestore Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-08-04 06:13:10 -05:00
Andrew Schoen	61d63f8468	lvm-osds: make task name and files consistent Removes capitilization and newlines to keep these files consistent in style with the existing tasks. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-08-04 06:13:10 -05:00
Andrew Schoen	63b7e3d36c	lvm_osds: ensure osd daemons are started Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-08-04 06:13:09 -05:00
Andrew Schoen	b93794bed4	adds a new 'lvm_osds' osd scenario This scenario will create OSDs using ceph-volume and is only available in ceph releases greater than Luminous. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-08-04 06:13:09 -05:00
Sébastien Han	30991b1c0a	osd: simplify scenarios There is only two main scenarios now: * collocated: everything remains on the same device: - data, db, wal for bluestore - data and journal for filestore * non-collocated: dedicated device for some of the component Signed-off-by: Sébastien Han <seb@redhat.com>	2017-08-03 10:20:39 +02:00
Sébastien Han	63cbcc8260	osd: fail check mount partition if not skipped We forgot to handle the case where "check if any of the raw partitions are mounted" task gets skipped. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-07-27 11:39:19 +02:00
Sébastien Han	8ac7d2e4c9	osd: do not enable osd@id unit file ceph-disk is responsable for enabling the unit file if needed. Actually since https://github.com/ceph/ceph/pull/12241 it seems that it's not even needed. On an event of a restart, udev rules will be trigger and they will ceph-disk activate the device too so the 'enabled' is not needed. Closes: https://github.com/ceph/ceph-ansible/issues/1142 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-07-26 17:17:57 +02:00
Sébastien Han	33c1f0cb03	osd: refactor osd scenarios We have multiple issues with ceph-disk's cli with bluestore and Ceph releases. This is mainly due to cli changes with Luminous. Luminous introduced a --bluestore and --filestore options which respectively does not exist on releases older than Luminous. The default store being bluestore on Luminous, simply checking for the store is not enough so we have to build a specific command line for ceph-disk depending on the Ceph version we are running and the desired osd_store. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-07-24 13:48:08 +02:00
yanyx	7e56b5c531	ceph-osd: when ceph relase >= luminous add --filestore config	2017-07-14 09:53:59 +08:00
Guillaume Abrioux	94c3756167	Tests: Add bluestore scenarios Since we started testing against Luminous, we need to add more scenarios testing. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-07-12 15:02:32 +02:00
Guillaume Abrioux	a517ab5583	Osd: Force filestore and bluestore usage In Luminous, ceph-disk defaults to bluestore so all our scenarios are using bluestore, we need to force testing both. Signed-off-by: Sébastien Han <seb@redhat.com> Co-Authored-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-07-12 11:30:30 +02:00
Douglas Fuller	e5d06a449f	osd: validate devices variable input Fail with a sane message if the devices or raw_journal_devices variables are strings instead of lists during manual device assignment. Signed-off-by: Douglas Fuller <dfuller@redhat.com>	2017-07-07 13:37:29 +00:00
Sébastien Han	d2320e412e	osd: docker, refactor ceph-osd-run.sh.j2 Easier to read and enhance. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-07-06 15:49:14 +02:00
Sébastien Han	7d657ac643	osd: ability to set db and wal to bluestore This commits refactors how we deploy bluestore. We have existing scenarios that we don't want to change too much. This commits eases the user experience by now changing the way you use scenarios. Bluestore is just a different interface to store objects but the scenarios more or less remain the same. If you set osd_objectstore == 'bluestore' along with journal_collocation: true, you will get an OSD running bluestore with DB and WAL partitions on the same device. If you set osd_objectstore == 'bluestore' along with raw_multi_journal: true, you will get an OSD running bluestore with a dedicated drive for the rocksdb DB, then the remaining drives (used with 'devices') will have WAL and DATA collocated. If you set osd_objectstore == 'bluestore' along with raw_multi_journal: true and declare bluestore_wal_devices you will get an OSD running bluestore with a dedicated drive for rocksdb db, a dedicated drive partition for rocksdb WAL and a dedicated drive for DATA. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-07-04 19:07:16 +02:00
Sébastien Han	fc0e54c59e	osd: remove redundant options to enable bluestore There is no need for 2 variables to enable bluestore, prior to this patch one had to do the following to activate bluestore: osd_objectstore: bluestore bluestore: true Now you just need to set `osd_objectstore: bluestore`. Fixes: https://github.com/ceph/ceph-ansible/issues/1475 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-07-04 18:22:03 +02:00
Douglas Fuller	6915dfcf81	ansible: fail if user selects OSD auto detection and raw devices are mounted Signed-off-by: Douglas Fuller <dfuller@redhat.com>	2017-06-29 17:02:17 +00:00
Guillaume Abrioux	3dfeffab43	Fix followup on refact code (1469) In addition to `7bb04a5`, these lines are no longer needed and can even cause playbook failures. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-06-26 15:53:41 +02:00
Sébastien Han	7bb04a5970	docker: refactor followup Followup on https://github.com/ceph/ceph-ansible/pull/1469 where we merged most of the container code from roles/ceph-/task/docker/.yml into roles/ceph-docker-common/tasks/ It seems that we forgot to remove the original files. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-06-26 13:21:36 +02:00
Guillaume Abrioux	ddfe019342	Refact code `ceph-docker-common`: At the moment there is a lot of duplicated tasks in each `./roles/ceph-<role>/tasks/docker/main.yml` that could be refactored in `./roles/ceph-docker-common/tasks/main.yml`. `_containerized_deployment` variables: All `_containerized_deployment` have been refactored to a single variable `containerized_deployment` duplicate `cephx` variables in `group_vars/* have been removed. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-05-24 15:55:41 +02:00
Austin Workman	22033bd1bf	Fixing partition detection regex for FusionIO devices.	2017-05-23 14:39:39 -05:00
Sébastien Han	6bdadc4363	Revert "docker: Retry OSD disk prepare to workaround race condition"	2017-05-18 16:03:16 +02:00
Andrew Schoen	58618aa778	Merge pull request #1531 from ceph/wip-1495 docker: Retry OSD disk prepare to workaround race condition	2017-05-17 09:36:07 -05:00
Guillaume Abrioux	1e7010de7f	Docker: rm container before retry of ceph osd prepare In addition to `196fa7e` this commit check if a container has been already launched and delete it before retrying the ceph osd prepare process. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-05-17 10:10:49 +02:00
Pascal Watteel	e4ef8bb87f	added support for Sandisk FusionIO devices Signed-off-by: Pascal Watteel <pascal.watteel@emc.com>	2017-05-16 12:00:21 +02:00
David Galloway	196fa7ef39	docker: Retry osd disk prep to workaround race condition Fixes: https://github.com/ceph/ceph-ansible/issues/1495 Signed-off-by: David Galloway <dgallowa@redhat.com>	2017-05-11 16:19:11 -04:00
Alberto Murillo	5218df5ef3	Add clearlinux to supported platforms Signed-off-by: Alberto Murillo Silva <alberto.murillo.silva@intel.com>	2017-04-24 09:34:23 -05:00
Gregory Meno	eb0c83db5f	remove osd directory scenario Proof-of-concept clusters or actual production clusters will never want to use this. We also do not test it anywhere for this same reason. Signed-off-by: Gregory Meno <gmeno@redhat.com>	2017-04-21 15:50:32 -07:00
Matthew Vernon	bc846b7da6	Only assemble {{ cluster }}.conf and osd.conf Ansible's assemble module by default will put all files in the src directory together into dest. We only want to put {{ cluster }}.conf and osd.conf together, not anything that might have found its way into /etc/ceph/ceph.d (e.g. files left by the sysadmin taking backups before an ansible run). So specify a regexp that matches only those two files. Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>	2017-04-11 13:27:19 +01:00
Daniel Horak	ce06dc1460	osd autodiscovery mode: fix holders detection Small fix for (probably copy&paste) issue from `42ffe6301`. Signed-off-by: Daniel Horak <dahorak@redhat.com>	2017-04-06 09:11:32 +02:00
Sébastien Han	42ffe63017	osd: autodiscovery mode, use holders to detect device As reported in https://github.com/ceph/ceph-ansible/issues/1403 when devices are held by lvm and `osd_auto_discovery` is set to true, it's not enough to check for a partition count = 0 since Ansible does not report. This patch also looks for 'holders' which in a case of lvm corresponds to the name of the pv. Now we also look for holders = 0. Fixes: #1403 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-04-04 10:37:14 +02:00
Sébastien Han	c37aaa41f4	playbook: homogenize the way list osd ids Problem: too many different commands to do the same thing. The 'cut' command on infrastructure-playbooks/purge-cluster.yml was also wrong. This sed command from osixia in ceph-docker https://github.com/ceph/ceph-docker/pull/580/ addresses all the scenarios. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-03-30 11:51:38 +02:00
Guillaume Abrioux	589d6812ca	ceph-docker: fix bootstrap directories permissions Make bootstrap directories permissions work for both RedHat and Debian os families. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> Fix: #1338	2017-03-22 11:18:56 +01:00
Sébastien Han	8320c14191	Merge pull request #1317 from ibotty/harmonize-docker-names harmonize docker names	2017-03-14 18:20:20 +01:00
Guillaume Abrioux	66b59ea9c6	docker: Fix #1303 Install package from official repos rather than pip when using RHEL. This commit fix https://bugzilla.redhat.com/show_bug.cgi?id=1420855 Also this commit Refact all `roles/ceph-*/tasks/docker/pre_requisite.yml` to avoid a lot of duplicated code. Fix: #1303 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-03-03 10:49:13 +01:00
Tobias Florek	931027e6f7	harmonize docker names Created containers now are named more or less in the form of <ansible role>-<ansible_hostname>	2017-02-23 09:15:05 +01:00
Sébastien Han	b91d227b99	docker: make ceph docker osd script path Since distro will not allow /usr/share to be writable (e.g: atomic) so we let the operator decide where to put that script. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-02-21 15:56:09 -05:00
Sébastien Han	73cf0378c2	docker: osd, do not use priviledged container anymore Oh yeah! This patch adds more fine grained control on how we run the activation osd container. We now use --device to give a read, write and mknodaccess to a specific device to be consumed by Ceph. We also use SYS_ADMIN cap to allow mount operations, ceph-disk needs to temporary mount the osd data directory during the activation sequence. This patch also enables the support of dedicated journal devices when deploying ceph-docker with ceph-ansible. Depends on https://github.com/ceph/ceph-docker/pull/478 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-02-21 15:54:36 -05:00
Sébastien Han	dd548c6034	docker: osd, do not skip on failure If the systemd unit file can not be generated we should fail, same for systemd enable and reload. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-02-21 15:54:36 -05:00
Sébastien Han	c2f1dca823	docker: use a better method to pull images We changed the way we declare image. Prior to this patch we must have a "user/image:tag" format, which is incompatible with non docker-hub registry where you usually don't have a "user". On the docker hub a "user" is also identified as a namespace, so for Ceph the user was "ceph". Variables have been simplified with only: * ceph_docker_image * ceph_docker_image_tag 1. For docker hub images: ceph_docker_name: "ceph/daemon" will give you the 'daemon' image of the 'ceph' user. 2. For non docker hub images: ceph_docker_name: "daemon" will simply give you the "daemon" image. Infrastructure playbooks have been modified as well. The file group_vars/all.docker.yml.sample has been removed as well. It is hard to maintain since we have to generate it manually. If you want to configure specific variables for a specific daemon simply edit group_vars/$DAEMON.yml Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1420207 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-02-09 17:57:18 +01:00
Sébastien Han	55abf69481	Merge pull request #1267 from ceph/container-systemd Container systemd	2017-02-03 14:02:53 +01:00
Sébastien Han	40709c8336	docker: use systemd to manage container Since we now only support systemd has an init system we can finally treat containers as processes using systemd and this for all the distros. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-02-01 17:12:46 +01:00
Sébastien Han	5578b9bc7b	osd: clarify osd scenario prepare sequence we now use the name of the scenario in the prepare task. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-02-01 13:59:35 +01:00
Guillaume Abrioux	76ddcbc271	Remove support of releases prior to Jewel. According to #1216, we need to simply the code by removing the support of anything before Jewel. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-01-31 11:00:54 +01:00
Sébastien Han	6f53774ee9	osd: make sure osd directory exists Sometimes users for testing, tend to delete the whole /var/lib/ceph and then run ansible again, OSD will never come up if we do not create their directory. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-01-30 14:31:56 +01:00
Andrew Schoen	0c55a35963	ceph-osd: use ceph_docker_registry when preparing OSDs Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-01-16 11:39:13 -06:00
Andrew Schoen	9449dbf083	use ceph_docker_registry in all the roles instead of docker.io This allows for ceph-ansible to use other docker registries. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-01-16 10:42:42 -06:00
Andrew Schoen	99d66e09d9	Merge pull request #1153 from ceph/cluster-name-test test: add cluster name support test scenario	2016-12-16 13:10:52 -06:00
Sébastien Han	2d8ac4a586	docker: only use systemd to manage containers Prior to this patch we had several ways to runs containers, we could use ansible's docker module on some distro and on containers distros we were using systemd. We strongly believe threating containers as services with systemd is the right approach so this patch generalizes to all the distros. These days most of the distros are running systemd so it's fair assumption. Signed-off-by: Sébastien Han <seb@redhat.com>	2016-12-16 19:37:05 +01:00
Sébastien Han	ce7431a227	docker: add support for cluster name We need to honour the cluster name that was chosen by ceph-ansible and pass it to ceph-docker. Signed-off-by: Sébastien Han <seb@redhat.com>	2016-12-16 14:31:21 +01:00
Sébastien Han	faabfdcefe	Merge pull request #1178 from zhsj/dev-partition Add prepare osd with partition devices in raw_multi_journal	2016-12-15 22:50:23 +01:00
Shengjing Zhu	a1b00e96db	enable prepare osd with partition devices in raw_multi_journal Address #895 Signed-off-by: Shengjing Zhu <zsj950618@gmail.com>	2016-12-15 22:03:38 +08:00
Sébastien Han	81baa6bb73	osd: docker change required variables for check when running a containerized deployment, some variable are not applicable thus should not be checked. Signed-off-by: Sébastien Han <seb@redhat.com>	2016-12-15 14:50:29 +01:00
Sébastien Han	1de8176bf4	common: move mandatory variables to their respective roles Signed-off-by: Sébastien Han <seb@redhat.com>	2016-12-09 14:45:05 +01:00
Andrew Schoen	bbbd8ff148	ceph-osd: no need to use playbook_dir when fetching configs for docker This causes a bug when fetch_directory is not relative to the playbook directory. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2016-12-01 10:28:54 -06:00
Sébastien Han	153837c195	Merge pull request #1125 from guits/master Use 'package' module instead of yum, apt and dnf	2016-11-30 15:50:23 +01:00
Guillaume Abrioux	07b953f420	Refact temporary vars in ceph-common defaults. These variables were defined here to be sure that `roles/ceph-common/tasks/checks/check_mandatory_vars.yml` has all variables defined.	2016-11-30 14:36:56 +01:00
Guillaume Abrioux	76220ed719	Use 'package' module instead of yum, apt and dnf Refactor the code using 'package' module Fix Issue #520 (However it doesn't cover all cases because some cases are not refactorable. Ex: because of diverging packages name between distribution)	2016-11-29 17:29:11 +01:00
Daniel Marks	ba0f16f485	Better --check compatibility for ceph-osd role Carefully chosen "always_run: true" parameters for read-only tasks that register variables. This enables --check runs (at least on deployed clusters).	2016-11-27 15:00:10 +01:00

1 2 3 4 5 ...

433 Commits (dc797971cec798a47ab0a7b9ad3cde209c369865)