ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Sébastien Han	65ba85aff6	Expose /var/run/ceph Useful for softwares that do data collection/monitoring like collectd. They can connect to the socket and then retrieve information. Even though the sockets are exposed now, I'm keeping the docker exec to check the socket, this will allow newer version of ceph-ansible to work with older versions. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1563280 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-20 15:48:32 +02:00
Sébastien Han	641f141c0f	selinux: remove chcon calls We know bindmount with the :z option at the end of the -v command so this will basically run the exact same command as we used to run. So to speak: chcon -Rt svirt_sandbox_file_t /var/lib/ceph Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-19 14:59:37 +02:00
Sébastien Han	d2a2793cb0	refactor the way we copy keys This commit does a couple of things: * use a common.yml file that contains things that can be played on both container and non-container * refactor the ability to copy the admin key to the nodes Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-18 16:46:33 +02:00
Sébastien Han	5bbbce527e	osd: do not do anything if the dev has a partition Regardless if the partition is 'ceph' or something else, we don't want to be as strick as checking for a particular partition. If the drive has a partition, we just don't do anything. This solves the case where the server reboots, disks get a different /dev/sda (node) allocation. In this case, prior to restarting the server /dev/sda was an OSD, but now it's /dev/sdb and the other way around. In such scenario, we will try to prepare the OSD and create a new partition, so let's not mess around with devices that have partitions. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1498303 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-13 19:11:15 +02:00
vasishta p shastry	e1a1f81b6f	osd: to support copy_admin_key	2018-04-11 14:21:15 +02:00
Sébastien Han	e3275c1ca1	osd: add fs.aio-max-nr tuning The number of osds per nodes is limited by aio-max-nr, default is low, so we need to increase it. Full story: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-August/020408.html Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1553407 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-15 14:06:26 +01:00
Sébastien Han	f432819c1e	osd: apply systcl right away Without sysctl_set: yes the sysctm tuning will only get applied on the systctl.conf but not on the fly. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-15 14:06:26 +01:00
Sébastien Han	0f8a4251ba	move system tuning to osd role The changes from these tasks only apply to osd nodes so there is no reason to have them in ceph-common. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-15 14:06:26 +01:00
Sébastien Han	3261ab23b8	osd: remove old crush_location implementation This was causing a lot of pain with the handlers. Also the implementation was not ideal since we were assembling files. Everything can now be done with the ceph_crush module so let's remove that. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-06 15:24:31 +00:00
Caleb Boylan	0be60456ce	osd: Add support for multipath disks Multipath disks have partitions with a different format than what ceph-ansible currently supports, this update makes ceph-ansible aware of that format so multipath disks can be used as OSDs Signed-off-by: Caleb Boylan <caleb.boylan@ormuco.com>	2018-02-09 18:06:25 +01:00
Guillaume Abrioux	deaf273b25	syntax: change local_action syntax Use a nicer syntax for `local_action` tasks. We used to have oneliner like this: ``` local_action: wait_for port=22 host={{ hostvars[inventory_hostname]['ansible_default_ipv4']['address'] }} state=started delay=10 timeout=500 }} ``` The usual syntax: ``` local_action: module: wait_for port: 22 host: "{{ hostvars[inventory_hostname]['ansible_default_ipv4']['address'] }}" state: started delay: 10 timeout: 500 ``` is nicer and kind of way to keep consistency regarding the whole playbook. This also fix a potential issue about missing quotation : ``` Traceback (most recent call last): File "/tmp/ansible_wQtWsi/ansible_module_command.py", line 213, in <module> main() File "/tmp/ansible_wQtWsi/ansible_module_command.py", line 185, in main rc, out, err = module.run_command(args, executable=executable, use_unsafe_shell=shell, encoding=None, data=stdin) File "/tmp/ansible_wQtWsi/ansible_modlib.zip/ansible/module_utils/basic.py", line 2710, in run_command File "/usr/lib64/python2.7/shlex.py", line 279, in split return list(lex) File "/usr/lib64/python2.7/shlex.py", line 269, in next token = self.get_token() File "/usr/lib64/python2.7/shlex.py", line 96, in get_token raw = self.read_token() File "/usr/lib64/python2.7/shlex.py", line 172, in read_token raise ValueError, "No closing quotation" ValueError: No closing quotation ``` writing `local_action: shell echo {{ fsid }} \| tee {{ fetch_directory }}/ceph_cluster_uuid.conf` can cause trouble because it's complaining with missing quotes, this fix solves this issue. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1510555 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-01-31 10:45:34 +01:00
Sébastien Han	5132cc3de4	Do not search osd ids if ceph-volume Description of problem: The 'get osd id' task goes through all the 10 times (and its respective timeouts) to make sure that the number of OSDs in the osd directory match the number of devices. This happens always, regardless if the setup and deployment is correct. Version-Release number of selected component (if applicable): Surely the latest. But any ceph-ansible version that contains ceph-volume support is affected. How reproducible: 100% Steps to Reproduce: 1. Use ceph-volume (LVM) to deploy OSDs 2. Avoid using anything in the 'devices' section 3. Deploy the cluster Actual results: TASK [ceph-osd : get osd id _uses_shell=True, _raw_params=ls /var/lib/ceph/osd/ \| sed 's/.-//'] ********************************************************************************************************************************************* task path: /Users/alfredo/python/upstream/ceph/src/ceph-volume/ceph_volume/tests/functional/lvm/.tox/xenial-filestore-dmcrypt/tmp/ceph-ansible/roles/ceph-osd/tasks/start_osds.yml:6 FAILED - RETRYING: get osd id (10 retries left). FAILED - RETRYING: get osd id (9 retries left). FAILED - RETRYING: get osd id (8 retries left). FAILED - RETRYING: get osd id (7 retries left). FAILED - RETRYING: get osd id (6 retries left). FAILED - RETRYING: get osd id (5 retries left). FAILED - RETRYING: get osd id (4 retries left). FAILED - RETRYING: get osd id (3 retries left). FAILED - RETRYING: get osd id (2 retries left). FAILED - RETRYING: get osd id (1 retries left). ok: [osd0] => { "attempts": 10, "changed": false, "cmd": "ls /var/lib/ceph/osd/ \| sed 's/.*-//'", "delta": "0:00:00.002717", "end": "2018-01-21 18:10:31.237933", "failed": true, "failed_when_result": false, "rc": 0, "start": "2018-01-21 18:10:31.235216" } STDOUT: 0 1 2 Expected results: There aren't any (or just a few) timeouts while the OSDs are found Additional info: This is happening because the check is mapping the number of "devices" defined for ceph-disk (in this case it would be 0) to match the number of OSDs found. Basically this line: until: osd_id.stdout_lines\|length == devices\|unique\|length Means in this 2 OSD case it is trying to ensure the following incorrect condition: until: 2 == 0 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1537103	2018-01-30 14:44:38 +01:00
Andrew Schoen	79473badfe	ceph-osd: adds dmcrypt to the lvm scenario Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-01-24 14:10:08 +01:00
Andrew Schoen	6cbb56a3b6	ceph-osd: adds the crush_device_class param to the lvm scenario Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-01-17 13:49:29 +01:00
Sébastien Han	6db4aea453	osd: skip devices marked as '/dev/dead' On a non-collocated scenario, if a drive is faulty we can't really remove it from the list of 'devices' without messing up or having to re-arrange the order of the 'dedicated_devices'. We want to keep this device list ordered. This will prevent the activation failing on a device that we know is failing but we can't remove it yet to not mess up the dedicated_devices mapping with devices. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-01-11 17:34:32 +01:00
Guillaume Abrioux	70401f955b	container: trigger handlers on systemd file change When a systemd unit file is changed we should trigger handlers to restart the services. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-01-10 16:46:42 +01:00
Guillaume Abrioux	895949d6c4	osd: fix check gpt the gpt label creation doesn't work even with parted module. This commit fixes the gpt label creation by using parted command instead. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-12-20 17:42:45 +01:00
Konstantin Shalygin	d7dadc3e7b	ceph-osd: respect nvme partitions when device is a disk.	2017-12-12 09:03:18 +01:00
Andrew Schoen	788c3f351a	ceph-osd: adds osd_objectstore to the name when using the ceph_volume module This allows for easier debugging if verbosity is not set high enough. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-12-11 09:58:06 -06:00
Andrew Schoen	5e3d8dbf63	ceph-osd: use the cluster param with the ceph_volume module Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-12-11 09:58:06 -06:00
Andrew Schoen	423166f671	ceph-osd: use the new ceph_volume module for the lvm scenario Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-12-11 09:58:06 -06:00
Andy McCrae	4f1e854c79	Use parted module instead of command	2017-12-11 17:33:40 +10:00
Guillaume Abrioux	b449b16edd	Merge pull request #2215 from squidboylan/support_loopback_devices Add support for using loopback devices as OSDs	2017-11-28 14:04:47 +01:00
Caleb Boylan	8f02bb007f	Add support for using loopback devices as OSDs This is particularly useful in CI environments where you dont have the option of adding extra devices or volumes to the host. It is also a simple change to support loopback devices	2017-11-27 16:02:36 -08:00
Guillaume Abrioux	1cba626484	osd: remove leftover and fix a typo This task was originally needed to fix a docker installation issue (see: #1030). This has been fixed, therefore it can be removed. Fixes: #2199 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-11-21 11:11:34 +01:00
Guillaume Abrioux	efe06be10f	osd: ensure a gpt label is set on device ceph-disk prepare will fail on jewel if a GPT label is not present on device. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-11-17 17:32:23 +01:00
Sébastien Han	932345ab2a	osd: remove leftover from osd partition We used to support osds that are a partition. This is long gone so removing this task. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-11-16 14:58:40 +01:00
Sébastien Han	b1c1322357	osd: remove failed_when on activation There is no need to continue if the activation fails. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-11-16 14:57:49 +01:00
Sébastien Han	80d3a242d0	osd: fix bad activation for dmcrypt We were activating dmcrypt devices with the wrong command. Basically the first task execute the wrong activate command. The task fails but continues because of the 'failed_when: false'. Then the right activation sequence is being done by the next task. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-11-16 14:55:08 +01:00
Andrew Schoen	3c604f1115	lvm: support --data as a raw device or partition in ceph-volume Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-11-15 09:36:17 -06:00
Andrew Schoen	04f02910a9	lvm: ensure the data_vg exists before using it Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-11-15 09:36:17 -06:00
Guillaume Abrioux	aa0b1ed118	tests: remove OSD_FORCE_ZAP variable from tests according to ceph/ceph-container#840, this variable is no longer needed. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-11-14 17:55:01 +01:00
Guillaume Abrioux	0369bd59e2	Merge pull request #2146 from mslovy/wip-fix-crush-location osd: fix crush location for non-containerized deployment	2017-11-13 12:23:44 +01:00
Guillaume Abrioux	c06faf2deb	Merge pull request #2154 from ceph/fix_auto_discover osd: avoid using non desired loop device in autodiscovery	2017-11-10 01:19:20 +01:00
Guillaume Abrioux	591d77220e	osd: always run disk_list test there is no need to have a condition on this task, this test should be always run since the result will be interpreted later. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-11-09 11:51:16 +01:00
Guillaume Abrioux	43975a7332	osd: avoid using non desired loop device in autodiscovery This will prevent ceph-ansible from using a loop device while it shouldn't in auto_discovery mode. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-11-09 10:26:24 +01:00
Guillaume Abrioux	d5dfc63c89	osd: fix automatic prepare when auto_discover Use `devices` variable instead of `ansible_devices`, otherwise it means we are not using the devices which have been 'auto discovered' Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-11-08 10:20:44 +01:00
yaoning	d82a09dddd	fix crush location for non-containerized deployment crush location only set for containerized deployment Signed-off-by: yaoning <yaoning@unitedstack.com>	2017-11-08 12:05:10 +11:00
Sébastien Han	0930f14915	osd: do not use dm when osd_auto_discovery The current code will also return lvm devices such as /dev/dm-2, this kind of device type is not supported by ceph-disk at the moment. Now we just ignore them. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-11-08 11:33:10 +11:00
Sébastien Han	d4ed9a2064	osd: enhance backward compatibility During the initial implementation of this 'old' thing we were falling into this issue without noticing https://github.com/moby/moby/issues/30341 and where blindly using --rm, now this is fixed the prepare container disappears and thus activation fail. I'm fixing this for old jewel images. Also this fixes the machine reboot case where the docker logs are purgend. In the old scenario, we now store the log locally in the same directory as the ceph-osd-run.sh script. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-11-03 11:15:23 +01:00
Sébastien Han	faccd0acf0	Merge pull request #2100 from ceph/lvm-bluestore ceph-volume lvm bluestore support	2017-10-27 17:36:16 +02:00
Alfredo Deza	517a2b3feb	ceph-osd skip lvm creation if they are already in use Signed-off-by: Alfredo Deza <adeza@redhat.com>	2017-10-27 11:33:54 -04:00
Sébastien Han	5a10b048b0	Merge pull request #2105 from major/really-fix-always-run Really fix always run	2017-10-27 09:33:47 +02:00
Sébastien Han	07e2a783f8	Merge pull request #2084 from ceph/backward-osd-2.4 osd: bring backward compatibility with old Jewel images	2017-10-25 17:33:49 +02:00
Major Hayden	f73232caa4	Use check_mode instead of always_run This patch changes the `always_run: yes` task option to `check_mode: no` to avoid Ansible warnings.	2017-10-25 09:53:34 -05:00
Major Hayden	c2b5118c1b	Revert "Avoid deprecated always_run" This reverts commit `620fb37dd4`.	2017-10-25 09:48:09 -05:00
Alfredo Deza	d3b427e169	ceph-osd lvm scnearios are no longer limited to filestore Signed-off-by: Alfredo Deza <adeza@redhat.com>	2017-10-25 08:23:45 -04:00
Alfredo Deza	df05e63c10	ceph-osd use --cluster in ceph-volume calls Signed-off-by: Alfredo Deza <adeza@redhat.com>	2017-10-25 08:23:45 -04:00
Alfredo Deza	628d98a92c	ceph-osd add the CEPH_VOLUME_DEBUG env var to all ceph-volume commands Signed-off-by: Alfredo Deza <adeza@redhat.com>	2017-10-25 06:50:22 -04:00
Alfredo Deza	bbc3672253	ceph-osd: lvm support for bluestore Signed-off-by: Alfredo Deza <adeza@redhat.com>	2017-10-25 06:46:39 -04:00

1 2 3 4 5 ...

329 Commits (cfe8e51d992b820de7be12604c4430e7ba18c4c5)