Commit Graph

504 Commits (28c47e4d1bb22b0c89f67b315a732b624f650f4b)

Author SHA1 Message Date
Sébastien Han a96e910114 Add new container scenario
Test with podman instead of docker and also support for python 3 only.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-11-27 16:47:40 +00:00
Sébastien Han 997667a873 osd: expose udev into the container
In order to be able to retrieve udev information, we must expose its
socket. As per, https://github.com/ceph/ceph/pull/25201 ceph-volume will
start consuming udev output.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-11-26 18:57:12 +00:00
Guillaume Abrioux 7774069d45 refact osd pool size customization
Add real default value for osd pool size customization.
Ceph itself has an `osd_pool_default_size` default value to `3`.

If users don't specify a pool size in various pools definition within
ceph-ansible, we should default to `3`.

By the way, this kind of condition isn't really clear:
```
when:
  - rbd_pool_size | default ("")
```

we should try to get the customized value then default to what is in
`osd_pool_default_size` (which has its default value pointing to
`ceph_osd_pool_default_size` (`3`) as well) and compare it to
`ceph_osd_pool_default_size`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-11-21 15:42:50 +00:00
Guillaume Abrioux d4c0960f04 mon: move `osd_pool_default_pg_num` in `ceph-defaults`
`osd_pool_default_pg_num` parameter is set in `ceph-mon`.
When using ceph-ansible with `--limit` on a specifc group of nodes, it
will fail when trying to access this variables since it wouldn't be
defined.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1518696

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-11-21 15:42:50 +00:00
Boris Ranto c2b0cbd699 start_osds: Use list instead of keys
If you use python3 based ansible then keys() returns a dict_keys object,
not a list of keys. This breaks the installation on such a system. Using
the list filter provides a more robust solution that should work on both
python2 and python3 based ansible. You can find some more information
about the issue, here:

https://github.com/ansible/ansible/issues/19514

Signed-off-by: Boris Ranto <branto@redhat.com>
2018-11-20 18:48:22 +01:00
Noah Watkins 64dee9be0c Remove outdated documentation
Fixes BZ
https://bugzilla.redhat.com/show_bug.cgi?id=1640525

Signed-off-by: Noah Watkins <nwatkins@redhat.com>
2018-11-15 22:26:19 +00:00
Guillaume Abrioux f7fcc012e9 osd: commonize start_osd code
since `ceph-volume` introduction, there is no need to split those tasks.

Let's refact this part of the code so it's clearer.

By the way, this was breaking rolling_update.yml when `openstack_config:
true` playbook because nothing ensured OSDs were started in ceph-osd role (In
`openstack_config.yml` there is a check ensuring all OSD are UP which was
obviously failing) and resulted with OSDs on the last OSD node not started
anyway.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-11-12 10:51:48 +01:00
Sébastien Han 72cae542da lint: Don't compare to empty string
description = 'Use `when: var` rather than `when: var != ""` (or ' \ 'conversely `when: not var` rather than `when: var == ""`)'

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-11-08 10:22:02 +00:00
Sébastien Han f9ddc27cd5 lint: meta add company info
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-11-08 10:22:02 +00:00
Sébastien Han 094ae8baf1 lint: do not use local_action
Use delegate_to: localhost instead.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-11-08 10:22:02 +00:00
Sébastien Han 037bab2922 lint: line length should not exceed 160 chars
Line was too long

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-11-08 10:22:02 +00:00
Sébastien Han ca7ed7dd81 galaxy roles: polish metadata
Update the meta with the relavant support such as:

* ansible version: min 2.4
* distro supported (tested on) centos 7

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-31 17:48:58 +01:00
Sébastien Han a882ad7ade lint: use command instead of shell
Use command when the tasks does not have any pipes or wilcards.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-31 14:18:36 +01:00
Sébastien Han 53972ee672 lint: add changed_when to command
Calling command should have changed_when false otherwise each time it
runs it will show as 'changed' and this is irrelevant.
Commands should not change things if nothing needs doing

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-31 14:18:36 +01:00
Rishabh Dave 8edbda96df use blocks directives to group tasks
Using block directives simplifies the playbooks and makes them more
readable.

Fixes: https://github.com/ceph/ceph-ansible/issues/2835
Signed-off-by: Rishabh Dave <ridave@redhat.com>
2018-10-31 09:37:43 +01:00
Sébastien Han d209fc9d02 lint yaml
Fix [error] too many blank lines (1 > 0) (empty-lines)

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-30 14:41:36 +01:00
Guillaume Abrioux 748342f5b6 roles: fix *_docker_memory_limit default value
append 'm' suffix to specify the unit size used in all
`*_docker_memory_limit`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-29 14:59:09 +01:00
Neha Ojha b7e4d4eb84 roles: do not limit docker_memory_limit for various daemons
Since we do not have enough data to put valid upper bounds for the memory
usage of these daemons, do not put artificial limits by default. This will
help us avoid failures like OOM kills due to low default values.

Whenever required, these limits can be manually enforced by the user.

More details in
https://bugzilla.redhat.com/show_bug.cgi?id=1638148

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1638148
Signed-off-by: Neha Ojha <nojha@redhat.com>
2018-10-29 14:59:09 +01:00
Rishabh Dave ee2d52d33d allow custom pool size
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1596339
Signed-off-by: Rishabh Dave <ridave@redhat.com>
2018-10-22 16:00:21 +02:00
Sébastien Han fbd878c8d5 infra: rename osd-configure to add-osd and improve it
The playbook has various improvements:

* run ceph-validate role before doing anything
* run ceph-fetch-keys only on the first monitor of the inventory list
* set noup flag so PGs get distributed once all the new OSDs have been
added to the cluster and unset it when they are up and running

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1624962
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-17 11:26:11 +00:00
Guillaume Abrioux 40b7747af7 remove jewel support
As of now, we should no longer support Jewel in ceph-ansible.
The latest ceph-ansible release supporting Jewel is `stable-3.1`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-12 23:38:17 +00:00
Sébastien Han 31a0438cb2 ceph_volume: refactor
This commit does a couple of things:

* Avoid code duplication
* Clarify the code
* add more unit tests
* add myself to the author of the module

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-10 16:08:41 -04:00
Sébastien Han bfe689094e osd: do not run when lvm scenario
This task was created for ceph-disk based deployments so it's not needed
when osd are prepared with ceph-volume.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-10 16:08:41 -04:00
Sébastien Han a948677de1 osd: ceph-volume activate, just pass the OSD_ID
We don't need to pass the device and discover the OSD ID. We have a
task that gathers all the OSD ID present on that machine, so we simply
re-use them and activate them. This also handles the situation when you
have multiple OSDs running on the same device.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-10 16:08:41 -04:00
Sébastien Han 5f35910ee1 osd: change unit template for ceph-volume container
We don't need to pass the hostname on the container name but we can keep
it simple and just call it ceph-osd-$id.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-10 16:08:41 -04:00
Sébastien Han ece9e9812e osd: do not use expose_partitions on lvm
expose_partitions is only needed on ceph-disk OSDs so we don't need to
activate this code when running lvm prepared OSDs.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-10 16:08:41 -04:00
Sébastien Han e39fc4f6ce ceph_volume: add container support for batch command
The batch option got recently added, while rebasing this patch it was
necessary to implement it. So now, the batch option can work on
containerized environments.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1630977
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-10 16:08:41 -04:00
Sébastien Han 3ddcc9af16 ceph_volume: try to get ride of the dummy container
If we run on a containerized deployment we pass an env variable which
contains the container image.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-10 16:08:41 -04:00
Sébastien Han aa2c1b27e3 ceph-osd: ceph-volume container support
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-10 16:08:41 -04:00
Noah Watkins 306e308f13 Avoid using tests as filter
Fixes the deprecation warning:

  [DEPRECATION WARNING]: Using tests as filters is deprecated. Instead of
  using `result|search` use `result is search`.

Signed-off-by: Noah Watkins <nwatkins@redhat.com>
2018-10-10 04:26:33 +00:00
Andrew Schoen c453ea25c0 ceph-osd: use journal_size and block_db_size for lvm batch
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-10-09 10:09:50 -04:00
Rishabh Dave b5d2ea269f don't use "static" field while including tasks
Instead used "import_tasks" and "include_tasks" to tell whether tasks
must be included statically or dynamically.

Fixes: https://github.com/ceph/ceph-ansible/issues/2998
Signed-off-by: Rishabh Dave <ridave@redhat.com>
2018-10-04 07:44:28 +00:00
Rishabh Dave 380168dadc don't use "include" to include tasks
Use "import_tasks" or "include_tasks" instead.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
2018-09-27 17:53:40 +02:00
Andrew Schoen b36f3e06b5 ceph_volume: adds the osds_per_device parameter
If this is set to anything other than the default value of 1 then the
--osds-per-device flag will be used by the batch command to define how
many osds will be created per device.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-09-12 20:27:14 +00:00
Sébastien Han 9ba670567e remove warning for unsupported variables
As promised, these will go unsupported for 3.1 so let's actually remove
them :).

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1622729
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-28 13:31:57 -07:00
Sébastien Han 8c70a5b197 osd: fix ceph_release
We need ceph_release in the condition, not ceph_stable_release

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1619255
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-20 20:14:56 +02:00
Sébastien Han 3149b2564f Revert "osd: generate device list for osd_auto_discovery on rolling_update"
This reverts commit e84f11e99e.

This commit was giving a new failure later during the rolling_update
process. Basically, this was modifying the list of devices and started
impacting the ceph-osd itself. The modification to accomodate the
osd_auto_discovery parameter should happen outside of the ceph-osd.

Also we are trying to not play ceph-osd role during the rolling_update
process so we can speed up the upgrade.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-16 11:13:12 +02:00
Andrew Schoen 6423ab4ad3 lvm: fix condition when selecting which scenario to run
devices and lvm_volumes will always be defined, so we need to instead
check it's length before deciding to run the scenario.

This fixes the failure here:
https://2.jenkins.ceph.com/job/ceph-ansible-prs-luminous-bluestore_lvm_osds/86/consoleFull#1667273050b5dd38fa-a56e-4233-a5ca-584604e56e3a

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-08-10 11:46:12 +02:00
Sébastien Han e84f11e99e osd: generate device list for osd_auto_discovery on rolling_update
rolling_update relies on the list of devices when performing the restart
of the OSDs. The task that is builind the devices list out of the
ansible_devices dict only runs when there are no partitions on the
drives. However during an upgrade the OSD are already configured, they
have been prepared and have partitions so this task won't run and thus
the devices list will be empty, skipping the restart during
rolling_update. We now run the same task under different requirements
when rolling_update is true and build a list when:

* osd_auto_discovery is true
* rolling_update is true
* ansible_devices exists
* no dm/lv are part of the discovery
* the device is not removable
* the device has more than 1 sector

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1613626
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-10 09:19:40 +02:00
Andrew Schoen 3592c68cca ceph-osd: adds crush_device_class config option
This is used with the lvm osd scenario. When using devices you need the
option to set the crush device class for all of the OSDs that are
created from those devices.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-08-09 09:41:58 -04:00
Andrew Schoen 6d431ec22d ceph-volume: implement the 'lvm batch' subcommand
This adds the action 'batch' to the ceph-volume module so that we can
run the new 'ceph-volume lvm batch' subcommand. A functional test is
also included.

If devices is defind and osd_scenario is lvm then the 'ceph-volume lvm
batch' command will be used to create the OSDs.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-08-09 09:41:58 -04:00
Sébastien Han 2ca8c51906 osd: do not remove expose_partition container
The container runs with --rm which means it will be deleted by Docker
when exiting. Also 'docker rm -f' is not idempotent and returns 1 if the
container does not exist.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1609007
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-07-30 10:38:15 +02:00
Guillaume Abrioux 053709da97 ceph-osds: backward compatibility with jewel for osp pools creation
If we want to be backward compatible with release prior to luminous, we
have to set the rule name accordingly to default values used in jewel.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-26 18:47:10 +00:00
Sébastien Han abdb53e16a ceph-osd: trigger osd container restart on script change
The script ceph-osd-run.sh holds the config options to start the
container, if one of these options are modified we must restart the
container. This was not the case before becauase the 'notify' flag
wasn't present.

Closing: https://bugzilla.redhat.com/show_bug.cgi?id=1596061
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-28 17:54:13 +02:00
George Shuklin 653b483fc3 Add ceph_keyring_permissions variable to control permissions for
keyring files in /etc/ceph. Default value is the same as it was (0600),
but this variable allows user to override it (f.e. set it to 0640).

Signed-off-by: George Shuklin <george.shuklin@gmail.com>
2018-06-28 15:48:39 +00:00
Sébastien Han a9ed3579ae mon/osd: bump container memory limit
As discussed with the cores, the current limits are too low and should
be bumped to higher value.
So now by default monitors get 3GB and OSDs get 5GB.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1591876
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-17 11:20:27 -04:00
Konstantin Shalygin 3a07568496 ceph-osd: set 'openstack_keys_tmp' only when 'openstack_config' is defined.
If 'openstack_config' is false this task shouldn't be executed.

Signed-off-by: Konstantin Shalygin <k0ste@k0ste.ru>
2018-06-11 13:03:55 +02:00
Guillaume Abrioux 433ecc7cbc osd: copy openstack keys over to all mon
When configuring openstack, the created keyrings aren't copied over to
all monitors nodes.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1588093

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-07 13:58:57 +08:00
Guillaume Abrioux aae37b44f5 mons: move set_fact of openstack_keys in ceph-osd
Since the openstack_config.yml has been moved to `ceph-osd` we must move
this `set_fact` in ceph-osd otherwise the tasks in
`openstack_config.yml` using `openstack_keys` will actually use the
defaults value from `ceph-defaults`.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1585139

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-01 17:12:01 +02:00
Guillaume Abrioux 9d5265fe11 osds: wait for osds to be up before creating pools
This is a follow up on #2628.
Even with the openstack pools creation moved later in the playbook,
there is still an issue because OSDs are not all UP when trying to
create pools.

Adding a task which checks for all OSDs to be UP with a `retries/until`
condition should definitively fix this issue.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1578086

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-01 15:46:52 +02:00
Guillaume Abrioux 34e646e767 osds: do not set docker_exec_cmd fact
in `ceph-osd` there is no need to set `docker_exec_cmd` since the only
place where this fact is used is in `openstack_config.yml` which
delegate all docker command to a monitor node. It means we need the
`docker_exec_cmd` fact that has been set referring to `ceph-mon-*`
containers, this fact is already set earlier in `ceph-defaults`.

By the way, when collocating an OSD with a MON it fails because the container
`ceph-osd-{{ ansible_hostname }}` doesn't exist.

Removing this task will allow to collocate an OSD with a MON.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1584179

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-05-30 16:17:29 +02:00
Guillaume Abrioux 3a0e168a76 mdss: move cephfs pools creation in ceph-mds
When deploying a large number of OSD nodes it can be an issue because the
protection check [1] won't pass since it tries to create pools before all
OSDs are active.

The idea here is to move cephfs pools creation in `ceph-mds` role.

[1] e59258943b/src/mon/OSDMonitor.cc (L5673)

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1578086

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-05-24 09:39:38 -07:00
Guillaume Abrioux 564a662baf osds: move openstack pools creation in ceph-osd
When deploying a large number of OSD nodes it can be an issue because the
protection check [1] won't pass since it tries to create pools before all
OSDs are active.

The idea here is to move openstack pools creation at the end of `ceph-osd` role.

[1] e59258943b/src/mon/OSDMonitor.cc (L5673)

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1578086

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-05-24 09:39:38 -07:00
Vishal Kanaujia ef5f52b1f3 Skip GPT header creation for lvm osd scenario
The LVM lvcreate fails if the disk already has a GPT header.
We create GPT header regardless of OSD scenario. The fix is to
skip header creation for lvm scenario.

fixes: https://github.com/ceph/ceph-ansible/issues/2592

Signed-off-by: Vishal Kanaujia <vishal.kanaujia@flipkart.com>
2018-05-23 11:44:09 -07:00
Andrew Schoen 32bac6b491 ceph-validate: move var checks from ceph-osd into this role
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-05-18 17:58:24 +02:00
Andy McCrae 08a2b58d39 Allow os_tuning_params to overwrite fs.aio-max-nr
The order of fs.aio-max-nr (which is hard-coded to 1048576) means that
if you set fs.aio-max-nr in os_tuning_params it will effectively be
ignored for bluestore scenarios.

To resolve this we should move the setting of fs.aio-max-nr above the
setting of os_tuning_params, in this way the operator can define the
value of fs.aio-max-nr to be something other than 1048576 if they want
to.

Additionally, we can make the sysctl settings happen in 1 task rather
than multiple.
2018-05-11 10:49:37 +01:00
Guillaume Abrioux 7b387b506a osd: clean legacy syntax in ceph-osd-run.sh.j2
Quick clean on a legacy syntax due to e0a264c7e

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-05-09 07:29:33 +02:00
Sébastien Han 65ba85aff6 Expose /var/run/ceph
Useful for softwares that do data collection/monitoring like collectd.
They can connect to the socket and then retrieve information.

Even though the sockets are exposed now, I'm keeping the docker exec to
check the socket, this will allow newer version of ceph-ansible to work
with older versions.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1563280
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-04-20 15:48:32 +02:00
Sébastien Han 641f141c0f selinux: remove chcon calls
We know bindmount with the :z option at the end of the -v command so
this will basically run the exact same command as we used to run. So to
speak:

chcon -Rt svirt_sandbox_file_t /var/lib/ceph

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-04-19 14:59:37 +02:00
Sébastien Han d2a2793cb0 refactor the way we copy keys
This commit does a couple of things:

* use a common.yml file that contains things that can be played on both
container and non-container

* refactor the ability to copy the admin key to the nodes

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-04-18 16:46:33 +02:00
Sébastien Han 5bbbce527e osd: do not do anything if the dev has a partition
Regardless if the partition is 'ceph' or something else, we don't want
to be as strick as checking for a particular partition.
If the drive has a partition, we just don't do anything.

This solves the case where the server reboots, disks get a different
/dev/sda (node) allocation. In this case, prior to restarting the server
/dev/sda was an OSD, but now it's /dev/sdb and the other way around.
In such scenario, we will try to prepare the OSD and create a new
partition, so let's not mess around with devices that have partitions.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1498303
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-04-13 19:11:15 +02:00
vasishta p shastry e1a1f81b6f osd: to support copy_admin_key 2018-04-11 14:21:15 +02:00
Alfredo Deza 3fcf966803 ceph-osd note that some scenarios use ceph-disk vs. ceph-volume
Signed-off-by: Alfredo Deza <adeza@redhat.com>
2018-03-29 09:11:33 +02:00
Ning Yao 691ddf5349 cleanup osd.conf.j2 in ceph-osd
osd crush location is set by ceph_crush in the library,
osd.conf.j2 is not used any more.

Signed-off-by: Ning Yao <yaoning@unitedstack.com>
2018-03-26 15:57:37 +08:00
Sébastien Han e3275c1ca1 osd: add fs.aio-max-nr tuning
The number of osds per nodes is limited by aio-max-nr, default is low,
so we need to increase it.

Full story:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-August/020408.html

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1553407
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-15 14:06:26 +01:00
Sébastien Han f432819c1e osd: apply systcl right away
Without     sysctl_set: yes the sysctm tuning will only get applied on
the systctl.conf but not on the fly.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-15 14:06:26 +01:00
Sébastien Han 0f8a4251ba move system tuning to osd role
The changes from these tasks only apply to osd nodes so there is no
reason to have them in ceph-common.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-15 14:06:26 +01:00
Sébastien Han 3261ab23b8 osd: remove old crush_location implementation
This was causing a lot of pain with the handlers. Also the
implementation was not ideal since we were assembling files. Everything
can now be done with the ceph_crush module so let's remove that.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-06 15:24:31 +00:00
Caleb Boylan 0be60456ce osd: Add support for multipath disks
Multipath disks have partitions with a different format than what
ceph-ansible currently supports, this update makes ceph-ansible
aware of that format so multipath disks can be used as OSDs

Signed-off-by: Caleb Boylan <caleb.boylan@ormuco.com>
2018-02-09 18:06:25 +01:00
Guillaume Abrioux e537779bb3 osd: fix osd restart when dmcrypt
This commit fixes a bug that occurs especially for dmcrypt scenarios.

There is an issue where the 'disk_list' container can't reach the ceph
cluster because it's not launched with `--net=host`.

If this container can't reach the cluster, it will hang on this step
(when trying to retrieve the dm-crypt key) :

```
+common_functions.sh:448: open_encrypted_part(): ceph --cluster abc12 --name \
client.osd-lockbox.9138767f-7445-49e0-baad-35e19adca8bb --keyring \
/var/lib/ceph/osd-lockbox/9138767f-7445-49e0-baad-35e19adca8bb/keyring \
config-key get dm-crypt/osd/9138767f-7445-49e0-baad-35e19adca8bb/luks
+common_functions.sh:452: open_encrypted_part(): base64 -d
+common_functions.sh:452: open_encrypted_part(): cryptsetup --key-file \
-luksOpen /dev/sdb1 9138767f-7445-49e0-baad-35e19adca8bb
```

It means the `ceph-run-osd.sh` script won't be able to start the
`osd_disk_activate` process in ceph-container because he won't have
filled the `$DOCKER_ENV` environment variable properly.

Adding `--net=host` to the 'disk_list' container fixes this issue.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1543284

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-02-08 15:45:13 +01:00
Guillaume Abrioux deaf273b25 syntax: change local_action syntax
Use a nicer syntax for `local_action` tasks.
We used to have oneliner like this:
```
local_action: wait_for port=22 host={{ hostvars[inventory_hostname]['ansible_default_ipv4']['address'] }} state=started delay=10 timeout=500 }}
```

The usual syntax:
```
    local_action:
      module: wait_for
      port: 22
      host: "{{ hostvars[inventory_hostname]['ansible_default_ipv4']['address'] }}"
      state: started
      delay: 10
      timeout: 500
```
is nicer and kind of way to keep consistency regarding the whole
playbook.

This also fix a potential issue about missing quotation :

```
Traceback (most recent call last):
  File "/tmp/ansible_wQtWsi/ansible_module_command.py", line 213, in <module>
    main()
  File "/tmp/ansible_wQtWsi/ansible_module_command.py", line 185, in main
    rc, out, err = module.run_command(args, executable=executable, use_unsafe_shell=shell, encoding=None, data=stdin)
  File "/tmp/ansible_wQtWsi/ansible_modlib.zip/ansible/module_utils/basic.py", line 2710, in run_command
  File "/usr/lib64/python2.7/shlex.py", line 279, in split
    return list(lex)                                                                                                                                                                                                                                                                                                            File "/usr/lib64/python2.7/shlex.py", line 269, in next
    token = self.get_token()
  File "/usr/lib64/python2.7/shlex.py", line 96, in get_token
    raw = self.read_token()
  File "/usr/lib64/python2.7/shlex.py", line 172, in read_token
    raise ValueError, "No closing quotation"
ValueError: No closing quotation
```

writing `local_action: shell echo {{ fsid }} | tee {{ fetch_directory }}/ceph_cluster_uuid.conf`
can cause trouble because it's complaining with missing quotes, this fix solves this issue.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1510555

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-01-31 10:45:34 +01:00
Sébastien Han 5132cc3de4 Do not search osd ids if ceph-volume
Description of problem: The 'get osd id' task goes through all the 10 times (and its respective timeouts) to make sure that the number of OSDs in the osd directory match the number of devices.

This happens always, regardless if the setup and deployment is correct.

Version-Release number of selected component (if applicable): Surely the latest. But any ceph-ansible version that contains ceph-volume support is affected.

How reproducible: 100%

Steps to Reproduce:
1. Use ceph-volume (LVM) to deploy OSDs
2. Avoid using anything in the 'devices' section
3. Deploy the cluster

Actual results:
TASK [ceph-osd : get osd id _uses_shell=True, _raw_params=ls /var/lib/ceph/osd/ | sed 's/.*-//'] **********************************************************************************************************************************************
task path: /Users/alfredo/python/upstream/ceph/src/ceph-volume/ceph_volume/tests/functional/lvm/.tox/xenial-filestore-dmcrypt/tmp/ceph-ansible/roles/ceph-osd/tasks/start_osds.yml:6
FAILED - RETRYING: get osd id (10 retries left).
FAILED - RETRYING: get osd id (9 retries left).
FAILED - RETRYING: get osd id (8 retries left).
FAILED - RETRYING: get osd id (7 retries left).
FAILED - RETRYING: get osd id (6 retries left).
FAILED - RETRYING: get osd id (5 retries left).
FAILED - RETRYING: get osd id (4 retries left).
FAILED - RETRYING: get osd id (3 retries left).
FAILED - RETRYING: get osd id (2 retries left).
FAILED - RETRYING: get osd id (1 retries left).
ok: [osd0] => {
    "attempts": 10,
    "changed": false,
    "cmd": "ls /var/lib/ceph/osd/ | sed 's/.*-//'",
    "delta": "0:00:00.002717",
    "end": "2018-01-21 18:10:31.237933",
    "failed": true,
    "failed_when_result": false,
    "rc": 0,
    "start": "2018-01-21 18:10:31.235216"
}

STDOUT:

0
1
2

Expected results:
There aren't any (or just a few) timeouts while the OSDs are found

Additional info:
This is happening because the check is mapping the number of "devices" defined for ceph-disk (in this case it would be 0) to match the number of OSDs found.

Basically this line:

    until: osd_id.stdout_lines|length == devices|unique|length

Means in this 2 OSD case it is trying to ensure the following incorrect condition:

    until: 2 == 0

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1537103
2018-01-30 14:44:38 +01:00
Andrew Schoen 79473badfe ceph-osd: adds dmcrypt to the lvm scenario
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-01-24 14:10:08 +01:00
Guillaume Abrioux 9306a1789c osds: change default value for `dedicated_devices`
This is to keep backward compatibility with stable-2.2 and satisfy the
check "verify dedicated devices have been provided" in
`check_mandatory_vars.yml`. This check is looking for
`dedicated_devices` so we need to default it's value to
`raw_journal_devices` when `raw_multi_journal` is set to `True`.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1536098

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-01-22 18:02:51 +01:00
Andrew Schoen fb4a6dc9a4 docs for the crush_device_class option of lvm_volumes
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-01-17 13:49:29 +01:00
Andrew Schoen 6cbb56a3b6 ceph-osd: adds the crush_device_class param to the lvm scenario
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-01-17 13:49:29 +01:00
Sébastien Han 6db4aea453 osd: skip devices marked as '/dev/dead'
On a non-collocated scenario, if a drive is faulty we can't really
remove it from the list of 'devices' without messing up or having to
re-arrange the order of the 'dedicated_devices'. We want to keep this
device list ordered. This will prevent the activation failing on a
device that we know is failing but we can't remove it yet to not mess up
the dedicated_devices mapping with devices.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-01-11 17:34:32 +01:00
Guillaume Abrioux 70401f955b container: trigger handlers on systemd file change
When a systemd unit file is changed we should trigger handlers to
restart the services.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-01-10 16:46:42 +01:00
Sébastien Han 97f520bc74 containers: bump memory limit
A default value of 4GB for MDS is more appropriate and 3GB for OSD also.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1531607
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-01-09 11:26:50 +01:00
Guillaume Abrioux 895949d6c4 osd: fix check gpt
the gpt label creation doesn't work even with parted module.
This commit fixes the gpt label creation by using parted command
instead.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-12-20 17:42:45 +01:00
Sébastien Han bbc79765f3 osd: best effort if no device is found during activation
We have a scenario when we switch from non-container to containers. This
means we don't know anything about the ceph partitions associated to an
OSD. Normally in a containerized context we have files containing the
preparation sequence. From these files we can get the capabilities of
each OSD. As a last resort we use a ceph-disk call inside a dummy bash
container to discover the ceph journal on the current osd.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1525612
Signed-off-by: Sébastien Han <seb@redhat.com>
2017-12-19 14:40:48 +01:00
Christian Berendt 50a848dc40 Rename fact docker_version to ceph_docker_version
The name docker_version is very generic and is also used by other
roles. As a result, there may be name conflicts. To avoid this a
ceph_ prefix should be used for this fact. Since it is an internal
fact renaming is not a problem.
2017-12-15 20:12:21 +01:00
John Fulton 8cba44262c Add flags for OSD 'docker run --cpuset-{cpus,mems}'
Add the variables ceph_osd_docker_cpuset_cpus and
ceph_osd_docker_cpuset_mems, so that a user may specify
the CPUs and memory nodes of NUMA systems on which OSD
containers are run.

Provides a example in osds.yaml.sample to guide user
based on sample `lscpu` output since cpuset-mems refers
to the memory by NUMA node only while cpuset-cpus can
refer to individual vCPUs within a NUMA node.
2017-12-14 16:39:35 +01:00
Konstantin Shalygin d7dadc3e7b ceph-osd: respect nvme partitions when device is a disk. 2017-12-12 09:03:18 +01:00
Andrew Schoen 788c3f351a ceph-osd: adds osd_objectstore to the name when using the ceph_volume module
This allows for easier debugging if verbosity is not set high enough.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2017-12-11 09:58:06 -06:00
Andrew Schoen 5e3d8dbf63 ceph-osd: use the cluster param with the ceph_volume module
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2017-12-11 09:58:06 -06:00
Andrew Schoen 423166f671 ceph-osd: use the new ceph_volume module for the lvm scenario
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2017-12-11 09:58:06 -06:00
Andy McCrae 4f1e854c79 Use parted module instead of command 2017-12-11 17:33:40 +10:00
Guillaume Abrioux b449b16edd
Merge pull request #2215 from squidboylan/support_loopback_devices
Add support for using loopback devices as OSDs
2017-11-28 14:04:47 +01:00
Caleb Boylan 8f02bb007f Add support for using loopback devices as OSDs
This is particularly useful in CI environments where you dont have
the option of adding extra devices or volumes to the host. It is also
a simple change to support loopback devices
2017-11-27 16:02:36 -08:00
Guillaume Abrioux 1cba626484 osd: remove leftover and fix a typo
This task was originally needed to fix a docker installation issue
(see: #1030). This has been fixed, therefore it can be removed.

Fixes: #2199

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-11-21 11:11:34 +01:00
Guillaume Abrioux efe06be10f osd: ensure a gpt label is set on device
ceph-disk prepare will fail on jewel if a GPT label is not present on
device.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-11-17 17:32:23 +01:00
Sébastien Han 932345ab2a osd: remove leftover from osd partition
We used to support osds that are a partition. This is long gone so
removing this task.

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-11-16 14:58:40 +01:00
Sébastien Han b1c1322357 osd: remove failed_when on activation
There is no need to continue if the activation fails.

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-11-16 14:57:49 +01:00
Sébastien Han 80d3a242d0 osd: fix bad activation for dmcrypt
We were activating dmcrypt devices with the wrong command. Basically the
first task execute the wrong activate command. The task fails but
continues because of the 'failed_when: false'. Then the right activation
sequence is being done by the next task.

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-11-16 14:55:08 +01:00
Sébastien Han cc264d6ba6
Merge pull request #2151 from hwoarang/add-opensuse
Add openSUSE Leap 42.3 support
2017-11-16 14:35:28 +01:00
Andrew Schoen 3c604f1115 lvm: support --data as a raw device or partition in ceph-volume
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2017-11-15 09:36:17 -06:00
Andrew Schoen 04f02910a9 lvm: ensure the data_vg exists before using it
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2017-11-15 09:36:17 -06:00
Guillaume Abrioux aa0b1ed118 tests: remove OSD_FORCE_ZAP variable from tests
according to ceph/ceph-container#840, this variable is no longer needed.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-11-14 17:55:01 +01:00
Markos Chandras fb46950373 ceph-osd: Add support for openSUSE Leap distributions
Add support for openSUSE Leap distributions

Signed-off-by: Markos Chandras <mchandras@suse.de>
2017-11-14 10:51:23 +00:00