Commit Graph

352 Commits (17d1ff61d591827de03da1094c7d50955a63282c)

Author SHA1 Message Date
Guillaume Abrioux c06faf2deb
Merge pull request #2154 from ceph/fix_auto_discover
osd: avoid using non desired loop device in autodiscovery
2017-11-10 01:19:20 +01:00
Guillaume Abrioux 591d77220e osd: always run disk_list test
there is no need to have a condition on this task, this test should be
always run since the result will be interpreted later.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-11-09 11:51:16 +01:00
Guillaume Abrioux 43975a7332 osd: avoid using non desired loop device in autodiscovery
This will prevent ceph-ansible from using a loop device while it
shouldn't in auto_discovery mode.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-11-09 10:26:24 +01:00
Guillaume Abrioux d5dfc63c89 osd: fix automatic prepare when auto_discover
Use `devices` variable instead of `ansible_devices`, otherwise it means
we are not using the devices which have been 'auto discovered'

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-11-08 10:20:44 +01:00
Sébastien Han 0930f14915 osd: do not use dm when osd_auto_discovery
The current code will also return lvm devices such as /dev/dm-2, this
kind of device type is not supported by ceph-disk at the moment. Now we
just ignore them.

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-11-08 11:33:10 +11:00
Guillaume Abrioux 39b584e540 osd: fix a typo in roles/ceph-osd/defaults/main.yml
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-11-07 10:06:16 +01:00
Sébastien Han d4ed9a2064 osd: enhance backward compatibility
During the initial implementation of this 'old' thing we were falling
into this issue without noticing
https://github.com/moby/moby/issues/30341 and where blindly using --rm,
now this is fixed the prepare container disappears and thus activation
fail.
I'm fixing this for old jewel images.

Also this fixes the machine reboot case where the docker logs are
purgend. In the old scenario, we now store the log locally in the same
directory as the ceph-osd-run.sh script.

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-11-03 11:15:23 +01:00
Sébastien Han faccd0acf0 Merge pull request #2100 from ceph/lvm-bluestore
ceph-volume lvm bluestore support
2017-10-27 17:36:16 +02:00
Alfredo Deza 517a2b3feb ceph-osd skip lvm creation if they are already in use
Signed-off-by: Alfredo Deza <adeza@redhat.com>
2017-10-27 11:33:54 -04:00
Sébastien Han 5a10b048b0 Merge pull request #2105 from major/really-fix-always-run
Really fix always run
2017-10-27 09:33:47 +02:00
Sébastien Han 5f9e50dabe Merge pull request #2103 from andymcc/tcmalloc_settings
Option to set TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES
2017-10-25 17:36:04 +02:00
Sébastien Han 07e2a783f8 Merge pull request #2084 from ceph/backward-osd-2.4
osd: bring backward compatibility with old Jewel images
2017-10-25 17:33:49 +02:00
Major Hayden f73232caa4
Use check_mode instead of always_run
This patch changes the `always_run: yes` task option to
`check_mode: no` to avoid Ansible warnings.
2017-10-25 09:53:34 -05:00
Major Hayden c2b5118c1b
Revert "Avoid deprecated always_run"
This reverts commit 620fb37dd4.
2017-10-25 09:48:09 -05:00
Andy McCrae 7f6c39102d Option to set TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES
Use "ceph_tcmalloc_max_total_thread_cache" to set the
TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES value inside /etc/default/ceph for
Debian installs, or /etc/sysconfig/ceph for Red Hat/CentOS installs.

By default this is set to 0, so the default package value will be used,
if specified this value will be changed to match the variable, and ceph
osd services will be restarted.
2017-10-25 14:38:36 +01:00
Alfredo Deza d3b427e169 ceph-osd lvm scnearios are no longer limited to filestore
Signed-off-by: Alfredo Deza <adeza@redhat.com>
2017-10-25 08:23:45 -04:00
Alfredo Deza df05e63c10 ceph-osd use --cluster in ceph-volume calls
Signed-off-by: Alfredo Deza <adeza@redhat.com>
2017-10-25 08:23:45 -04:00
Alfredo Deza 628d98a92c ceph-osd add the CEPH_VOLUME_DEBUG env var to all ceph-volume commands
Signed-off-by: Alfredo Deza <adeza@redhat.com>
2017-10-25 06:50:22 -04:00
Alfredo Deza b89309e2a3 ceph-osd update the examples in defaults for lvm bluestore
Signed-off-by: Alfredo Deza <adeza@redhat.com>
2017-10-25 06:46:39 -04:00
Alfredo Deza bbc3672253 ceph-osd: lvm support for bluestore
Signed-off-by: Alfredo Deza <adeza@redhat.com>
2017-10-25 06:46:39 -04:00
Guillaume Abrioux f21859656b Merge pull request #2102 from yanyixing/fix_miss_word
add the miss word
2017-10-25 10:49:38 +02:00
Yixing Yan b6296c13ac update sample file 2017-10-25 16:39:08 +08:00
John Fulton 7a7ddab6c2 Require osd_scenario parameter to be provided in containerized deploy
Fixes: #2095
2017-10-23 15:16:03 +00:00
Sébastien Han 968ef04324 osd: bring backward compatibility with old Jewel images
There was a huge resync from luminous to jewel in ceph-docker:
https://github.com/ceph/ceph-docker/pull/797

This change brought a new handy function to discover partitions tight to
an OSD. This function doesn't exist in the old image so the
ceph-osd-run.sh script breaks when trying to deploy Jewel OSD with that
old Jewel image version.

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-10-20 16:26:41 +02:00
Sébastien Han 4413511b66 all: backward compatibility between stable-2.2 and 3.0
stable-3.0 brought numerous changes in ceph-ansible variables, this PR
aims to maintain backward compatibility for someone running stable-2.2
upgrading to stable-3.0 but keeps its groups_vars untouched.
We will then determine the right options to make sure the upgrade works
but we are expecting that new variables should be used.

We will drop this in a near future, maybe 3.1 or 3.2.

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-10-20 11:54:10 +02:00
Sébastien Han c527515502 Merge pull request #2000 from ceph/merge-osd-scenarios
[skip ci] ci: new osd scenarios
2017-10-19 09:18:02 +02:00
Sébastien Han a53aa9e8b4 ci: new osd scenarios
This commit add new osd scenarios, it aims to simplify the CI setup and
brings a better coverage on the OSD scenarios.
We decided to differentiate between filestore and bluestore, thinking
ahead when filestore won't be supported anymore.
So we now have two classes of tests:

* Filestore
* Bluestore

In each of those classes we have container and non-container.
Then for each we test the following:

* collocated
* collocated dmcrypt
* non-collocated
* non-collocated dmcrypt
* auto discovery collocated
* auto discovery collocated dmcrypt

This gives us a nice coverage and also reduces the footprint on the CI.
We are now up to 4 scenarios, each containing 6 OSD VMs.

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-10-18 09:26:06 +02:00
Christian Berendt 4c380c9ef8 Cleanup readme files in roles directories
The contents of the README files are no longer up to date.
Documentation for all roles is located below the docs directory.
2017-10-17 11:22:06 +02:00
Christian Berendt cf901f0171 In docker start scripts replace \u00a0 with \u0020
This will solve the following issue when starting docker containers on ubuntu:

invalid argument "1\u00a0" for --cpus=1 : failed to parse 1  as a rational number

Closes-bug: #2056
2017-10-16 15:16:48 +02:00
Major Hayden c01851325e
Remove jinja2 delimiters from `when` keys
This patch changes the `when:` keys so that they have no jinja2
delimiters. This avoids Ansible warnings which could turn into
errors in a future Ansible release.
2017-10-12 11:27:42 -05:00
Major Hayden 620fb37dd4
Avoid deprecated always_run
The `always_run` key is deprecated and being removed in Ansible 2.4.
Using it causes a warning to be displayed:

    [DEPRECATION WARNING]: always_run is deprecated.

This patch changes all instances of `always_run` to use the `always`
tag, which causes the task to run each time the playbook runs.
2017-10-12 08:29:44 -05:00
Sébastien Han d0a9e57bfc osd: rollback bindmount of /run/udev
This is causing unknown issues when trying to start a dmcrypt container.
Basically the container is stuck at mount opening the LUKS device. This
is still unknown why this is causing trouble but we need to move
forward. Also, this doesn't seem to help in any ways to fix the race
condition we've seen.

Here is the log for dmcrypt:

cryptsetup 1.7.4 processing "cryptsetup --debug --verbose --key-file
key luksClose fbf8887d-8694-46ca-b9ff-be79a668e2a9"
Running command close.
Locking memory.
Installing SIGINT/SIGTERM handler.
Unblocking interruption on signal.
Allocating crypt device context by device
fbf8887d-8694-46ca-b9ff-be79a668e2a9.
Initialising device-mapper backend library.
dm version   [ opencount flush ]   [16384] (*1)
dm versions   [ opencount flush ]   [16384] (*1)
Detected dm-crypt version 1.14.1, dm-ioctl version 4.35.0.
Device-mapper backend running with UDEV support enabled.
dm status fbf8887d-8694-46ca-b9ff-be79a668e2a9  [ opencount flush ]
[16384] (*1)
Releasing device-mapper backend.
Trying to open and read device /dev/sdc1 with direct-io.
Allocating crypt device /dev/sdc1 context.
Trying to open and read device /dev/sdc1 with direct-io.
Initialising device-mapper backend library.
dm table fbf8887d-8694-46ca-b9ff-be79a668e2a9  [ opencount flush
securedata ]   [16384] (*1)
Trying to open and read device /dev/sdc1 with direct-io.
Crypto backend (gcrypt 1.5.3) initialized in cryptsetup library
version 1.7.4.
Detected kernel Linux 3.10.0-693.el7.x86_64 x86_64.
Reading LUKS header of size 1024 from device /dev/sdc1
Key length 32, device size 1943016847 sectors, header size 2050
sectors.
Deactivating volume fbf8887d-8694-46ca-b9ff-be79a668e2a9.
dm status fbf8887d-8694-46ca-b9ff-be79a668e2a9  [ opencount flush ]
[16384] (*1)
Udev cookie 0xd4d14e4 (semid 32769) created
Udev cookie 0xd4d14e4 (semid 32769) incremented to 1
Udev cookie 0xd4d14e4 (semid 32769) incremented to 2
Udev cookie 0xd4d14e4 (semid 32769) assigned to REMOVE task(2) with
flags         (0x0)
dm remove fbf8887d-8694-46ca-b9ff-be79a668e2a9  [ opencount flush
retryremove ]   [16384] (*1)
fbf8887d-8694-46ca-b9ff-be79a668e2a9: Stacking NODE_DEL [verify_udev]
Udev cookie 0xd4d14e4 (semid 32769) decremented to 1
Udev cookie 0xd4d14e4 (semid 32769) waiting for zero

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-10-11 13:21:37 +02:00
Sébastien Han bf99751ce1 osd: bindmount /run/udev
Ensures that "udevadm" is able to check the status of udev's event queue.

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-10-09 17:25:45 +02:00
Sébastien Han c693e95cbf purge-docker: rework device detection
we don't need "devices" and other device variable anymore, the playbook
detects that for us.

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-10-07 03:39:04 +02:00
Guillaume Abrioux 6b027557e6 osd: fix `set_fact build dedicated_devices`
Use an intermediate variable to build the final `dedicated_devices` list
to avoid duplicate entry in that array. (We need a 1:1 relation between
`dedicated_devices` and `devices` since we are using a `with_together`
later.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-10-06 15:00:32 +02:00
Sébastien Han 29888649e5 osd: do not do unique on dedicated_devices
This is needed later, if we do unique, only the first OSD will get a
journal.

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-10-05 18:20:18 +02:00
Michel Rode b462b68e65 Fixing path to osd_fragment.yml 2017-10-05 14:42:10 +02:00
Guillaume Abrioux 70e2787fe2 docker: fix keyrings copied on all nodes
All keyring are getting copied to all nodes.
This commit fixes a leftover from a previous code refactor.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1498583

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-10-05 09:23:22 +02:00
Guillaume Abrioux 784cc73da0 set docker_exec_cmd fact early in each role
This is to ensure `docker_exec_cmd` fact is set with the correct value
in case of daemons collocation

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-10-04 11:31:09 +02:00
Sébastien Han 3bd341f6c0 osd: container use id instead of dev name
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1494127
Signed-off-by: Sébastien Han <seb@redhat.com>
2017-10-03 14:44:00 +02:00
Sébastien Han ba42894516 osd: do not copy admin key on collocated scenario
ceph-disk used to have a bug requiring the admin key to store the
encrypted key in the mon kv store. This was reported in:
http://tracker.ceph.com/issues/17849

Fixed and backported here: https://github.com/ceph/ceph/pull/11996

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-10-03 14:44:00 +02:00
Sébastien Han 46a01df434 osd: add cluster name support
I forgot to add cluster name support so some partition were never
mounted correctly.

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-09-29 20:30:54 +02:00
Guillaume Abrioux 466f6f35b7 Use systemd module instead of service.
Using systemd module allows us to do in one task what we did in three
tasks:

- enable unit file,
- issue a `daemon-reload`,
- start the service

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-09-29 14:54:00 +02:00
Guillaume Abrioux 913ad53709 docker: add condition to run selinux tasks only on rhel os family
This fixes the error :

```
The conditional check 'sestatus.stdout != 'Disabled'' failed.
```

that occurs when running on non rhel based system since the
`sestatus` fact is registered only on rhel based distribution.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-09-29 02:35:07 +02:00
Sébastien Han 45797ab968 osd: fix container reboot
It's sad but we can not rely on the prepare container anymore since the
log are flushed after reboot. So inpecting the container does not return
anything.
Now, instead we use a ephemeral container to look up for the
journal/block.db/block.wal (depending if filestore or bluestore) and
build the activate command accordingly.

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-09-25 13:34:47 +02:00
Sébastien Han cb05172605 docker: we don't need to copy the ceph.conf on all the nodes
We generate the ceph.conf on all the nodes through the
ceph-docker-common so there is no need to push it to the Ansible file.

Also this is breaking the ceph.conf template generation since we only
generate sections based on the host the ansible task is running on.

For example, what's typically happening, we bootstrap the monitor, we
get a ceph.conf generated for a mon only, we go on an osd, we generate
the ceph.conf with osd section (done by ceph-docker-common) but this
gets overwritten by the copy_config task of the ceph-osd role.

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-09-20 16:33:29 +02:00
Sébastien Han d100b4e596 name includes and set_fact for clarity
When Ansible is not run with verbose options it's difficult to see which
include and/or set_fact does what. So adding a name for each clarifies.

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-09-18 23:39:46 +02:00
Sébastien Han 66d41f342d Merge pull request #1889 from ceph/client-containers
client: ability to create keys and pool with no ceph binaries
2017-09-18 17:27:32 +02:00
Sébastien Han 660893e70e osd: add meaningful message for journal_size
Signed-off-by: Sébastien Han <seb@redhat.com>
2017-09-13 23:49:15 -06:00
Sébastien Han ef8d37dd0d Merge pull request #1800 from ceph/wip-osd-start-fix
ceph-osd: Fix osd start sequence
2017-09-13 17:20:10 -06:00