Commit Graph

4269 Commits (c32d690a4cd1aa60987b18f36731696d12615c0a)
 

Author SHA1 Message Date
Guillaume Abrioux c32d690a4c mgr: add a check task for all mgr to be up
This can't be backported from master since there was too many
modifications meantime.

When mgr aren't all ready, sometimes the following error can show up:

```
stderr: 'Error ENOENT: all mgr daemons do not support module ''status'', pass --force to force enablement'
```

This commit adds a check so all mgr are available when we try to enable
modules.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-08-22 17:11:19 +02:00
Guillaume Abrioux 12e61d190e validate: fail if gpt header found on unprepared devices
ceph-volume will complain if gpt headers are found on devices.
This commit checks whether a gpt header is present on devices passed in
`devices` variable and fail early.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1730541

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 487d701685)
2019-08-22 16:59:34 +02:00
Guillaume Abrioux b84d2cdd31 release-note: add two deprecations warning and removal
In `stable-4.0`, the group name `iscsi-gws` will go away and some rgw
systemd service names will disappear as well:

(`ceph-rgw@<hostname>.service`, `ceph-radosgw@<hostname>.service`,
`ceph-radosgw@radosgw.<hostname>.service`,
`ceph-radosgw@radosgw.gateway.service`)

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-08-20 11:53:42 +02:00
Guillaume Abrioux af1f41bbd9 validate: do not validate devices or lvm_volumes in osd_auto_discovery case
we shouldn't validate these two variables when `osd_auto_discovery` is
set.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1644623

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 243edfbc96)
2019-08-20 11:02:38 +02:00
Guillaume Abrioux 787a6e879e update: use ids to restart osds instead of device name
we must use the ids instead of device names in the tasks executed in
`post_tasks` for the osd rolling update otherwise it ends up with old
systemd units enabled.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1739209

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-08-13 13:42:58 +02:00
Guillaume Abrioux 81906344ee osd: copy systemd-device-to-id.sh on all osd nodes before running it
Otherwise it will fail when running rolling_update.yml playbook because
of `serial: 1` usage.
The task which copies the script is run against the current node being
played only whereas the task which runs the script is run against all
nodes in a loop, it ends up with the typical error:

```
2019-08-08 17:47:05,115 p=14905 u=ubuntu |  failed: [magna023 -> magna030] (item=magna030) => {
    "changed": true,
    "cmd": [
        "/usr/bin/env",
        "bash",
        "/tmp/systemd-device-to-id.sh"
    ],
    "delta": "0:00:00.004339",
    "end": "2019-08-08 17:46:59.059670",
    "invocation": {
        "module_args": {
            "_raw_params": "/usr/bin/env bash /tmp/systemd-device-to-id.sh",
            "_uses_shell": false,
            "argv": null,
            "chdir": null,
            "creates": null,
            "executable": null,
            "removes": null,
            "stdin": null,
            "warn": true
        }
    },
    "item": "magna030",
    "msg": "non-zero return code",
    "rc": 127,
    "start": "2019-08-08 17:46:59.055331",
    "stderr": "bash: /tmp/systemd-device-to-id.sh: No such file or directory",
    "stderr_lines": [
        "bash: /tmp/systemd-device-to-id.sh: No such file or directory"
    ],
    "stdout": "",
    "stdout_lines": []
}
```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1739209

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-08-12 21:57:29 +02:00
Guillaume Abrioux 5b29144bbd tests: deploy mgr on a dedicated node (all_daemons scenario)
let's deploy mgr on a dedicated node.
This makes update job failing on stable-4.0 branch since there's a
mismatch between the two inventories.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-08-08 13:43:29 +02:00
Guillaume Abrioux a4f4dd7535 osd: add 'osd blacklist' cap for osp keyrings
This commits adds the `osd blacklist` cap on all OSP clients keyrings.

Fixes: #2296

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2d955757ee)
2019-08-07 10:43:04 +02:00
Dimitri Savineau 343eec7a53 shrink-osd: Stop ceph-disk container based on ID
Since bedc0ab we now manage ceph-osd systemd unit scripts based on ID
instead of device name but it was not present in the shrink-osd
playbook (ceph-disk version).
To keep backward compatibility on deployment that didn't do yet the
transition on OSD id then we should stop unit scripts for both device
and ID.
This commit adds the ulimit nofile container option to get better
performance on ceph-disk commands.
It also fixes an issue when the OSD id matches multiple OSD ids with
the same first digit.

$ ceph-disk list | grep osd.1
 /dev/sdb1 ceph data, prepared, cluster ceph, osd.1, block /dev/sdb2
 /dev/sdg1 ceph data, prepared, cluster ceph, osd.12, block /dev/sdg2

Finally removing the shrinked OSD directory.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-08-06 09:38:52 +02:00
Dimitri Savineau d12e6e626d rgw: add beast frontend
Allow to configure the rgw beast frontend in addition to civetweb
(default value).
Add rgw_thread_pool_size variable with 512 as default value and keep
backward compatibility with num_threads option when using civetweb.
Update radosgw_civetweb_num_threads to reflect rgw_thread_pool_size
change.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1733406

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d17b1b48b6)
2019-08-01 10:10:09 +02:00
Dimitri Savineau 4dffcfb429 ceph-osd: check container engine rc for pools
When creating OpenStack pools, we only check if the return code from
the pool list command isn't 0 (ie: if it doesn't exist). In that case,
the return code will be 2. That's why the next condition is rc != 0 for
the pool creation.
But in containerized deployment, the return code could be different if
there's a failure on the container engine command (like container not
running). In that case, the return code could but either 1 (docker) or
125 (podman) so we should fail at this point and not in the next tasks.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1732157

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d549fffdd2)
2019-07-31 14:08:22 -04:00
Dimitri Savineau bf8bd4c0f1 tests: Update ooo-collocation scenario
The ooo-collocation scenario was still using an old container image and
doesn't match the requirement on latest stable-3.2 code. We need to use
at least the container image v3.2.5.
Also updating the OSD tests to reflect the changes introduced by the
commit bedc0ab because we don't have the OSD systemd unit script using
device name anymore.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-07-30 08:27:13 +02:00
Dimitri Savineau 5463d730ee Remove NBSP characters
Some NBSP are still present in the yaml files.
Adding a test in travis CI.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 07c6695d16)
2019-07-26 16:23:38 -04:00
Dimitri Savineau bedc0ab69d ceph-osd: use OSD id with systemd ceph-disk
When using containerized deployment we have to create the systemd
service unit based on a template.
The current implementation with ceph-disk is using the device name
as paramater to the systemd service and for the container name too.

$ systemctl start ceph-osd@sdb
$ docker ps --filter 'name=ceph-osd-*'
CONTAINER ID IMAGE                        NAMES
065530d0a27f ceph/daemon:latest-luminous  ceph-osd-strg0-sdb

This is the only scenario (compared to non containerized or
ceph-volume based deployment) that isn't using the OSD id.

$ systemctl start ceph-osd@0
$ docker ps --filter 'name=ceph-osd-*'
CONTAINER ID IMAGE                        NAMES
d34552ec157e ceph/daemon:latest-luminous  ceph-osd-0

Also if the device mapping doesn't persist to system reboot (ie sdb
might be remapped to sde) then the OSD service won't come back after
the reboot.

This patch allows to use the OSD id with the ceph-osd systemd service
but requires to activate the OSD manually with ceph-disk first in
order to affect the ID to that OSD.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1670734

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-07-26 16:07:22 -04:00
Dimitri Savineau df46d10c27 ceph-infra: update handler with daemon variable
Both ntp and chrony daemon use variable for the service name because it
could be different depending on the GNU/Linux distribution.
This has been update in 9d88d3199 for chrony but only for the start part
not for the handler.
The commit fixes this for both ntp and chrony.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 0ae0193144)
2019-07-12 10:50:04 -04:00
Ramana Raja 9097f9847c Install nfs-ganesha stable v2.7
nfs-ganesha v2.5 and 2.6 have hit EOL. Install nfs-ganesha v2.7
stable that is currently being maintained.

Signed-off-by: Ramana Raja <rraja@redhat.com>
(cherry picked from commit dfff89ce67)
2019-07-10 22:09:14 +02:00
Guillaume Abrioux 1716eea5e3 validate: improve message printed in check_devices.yml
The message prints the whole content of the registered variable in the
playbook, this is not needed and makes the message pretty unclear and
unreadable.

```
"msg": "{'_ansible_parsed': True, 'changed': False, '_ansible_no_log': False, u'err': u'Error: Could not stat device /dev/sdf - No such file or directory.\\n', 'item': u'/dev/sdf', '_ansible_item_result': True, u'failed': False, '_ansible_item_label': u'/dev/sdf', u'msg': u\"Error while getting device information with parted script: '/sbin/parted -s -m /dev/sdf -- unit 'MiB' print'\", u'rc': 1, u'invocation': {u'module_args': {u'part_start': u'0%', u'part_end': u'100%', u'name': None, u'align': u'optimal', u'number': None, u'label': u'msdos', u'state': u'info', u'part_type': u'primary', u'flags': None, u'device': u'/dev/sdf', u'unit': u'MiB'}}, 'failed_when_result': False, '_ansible_ignore_errors': None, u'out': u''} is not a block special file!"
```

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1719023

(cherry picked from commit e6dc3ebd8c)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-07-10 09:37:12 -04:00
Guillaume Abrioux d739f41549 shrink-osd: (ceph-disk only) remove prepare container
When shrinking an OSD, its corresponding 'prepare container' should be
removed otherwise it prevent from redeploying a new osd because of this
leftover.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-07-09 09:04:19 -04:00
Guillaume Abrioux 4b49013369 shrink-osd: (ceph-disk only) remove gpt header
Removing the gpt header on devices will ease ceph-disk to ceph-volume
migration when using shrink-osd + add-osd playbooks.
ceph-disk requires GPT header where ceph-volume will complain if GPT
header is present.
That won't break ceph-disk (re)deployment since we check and add the GPT
header if needed when deploying ceph-disk ODs.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1613735

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-07-09 09:04:19 -04:00
Dimitri Savineau 94cdef2757 ceph-handler: Fix rgw socket in restart script
If the SOCKET variable isn't defined in the script then the test
command won't fail because the return code is 0

$ test -S
$ echo $?
0

There multiple issues in that script:
  - The default SOCKET value isn't defined.
  - Update the wget parameters because the command is doing a loop.
We now use the same option than curl.
  - The check_rest function doesn't test the radosgw at all due to
a wrong test command (test against a string) and always returns 0.
This needs to use the DOCKER_EXEC variable in order to execute the
command.

$ test 'wget http://192.168.100.11:8080'
$ echo $?
0

Resolves: #3926

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit c90f605b51)
2019-07-08 10:38:35 -04:00
Dimitri Savineau 9cc5d1e903 ceph-handler: Fix radosgw_address default value
The rgw restart script set the RGW_IP variable depending on ansible
variables:
  - radosgw_address
  - radosgw_address_block
  - radosgw_interface

Those variables have default values defined in ceph-defaults role:

radosgw_interface: interface
radosgw_address: 0.0.0.0
radosgw_address_block: subnet

But in the rgw restart script we always use the radosgw_address value
instead of the radosgw_interface when defined because we aren't testing
the right default value.
As a consequence, the RGW_IP variable will be set to 0.0.0.0 even if
the ip address associated to the radosgw_interface variable is set
correctly. This causes the check_rest function to fail.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-07-07 07:24:38 +02:00
Gabriel Ramirez 9c31811c33 validate.py: Fix alphabetical order on uca
Alphabetized ceph_repository_uca keys due to errors validating when
using UCA/queens repository on Ubuntu 16.04

An exception occurred during task execution. To see the full
traceback, use -vvv. The error was:
SchemaError: -> ceph_stable_repo_uca  schema item is not
alphabetically ordered

Closes: #4154

Signed-off-by: Gabriel Ramirez <gabrielramirez1109@gmail.com>
(cherry picked from commit 82262c6e8c)
2019-06-25 11:36:04 -04:00
Dimitri Savineau 14f2d616ee ceph-nfs: use template module for configuration
789cef7 introduces a regression in the ganesha configuration file
generation. The new config_template module version broke it.
But the ganesha.conf file isn't an ini file and doesn't really
need to use the config_template module. Instead we can use the
classic template module.

Resolves: #4045

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 616c484698)
2019-06-24 20:47:25 +02:00
Guillaume Abrioux 8b91905dff purge: ensure no ceph kernel thread is present
This tries to first unmount any cephfs/nfs-ganesha mount point on client
nodes, then unmap any mapped rbd devices and finally it tries to remove
ceph kernel modules.
If it fails it means some resources are still busy and should be cleaned
manually before continuing to purge the cluster.
This is done early in the playbook so the cluster stays untouched until
everything is ready for that operation, otherwise if you try to redeploy
a cluster it could end up by getting confused by leftover from previous
deployment.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1337915

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 20e4852888)
2019-06-24 15:36:21 +02:00
Guillaume Abrioux 520f4e9914 add-osd: fix error in validate execution role
ceph-facts should be run before we play ceph-validate since it has
reference to facts that are set in ceph-facts role.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-06-24 14:36:18 +02:00
Guillaume Abrioux 27aad73471 tests: add nfs-ganesha testing
This was removed because of broken repositories which made the CI
failing. That doesn't make sense anymore so adding back it

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-06-24 10:35:42 +02:00
Dimitri Savineau d08af0a654 ceph-disk: Set max open files limit on container
Same behaviour than ceph-volume (b987534). The ceph-disk command runs
faster when using ulimit nofile with container cli.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-06-24 10:06:11 +02:00
Dimitri Savineau 2b492e3de1 ceph-handler: Fix OSD restart script
There's two big issues with the current OSD restart script.

1/ We try to test if the ceph osd daemon socket exists but we use a
wildcard for the socket name : /var/run/ceph/*.asok.
This fails because we usually have multiple ceph osd sockets (or
other ceph daemon collocated) present in /var/run/ceph directory.
Currently the test fails with:

bash: line xxx: [: too many arguments

But it doesn't stop the script execution.
Instead we can specify the full ceph osd socket name because we
already know the OSD id.

2/ The container filter pattern is wrong and could matches multiple
containers resulting the script to fail.
We use the filter with two different patterns. One is with the device
name (sda, sdb, ..) and the other one is with the OSD id (ceph-osd-0,
ceph-osd-15, ..).
In both case we could match more than needed.

$ docker container ls
CONTAINER ID IMAGE              NAMES
958121a7cc7d ceph-daemon:latest ceph-osd-strg0-sda
589a982d43b5 ceph-daemon:latest ceph-osd-strg0-sdb
46c7240d71f3 ceph-daemon:latest ceph-osd-strg0-sdaa
877985ec3aca ceph-daemon:latest ceph-osd-strg0-sdab
$ docker container ls -q -f "name=sda"
958121a7cc7d
46c7240d71f3
877985ec3aca

$ docker container ls
CONTAINER ID IMAGE              NAMES
2db399b3ee85 ceph-daemon:latest ceph-osd-5
099dc13f08f1 ceph-daemon:latest ceph-osd-13
5d0c2fe8f121 ceph-daemon:latest ceph-osd-17
d6c7b89db1d1 ceph-daemon:latest ceph-osd-1
$ docker container ls -q -f "name=ceph-osd-1"
099dc13f08f1
5d0c2fe8f121
d6c7b89db1d1

Adding an extra '$' character at the end of the pattern solves the
problem.

Finally removing the get_container_osd_id function because it's not
used in the script at all.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 45d46541cb)
2019-06-21 14:49:55 -04:00
Dimitri Savineau f4212b20e5 ceph-volume: Set max open files limit on container
The ceph-volume lvm list command takes ages to complete when having
a lot of LV devices on containerized deployment.
For instance, with 25 OSDs on a node it takes 3 mins 44s to list the
OSD.
Adding the max open files limit to the container engine cli when
executing the ceph-volume command seems to improve a lot thee
execution time ~30s.

This was impacting the OSDs creation with ceph-volume (both filestore
and bluestore) when using multiple LV devices.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1702285

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit b987534881)
2019-06-20 20:01:13 -04:00
Guillaume Abrioux f29366b848 ceph-osd: do not relabel /run/udev in containerized context
Otherwise content in /run/udev is mislabeled and prevent some services
like NetworkManager from starting.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 80875adba7)
2019-06-19 23:46:46 +02:00
Rishabh Dave 114078bfa1 ceph-infra: make chronyd default NTP daemon
Since timesyncd is not available on RHEL-based OSs, change the default
to chronyd for RHEL-based OSs. Also, chronyd is chrony on Ubuntu, so
set the Ansible fact accordingly.

Fixes: https://github.com/ceph/ceph-ansible/issues/3628
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 9d88d3199f)
2019-06-18 10:46:34 +02:00
Rishabh Dave 93c7d8d79d don't install NTPd on Atomic
Since Atomic doesn't allow any installations and NTPd is not present
on Atomic image we are using, abort when ntp_daemon_type is set to ntpd.

https://github.com/ceph/ceph-ansible/issues/3572
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit bdff3e48fd)
2019-06-18 10:46:34 +02:00
Dimitri Savineau 81de8a8106 remove ceph-agent role and references
The ceph-agent role was used only for RHCS 2 (jewel) so it's not
usefull anymore.
The current code will fail on CentOS distribution because the rhscon
package is only avaible on Red Hat with the RHCS 2 repository and
this ceph release is supported on stable-3.0 branch.

Resolves: #4020

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 7503098ca0)
2019-06-17 14:42:08 -04:00
Dimitri Savineau ed9b594b80 tests: Update ansible ssh_args variable
Because we're using vagrant, a ssh config file will be created for
each nodes with options like user, host, port, identity, etc...
But via tox we're override ANSIBLE_SSH_ARGS to use this file. This
remove the default value set in ansible.cfg.

Also adding PreferredAuthentications=publickey because CentOS/RHEL
servers are configured with GSSAPIAuthenticationis enabled for ssh
server forcing the client to make a PTR DNS query.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 34f9d51178)
2019-06-17 12:02:36 -04:00
Guillaume Abrioux 64659d2c82 iscsi: assign application (rbd) to pool 'rbd'
if we don't assign the rbd application tag on this pool,
the cluster will get `HEALTH_WARN` state like following:

```
HEALTH_WARN application not enabled on 1 pool(s)
POOL_APP_NOT_ENABLED application not enabled on 1 pool(s)
    application not enabled on pool 'rbd'
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4cf17a6fdd)
2019-06-13 14:43:25 +02:00
Dimitri Savineau 95f3908e44 ceph-handler: replace fuser by /proc/net/unix
We're using fuser command to see if a process is using a ceph unix
socket file. But the fuser command runs through every PID present in
/proc/<PID> to see if one of them is using the file.
On a system running thousands processes, the fuser command can take
a long time to finish.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1717011

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit da9891da1e)
2019-06-12 23:00:21 +02:00
Guillaume Abrioux db90debcc7 validate: fail in check_devices at the right task
see https://bugzilla.redhat.com/show_bug.cgi?id=1648168#c17 for details.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1648168#c17

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 771648304d)
2019-06-10 08:09:58 +02:00
Guillaume Abrioux 62647e1935 spec: bring back possibility to install ceph with custom repo
This can be seen as a regression for customers who were used to deploy
in offline environment with custom repositories.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1673254

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c933645bf7)
2019-06-07 17:29:57 +02:00
Dimitri Savineau 0b653ee5b4 update default rhcs values and docs
The RHCS documentation mentionned in the default values and
group_vars directory are referring to RHCS 2.x while it should be
3.x.

Revolves: https://bugzilla.redhat.com/show_bug.cgi?id=1702732

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-06-04 14:18:23 +02:00
Dimitri Savineau b5fdf5fdcb vagrant: Default box to centos/7
We don't use ceph/ubuntu-xenial anymore but only centos/7 and
centos/atomic-host.
Changing the default to centos/7.

Resolves: #4036

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 24d0fd7003)
2019-05-31 13:57:55 -04:00
Dimitri Savineau 8a74928a19 tox: Refact lvm_osds scenario
The current lvm_osds only tests filestore on one OSD node.
We also have bs_lvm_osds to test bluestore and encryption.
Let's use only one scenario to test filestore/bluestore and with or
without dmcrypt on four OSD nodes.
Also use validate_dmcrypt_bool_value instead of types.boolean on
dmcrypt validation via notario.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 52b9f3fb28)
2019-05-10 11:24:32 +02:00
Mike Christie 0a24078bbb igw: Fix rolling update service ordering
We must stop tcmu-runner after the other rbd-target-* services
because they may need to interact with tcmu-runner during shutdown.
There is also a bug in some kernels where IO can get stuck in the
kernel and by stopping rbd-target-* first we can make sure all IO is
flushed.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1659611

Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit d7ef12910e)
2019-05-10 11:12:50 +02:00
Guillaume Abrioux 900244e065 Revert "Revert "cv: support zap by osd fsid""
This reverts commit addcc1e61a.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-05-10 09:13:10 +02:00
Guillaume Abrioux f1b4874176 Revert "Revert "shrink_osd: use cv zap by fsid to remove parts/lvs""
This reverts commit 043ee8c158.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-05-10 09:13:10 +02:00
Guillaume Abrioux 5053f32c15 osds: allow passing devices by path
ceph-volume didn't work when the devices where passed by path.
Since it now support it, let's allow this feature in ceph-ansible

Closes: #3812

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8f2c45dfd3)
2019-05-09 14:21:43 +02:00
Guillaume Abrioux addcc1e61a Revert "cv: support zap by osd fsid"
This reverts commit 8454f0144a.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-25 21:27:37 +02:00
Guillaume Abrioux 043ee8c158 Revert "shrink_osd: use cv zap by fsid to remove parts/lvs"
This reverts commit be59e0b451.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-25 21:27:37 +02:00
Dimitri Savineau 2fa8099fa7 osd: set default bluestore_wal_devices empty
We only need to set the wal dedicated device when there's three tiers
of storage used.
Currently the block.wal partition will also be created on the same
device than block.db.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1685253

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-04-25 07:13:38 +00:00
Dimitri Savineau 9ff19cc604 rolling_update: restart all ceph-iscsi services
Currently only rbd-target-gw service is restarted during an update.
We also need to restart tcmu-runner and rbd-target-api services
during the ceph iscsi upgrade.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1659611

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f1048627ea)
2019-04-24 23:17:41 +00:00
Dimitri Savineau 7418999638 ceph-mds: Increase cpu limit to 4
In containerized deployment the default mds cpu quota is too low
for production environment.
This is causing performance degradation compared to bare-metal.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1695850

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 1999cf3d19)
2019-04-24 21:44:23 +00:00