Commit Graph

4353 Commits (cf748e729fbe4b8d22a14624822fc18354b28cd2)
 

Author SHA1 Message Date
Guillaume Abrioux cf748e729f update: remove legacy tasks
These tasks should have been removed with backport #4756

Note:
This should have been backported from master but it's not possible
because of too many change between master and stable-3.2

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1740463

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-01-29 09:25:15 -05:00
Dimitri Savineau 13e0f7d341 ceph-defaults: remove rgw from ceph_conf_overrides
The [rgw] section in the ceph.conf file or via the ceph_conf_overrides
variable doesn't exist and has no effect.
To apply overrides to all radosgw instances we should use either the
[global] or [client] sections.
Overrides per radosgw instance should still use the
[client.rgw.{instance-name}] section.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1794552

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 2f07b85131)
2020-01-29 14:19:17 +01:00
Guillaume Abrioux 726b3f220b defaults: change monitor|radosgw_address default values
To avoid confusion, let's change the default value from `0.0.0.0` to
`x.x.x.x`.
Users might think setting `0.0.0.0` will make the daemon binding on all
interfaces.

Fixes: #4827

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fc02fc98eb)
2020-01-14 17:22:35 +01:00
Dimitri Savineau 071b950325 tox: allow copy admin key for purge scenario
This is enabled in the group_vars/clients file but it's overrided in
extra vars by tox.
Let's do it like that for now.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-01-13 14:50:29 -05:00
Guillaume Abrioux 01095f1f4c tests: add coverage on purge playbook
This commit adds a playbook to be played before we run purge playbook,
it first creates an rbd image then map an rbd device on client0 so the
purge playbook will try to unmap it.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit db77fbda15)
2020-01-13 14:50:29 -05:00
Guillaume Abrioux 5db0b239f6 purge: use sysfs to unmap rbd devices
in containerized context, using the binary provided in atomic os won't
work because it's an old version provided by ceph-common based on
10.2.5.
Using a container could be an idea but for large cluster with hundreds
of client nodes, that would require to pull the image of each of them
just to unmap the rbd devices.

Let's use the sysfs method in order to avoid any issue related to ceph
version that is shipped on the host.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1766064

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3cfcc7a105)
2020-01-13 14:50:29 -05:00
Guillaume Abrioux bcd7fee18d update: only run post osd upgrade play on 1 mon
There is no need to run these tasks n times from each monitor.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c878e99589)
2020-01-13 13:42:01 -05:00
Guillaume Abrioux 09f295e89c update: use flags noout and nodeep-scrub only
1. set noout and nodeep-scrub flags,
2. upgrade each OSD node, one by one, wait for active+clean pgs
3. after all osd nodes are upgraded, unset flags

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Rachana Patel <racpatel@redhat.com>
(cherry picked from commit 548db78b95)
2020-01-13 13:42:01 -05:00
Dimitri Savineau 7ce33f4865 ceph-defaults: exclude rbd devices from discovery
The RBD devices aren't excluded from the devices list in the LVM auto
discovery scenario.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1783908

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 6f0556f015)
2020-01-13 12:06:35 -05:00
Dimitri Savineau aea4257807 ceph-osd: wait for all osds once
cf8c6a3 moves the 'wait for all osds' task from openstack_config to the
main tasks list.
But the openstack_config code was executed only on the last OSD node.
We don't need to do this check on all OSD node so we need to add set
run_once to true on that task.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 5bd1cf40eb)
2020-01-13 16:54:01 +01:00
Dimitri Savineau 9a42fe580f ceph-osd: wait for all osd before crush rules
When creating crush rules with device class parameter we need to be sure
that all OSDs are up and running because the device class list is
is populated with this information.
This is now enable for all scenario not openstack_config only.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit cf8c6a3849)
2020-01-13 16:54:01 +01:00
Dimitri Savineau 8b2659bf6d rolling_update: create crush rule after osd play
When upgrading from jewel to luminous we can execute the crush rule tasks
only when the 'osd require-osd-release luminous' command.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-01-13 16:54:01 +01:00
Dimitri Savineau af57597df6 ceph-osd: add device class to crush rules
This adds device class support to crush rules when using the class key
in the rule dict via the create-replicated sub command.
If the class key isn't specified then we use the create-simple sub
command for backward compatibility.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1636508

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ef2cb99f73)
2020-01-13 16:54:01 +01:00
Dimitri Savineau 0ac43d83f4 move crush rule creation from mon to osd role
If we want to create crush rules with the create-replicated sub command
and device class then we need to have the OSD created before the crush
rules otherwise the device classes won't exist.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ed36a11eab)
2020-01-13 16:54:01 +01:00
Dimitri Savineau 255be99bc5 ceph-validate: add rbdmirror validation
When ceph_rbd_mirror_configure is set to true we need to ensure that
the required variables aren't empty.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1760553

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 4a065cebd7)
2020-01-13 16:53:32 +01:00
Dimitri Savineau 2436044369 switch_to_containers: set GUID on lockbox part
The ceph lockbox partition (part number 5) used with non lvm scenarios
and in non containerized deployment don't have a valid PARTUUID.
The value is set to 00000000-0000-0000-0000-000000000000 for each OSD
devices.

$ blkid -t PARTLABEL="ceph lockbox" -o value -s PARTUUID
00000000-0000-0000-0000-000000000000
00000000-0000-0000-0000-000000000000
00000000-0000-0000-0000-000000000000
00000000-0000-0000-0000-000000000000
00000000-0000-0000-0000-000000000000

When switching to containerized deployment we manually mount the lockbox
partition by using the PARTUUID.
Unfortunately because we have most of the time multiple OSD on the same
node we can't have the right symlink in /dev/disk/by-partuuid because it
will point to only one partition.

/dev/disk/by-partuuid/00000000-0000-0000-0000-000000000000 -> ../../sdb5

After the switch_to_containers playbook then only one OSD will restart
correctly and the other will try to access to the wrong device causing
error like 'xxxx is still in use'.

When deploying with containers and dmcrypt OSDs we force a PARTUUID
value during the ceph-disk prepare task.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1616159

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-01-13 16:52:55 +01:00
Dimitri Savineau 58ffae3117 ceph-mds: allow directory fragmentation
We need to explicitly enable the allow_dirfrags flag on cephfs pool
after upgrading to Luminous.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1776233

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-01-13 16:52:11 +01:00
Guillaume Abrioux 881056fa9d facts: avoid duplicated element in devices list
When using `osd_auto_discovery`, `devices` is built multiple times due
to multiple runs of `ceph-facts` role. It end up with duplicate
instances of a same device in the list.

Using `unique` filter when building the list fixes this issue.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 23b1f43897)
2020-01-13 15:47:02 +01:00
Guillaume Abrioux 195c49eaa9 tests: add shrink-osd-legacy testing
This commit introduce back testing against ceph-disk deployed osds.

In stable-3.2 which is the most common version used at customers
(downstream pov), a bunch of OSDs are still deployed using ceph-disk.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-01-09 09:24:22 +01:00
Guillaume Abrioux ca728dcd70 shrink-osd: support fqdn in inventory
When using fqdn in inventory, that playbook fails because of some tasks
using the result of ceph osd tree (which returns shortname) to get
some datas in hostvars[].

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1779021

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 6d9ca6b05b)
2020-01-09 09:24:22 +01:00
Dimitri Savineau 193ce4f572 ceph-iscsi: add ceph-iscsi stable repositories
This commit adds the support of the ceph-iscsi stable repository when
use ceph_repository community instead of always using the devel
repositories.
We're still using the devel repositories for rtslib and tcmu-runner in
both cases (dev and community).

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-01-08 17:47:52 +01:00
Guillaume Abrioux d606ad0bac ansible.cfg: do not enforce PreferredAuthentications
There's no need to enforce PreferredAuthentications by default.
Users can still choose to override the ansible.cfg with any additional
parameter like this one to fit their infrastructure.

Fixes: #4826

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d682412e2a)
2019-12-11 08:56:05 -05:00
Dimitri Savineau 56a7537f48 ceph-osd: update systemd unit script
The systemd unit script wasn't updated with the new container name
format (without the hostname).
We now have the same start/stop docker commands for all scenarios.
During the device to id OSD migration we need to be sure that the
old container with the hostname are stopped.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1780688

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-12-10 23:59:13 +01:00
Dimitri Savineau c409d6e960 tests: add lvm-auto-discovery scenario
This adds the lvm-auto-discovery scenario.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-12-09 09:32:55 +01:00
Dimitri Savineau 9f9b952473 ceph-defaults: exclude md devices from discovery
The md devices (RAID software) aren't excluded from the devices list in
the auto discovery scenario.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1764601

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 014f51c2a4)
2019-12-09 09:32:55 +01:00
Guillaume Abrioux 4f6925890c facts: fix auto_discovery exclude
the previous approach was wrong.
checking if `item.key` is in `osd_auto_discovery_exclude` (`['dm-',
'loop']`) is incorrect because it will obviously not match. Therefore,
the condition will return `True` whatever the device we are checking.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8f42007272)
2019-12-09 09:32:55 +01:00
Guillaume Abrioux f6fea33b40 osd: add possibility to exclude device in osd_auto_discovery
Add a new `osd_auto_discovery_exclude` to give the possibility of
excluding some devices in auto_discovery scenario.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 83d7ef777e)
2019-12-09 09:32:55 +01:00
Andrew Schoen 690860affc ceph-facts: generate devices when osd_auto_discovery is true
This task used to live in ceph-osd, but we need it defined here to that
ceph-config can use it when trying to determine the number of osds.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 88eda479a9)
2019-12-09 09:32:55 +01:00
Dimitri Savineau 825429658b tests: reduce max_mds from 3 to 2
Having max_mds value equals to the number of mds nodes generates a
warning in the ceph cluster status:

cluster:
id:     6d3e49a4-ab4d-4e03-a7d6-58913b8ec00a'
health: HEALTH_WARN'
        insufficient standby MDS daemons available'
(...)
services:
  mds:     cephfs:3 {0=mds1=up:active,1=mds0=up:active,2=mds2=up:active}'

Let's use 2 active and 1 standby mds.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 4a6d19dae2)
2019-12-04 17:57:33 -05:00
Dimitri Savineau b08ac9cd44 switch_to_containers: fix umount ceph partitions
When a container is already running on a non containerized node then the
umount ceph partition task is skipped.
This is due to the container ps command which always returns 0 even if
the filter matches nothing.

We should run the umount task when:
1/ the container command is failing (not installed) : rc != 0
2/ the container command reports running ceph-osd containers : rc == 0

Also we should not fail on the ceph directory listing.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1616159

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 39cfe0aa65)
2019-12-03 15:58:57 +01:00
Guillaume Abrioux cbfa01f697 tests: fix update scenario (container)
The path to the inventory isn't correct because we are missing the variable
`CONTAINER_DIR` here.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-12-02 10:48:56 -05:00
Guillaume Abrioux 4d004bd5f6 tests: revert vagrant_variable file name detection
This commit reverts the following change:

fcf181342a (diff-23b6f443c01ea2efcb4f36eedfea9089R7-R14)

this is causing CI failures so this commit is intended to unlock the CI.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5353ab8a23)
2019-11-25 15:33:10 +01:00
Dimitri Savineau cb0926262d rolling_update: don't enable ceph-mon unit
On non containerized deployment the ceph-mon hostname/fqdn systemd
service are stopped at the beginning of the mon upgrade.
But the parameter enabled is set to true for both task so even if we're
not using the fqdn then it will enabled the systemd unit based on it.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1649617

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-11-21 09:19:43 +01:00
Dimitri Savineau 25ac0efddd container: add always tag on gather fact tasks
If we execute the site-container.yml playbook with specific tags (like
ceph_update_config) then we need to be sure to gather the facts otherwise
we will see error like:

The task includes an option with an undefined variable. The error was:
'ansible_hostname' is undefined

This commit also adds missing 'gather_facts: false' to mons plays.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1754432

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d7fd769b6d)
2019-11-18 16:42:47 +01:00
VasishtaShastry c67de5a342 Evades validation of ceph_repository_type in containerized scenario
This will prevent failure of site-docker.yml with configs in doc.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1769760

Signed-off-by: VasishtaShastry <vipin.indiasmg@gmail.com>
Co-Authored-By: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9a1f1626c3)
2019-11-18 16:41:34 +01:00
Guillaume Abrioux c6bc4c4976 ceph_key: restore file mode after a key is fetched
when `import_key` is enabled, if the key already exists, it will only be
fetched using ceph cli, if the mode specified in the `ceph_key` task is
different from what is applied by the ceph cli, the mode isn't restored because
we don't call `module.set_fs_attributes_if_different()` before
`module.exit_json(**result)`

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1734513

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b717b5f736)
2019-11-15 06:11:11 +01:00
Noah Watkins 146d144045 Remove outdated documentation
Fixes BZ
https://bugzilla.redhat.com/show_bug.cgi?id=1640525

Signed-off-by: Noah Watkins <nwatkins@redhat.com>
2019-11-13 16:04:55 +01:00
Guillaume Abrioux 4b1a810906 mergify: remove mergify config on stable-3.2
This commit removes the mergify config on stable-3.2

At the moment there is no need to have a mergify config on this branch
given that we don't use it.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-11-07 15:22:29 -05:00
Dimitri Savineau b47f7763fc ceph-osd: fix fs.aio-max-nr sysctl condition
[1] introduced a regression on the fs.aio-max-nr sysctl value condition.
The enable key isn't a boolean but a string because the expression isn't
evaluated.
This string output "(osd_objectstore == 'bluestore')" is always true
because item.enable condition only matches non empty string. So the
sysctl value was applyied for both filestore and bluestore backend.

[2] added the bool filter to the condition but the filter always returns
false on string and the sysctl wasn't applyed at all.

This commit fixes the enable key value by evaluating the value instead
of using the string.

[1] https://github.com/ceph/ceph-ansible/commit/08a2b58
[2] https://github.com/ceph/ceph-ansible/commit/ab54fe2

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ece46d33be)
2019-11-07 20:38:33 +01:00
Harald Jensås e8ed6655f3 Support comma-delimited subnets in firewall
ceph.conf supports a comma separated list of
subnet CIDR's for the public_network and the
cluster network. ceph-ansible should support
setting up the firewall for this configuration.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1767392
Closes: #4425
Related: #4333
https://docs.ceph.com/docs/nautilus/rados/configuration/network-config-ref/#network-config-settings

Signed-off-by: Harald Jensås <hjensas@redhat.com>
(cherry picked from commit d94229204d)
2019-11-01 11:00:18 -04:00
Dimitri Savineau dd4a4cbb66 ceph-infra: Remove restart firewalld handler
There's no need to restart firewalld service when a new rule is
added due to the usage of the immediate flag.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit b7338d438a)
2019-11-01 11:00:18 -04:00
Dimitri Savineau 4cd53bfbe5 ceph-osd: Remove ulimit nofile on container start
Even if this improves ceph-disk/ceph-volume performances then it also
impact the ceph-osd process.
The ceph-osd process shouldn't use 1024:4096 value for the max open
files.
Removing the ulimit option from the container engine and doing this kind
of change on the container side [1].

[1] https://github.com/ceph/ceph-container/pull/1497

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1702285

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 9a996aef7f)
2019-10-31 14:42:41 -04:00
Guillaume Abrioux a5a231b0b6 update: add default values when setting fact
This commit adds a default value in the with_dict because when using
python 2.7, if a task using a with_dict has a condition, it is
evaluated anyway whereas in python 3 it isn't.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1766499

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-10-29 16:00:39 -04:00
Dimitri Savineau 8acb42dc61 rolling_update: remove default filter on mds group
There's no need to use the default filter on active/standby groups
because if the group doesn't exist then the play is just skipped.

Currently this generates warnings like:

[WARNING]: Could not match supplied host pattern, ignoring: |
[WARNING]: Could not match supplied host pattern, ignoring: default([])

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 2ca79fcc99)
2019-10-28 13:08:43 -04:00
Dimitri Savineau bd79b4480a rolling_update: fix active mds host value
The active mds host should be based on the inventory hostname and not on
the ansible hostname.
The value returns under the mdsmap structure is based on the OS hostname
so we need to find the right node in the inventory with this value when
doing operation on inventory nodes.

Othewise we could see error like:

The task includes an option with an undefined variable. The error was:
"hostvars[foobar]" is undefined

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f1f2352c79)
2019-10-28 13:08:43 -04:00
Guillaume Abrioux 4b667b2f37 update: skip mds deactivation when no mds in inventory
Let's skip this part of the code if there's no mds node in the
inventory.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5ec906c3af)
2019-10-25 08:57:47 -04:00
Dimitri Savineau f3fc97caa0 openstack_config: fix docker exec command
container_exec_cmd should be replace by docker_exec_cmd.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1765110

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-10-24 14:13:52 -04:00
Guillaume Abrioux 1884506189 update: follow new recommandation to upgrade mds cluster
Refact the mds cluster upgrade code in order to follow the documented
recommandation.
See: https://github.com/ceph/ceph/blob/luminous/doc/cephfs/upgrading.rst

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1569689

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 71cebf80a6)
2019-10-21 15:44:38 -04:00
Dimitri Savineau 52bba29a7f tests: fix the size on the second data LV
The commit replaces the pv/vg/lv commands used with the ansible command
module by the lvg and lvol modules.
This also fixes the size of the second data LV because we were only using
50% of the remaining space instead of 100%.

With a 50G device, the result was:
  - data-lv1 was 25G
  - data-lv2 was 12.5G
Instead of:
  - data-lv1 was 25G
  - data-lv2 was 25G

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 2c03c6fcd3)
2019-10-18 14:49:57 -04:00
Guillaume Abrioux 8dc40711bb common: do not override ceph_release when using custom repo
Otherwise it fails like following:

```
TASK [ceph-mds : allow multimds] **************************************************************************************************************************************************
Monday 22 July 2019  16:37:38 +0800 (0:00:03.269)       0:13:25.651 ***********
fatal: [rhel7u6clone1]: FAILED! => {"msg": "The conditional check 'ceph_release_num[ceph_release] == ceph_release_num.luminous' failed. The error was: error while evaluating conditional (ceph_release_num[ceph_release] == ceph_release_num.luminous): 'dict object' has no attribute u'dummy'\n\nThe error appears to have been in '/usr/share/ceph-ansible/roles/ceph-mds/tasks/create_mds_filesystems.yml': line 43, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: allow multimds\n  ^ here\n"}
```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1645379

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4e9504c939)
2019-10-17 20:10:57 -04:00