Commit Graph

1872 Commits (ebc901c6af67300f7b7b8da1b2d0a74147798da5)

Author SHA1 Message Date
Guillaume Abrioux 3b74a6919c client: do not rely on copy_admin_key to import keys
Relying on `copy_admin_key` to import created keys on client nodes makes
us obliged to copy admin key on those nodes which is not something we might
want.
We should use the fact `condition_copy_admin_key` which will be set to
`True` when the delegated node is a mon which means we can import keys
without taking care of admin keyring.

Fixes: #2867

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5ef5fcd0b6)
2018-07-13 09:24:14 +00:00
Guillaume Abrioux aee48a40a4 mgr: fix condition to add modules to ceph-mgr
Follow up on #2784

We must check in the generated fact `_disabled_ceph_mgr_modules` to
enable disabled mgr module.
Otherwise, this task will be skipped because it's not comparing the
right list.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1600155

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ce5ac930c5)
2018-07-13 06:44:13 +00:00
Guillaume Abrioux f8a4c57c03 common: switch from docker module to docker_container
As of ansible 2.4, `docker` module has been removed (was deprecated
since ansible 2.1).
We must switch to `docker_container` instead.

See: https://docs.ansible.com/ansible/latest/modules/docker_module.html#docker-module

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d0746e0858)
2018-07-11 10:56:48 +02:00
Guillaume Abrioux 722e7412b6 mon: ensure socker is purged when mon is stopped
On containerized deployment, if a mon is stopped, the socket is not
purged and can cause failure when a cluster is redeployed after the
purge playbook has been run.

Typical error:

```
fatal: [osd0]: FAILED! => {}

MSG:

'dict object' has no attribute 'osd_pool_default_pg_num'
```

the fact is not set because of this previous failure earlier:

```
ok: [mon0] => {
    "changed": false,
    "cmd": "docker exec ceph-mon-mon0 ceph --cluster test daemon mon.mon0 config get osd_pool_default_pg_num",
    "delta": "0:00:00.217382",
    "end": "2018-07-09 22:25:53.155969",
    "failed_when_result": false,
    "rc": 22,
    "start": "2018-07-09 22:25:52.938587"
}

STDERR:

admin_socket: exception getting command descriptions: [Errno 111] Connection refused

MSG:

non-zero return code
```

This failure happens when the ceph-mon service is stopped, indeed, since
the socket isn't purged, it's a leftover which is confusing the process.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9f54b3b4a7)
2018-07-11 10:56:48 +02:00
Sébastien Han 8d95479502 ceph-config: do not log cluster log on container
The container image recently merged both cluster and mon log into a
single stream. Following this, we now see this warning coming from the
container image:

2018-06-19 13:44:01.542990 7ff75b024700  1 mon.vm02@1(peon).log
v57928205 unable to write to '/var/log/ceph/ceph.log' for channel
'cluster': (2) No such file or directory

So we now tell the mon to not log cluster log on the filesystem.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1591771
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 713b9fcf9b)
2018-07-05 18:13:02 +00:00
Sébastien Han d565f28ef5 ceph-common: fix rhcs condition
We forgot to add mgr_group_name when checking for the mon repo, thus the
conditional on the next task was failing.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1598185
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit fcf11ecc35)
2018-07-04 17:18:10 +02:00
Guillaume Abrioux b3abd67713 mgr: fix enabling of mgr module on mimic
The data structure has slightly changed on mimic.

Prior to mimic, it used to be:

```
{
    "enabled_modules": [
        "status"
    ],
    "disabled_modules": [
        "balancer",
        "dashboard",
        "influx",
        "localpool",
        "prometheus",
        "restful",
        "selftest",
        "zabbix"
    ]
}
```

From mimic it looks like this:

```
{
    "enabled_modules": [
        "status"
    ],
    "disabled_modules": [
        {
            "name": "balancer",
            "can_run": true,
            "error_string": ""
        },
        {
            "name": "dashboard",
            "can_run": true,
            "error_string": ""
        }
    ]
}
```

This means we can't simply check if `item` is in `item in
_ceph_mgr_modules.disabled_modules`

the idea here is to use filter `map(attribute='name')` to build a list
when deploying mimic.

Fixes: #2766

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3abc253fec)
2018-07-03 23:47:48 +00:00
Sébastien Han 837d5a44a8 ceph-client: do not kill the dummy container
The container runs for 300 sec, then dies and removes itself thanks to
the '--rm' option, so there is no point of removing it. Also this is
causing failure under some circonstances.

Closing: https://bugzilla.redhat.com/show_bug.cgi?id=1568157
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 63658c05c7)
2018-07-03 18:39:56 +00:00
Sébastien Han ceded7e2c1 ceph-defaults: add default application to pool
We now add a default 'rbd' application type to each pool we create. This
will remove the warning: "  application not enabled on N pool(s) "

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1590275
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 103c279c21)
2018-07-02 13:29:26 +00:00
Sébastien Han d6f265f51d ceph-mds: enable application pool
We now enable the application type 'cephfs' for each cephfs pools we
create.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1590275
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit a629408967)
2018-07-02 13:29:26 +00:00
Vasu Kulkarni 6431976e22 Enable monitor repo for mgr nodes and Tools repo for iscsi/nfs/clients
Signed-off-by: Vasu Kulkarni <vasu@redhat.com>
(cherry picked from commit 1d454b611f)
2018-07-02 09:46:53 +02:00
Ha Phan 2b4710943d ceph-mon: Generate initial keyring
Minor fix so that initial keyring can be generated using python3.

Signed-off-by: Ha Phan <thanhha.work@gmail.com>
(cherry picked from commit a7b7735b6f)
2018-06-29 10:55:26 +02:00
Sébastien Han 863e99ea32 systemd: remove changed_when: false
When using a module there is no need to apply this Ansible option. The
module will handle the idempotency on its own. So the module decides
wether or not the task has changed during the execution.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit f623997271)

# Conflicts:
#	roles/ceph-iscsi-gw/tasks/container/containerized.yml
2018-06-29 07:21:10 +00:00
Sébastien Han 89945f8d77 ceph-osd: trigger osd container restart on script change
The script ceph-osd-run.sh holds the config options to start the
container, if one of these options are modified we must restart the
container. This was not the case before becauase the 'notify' flag
wasn't present.

Closing: https://bugzilla.redhat.com/show_bug.cgi?id=1596061
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit abdb53e16a)
2018-06-29 07:21:10 +00:00
Sébastien Han 06275d229a mon: honour mon_docker_net_host option
--net=host was hardcoded in the startup line so even though
mon_docker_net_host was set to False the net option would always be
activated.
mon_docker_net_host is set to True by default so this commit does not
change the behaviour.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 322e2de7d2)
2018-06-27 21:17:09 +00:00
Guillaume Abrioux 045ba36ea3 common: remove duplicate include of configure_firewall_rpm.yml
408ef69 has introduced a duplicated task, something went wrong with the
backport from 24ef47b (probably a conflict merge hasn't been solved properly).

It's better now to commit directly in stable-3.1 to definitely solve
this issue.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1589146

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-19 15:04:26 +02:00
Sébastien Han 7a099bb561 common: start firewalld if configure_firewall
Currently we expect that if configure_firewall is set to True to have
firewalld enabled and running. Let's enforce that.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1589146
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit bea4027f0c)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-19 15:04:26 +02:00
Sébastien Han 60c06a7afe mon/osd: bump container memory limit
As discussed with the cores, the current limits are too low and should
be bumped to higher value.
So now by default monitors get 3GB and OSDs get 5GB.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1591876
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit a9ed3579ae)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-18 13:54:42 +02:00
Patrick Donnelly 2bcc1c71c0 ceph-mds: do not enable multimds on jewel
Multiple active MDS became stable in Luminous.

Introduced-by: c8573fe0d7
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 9ce81ae845)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-13 20:01:37 +02:00
Guillaume Abrioux 35af792fa1 client: try to kill dummy container only on first client node
The 'dummy' container is created only on first client node, it means we
must seek to destroy this container only on this node, otherwise this
can cause failure like following :
```
fatal: [192.168.24.8]: FAILED! => {"changed": false, "cmd": ["docker", "rm",
"-f", "ceph-create-keys"], "delta": "0:00:00.023692", "end": "2018-06-12
20:56:07.261278", "msg": "non-zero return code", "rc": 1, "start":
"2018-06-12 20:56:07.237586", "stderr": "Error response from daemon: No such
container: ceph-create-keys", "stderr_lines": ["Error response from daemon: No
such container: ceph-create-keys"], "stdout": "", "stdout_lines": []}

```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1590746

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 51cf3b7fa0)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-13 17:42:28 +02:00
Konstantin Shalygin 5137bc263a ceph-osd: set 'openstack_keys_tmp' only when 'openstack_config' is defined.
If 'openstack_config' is false this task shouldn't be executed.

Signed-off-by: Konstantin Shalygin <k0ste@k0ste.ru>
(cherry picked from commit 3a07568496)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-12 08:23:14 +02:00
Sébastien Han 408ef69f9a common: ability to enable/disable fw configuration
Prior to this patch if you were running on a Red Hat system,
ceph-ansible would try to configure firewalld for you without the
operators's consent.
Now you can enable or disable the fw configuration by setting
configure_firewall to either true or false.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1589146
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 2e8412734a)
2018-06-11 23:10:16 +02:00
Guillaume Abrioux 18e794217d client: keyrings aren't created when single client node
combining `run_once: true` with `inventory_hostname ==
groups.get(client_group_name) | first` might cause bug when the only
node being run is not the first in the group.

In a deployment with a single client node it might cause issue because
sometimes keyring won't be created since the task could be definitively
skipped.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1588093

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 090ecff94e)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-08 15:34:34 +02:00
Guillaume Abrioux c35203da88 client: add a default value for keyring file
Potential error if someone doesnt pass the mode in `keys` dict for
client nodes:

```
fatal: [client2]: FAILED! => {}

MSG:

The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'mode'

The error appears to have been in '/home/guits/ceph-ansible/roles/ceph-client/tasks/create_users_keys.yml': line 117, column 3, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

- name: get client cephx keys
  ^ here

exception type: <class 'ansible.errors.AnsibleUndefinedVariable'>
exception: 'dict object' has no attribute 'mode'

```

adding a default value will avoid the deployment failing for this.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8a653cacd5)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-07 18:27:00 +02:00
Guillaume Abrioux 7bcb005e6b client: use dummy created container when there is no mon in inventory
the `docker_exec_cmd` fact set in client role when there is no monitor
in inventory is wrong, `ceph-client-{{ hostname }}` is never created so
it will fail anyway.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7b156deb67)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-07 13:48:33 +02:00
Guillaume Abrioux 9d50874d38 osd: copy openstack keys over to all mon
When configuring openstack, the created keyrings aren't copied over to
all monitors nodes.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1588093

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 433ecc7cbc)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-07 10:59:46 +02:00
Patrick Donnelly 4c5042ae28 change max_mds default to 1
Otherwise, with the removal of mds_allow_multimds, the default of 3 will be set
on every new FS.

Introduced by: c8573fe0d7

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1583020
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 91f9da530f)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-06 10:47:04 +02:00
Guillaume Abrioux a558d8aef3 rgw: refact rgw pools creation
Refact of 8704144e31
There is no need to have duplicated tasks for this. The rgw pools
creation should be delegated on a monitor node se we don't have to care
if the admin keyring is present on rgw node.
By the way, only one task is needed to create the pools, we just need to
use the `docker_exec_cmd` fact already defined in `ceph-defaults` to
achieve it.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1550281

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2cf06b515f)
2018-06-06 11:30:29 +08:00
jtudelag 36b2c4a527 rgws: renames create_pools variable with rgw_create_pools.
Renamed to be consistent with the role (rgw) and have a meaningful name.

Signed-off-by: Jorge Tudela <jtudelag@redhat.com>
(cherry picked from commit 600e1e2c26)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-05 18:56:24 +02:00
jtudelag 1d94d12c9f Adds RGWs pool creation to containerized installation.
ceph command has to be executed from one of the monitor containers
if not admin copy present in RGWs. Task has to be delegated then.

Adds test to check proper RGW pool creation for Docker container scenarios.

Signed-off-by: Jorge Tudela <jtudelag@redhat.com>
(cherry picked from commit 8704144e31)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-05 18:56:24 +02:00
Andy McCrae c90535ecce Fix template reference for ganesha.conf
We can simply reference the template name since it exists within the
role that we are calling. We don't need to check the ANSIBLE_ROLE_PATH
or playbooks directory for the file.

Signed-off-by: Lionel Sausin <ls@initiatives.fr>
2018-06-04 10:21:17 +02:00
Andrew Schoen 53dfd050c5 ceph-defaults: add the nautilus 14.x entry to ceph_release_num
The first 14.x tag has been cut so this needs to be added so that
version detection will still work on the master branch of ceph.

Fixes: https://github.com/ceph/ceph-ansible/issues/2671

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit c2423e2c48)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-01 19:21:59 +02:00
Guillaume Abrioux 28319698e2 mons: move set_fact of openstack_keys in ceph-osd
Since the openstack_config.yml has been moved to `ceph-osd` we must move
this `set_fact` in ceph-osd otherwise the tasks in
`openstack_config.yml` using `openstack_keys` will actually use the
defaults value from `ceph-defaults`.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1585139

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit aae37b44f5)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-01 18:37:18 +02:00
Guillaume Abrioux 9c91bb8b2c osds: wait for osds to be up before creating pools
This is a follow up on #2628.
Even with the openstack pools creation moved later in the playbook,
there is still an issue because OSDs are not all UP when trying to
create pools.

Adding a task which checks for all OSDs to be UP with a `retries/until`
condition should definitively fix this issue.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1578086

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9d5265fe11)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-01 17:07:35 +02:00
Sébastien Han 2ac720d2c2 rgw: container add option to configure multi-site zone
You can now use RGW_ZONE and RGW_ZONEGROUP on each rgw host from your
inventory and assign them a value. Once the rgw container starts it'll
pick the info and add itself to the right zone.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1551637
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 1c084efb3c)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-05-31 11:10:01 +02:00
Guillaume Abrioux 4f0850adf1 mon: remove check on pg_num for cephfs_pools
It should have been backported from 29a9dff but for better clarity I
think it's better to create a new commit for this.

c68126d6 aims to not make `pgs` attribute mandatory for each element of
`cephfs_pools`. Therefore, we must remove the check in
`roles/ceph-mon/tasks/check_mandatory_vars.yml`.
This task has been removed by 29a9dff but I've chosen to not backport
this commit since it's part of a bunch of commits belonging to a PR
implementing `ceph-validate` role.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-05-30 21:45:03 +02:00
Guillaume Abrioux 4328e0b42a mdss: do not make pg_num a mandatory params
When playing ceph-mds role, mon nodes have set a fact with the default
pg num for osd pools, we can simply default to this value for cephfs
pools (`cephfs_pools` variable).

At the moment the variable definition for `cephfs_pools` looks like:

```
cephfs_pools:
  - { name: "{{ cephfs_data }}", pgs: "" }
  - { name: "{{ cephfs_metadata }}", pgs: "" }
```

and we have a task in `ceph-validate` to ensure `pgs` has been set to a
valid value.

We could simply avoid this check by setting the default value of `pgs`
to `hostvars[groups[mon_group_name][0]]['osd_pool_default_pg_num']` and
let to users the possibility to override this value.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1581164

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c68126d6fd)
2018-05-30 21:45:03 +02:00
Guillaume Abrioux 6ee4b228ba osds: do not set docker_exec_cmd fact
in `ceph-osd` there is no need to set `docker_exec_cmd` since the only
place where this fact is used is in `openstack_config.yml` which
delegate all docker command to a monitor node. It means we need the
`docker_exec_cmd` fact that has been set referring to `ceph-mon-*`
containers, this fact is already set earlier in `ceph-defaults`.

By the way, when collocating an OSD with a MON it fails because the container
`ceph-osd-{{ ansible_hostname }}` doesn't exist.

Removing this task will allow to collocate an OSD with a MON.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1584179

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 34e646e767)
2018-05-30 20:20:43 +02:00
Guillaume Abrioux 220d528e8b mds: move mds fs pools creation
When collocating mds on monitor node, the cephpfs will fail
because `docker_exec_cmd` is reset to `ceph-mds-monXX` which is
incorrect because we need to delegate the task on `ceph-mon-monXX`.
In addition, it wouldn't have worked since `ceph-mds-monXX` container
isn't started yet.

Moving the task earlier in the `ceph-mds` role will fix this issue.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1578086

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 608ea947a9)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-05-25 03:57:31 -07:00
Andrew Schoen 49f6d3cbec ceph-defaults: move cephfs vars from the ceph-mon role
We're doing this so we can validate this in the ceph-validate role

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 1f15a81c48)
2018-05-24 21:29:42 +02:00
Guillaume Abrioux 683bec9eb2 mdss: move cephfs pools creation in ceph-mds
When deploying a large number of OSD nodes it can be an issue because the
protection check [1] won't pass since it tries to create pools before all
OSDs are active.

The idea here is to move cephfs pools creation in `ceph-mds` role.

[1] e59258943b/src/mon/OSDMonitor.cc (L5673)

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1578086

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3a0e168a76)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-05-24 21:29:42 +02:00
Guillaume Abrioux 873abdbf0c osds: move openstack pools creation in ceph-osd
When deploying a large number of OSD nodes it can be an issue because the
protection check [1] won't pass since it tries to create pools before all
OSDs are active.

The idea here is to move openstack pools creation at the end of `ceph-osd` role.

[1] e59258943b/src/mon/OSDMonitor.cc (L5673)

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1578086

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 564a662baf)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-05-24 21:29:42 +02:00
Luigi Toscano d7f0ea33c9 ceph-radosgw: disable NSS PKI db when SSL is disabled
The NSS PKI database is needed only if radosgw_keystone_ssl
is explicitly set to true, otherwise the SSL integration is
not enabled.

It is worth noting that the PKI support was removed from Keystone
starting from the Ocata release, so some code paths should be
changed anyway.

Also, remove radosgw_keystone, which is not useful anymore.
This variable was used until fcba2c801a.
Now profiles drives the setting of rgw keystone *.

Signed-off-by: Luigi Toscano <ltoscano@redhat.com>
(cherry picked from commit 43e96c1f98)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-05-24 15:41:42 +02:00
Vishal Kanaujia 0e0bd09b1f Skip GPT header creation for lvm osd scenario
The LVM lvcreate fails if the disk already has a GPT header.
We create GPT header regardless of OSD scenario. The fix is to
skip header creation for lvm scenario.

fixes: https://github.com/ceph/ceph-ansible/issues/2592

Signed-off-by: Vishal Kanaujia <vishal.kanaujia@flipkart.com>
(cherry picked from commit ef5f52b1f3)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-05-23 13:37:09 -07:00
Subhachandra Chandra 747b545af4 Fix restarting OSDs twice during a rolling update.
During a rolling update, OSDs are restarted twice currently. Once, by the
handler in roles/ceph-defaults/handlers/main.yml and a second time by tasks
in the rolling_update playbook. This change turns off restarts by the handler.
Further, the restart initiated by the rolling_update playbook is more
efficient as it restarts all the OSDs on a host as one operation and waits
for them to rejoin the cluster. The restart task in the handler restarts one
OSD at a time and waits for it to join the cluster.

(cherry picked from commit c7e269fcf5)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-05-22 23:22:38 -07:00
Sébastien Han 831491f7d6 defaults: restart_osd_daemon unit spaces
Extra space in systemctl list-units can cause restart_osd_daemon.sh to
fail

It looks like if you have more services enabled in the node space
between "loaded" and "active" get more space as compared to one space
given in command the command[1].

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1573317
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 2f43e9dab5)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-05-22 09:43:10 +02:00
Michael Vollman e1aa85f04c Do nothing when mgr module is in good state
Check whether a mgr module is supposed to be disabled before disabling
it and whether it is already enabled before enabling it.

Signed-off-by: Michael Vollman <michael.b.vollman@gmail.com>
(cherry picked from commit ed050bf3f6)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-05-18 16:53:39 +02:00
Guillaume Abrioux 861f4b876b client: remove default value for pg_num in pools creation
trying to set the default value for pg_num to
`hostvars[groups[mon_group_name][0]]['osd_pool_default_pg_num'])` will
break in case of external client nodes deployment.
the `pg_num` attribute should be mandatory and be tested in future
`ceph-validate` role.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f60b049ae5)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-05-16 09:44:36 +02:00
Sébastien Han 1de4756338 rolling_update: move mgr key creation
Until all the mons haven't been updated to Luminous, there is no way to
create a key. So we should do the key creation in the mon role only if
we are not part of an update.
If we are then the key creation is done after the mons upgrade to
Luminous.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1574995
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 52fc8a0385)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-05-15 19:39:03 +02:00
Sébastien Han 9ca1d1d571 Revert "mon: fix mgr keyring creation when upgrading from jewel"
This reverts commit 259fae931d.

(cherry picked from commit e810fb217f)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-05-15 19:39:03 +02:00
Guillaume Abrioux 7c7f517bba iscsi-gw: fix issue when trying to mask target
trying to mask target when `/etc/systemd/system/target.service` doesn't
exist seems to be a bug.
There is no need to mask a unit file which doesn't exist.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a145caf947)
2018-05-15 10:21:41 +02:00
Sébastien Han 0bb7e6dd8c iscsi: add python-rtslib repository
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 8c7c11b774)
2018-05-15 10:21:41 +02:00
Andy McCrae 54ef0496da Allow os_tuning_params to overwrite fs.aio-max-nr
The order of fs.aio-max-nr (which is hard-coded to 1048576) means that
if you set fs.aio-max-nr in os_tuning_params it will effectively be
ignored for bluestore scenarios.

To resolve this we should move the setting of fs.aio-max-nr above the
setting of os_tuning_params, in this way the operator can define the
value of fs.aio-max-nr to be something other than 1048576 if they want
to.

Additionally, we can make the sysctl settings happen in 1 task rather
than multiple.

(cherry picked from commit 08a2b58d39)
2018-05-14 11:05:43 +02:00
Gregory Meno b6ea36e98e adds missing state needed to upgrade nfs-ganesha
in tasks for os_family Red Hat we were missing this

fixes: bz1575859
Signed-off-by: Gregory Meno <gmeno@redhat.com>
(cherry picked from commit 26f6a65042)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-05-10 11:49:53 -07:00
Guillaume Abrioux 259fae931d mon: fix mgr keyring creation when upgrading from jewel
On containerized deployment,
when upgrading from jewel to luminous, mgr keyring creation fails because the
command to create mgr keyring is executed on a container that is still
running jewel since the container is restarted later to run the new
image, therefore, it fails with bad entity error.

To get around this situation, we can delegate the command to create
these keyrings on the first monitor when we are running the playbook on the last monitor.
That way we ensure we will issue the command on a container that has
been well restarted with the new image.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1574995

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-05-09 10:29:48 -07:00
Guillaume Abrioux 7b387b506a osd: clean legacy syntax in ceph-osd-run.sh.j2
Quick clean on a legacy syntax due to e0a264c7e

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-05-09 07:29:33 +02:00
Simone Caronni b12bf62c36 Make sure the restart_mds_daemon script is created with the correct MDS name 2018-05-08 20:53:15 +02:00
Sébastien Han 07ca91b5cb common: enable Tools repo for rhcs clients
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1574458
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-05-08 16:12:30 +02:00
Andy McCrae e99351b95b Fix install of nfs-ganesha-ceph for Debian/SuSE
The Debian and SuSE installs for nfs-ganesha on the non-rhcs repository
requires you to allow_unauthenticated for Debian, and disable_gpg_check
for SuSE. The nfs-ganesha-rgw package already does this, but the
nfs-ganesha-ceph package will fail to install because of this same
issue.

This PR moves the installations to happen when the appropriate flags are
set to True (nfs_obj_gw & nfs_file_gw), but does it per distro (one for
SuSE and one for Debian) so that the appropriate flag can be passed to
ignore the GPG check.
2018-05-04 15:13:59 +02:00
Ramana Raja 31762dede3 ceph-nfs: disable attribute caching
When 'ceph_nfs_disable_caching' is set to True, disable attribute
caching done by Ganesha for all Ganesha exports.

Signed-off-by: Ramana Raja <rraja@redhat.com>
2018-05-04 09:47:54 +02:00
Sébastien Han 4a186237e6 common: copy iso files if rolling_update
If we are in a middle of an update we want to get the new package
version being installed so the task that copies the repo files should
not be skipped.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1572032
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-05-03 17:18:55 +02:00
Andy McCrae d142be0422 Move apt cache update to individual task per role
The apt-cache update can fail due to transient issues related to the
action being a network operation. To reduce the impact of these
transient failures this patch adds a retry to the update_cache task.

However, the apt_repository tasks which would perform an apt_update
won't retry the apt_update on a failure in the same way, as such this PR
moves the apt_update into an individual task, once per role.

Finally, the apt_repository tasks no longer have a changed_when: false,
and the apt_cache update is only performed once per role, if the
repositories change. Otherwise the cache is updated on the "apt" install
tasks if the cache_timeout has been reached.
2018-05-03 14:02:15 +02:00
Guillaume Abrioux 6fe8df627b client: fix pool creation
the value in `docker_exec_client_cmd` doesn't allow to check for
existing pools because it's set with a wrong value for the entrypoint
that is going to be used.
It means the check were going to fail anyway even if pools actually exist.

Using jinja syntax to set `docker_exec_cmd` allows to handle the case
where you don't have monitors in your inventory.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-05-03 08:22:40 +02:00
Sébastien Han 43e23ffe4d mon: change application pool support
If openstack_pools contains an application key it will be used to apply
this application pool type to a pool.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1562220
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-04-30 09:42:58 +02:00
Guillaume Abrioux 75ed437d4e check if pools already exist before creating them
Add a task to check if pools already exist before we create them.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-04-30 08:15:18 +02:00
Guillaume Abrioux a68091c923 tests: update the type for the rule used in pools
As of ceph 12.2.5 the type of the parameter `type` is not a name anymore but
an id, therefore an `int` is expected otherwise it will fail with the
following error

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-04-30 08:15:18 +02:00
Sébastien Han 12eebc31fb mon/client: honor key mode when copying it to other nodes
The last mon creates the keys with a particular mode, while copying them
to the other mons (first and second) we must re-use the mode that was
set.

The same applies for the client node, the slurp preserves the initial
'item' so we can get the mode for the copy.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-04-23 18:34:58 +02:00
Sébastien Han 74494253fa mon: remove redundant copy task
We had twice the same task, also one was overriding the mode.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-04-23 18:34:58 +02:00
Sébastien Han 85732d11b9 mon/client: remove acl code
Applying ACL on the keyrings is not used anymore so let's remove this
code.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-04-23 18:34:58 +02:00
Sébastien Han cfe8e51d99 mon/client: apply mode from ceph_key
Do not use a dedicated task for this but use the ceph_key module
capability to set file mode.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-04-23 18:34:58 +02:00
Di Xu 113eb25424 add AArch64 to supported architecture
works on AArch64 platform
2018-04-23 10:23:21 +02:00
Sébastien Han 949507d304 mon: remove mgr key from ceph_config_keys
This key is created after the last mon is up so there is no need to try
to push it from the first mon. The initia mon container is not creating
the mgr key, ansible does. So this key will never exist.
The key will go into the fetch dir once the last mon is up, then when
the ceph-mgr plays it will try to get it from the fetch directory.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-04-23 10:17:24 +02:00
Sébastien Han 35c1eb7183 mon: remove mon map from ceph_config_keys
During the initial bootstrap of the first mon, the monmap file is
destroyed so it's not available and ansible will never find it.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-04-23 10:17:24 +02:00
Sébastien Han 65ba85aff6 Expose /var/run/ceph
Useful for softwares that do data collection/monitoring like collectd.
They can connect to the socket and then retrieve information.

Even though the sockets are exposed now, I'm keeping the docker exec to
check the socket, this will allow newer version of ceph-ansible to work
with older versions.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1563280
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-04-20 15:48:32 +02:00
Sébastien Han bf1e70e8cf default: extent ceph_uid and gid
We now have the ability to detect the uid/gid of the ceph user depending
on the distribution we are running on and so we are doing non-container
deployements.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-04-20 15:48:32 +02:00
Sébastien Han f3656ad167 move create ceph initial directories to default
This is needed for both non-container and container deployments.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-04-20 15:48:32 +02:00
Sébastien Han 641f141c0f selinux: remove chcon calls
We know bindmount with the :z option at the end of the -v command so
this will basically run the exact same command as we used to run. So to
speak:

chcon -Rt svirt_sandbox_file_t /var/lib/ceph

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-04-19 14:59:37 +02:00
Sébastien Han 90e47c5fb0 client: add a --rm option to run the container
This fixes the case where the playbook died and never removed the
container. So now, once the container exits it will remove itself from
the container list.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1568157
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-04-19 14:59:37 +02:00
Sébastien Han 6c742376fd client: import the key in ceph is copy_admin_key is true
If the user has set copy_admin_key to true we assume he/she wants to
import the key in Ceph and not only create the key on the filesystem.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-04-18 17:46:54 +02:00
Sébastien Han 424815501a client: add quotes to the dict values
ceph-authtool does not support raw arguements so we have to quote caps
declaration like this allow 'bla bla' instead of allow bla bla

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1568157
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-04-18 17:46:54 +02:00
Sébastien Han d2a2793cb0 refactor the way we copy keys
This commit does a couple of things:

* use a common.yml file that contains things that can be played on both
container and non-container

* refactor the ability to copy the admin key to the nodes

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-04-18 16:46:33 +02:00
Randy J. Martinez 127a643fd0 ceph-defaults: fix ceph_uid fact on container deployments
Red Hat is now using tags[3,latest] for image rhceph/rhceph-3-rhel7.
Because of this, the ceph_uid conditional passes for Debian
when 'ceph_docker_image_tag: latest' on RH deployments.
I've added an additional task to check for rhceph image specifically,
and also updated the RH family task for ceph/daemon [centos|fedora]tags.

Signed-off-by: Randy J. Martinez <ramartin@redhat.com>
2018-04-17 16:54:51 +02:00
Sébastien Han a98885a71e rhcs: re-add apt-pining
When installing rhcs on Debian systems the red hat repos must have the
highest priority so we avoid packages conflicts and install the rhcs
version.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1565850
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-04-17 16:07:06 +02:00
Guillaume Abrioux 899b0eb451 defaults: check only 1 time if there is a running cluster
There is no need to check for a running cluster n*nodes time in
`ceph-defaults` so let's add a `run_once: true` to save some resources
and time.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-04-16 11:23:00 +02:00
Sébastien Han 5bbbce527e osd: do not do anything if the dev has a partition
Regardless if the partition is 'ceph' or something else, we don't want
to be as strick as checking for a particular partition.
If the drive has a partition, we just don't do anything.

This solves the case where the server reboots, disks get a different
/dev/sda (node) allocation. In this case, prior to restarting the server
/dev/sda was an OSD, but now it's /dev/sdb and the other way around.
In such scenario, we will try to prepare the OSD and create a new
partition, so let's not mess around with devices that have partitions.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1498303
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-04-13 19:11:15 +02:00
Sébastien Han 37117071eb common: add tools repo for iscsi gw
To install iscsi gw packages we need to enable the tools repo.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1547849
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-04-12 13:38:34 +02:00
Douglas Fuller c8573fe0d7 Remove deprecated allow_multimds
allow_multimds will be officially deprecated in Mimic, specify it
only for all versions of Ceph where it was declared stable. Going
forward, specify only max_mds.

Signed-off-by: Douglas Fuller <dfuller@redhat.com>
2018-04-12 10:29:17 +02:00
vasishta p shastry 020e66c1b4 Fixed a typo (extra space) 2018-04-11 14:21:15 +02:00
vasishta p shastry e1a1f81b6f osd: to support copy_admin_key 2018-04-11 14:21:15 +02:00
vasishta p shastry db3a5ce6d9 mds: to support copy_admin_keyring 2018-04-11 14:21:15 +02:00
vasishta p shastry 6b59416f75 nfs: to support copy_admin_key - containerized 2018-04-11 14:21:15 +02:00
Ali Maredia 01c58695fc nfs: ensure nfs-server server is stopped
NFS-ganesha cannot start is the nfs-server service
is running. This commit stops nfs-server in case it
is running on a (debian, redhat, suse) node before
the nfs-ganesha service starts up

fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1508506

Signed-off-by: Ali Maredia <amaredia@redhat.com>
2018-04-11 14:00:48 +02:00
Ramana Raja 4a430ae29a ceph-nfs: allow disabling ganesha caching
Add a variable, ceph_nfs_disable_caching, that if set to true
disables ganesha's directory and attribute caching as much as
possible.

Also, disable caching done by ganesha, when 'nfs_file_gw'
variable is true, i.e., when Ganesha is used as CephFS's gateway.
This is the recommended Ganesha setting as libcephfs already caches
information. And doing so helps avoid cache incoherency issues
especially with clustered ganesha over CephFS.

Fixes: https://tracker.ceph.com/issues/23393

Signed-off-by: Ramana Raja <rraja@redhat.com>
2018-04-11 13:56:40 +02:00
Sébastien Han 82ccbdafbc ceph-defaults: bring backward compatibility for old syntax
If people keep on using the mon_cap, osd_cap etc the playbook will
translate this old syntax on the flight.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-04-11 12:18:34 +02:00
Sébastien Han 9657e4d6fa ceph_key: use ceph_key in the playbook
Replaced all the occurence of raw command using the 'command' module
with the ceph_key module instead.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-04-11 12:18:34 +02:00
Guillaume Abrioux 66c4118dcd defaults: fix backward compatibility
backward compatibility with `ceph_mon_docker_interface` and
`ceph_mon_docker_subnet` was not working since there wasn't lookup on
`monitor_interface` and `public_network`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-04-10 00:19:11 +02:00
Ken Dreyer 3752cc6f38 common: upgrade/install ceph-test RPM first
Prior to this change, if a user had ceph-test-12.2.1 installed, and
upgraded to ceph v12.2.3 or newer, the RPM upgrade process would
fail.

The problem is that the ceph-test RPM did not depend on an exact version
of ceph-common until v12.2.3.

In Ceph v12.2.3, ceph-{osdomap,kvstore,monstore}-tool binaries moved
from ceph-test into ceph-base. When ceph-test is not yet up-to-date, Yum
encounters package conflicts between the older ceph-test and newer
ceph-base.

When all users have upgraded beyond Ceph < 12.2.3, this is no longer
relevant.
2018-04-09 18:09:52 +02:00
Sébastien Han bb60f2fea4 ceph-defaults: fix ceoh_uid for container image tag latest
According to our recent change, we now use "CentOS" as a latest
container image. We need to reflect this on the ceph_uid.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-04-09 13:54:55 +02:00
Zack Cerza 0123d790cd Use the CentOS repo for Red Hat dev packages
No use even trying to use something that doesn't exist.

Signed-off-by: Zack Cerza <zack@redhat.com>
2018-04-09 10:05:57 +02:00
Attila Fazekas ecd3563c21 Deploying without managed monitors failed
Tripleo deployment failed when the monitors not manged
by tripleo itself with:
    FAILED! => {"msg": "list object has no element 0"}

The failing play item was introduced by
 f46217b69a .

fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1552327

Signed-off-by: Attila Fazekas <afazekas@redhat.com>
2018-04-04 18:16:46 +02:00