combining `run_once: true` with `inventory_hostname ==
groups.get(client_group_name) | first` might cause bug when the only
node being run is not the first in the group.
In a deployment with a single client node it might cause issue because
sometimes keyring won't be created since the task could be definitively
skipped.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1588093
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 090ecff94e)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Potential error if someone doesnt pass the mode in `keys` dict for
client nodes:
```
fatal: [client2]: FAILED! => {}
MSG:
The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'mode'
The error appears to have been in '/home/guits/ceph-ansible/roles/ceph-client/tasks/create_users_keys.yml': line 117, column 3, but may
be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
- name: get client cephx keys
^ here
exception type: <class 'ansible.errors.AnsibleUndefinedVariable'>
exception: 'dict object' has no attribute 'mode'
```
adding a default value will avoid the deployment failing for this.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8a653cacd5)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
the `docker_exec_cmd` fact set in client role when there is no monitor
in inventory is wrong, `ceph-client-{{ hostname }}` is never created so
it will fail anyway.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7b156deb67)
Signed-off-by: Sébastien Han <seb@redhat.com>
When configuring openstack, the created keyrings aren't copied over to
all monitors nodes.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1588093
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 433ecc7cbc)
Signed-off-by: Sébastien Han <seb@redhat.com>
Otherwise, with the removal of mds_allow_multimds, the default of 3 will be set
on every new FS.
Introduced by: c8573fe0d7
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1583020
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 91f9da530f)
Signed-off-by: Sébastien Han <seb@redhat.com>
Refact of 8704144e31
There is no need to have duplicated tasks for this. The rgw pools
creation should be delegated on a monitor node se we don't have to care
if the admin keyring is present on rgw node.
By the way, only one task is needed to create the pools, we just need to
use the `docker_exec_cmd` fact already defined in `ceph-defaults` to
achieve it.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1550281
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2cf06b515f)
Renamed to be consistent with the role (rgw) and have a meaningful name.
Signed-off-by: Jorge Tudela <jtudelag@redhat.com>
(cherry picked from commit 600e1e2c26)
Signed-off-by: Sébastien Han <seb@redhat.com>
ceph command has to be executed from one of the monitor containers
if not admin copy present in RGWs. Task has to be delegated then.
Adds test to check proper RGW pool creation for Docker container scenarios.
Signed-off-by: Jorge Tudela <jtudelag@redhat.com>
(cherry picked from commit 8704144e31)
Signed-off-by: Sébastien Han <seb@redhat.com>
We can simply reference the template name since it exists within the
role that we are calling. We don't need to check the ANSIBLE_ROLE_PATH
or playbooks directory for the file.
Signed-off-by: Lionel Sausin <ls@initiatives.fr>
The first 14.x tag has been cut so this needs to be added so that
version detection will still work on the master branch of ceph.
Fixes: https://github.com/ceph/ceph-ansible/issues/2671
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit c2423e2c48)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Since the openstack_config.yml has been moved to `ceph-osd` we must move
this `set_fact` in ceph-osd otherwise the tasks in
`openstack_config.yml` using `openstack_keys` will actually use the
defaults value from `ceph-defaults`.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1585139
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit aae37b44f5)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This is a follow up on #2628.
Even with the openstack pools creation moved later in the playbook,
there is still an issue because OSDs are not all UP when trying to
create pools.
Adding a task which checks for all OSDs to be UP with a `retries/until`
condition should definitively fix this issue.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1578086
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9d5265fe11)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
You can now use RGW_ZONE and RGW_ZONEGROUP on each rgw host from your
inventory and assign them a value. Once the rgw container starts it'll
pick the info and add itself to the right zone.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1551637
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 1c084efb3c)
Signed-off-by: Sébastien Han <seb@redhat.com>
It should have been backported from 29a9dff but for better clarity I
think it's better to create a new commit for this.
c68126d6 aims to not make `pgs` attribute mandatory for each element of
`cephfs_pools`. Therefore, we must remove the check in
`roles/ceph-mon/tasks/check_mandatory_vars.yml`.
This task has been removed by 29a9dff but I've chosen to not backport
this commit since it's part of a bunch of commits belonging to a PR
implementing `ceph-validate` role.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
When playing ceph-mds role, mon nodes have set a fact with the default
pg num for osd pools, we can simply default to this value for cephfs
pools (`cephfs_pools` variable).
At the moment the variable definition for `cephfs_pools` looks like:
```
cephfs_pools:
- { name: "{{ cephfs_data }}", pgs: "" }
- { name: "{{ cephfs_metadata }}", pgs: "" }
```
and we have a task in `ceph-validate` to ensure `pgs` has been set to a
valid value.
We could simply avoid this check by setting the default value of `pgs`
to `hostvars[groups[mon_group_name][0]]['osd_pool_default_pg_num']` and
let to users the possibility to override this value.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1581164
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c68126d6fd)
in `ceph-osd` there is no need to set `docker_exec_cmd` since the only
place where this fact is used is in `openstack_config.yml` which
delegate all docker command to a monitor node. It means we need the
`docker_exec_cmd` fact that has been set referring to `ceph-mon-*`
containers, this fact is already set earlier in `ceph-defaults`.
By the way, when collocating an OSD with a MON it fails because the container
`ceph-osd-{{ ansible_hostname }}` doesn't exist.
Removing this task will allow to collocate an OSD with a MON.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1584179
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 34e646e767)
When collocating mds on monitor node, the cephpfs will fail
because `docker_exec_cmd` is reset to `ceph-mds-monXX` which is
incorrect because we need to delegate the task on `ceph-mon-monXX`.
In addition, it wouldn't have worked since `ceph-mds-monXX` container
isn't started yet.
Moving the task earlier in the `ceph-mds` role will fix this issue.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1578086
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 608ea947a9)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
We're doing this so we can validate this in the ceph-validate role
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 1f15a81c48)
When deploying a large number of OSD nodes it can be an issue because the
protection check [1] won't pass since it tries to create pools before all
OSDs are active.
The idea here is to move cephfs pools creation in `ceph-mds` role.
[1] e59258943b/src/mon/OSDMonitor.cc (L5673)
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1578086
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3a0e168a76)
Signed-off-by: Sébastien Han <seb@redhat.com>
When deploying a large number of OSD nodes it can be an issue because the
protection check [1] won't pass since it tries to create pools before all
OSDs are active.
The idea here is to move openstack pools creation at the end of `ceph-osd` role.
[1] e59258943b/src/mon/OSDMonitor.cc (L5673)
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1578086
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 564a662baf)
Signed-off-by: Sébastien Han <seb@redhat.com>
The NSS PKI database is needed only if radosgw_keystone_ssl
is explicitly set to true, otherwise the SSL integration is
not enabled.
It is worth noting that the PKI support was removed from Keystone
starting from the Ocata release, so some code paths should be
changed anyway.
Also, remove radosgw_keystone, which is not useful anymore.
This variable was used until fcba2c801a.
Now profiles drives the setting of rgw keystone *.
Signed-off-by: Luigi Toscano <ltoscano@redhat.com>
(cherry picked from commit 43e96c1f98)
Signed-off-by: Sébastien Han <seb@redhat.com>
The LVM lvcreate fails if the disk already has a GPT header.
We create GPT header regardless of OSD scenario. The fix is to
skip header creation for lvm scenario.
fixes: https://github.com/ceph/ceph-ansible/issues/2592
Signed-off-by: Vishal Kanaujia <vishal.kanaujia@flipkart.com>
(cherry picked from commit ef5f52b1f3)
Signed-off-by: Sébastien Han <seb@redhat.com>
During a rolling update, OSDs are restarted twice currently. Once, by the
handler in roles/ceph-defaults/handlers/main.yml and a second time by tasks
in the rolling_update playbook. This change turns off restarts by the handler.
Further, the restart initiated by the rolling_update playbook is more
efficient as it restarts all the OSDs on a host as one operation and waits
for them to rejoin the cluster. The restart task in the handler restarts one
OSD at a time and waits for it to join the cluster.
(cherry picked from commit c7e269fcf5)
Signed-off-by: Sébastien Han <seb@redhat.com>
Extra space in systemctl list-units can cause restart_osd_daemon.sh to
fail
It looks like if you have more services enabled in the node space
between "loaded" and "active" get more space as compared to one space
given in command the command[1].
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1573317
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 2f43e9dab5)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Check whether a mgr module is supposed to be disabled before disabling
it and whether it is already enabled before enabling it.
Signed-off-by: Michael Vollman <michael.b.vollman@gmail.com>
(cherry picked from commit ed050bf3f6)
Signed-off-by: Sébastien Han <seb@redhat.com>
trying to set the default value for pg_num to
`hostvars[groups[mon_group_name][0]]['osd_pool_default_pg_num'])` will
break in case of external client nodes deployment.
the `pg_num` attribute should be mandatory and be tested in future
`ceph-validate` role.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f60b049ae5)
Signed-off-by: Sébastien Han <seb@redhat.com>
Until all the mons haven't been updated to Luminous, there is no way to
create a key. So we should do the key creation in the mon role only if
we are not part of an update.
If we are then the key creation is done after the mons upgrade to
Luminous.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1574995
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 52fc8a0385)
Signed-off-by: Sébastien Han <seb@redhat.com>
trying to mask target when `/etc/systemd/system/target.service` doesn't
exist seems to be a bug.
There is no need to mask a unit file which doesn't exist.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a145caf947)
The order of fs.aio-max-nr (which is hard-coded to 1048576) means that
if you set fs.aio-max-nr in os_tuning_params it will effectively be
ignored for bluestore scenarios.
To resolve this we should move the setting of fs.aio-max-nr above the
setting of os_tuning_params, in this way the operator can define the
value of fs.aio-max-nr to be something other than 1048576 if they want
to.
Additionally, we can make the sysctl settings happen in 1 task rather
than multiple.
(cherry picked from commit 08a2b58d39)
in tasks for os_family Red Hat we were missing this
fixes: bz1575859
Signed-off-by: Gregory Meno <gmeno@redhat.com>
(cherry picked from commit 26f6a65042)
Signed-off-by: Sébastien Han <seb@redhat.com>
On containerized deployment,
when upgrading from jewel to luminous, mgr keyring creation fails because the
command to create mgr keyring is executed on a container that is still
running jewel since the container is restarted later to run the new
image, therefore, it fails with bad entity error.
To get around this situation, we can delegate the command to create
these keyrings on the first monitor when we are running the playbook on the last monitor.
That way we ensure we will issue the command on a container that has
been well restarted with the new image.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1574995
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The Debian and SuSE installs for nfs-ganesha on the non-rhcs repository
requires you to allow_unauthenticated for Debian, and disable_gpg_check
for SuSE. The nfs-ganesha-rgw package already does this, but the
nfs-ganesha-ceph package will fail to install because of this same
issue.
This PR moves the installations to happen when the appropriate flags are
set to True (nfs_obj_gw & nfs_file_gw), but does it per distro (one for
SuSE and one for Debian) so that the appropriate flag can be passed to
ignore the GPG check.
When 'ceph_nfs_disable_caching' is set to True, disable attribute
caching done by Ganesha for all Ganesha exports.
Signed-off-by: Ramana Raja <rraja@redhat.com>
If we are in a middle of an update we want to get the new package
version being installed so the task that copies the repo files should
not be skipped.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1572032
Signed-off-by: Sébastien Han <seb@redhat.com>
The apt-cache update can fail due to transient issues related to the
action being a network operation. To reduce the impact of these
transient failures this patch adds a retry to the update_cache task.
However, the apt_repository tasks which would perform an apt_update
won't retry the apt_update on a failure in the same way, as such this PR
moves the apt_update into an individual task, once per role.
Finally, the apt_repository tasks no longer have a changed_when: false,
and the apt_cache update is only performed once per role, if the
repositories change. Otherwise the cache is updated on the "apt" install
tasks if the cache_timeout has been reached.
the value in `docker_exec_client_cmd` doesn't allow to check for
existing pools because it's set with a wrong value for the entrypoint
that is going to be used.
It means the check were going to fail anyway even if pools actually exist.
Using jinja syntax to set `docker_exec_cmd` allows to handle the case
where you don't have monitors in your inventory.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
If openstack_pools contains an application key it will be used to apply
this application pool type to a pool.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1562220
Signed-off-by: Sébastien Han <seb@redhat.com>
As of ceph 12.2.5 the type of the parameter `type` is not a name anymore but
an id, therefore an `int` is expected otherwise it will fail with the
following error
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The last mon creates the keys with a particular mode, while copying them
to the other mons (first and second) we must re-use the mode that was
set.
The same applies for the client node, the slurp preserves the initial
'item' so we can get the mode for the copy.
Signed-off-by: Sébastien Han <seb@redhat.com>
This key is created after the last mon is up so there is no need to try
to push it from the first mon. The initia mon container is not creating
the mgr key, ansible does. So this key will never exist.
The key will go into the fetch dir once the last mon is up, then when
the ceph-mgr plays it will try to get it from the fetch directory.
Signed-off-by: Sébastien Han <seb@redhat.com>