In order to ensure there is no leftover after having purged a cluster,
we must wipe all partitions properly.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1492242
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
there is some leftover on devices when purging osds because of a invalid
device list construction.
typical error:
```
changed: [osd3] => (item=/dev/sda sda1) => {
"changed": true,
"cmd": "# if the disk passed is a raw device AND the boot system disk\n if parted -s \"/dev/sda sda1\" print | grep -sq boot; then\n echo \"Looks like /dev/sda sda1 has a boot partition,\"\n echo \"if you want to delete specific partitions point to the partition instead of the raw device\"\n echo \"Do not use your system disk!\"\n exit 1\n fi\n echo sgdisk -Z \"/dev/sda sda1\"\n echo dd if=/dev/zero of=\"/dev/sda sda1\" bs=1M count=200\n echo udevadm settle --timeout=600",
"delta": "0:00:00.015188",
"end": "2018-05-16 12:41:40.408597",
"item": "/dev/sda sda1",
"rc": 0,
"start": "2018-05-16 12:41:40.393409"
}
STDOUT:
sgdisk -Z /dev/sda sda1
dd if=/dev/zero of=/dev/sda sda1 bs=1M count=200
udevadm settle --timeout=600
STDERR:
Error: Could not stat device /dev/sda sda1 - No such file or directory.
```
the devices list in the task `resolve parent device` isn't built
properly because the command used to resolve the parent device doesn't
return the expected output
eg:
```
changed: [osd3] => (item=/dev/sda1) => {
"changed": true,
"cmd": "echo /dev/$(lsblk -no pkname \"/dev/sda1\")",
"delta": "0:00:00.013634",
"end": "2018-05-16 12:41:09.068166",
"item": "/dev/sda1",
"rc": 0,
"start": "2018-05-16 12:41:09.054532"
}
STDOUT:
/dev/sda sda1
```
For instance, it will result with a devices list like:
`['/dev/sda sda1', '/dev/sdb', '/dev/sdc sdc1']`
where we expect to have:
`['/dev/sda', '/dev/sdb', '/dev/sdc']`
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1492242
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
During a minor update from a jewel to a higher jewel version (10.2.9 to
10.2.10 for example) osd flags don't get applied because they were done
in the mgr section which is skipped in jewel since this daemons does not
exist.
Moving the set flag section after all the mons have been updated solves
that problem.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1548071
Co-authored-by: Tomas Petr <tpetr@redhat.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
Add a new "make tag" command. This automates some common operations:
1) Automatically determine the next Git tag version number to create.
For example:
"3.2.0beta1 -> "3.2.0beta2"
"3.2.0rc1 -> "3.2.0rc2"
"3.2.0" -> "3.2.1"
2) Create the Git tag, and print instructions for the user to push it to
GitHub.
3) Sanity check that HEAD is a stable-* branch or master (bail on
everything else).
4) Sanity check that HEAD is not already tagged.
Note, we will still need to tag manually once each time we change the
format, for example when moving from tagging "betas" to tagging "rcs",
or "rcs" to "stable point releases".
Signed-off-by: Ken Dreyer <kdreyer@redhat.com>
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
When pushing a PR it might be handy to set the [skip ci] flag if we know
upfront the content should not trigger the CI.
Now you can add [skip ci] as $4 in your command line.
Signed-off-by: Sébastien Han <seb@redhat.com>
To make the package installation more efficient we should install
packages as a list rather than as individual tasks or using a
"with_items" loop. The package managers can handle a list passed to them
to install in one go.
We can use a specified list and substitute any packages that are not to
be installed with the ceph-common package, which is installed on every
package install, then apply the unique filter to the package install
list.
There is no need to stat for created mgr keyrings since they are created
anyway when deploying a ceph cluster > jewel. In case of a jewel
deployment we won't enter that block.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This file is a leftover from PR ceph/ceph-ansible#2516
It is not used anymore so it can be removed.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
the role `ceph-mgr` that is played later in the playbook fails because
the destination path for the fetched keys is wrong.
This patch fix the destination path used in the task `fetch ceph mgr
key(s)` so there is no mismatch.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1574995
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
{{ fsid }} points to {{ cluster_uuid.stdout }} which is not defined in
this part of the rolling_update playbook.
Since we need to call {{ fsid }} we must get the fsid and register it to
`cluster_uuid`.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Until all the mons haven't been updated to Luminous, there is no way to
create a key. So we should do the key creation in the mon role only if
we are not part of an update.
If we are then the key creation is done after the mons upgrade to
Luminous.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1574995
Signed-off-by: Sébastien Han <seb@redhat.com>
trying to mask target when `/etc/systemd/system/target.service` doesn't
exist seems to be a bug.
There is no need to mask a unit file which doesn't exist.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The order of fs.aio-max-nr (which is hard-coded to 1048576) means that
if you set fs.aio-max-nr in os_tuning_params it will effectively be
ignored for bluestore scenarios.
To resolve this we should move the setting of fs.aio-max-nr above the
setting of os_tuning_params, in this way the operator can define the
value of fs.aio-max-nr to be something other than 1048576 if they want
to.
Additionally, we can make the sysctl settings happen in 1 task rather
than multiple.
Prior to this change, if we created entirely new Git tags patterns like
"3.2.0alpha" or "3.2.0foobar", the Makefile would incorrectly translate
the Git tag name into a Name-Version-Release that would prevent upgrades
to "newer" versions.
This happened for example in
https://bugs.centos.org/view.php?id=14593, "Incorrect naming scheme for
a build of ceph-ansible prevents subsequent updates to be installed"
If we encounter a new Git tag format that we cannot parse,
pessimistically bail out early instead of trying to build an RPM.
The purpose of this safeguard is to prevent Jenkins from building RPMs
that cannot be easily upgraded.
trying to set the default value for pg_num to
`hostvars[groups[mon_group_name][0]]['osd_pool_default_pg_num'])` will
break in case of external client nodes deployment.
the `pg_num` attribute should be mandatory and be tested in future
`ceph-validate` role.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Since we now do backports on stable-3.0 and stable-3.1 we have to use
the name of the stable branch in the backport branch name. If we don't
do this we will end up with conflicting branch names.
Signed-off-by: Sébastien Han <seb@redhat.com>
On containerized deployment,
when upgrading from jewel to luminous, mgr keyring creation fails because the
command to create mgr keyring is executed on a container that is still
running jewel since the container is restarted later to run the new
image, therefore, it fails with bad entity error.
To get around this situation, we can delegate the command to create
these keyrings on the first monitor when we are running the playbook on the last monitor.
That way we ensure we will issue the command on a container that has
been well restarted with the new image.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1574995
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The Debian and SuSE installs for nfs-ganesha on the non-rhcs repository
requires you to allow_unauthenticated for Debian, and disable_gpg_check
for SuSE. The nfs-ganesha-rgw package already does this, but the
nfs-ganesha-ceph package will fail to install because of this same
issue.
This PR moves the installations to happen when the appropriate flags are
set to True (nfs_obj_gw & nfs_file_gw), but does it per distro (one for
SuSE and one for Debian) so that the appropriate flag can be passed to
ignore the GPG check.
there is no need to gather facts with O(N^2) way.
Only one node should gather facts from other node.
Fixes: #2553
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
When 'ceph_nfs_disable_caching' is set to True, disable attribute
caching done by Ganesha for all Ganesha exports.
Signed-off-by: Ramana Raja <rraja@redhat.com>
If we are in a middle of an update we want to get the new package
version being installed so the task that copies the repo files should
not be skipped.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1572032
Signed-off-by: Sébastien Han <seb@redhat.com>
The apt-cache update can fail due to transient issues related to the
action being a network operation. To reduce the impact of these
transient failures this patch adds a retry to the update_cache task.
However, the apt_repository tasks which would perform an apt_update
won't retry the apt_update on a failure in the same way, as such this PR
moves the apt_update into an individual task, once per role.
Finally, the apt_repository tasks no longer have a changed_when: false,
and the apt_cache update is only performed once per role, if the
repositories change. Otherwise the cache is updated on the "apt" install
tasks if the cache_timeout has been reached.
the value in `docker_exec_client_cmd` doesn't allow to check for
existing pools because it's set with a wrong value for the entrypoint
that is going to be used.
It means the check were going to fail anyway even if pools actually exist.
Using jinja syntax to set `docker_exec_cmd` allows to handle the case
where you don't have monitors in your inventory.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
If openstack_pools contains an application key it will be used to apply
this application pool type to a pool.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1562220
Signed-off-by: Sébastien Han <seb@redhat.com>
As of ceph 12.2.5 the type of the parameter `type` is not a name anymore but
an id, therefore an `int` is expected otherwise it will fail with the
following error
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
In addition to b324c17 this commit fix the ceph uid for osd role in the
switch from non containerized to containerized playbook.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
If we don't do this, umounting devices declared like this
/dev/disk/by-id/ata-QEMU_HARDDISK_QM00001
will fail like:
umount: /dev/disk/by-id/ata-QEMU_HARDDISK_QM000011: mountpoint not found
Since we append '1' (partition 1), this won't work.
So we need to resolved the link to get something like /dev/sdb and then
append 1 to /dev/sdb1
Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
The last mon creates the keys with a particular mode, while copying them
to the other mons (first and second) we must re-use the mode that was
set.
The same applies for the client node, the slurp preserves the initial
'item' so we can get the mode for the copy.
Signed-off-by: Sébastien Han <seb@redhat.com>
You can now create keys and set file mode on them. Use the 'mode'
parameter for that, mode must be in octal so 0644.
Signed-off-by: Sébastien Han <seb@redhat.com>
This key is created after the last mon is up so there is no need to try
to push it from the first mon. The initia mon container is not creating
the mgr key, ansible does. So this key will never exist.
The key will go into the fetch dir once the last mon is up, then when
the ceph-mgr plays it will try to get it from the fetch directory.
Signed-off-by: Sébastien Han <seb@redhat.com>
During the initial bootstrap of the first mon, the monmap file is
destroyed so it's not available and ansible will never find it.
Signed-off-by: Sébastien Han <seb@redhat.com>
Useful for softwares that do data collection/monitoring like collectd.
They can connect to the socket and then retrieve information.
Even though the sockets are exposed now, I'm keeping the docker exec to
check the socket, this will allow newer version of ceph-ansible to work
with older versions.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1563280
Signed-off-by: Sébastien Han <seb@redhat.com>
We now have the ability to detect the uid/gid of the ceph user depending
on the distribution we are running on and so we are doing non-container
deployements.
Signed-off-by: Sébastien Han <seb@redhat.com>