Problem: too many different commands to do the same thing. The 'cut'
command on infrastructure-playbooks/purge-cluster.yml was also wrong.
This sed command from osixia in ceph-docker
https://github.com/ceph/ceph-docker/pull/580/ addresses all the
scenarios.
Signed-off-by: Sébastien Han <seb@redhat.com>
Doing so will override any values set for these in the group_vars
directory relative to the users inventory.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
Doing so at playbook level overrides whatever values might be set for
these in the user's group_vars directory that's relative to their
inventory.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
This has the behavior of overriding custom values set in group_vars.
I've added defaults to the rest of the group names so that if they are
not overridden in group_vars then defaults will be used.
See: https://bugzilla.redhat.com/show_bug.cgi?id=1354700
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
When ansible do not load the file host_vars/{{ ansible_hostname }}.yml and host_vars/default.yml it will show syntactic, so keyword "skip" to fix it.
Exit the playbook if the user not define devices in both host_vars/{{ ansible_hostname }}.yml and host_vars/default.yml
When ansible do not load the file host_vars/{{ ansible_hostname }}.yml and host_vars/default.yml it will show syntactic err, so add keyword "skip" to fix it.
Exit the playbook if the user not define devices in both host_vars/{{ ansible_hostname }}.yml and host_vars/default.yml
host_vars/default.yml
load device partition file in directory host_vars
1) if the user define host_vars/hostname.yml load the devices partition on this file.
2) otherwise load host_vars/default.yml for default
The task waiting for the monitor to join the quorum... , the result for ceph -s | grep monmap only contain monmap, not included quorum:
# ceph -s --cluster ceph | grep monmap
monmap e1: 3 mons at {sh-office-ceph-1=10.12.10.34:6789/0,sh-office-ceph-2=10.12.10.35:6789/0,sh-office-ceph-3=10.12.10.36:6789/0}
If want to get monitor, should use this:
# ceph -s --cluster ceph | grep election
election epoch 80, quorum 0,1 sh-office-ceph-1,sh-office-ceph-2
ceph verison: 10.2.5
We now run the container and waits until it dies. Prior to this we were
stopping it before completion so not all the devices where zapped.
Signed-off-by: Sébastien Han <seb@redhat.com>
Some playbooks use [0-9]*, others use \d+$
The latter is more correct since cluster name may contain numbers.
Signed-off-by: Shengjing Zhu <zsj950618@gmail.com>
This is to allow for ceph-installer usage of this playbook and
to ensure that you have the correct keys locally when bootstrapping.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
We changed the way we declare image.
Prior to this patch we must have a "user/image:tag"
format, which is incompatible with non docker-hub registry where you
usually don't have a "user". On the docker hub a "user" is also
identified as a namespace, so for Ceph the user was "ceph".
Variables have been simplified with only:
* ceph_docker_image
* ceph_docker_image_tag
1. For docker hub images: ceph_docker_name: "ceph/daemon" will give
you the 'daemon' image of the 'ceph' user.
2. For non docker hub images: ceph_docker_name: "daemon" will simply
give you the "daemon" image.
Infrastructure playbooks have been modified as well.
The file group_vars/all.docker.yml.sample has been removed as well.
It is hard to maintain since we have to generate it manually. If
you want to configure specific variables for a specific daemon simply
edit group_vars/$DAEMON.yml
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1420207
Signed-off-by: Sébastien Han <seb@redhat.com>
Including variables from role defaults or files in a group_vars
directory relative to the playbook is a bad practice. We don't want to
do this because including these defaults at the task level overrides
values that would be set in a group_vars directory relative to the
inventory file, which is the correct usage if you wish to override
those default values.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
According to #1216, we need to simply the code by removing the
support of anything before Jewel.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Doing this cause some all the daemons to go down at the same time. In a
scenario where we colocate a monitor and an osd, this osds will take
some time to go down which will make the 'umount' task fail.
Signed-off-by: Sébastien Han <seb@redhat.com>
On systems running docker there is an issue with lxfs that results in
the find command returning 1 but actually did the job.
e.g: on a system with docker runnning find /var will give us the
following error:
find:
'/var/lib/lxcfs/cgroup/devices/lxc/x1/system.slice/systemd-update-utmp.service/devices.deny':
Permission denied
find:
'/var/lib/lxcfs/cgroup/devices/lxc/x1/system.slice/dev-random.mount/devices.allow':
Permission denied
...
...
However ceph files got deleted so we ignore the error.
Signed-off-by: Sébastien Han <seb@redhat.com>
We now rely on the cli tool ceph-detect-init which will tell us the init
system in used on the distribution. We do this instead of the previous
lookup for systemd unit files to call the right task depending on the
init system.
Signed-off-by: Sébastien Han <seb@redhat.com>
with_items is evaluated before the when so in a second run where the
variable is empty if will fail with "'dict object' has no attribute
'stdout_lines'". To fix this we had a default array so with_items does
not fail and the task is skipped with the when.
Signed-off-by: Sébastien Han <seb@redhat.com>
Because the purge-cluster.yml playbook does not have access to the roles
default vars then we can be sure that raw_multi_journal is defined. For
example, if this was purging a dmcrypt journal then raw_multi_journal
might not be defined at all in group_vars/all.yml or
group_vars/osds.yml.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
This playbook was still referencing the old version of the
ceph_*_docker_extra_env but only for Ceph MONs and Ceph NFS. This
playbook was not kept up-to-date when updating the
ceph_*_docker_extra_env variables to add the '-e' option to docker.
That's because the addition of '-e' breaks this playbook as it requires
a comma separated list of variables for the 'env:' docker module
parameter. Therefore this change just makes the playbook consistently
broken by referencing the same variable throughout.
When running encrypted OSDs, an encrypted device mapper is used (because
created by the crypsetup tool). So before attempting to remove all the
partitions on a device we must delete all the encrypted device mappers,
then we can delete all the partitions.
Signed-off-by: Sébastien Han <seb@redhat.com>
Please enter the commit message for your changes. Lines starting
The name of this variable was a bit confusing since its activation will
zap all the block devices no matter which osd scenario we are using.
Removing this variable and applying a condition on the OSD scenario is
now feasible and easier since we import group_vars variable files for
OSDs.
Signed-off-by: Sébastien Han <seb@redhat.com>
When purging OSDs we do not need to include these defaults as nothing in
the following tasks uses them. Also, it has the side effect of
overwriting any variables defined in group_vars files that are relative
to the inventory you are using with the default values. That behavior
was causing the CI tests to fail.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
In my testing zapping the osd disks deleted the journal
partitions, making the 'zap ceph journal partitions' task fail because
the partitions it found previously do not exist anymore.
This moves the task that finds the journal partitions after 'zap osd disks'
to catch any partitions ceph-disk might have missed.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
Using failed_when will still throw an exception and stop the playbook if
the file you're trying to include doesn't exist.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
the libcephfs version was bumped to 2, so we need to check for that as
well when we're removing all ceph packages
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Prior to this patch we were just looking for any *.conf file which
sometimes could results in multiple matches. The new command looks for a
.conf file that must contain [global] and 'fsid' patterns. This will
definitely get us the ceph.conf file. We can not directly use ceph.conf
because of a different cluster name.
Signed-off-by: Sébastien Han <seb@redhat.com>
* ignore yml files in general
* refactor based on commit f8e043b6ea5ac4e886532d4f2f675c507b44b955 that
changed directory layouts
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit ec5c6f5da566611c4e0b88f925cbd26dc90368d6)
Prior to this commit the serial variable was poorly documented. Now we
are making clear that this value should be left untouched as the rolling
update mechanism should happen serially.
Solves: bz-1396742
Signed-off-by: Sébastien Han <seb@redhat.com>
libfcgi is dead upstream (http://tracker.ceph.com/issues/16784)
The RGW developers intend to remove libfcgi support entirely before the
Luminous release.
Since libfcgi gets little-to-no developer attention or testing, remove
it entirely from ceph-ansible.
- Update rolling update playbook to support containerized deployments
for mons, osds, mdss, and rgws
- Skip checking if existing cluster is running when performing a rolling
update
- Fixed bug where we were failing to start the mds container because it
was missing the admin keyring. The admin keyring was missing because
it was not being pushed from the mon host to the ansible host due to
the keyring not being available before running the copy_configs.yml
task include file. Now we forcefully wait for the admin keyring to be
generated before continuing with the copy_configs.yml task include file
- Skip pre_requisite.yml when running on atomic host. This technically
no longer requires specifying to skip tasks containing the with_pkg tag
- Add missing variables to all.docker.sample
- Misc. cleanup
Signed-off-by: Ivan Font <ifont@redhat.com>
My stupid self removed this crucial variable here: 217ce3ca thinking it
was another hard coded variable import where this is actually the
trigger for the upgrade.
Closes: #1071
Signed-off-by: Sébastien Han <seb@redhat.com>
- Updates to allow running infrastructure-playbooks both from within its
directory or root directory of ceph-ansible.
Signed-off-by: Ivan Font <ifont@redhat.com>
- Separated out one large playbook into multiple playbooks to run
host-type by host-type i.e. mdss, rgws, rbdmirrors, nfss, osds, mons.
- Combined common tasks into one shared task for all hosts where
applicable
- Fixed various bugs
Signed-off-by: Ivan Font <ivan.font@redhat.com>
Prior to this change we were purging all the partitions on the device
when using the raw_journal_devices scenario.
This was breaking deployments where other partitions are used for other
purposes (ie: OS system).
Signed-off-by: Sébastien Han <seb@redhat.com>
This commit introduces the ability to configure delays and retries for
cluster health checks, for both monitors and OSDs.
Signed-off-by: Sébastien Han <seb@redhat.com>
Users have reported this task to hang. Since this command is not
required to perform the upgrade, we remove it.
Signed-off-by: Sébastien Han <seb@redhat.com>
- Add all relevant group_vars files in containerized purge cluster
playbook and ignore errors if file may not exist.
- Also fixing indentation issues.
Signed-off-by: Ivan Font <ivan.font@redhat.com>
Since we have a couple of infrastructure related playbooks
(additionnally to the roles we are using to deploy Ceph), it makes sense
to have them located in a separate directory.
Signed-off-by: Sébastien Han <seb@redhat.com>