The dummy client container currently wont work on non-x86_64 hosts.
This PR creates a filtered client group that contains only hosts
that are x86_64 - which can then be the group to run the
dummy container against.
This is for the specific case of a containerized_deployment where
there is a mixture of non-x86_64 hosts and x86_64 hosts. As such
the filtered group will contain all hosts when running with
containerized_deployment: false.
Currently ppc64le is not supported for Ceph server components.
Signed-off-by: Andy McCrae <andy.mccrae@gmail.com>
We have been supporting multiple devices for journalin containerized
deployments for a while now and forgot about this.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1622393
Signed-off-by: Sébastien Han <seb@redhat.com>
Add a message for when PV creation fails.
This message alerts users that FS/GPT/RAID
signatures could still on the device and the
reason for the failures.
`wipefs -a $device` needs to be run to fix this issue.
Signed-off-by: Ali Maredia <amaredia@redhat.com>
Same problem again... ceph_release_num[ceph_release] is only set in
ceph-docker-common/common roles so putting the condition on that role
will never work. Removing the condition.
The downside of this is we will be installing packages and then skip the
role on the node.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1622210
Signed-off-by: Sébastien Han <seb@redhat.com>
If we play site-docker.yml, we are already in a
containerized_deployment. So the condition is not needed.
Signed-off-by: Sébastien Han <seb@redhat.com>
There is no point of using hosts running on atomic AND centos hosts. So
let's run containerized scenarios on Atomic only.
This solves this error here:
```
fatal: [client2]: FAILED! => {
"failed": true
}
MSG:
The conditional check 'ceph_current_status.rc == 0' failed. The error was: error while evaluating conditional (ceph_current_status.rc == 0): 'dict object' has no attribute 'rc'
The error appears to have been in '/home/jenkins-build/build/workspace/ceph-ansible-nightly-luminous-stable-3.1-ooo_collocation/roles/ceph-defaults/tasks/facts.yml': line 74, column 3, but may
be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
- name: set_fact ceph_current_status (convert to json)
^ here
```
From https://2.jenkins.ceph.com/view/ceph-ansible-stable3.1/job/ceph-ansible-nightly-luminous-stable-3.1-ooo_collocation/37/consoleFull#1765217701b5dd38fa-a56e-4233-a5ca-584604e56e3a
What's happening here is all the hosts excepts the clients are running atomic, so here: https://github.com/ceph/ceph-ansible/blob/master/site-docker.yml.sample#L62
The condition will skipped all the nodes excepts the clients, thus when running ceph-default, the task "is ceph running already?" is skipped but the task above needs the rc of the skipped task.
This is not an error from the playbook, it's a CI setup issue.
Signed-off-by: Sébastien Han <seb@redhat.com>
A couple if things were wrong in the initial commit:
* ceph_release_num[ceph_release] >= ceph_release_num['luminous'] will
never work since the ceph_release fact is set in the roles after. So
either ceph-common or ceph-docker-common set it
* we can easily re-use the initial command to check if a cluster is
running, it's more elegant than running it twice.
* set the fact rgw_hostname on rgw nodes only
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1618678
Signed-off-by: Sébastien Han <seb@redhat.com>
This file is not needed, if you want to generate a key you can run:
python -c "import os ; import struct ; import time; import base64 ; key
= os.urandom(16) ; header =
struct.pack('<hiih',1,int(time.time()),0,len(key)) ;
print(base64.b64encode(header + key).decode())"
Signed-off-by: Sébastien Han <seb@redhat.com>
The config_template plugin exists in the ceph-common role so that
config_template will still work with ansible galaxy.
This PR syncs the config_template module from the base of the repo in
plugins/actions to the ceph-common role.
Signed-off-by: Andy McCrae <andy.mccrae@gmail.com>
Running 'osd set sortbitwise' when we detect a version 12 of Ceph is
wrong. When OSD are getting updated, even though the package is updated
they won't send their updated version (12) and will stick with 10 if the
command is not applied. So we have to check if OSD are sending a version
10 and then run the command to unlock the OSDs.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1600943
Signed-off-by: Sébastien Han <seb@redhat.com>
Recently we renamed the group_name for iscsi iscsigws where previously
it was named iscsi-gws. Existing deployments with a host file section
with iscsi-gws must continue to work.
This commit adds the old group name as a backoward compatility, no error
from Ansible should be expected, if the hostgroup is not found nothing
is played.
Close: https://bugzilla.redhat.com/show_bug.cgi?id=1619167
Signed-off-by: Sébastien Han <seb@redhat.com>
We were using var_files long ago when default variables were not in
ceph-defaults, now the role exists this is not need. Moreover having
these two var files added:
- roles/ceph-defaults/defaults/main.yml
- group_vars/all.yml
Will create collision and override necessary variables.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1555305
Signed-off-by: Sébastien Han <seb@redhat.com>
If there are no services on the cluster, then the 'rgw' could be missing
and the task is failing with the following problem:
msg": "The task includes an option with an undefined variable.
The error was: 'dict object' has no attribute 'rgw'
We fix this by checking the existence of the 'rgw' attribute. If it's
missing, we skip the task since the role already contains code to set
a good default rgw_hostname.
Signed-off-by: Markos Chandras <mchandras@suse.de>
Since commit f422efb1d6 ("config: ensure
rgw section has the correct name") we observe the following failures in
new Ceph deployment with OpenStack-Ansible
fatal: [aio1_ceph-rgw_container-fc588f0a]: FAILED! => {"changed": false,
"cmd": "ceph --cluster ceph -s -f json", "msg": "[Errno 2] No such file
or directory"
This is because the task executes 'ceph' but at this point no package
installation has happened. Packages are normally installed in the
'ceph-common' role which runs after the 'ceph-defaults' one.
Since we are looking to obtain cluster information, the task should be
delegated to a monitor node similar to other tasks in that role
Signed-off-by: Markos Chandras <mchandras@suse.de>
Follow up on 36942af698
"disabled_modules" is always a list, it's the items in the list that
can be dicts in mimic. Many ways to fix this, here's one.
Signed-off-by: Dardo D Kleiner <dardokleiner@gmail.com>
The copy module does in fact do variable interpolation so we do not need
to use the template module or keep a template in the source.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
Using an explicitly named testing environment name allows us to have a
specific [testenv] block for this test. This greatly simplifies how it will
work as it doesn't really anything from the ceph cluster tests.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
If a user decides to to use the lv_vars.yml file then it should fail
silenty so that configuration can be picked up from other places.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
The copy module will not expand the template and render the variables
included, so we must use template.
Creating a temp file and using it locally means that you must run the
playbook with sudo privledges, which I don't think we want to require.
This introduces a logfile_path variable that the user can use to control
where the logfile is written to, defaulting to the cwd.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
These playbooks create and tear down logical
volumes for OSD data on HDDs and for a bucket index and
journals on 1 NVMe device.
Users should follow the guidelines set in var/lv_vars.yaml
After the lv-create.yml playbook is run, output is
sent to /tmp/logfile.txt for copy and paste into
osds.yml
Signed-off-by: Ali Maredia <amaredia@redhat.com>
Before running the upgrade, let's call systemd to collect unit names
instead of relaying on the device list. This is more accurate and fix
the osd_auto_discovery scenario too.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1613626
Signed-off-by: Sébastien Han <seb@redhat.com>
This reverts commit e84f11e99e.
This commit was giving a new failure later during the rolling_update
process. Basically, this was modifying the list of devices and started
impacting the ceph-osd itself. The modification to accomodate the
osd_auto_discovery parameter should happen outside of the ceph-osd.
Also we are trying to not play ceph-osd role during the rolling_update
process so we can speed up the upgrade.
Signed-off-by: Sébastien Han <seb@redhat.com>
In some case, use may mount a partition to /var/lib/ceph, and umount
it will be failure and no need to do so too.
Signed-off-by: Jeffrey Zhang <zhang.lei.fly@gmail.com>
fqdn configuration possibility caused a lot of trouble, it's adding a
lot of complexity because of multiple cases and the relation between
ceph-ansible and ceph-container. Moreover, there is no benefit for such
a feature.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1613155
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
the ceph.conf.j2 always assumes the hostname used to register the
radosgw in the servicemap is equivalent to `{{ ansible_hostname }}`
which returns the shortname form.
We need to detect which form of the hostname was used in case of already
deployed cluster and update the ceph.conf accordingly.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1580408
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
there is no need to have all these conditions.
for instance, assuming `mds_group_name` is set to 'mdss':
- `if groups[mds_group_name] is defined` checks if `'mdss'` is present in `{{ groups }}`
- `if {{ mds_group_name }} in group_names` checks if the current node is part
the group `'mdss'`
- `if inventory_hostname in groups.get(mds_group_name, [])` checks if
the current node is part of the group 'mdss'
The third condition is enough to cover the need of ensuring we are
running on a mds node.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
If calamari is already installed and ceph has been upgraded to a higher
version the initialisation will fail later. So if we detect the
calamari-server is too old compare to ceph_rhcs_version we try to update
it.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1601755
Signed-off-by: Sébastien Han <seb@redhat.com>
rolling_update relies on the list of devices when performing the restart
of the OSDs. The task that is builind the devices list out of the
ansible_devices dict only runs when there are no partitions on the
drives. However during an upgrade the OSD are already configured, they
have been prepared and have partitions so this task won't run and thus
the devices list will be empty, skipping the restart during
rolling_update. We now run the same task under different requirements
when rolling_update is true and build a list when:
* osd_auto_discovery is true
* rolling_update is true
* ansible_devices exists
* no dm/lv are part of the discovery
* the device is not removable
* the device has more than 1 sector
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1613626
Signed-off-by: Sébastien Han <seb@redhat.com>