Backport
https://github.com/ceph/ceph-ansible/pull/2984
to stable 3.1.
From upstream commit:
commit 1164cdc002
Author: Guillaume Abrioux <gabrioux@redhat.com>
Date: Thu Aug 2 11:58:47 2018 +0200
iscsigw: install ceph-iscsi-cli package
installs the cli package but does not start and enable the
rbd-target-api daemon needed for gwcli to communicate with the igw
nodes. This just enables and starts it.
This fixes Red Hat BZ
https://bugzilla.redhat.com/show_bug.cgi?id=1613963.
Signed-off-by: Mike Christie <mchristi@redhat.com>
fqdn configuration possibility caused a lot of trouble, it's adding a
lot of complexity because of multiple cases and the relation between
ceph-ansible and ceph-container. Moreover, there is no benefit for such
a feature.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1613155
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
the ceph.conf.j2 always assumes the hostname used to register the
radosgw in the servicemap is equivalent to `{{ ansible_hostname }}`
which returns the shortname form.
We need to detect which form of the hostname was used in case of already
deployed cluster and update the ceph.conf accordingly.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1580408
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f422efb1d6)
let's align the name of that group in stable-3.1 with master branch.
Not having the same group name on different branches is confusing and
make some nightlies job failing in the CI.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
rolling_update relies on the list of devices when performing the restart
of the OSDs. The task that is builind the devices list out of the
ansible_devices dict only runs when there are no partitions on the
drives. However during an upgrade the OSD are already configured, they
have been prepared and have partitions so this task won't run and thus
the devices list will be empty, skipping the restart during
rolling_update. We now run the same task under different requirements
when rolling_update is true and build a list when:
* osd_auto_discovery is true
* rolling_update is true
* ansible_devices exists
* no dm/lv are part of the discovery
* the device is not removable
* the device has more than 1 sector
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1613626
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit e84f11e99e)
If calamari is already installed and ceph has been upgraded to a higher
version the initialisation will fail later. So if we detect the
calamari-server is too old compare to ceph_rhcs_version we try to update
it.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1601755
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 4c9e24a90f)
The include does not need a condition on containerized_deployment since
we are already in an include than has the same condition.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 5a89479abe)
copy_configs.yml was not including and is a leftover so let's remove it.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 3bce117de2)
Since the container now simply reads the ceph.conf, we remove all the
unnecessary options.
Also this PR is the foundation to support multiple backend, such as the
new 'beast' from Ceph Mimic.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1582411
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 4d64dd4686)
# Conflicts:
# roles/ceph-rgw/tasks/docker/main.yml
When we deploy a Jewel cluster on Ubuntu with ceph_test: True, we're
unable to upgrade that cluster to Luminous.
"apt-get install ceph-common" fails to upgrade to luminous if a jewel ceph-test package is installed:
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:
The following packages have unmet dependencies:
ceph-base : Breaks: ceph-test (< 12.2.2-14) but 10.2.11-1xenial is to be installed
ceph-mon : Breaks: ceph-test (< 12.2.2-14) but 10.2.11-1xenial is to be installed
In ceph-ansible master, we resolve this whole class of problem by
installing all the packages in one operation (see
b338fafd90).
For the stable-3.1 branch, take a less-invasive approach, and upgrade
ceph-test prior to any other package. This matches the approach I took
for RPMs in 3752cc6f38, before we had the
better solution in b338fafd90.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1610997
Signed-off-by: Ken Dreyer <kdreyer@redhat.com>
In environments where we wish to have manual/greater control over
how the bootstrap keyrings are used, we need to able to externally
define what the mgr keyring secret will be and have ceph-ansible
use it, instead of it being autogenerated
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1610213
Signed-off-by: Graeme Gillies <ggillies@akamai.com>
(cherry picked from commit a46025820d)
This is not needed since this is already covered by docker_cluster and
centos_cluster scenarios.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 50be3fd9e8)
This was fixed by
578aa5c2d5
on non-container, we need to apply the same fix for containers.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 77d4023fbe)
deployment.
restart_osd_daemon.sh is used to discover and restart all OSDs on a
host. To do it the scripts loops the list of ceph-osd@ services in the
system. This commit fixes bug in the regular expression responsile for
extraction of OSDs - prior version uses `[0-9]{1,2}` expression
which is ignoring all OSDS which numbers are greater than 99 (thus
longer than 2 digits). Fix removed upper limit of digits in the number.
This problem existed in two places in the script.
Closes: #2964
Signed-off-by: Artur Fijalkowski <artur.fijalkowski@ing.com>
(cherry picked from commit 52d9d406b1)
This commit ensures we are backward compatible with fqdn deployments.
Since ceph-container enforces deployment to be done with shortname, we
must keep backward compatibility with clusters already deployed with
fqdn configuration
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0a6ff6bbf8)
upgrade RHCS 2 -> RHCS 3 will fail if cluster has still set
sortnibblewise,
it stay stuck on "TASK [waiting for clean pgs...]" as RHCS 3 osds will
not start if nibblewise is set.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1600943
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit b3266c5be2)
This was introduced by
59ee2e8d3b
and made our socket checks impossible to run. The PID could be found,
but the cctid cannot.
This happens during upgrade to mimic and on cluster running on mimic.
So let's force the admin socket the way it was so we can properly check
for existing instances also the line $cluster-$name.$pid.$cctid.asok
is only needed when running multiple instances of the same daemon,
thing ceph-ansible cannot do at the time of writing
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1610220
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit ea9e60d48d)
`test_rbd_mirror_is_up()` is failing on update scenarios because it
assumes the `ceph_stable_release` is still set to the value of the
original ceph release, it means it won't enter in the right part of the
condition and fails.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d8281e50f1)
We were not passing in the ceph conf info into the rbd image removal
command, so if the clustername was not the default igw purge would fail
due to the rbd rm command failing.
This just fixes the bug by passing in the ceph conf info which has the
clustername to use.
This fixes Red Hat bugzilla:
https://bugzilla.redhat.com/show_bug.cgi?id=1601949
Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit d572a9a602)
Instead of failing the entire purge operation when the rbd command fails
just log an error. This will allow the higher level target and config
cleanup to complete, and the user only has to manually delete the rbd
images.
Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit 6f72f96dad)
The container runs with --rm which means it will be deleted by Docker
when exiting. Also 'docker rm -f' is not idempotent and returns 1 if the
container does not exist.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1609007
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 2ca8c51906)
If we want to be backward compatible with release prior to luminous, we
have to set the rule name accordingly to default values used in jewel.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 053709da97)
rbd-mirror can't start when deploying jewel because it needs admin
keyring.
Getting back this task brings backward compatibility for jewel
deployment.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1ecbbbdcfa)
Let's not deploy common roles when iscsigw nodes for jewel deployment.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a1ca2c8fd3)
jewel used to create a default `rbd` pool in the default crush root
`default`, we need to have at least 1 osd to satisfy the PGs for this
created pool, otherwise the cluster will be in HEALTH_ERR state because
of `pgs stuck unclean`/`pgs stuck inactive`
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 578aa5c2d5)
The problem is rbd-target-gw needs the rbd pool to be created, keyring
to be copied over, and the iscsi-gateway.cfg to be setup before starting
the rbd-target-gw service.
In the master branch this is fixed by this commit:
commit 91bf53ee93
Author: Sébastien Han <seb@redhat.com>
Date: Fri Mar 23 11:24:56 2018 +0800
ceph-iscsi: support for containerize deployment
where the needed setup tasks are done in common.yml which is done
before prerequisites.yml.
To avoid porting all those changes to 3.1 this patch just moves the
rbd-target-gw startup to configure_iscsi.yml after everything has
been setup.
This fixes red hat bz:
https://bugzilla.redhat.com/show_bug.cgi?id=1601325
Signed-off-by: Mike Christie <mchristi@redhat.com>
In containerized deployments we now inherite from the
radosgw_civetweb_options options when bootstrapping the container.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1582411
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit e2ea5bac51)
When distributing ceph-nfs role, creation of rados index object
fails as it assumes availability of client.admin locally.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1607970
Signed-off-by: Giulio Fidente <gfidente@redhat.com>
(cherry picked from commit e85e5ea781)
Let's create a dedicated environment for these scenarios, there is no
need to deploy everything.
By the way, doing so will save some times.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b89cc1746f)
In addition to ceph/ceph-build#1082
Let's set the ansible version in each ceph-ansible branch's respective
requirements.txt.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
since no latest-bis-jewel exists, it's using latest-bis which points to
ceph mimic. In our testing, using it for idempotency/handlers tests
means upgrading from jewel to mimic which is not what we want do.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 05852b0301)
Since `V2.6-stable` is available and has packages for `mimic`, let's
update this default value accordingly so nfs nodes can be deployed with
mimic.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1a626d3c61)
Relying on `copy_admin_key` to import created keys on client nodes makes
us obliged to copy admin key on those nodes which is not something we might
want.
We should use the fact `condition_copy_admin_key` which will be set to
`True` when the delegated node is a mon which means we can import keys
without taking care of admin keyring.
Fixes: #2867
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5ef5fcd0b6)
Follow up on #2784
We must check in the generated fact `_disabled_ceph_mgr_modules` to
enable disabled mgr module.
Otherwise, this task will be skipped because it's not comparing the
right list.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1600155
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ce5ac930c5)
since ooo_collocation scenario is supposed to be the same scenario than the
one tested by OSP and they are not passing `rgw_create_pools` the test
`test_docker_rgw_tuning_pools_are_set` will fail:
```
> pools = node["vars"]["rgw_create_pools"]
E KeyError: 'rgw_create_pools'
```
skipping this test if `node["vars"]["rgw_create_pools"]` is not defined
fixes this failure.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1c3dae4a90)
CI is deploying a iscsigw node anyway but its not deployed let's skip
test accordingly
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2d560b562a)
these tests are skipped on bluestore osds scenarios.
they were going to fail anyway since they are run on mon nodes and
`devices` is defined in inventory for each osd node. It means
`num_devices * num_osd_hosts` returns `0`.
The result is that the test expects to have 0 OSDs up.
The idea here is to move these tests so they are run on OSD nodes.
Each OSD node checks their respective OSD to be UP, if an OSD has 2
devices defined in `devices` variable, it means we are checking for 2
OSD to be up on that node, if each node has all its OSD up, we can say
all OSD are up.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fe79a5d240)
At the moment, a lot of tests are skipped when daemons are collocated.
Our tests consider a node belong to only 1 group while it's possible for
certain scenario it can belong to multiple groups.
Also pinning to pytest 3.6.1 so we can use `request.node.iter_markers()`
Co-Authored-by: Alfredo Deza <adeza@redhat.com>
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d83b24d271)
We must initialize `children` variable in `_get_osd_id_from_host()`,
otherwise, if for any reason the deployment has failed and result with
an osd host with no OSD registered, we won't enter in the condition,
therefore, `children` is never set and the function tries to return
something undefined.
Typical error:
```
E UnboundLocalError: local variable 'children' referenced before assignment
```
Fixes: #2860
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9a65ec231d)
these tests are skipped on bluestore osds scenarios.
they were going to fail anyway since they are run on mon nodes and
`devices` is defined in inventory for each osd node. It means
`num_devices * num_osd_hosts` returns `0`.
The result is that the test expects to have 0 OSDs up.
The idea here is to move these tests so they are run on OSD nodes.
Each OSD node checks their respective OSD to be UP, if an OSD has 2
devices defined in `devices` variable, it means we are checking for 2
OSD to be up on that node, if each node has all its OSD up, we can say
all OSD are up.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fe79a5d240)
avoid duplicating test unnecessarily just because of docker exec syntax.
Using the same logic than in the playbook with `docker_exec_cmd` allow us
to execute the same test on both containerized and non containerized environment.
The idea is to set a variable `docker_exec_cmd` with the
'docker exec <container-name>' string when containerized and
set it to '' when non containerized.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f2e57a56db)