Commit Graph

3771 Commits (988b5a81d3aafbe581436100d7b8c6ee7ea8ffe2)
 

Author SHA1 Message Date
Sébastien Han 988b5a81d3 take-over-existing-cluster: do not call var_files
We were using var_files long ago when default variables were not in
ceph-defaults, now the role exists this is not need. Moreover having
these two var files added:

- roles/ceph-defaults/defaults/main.yml
- group_vars/all.yml

Will create collision and override necessary variables.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1555305
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit b738706810)
2018-08-20 14:47:32 +02:00
Markos Chandras b2de642c8e roles: ceph-defaults: Delegate cluster information task to monitor node
Since commit f422efb1d6 ("config: ensure
rgw section has the correct name") we observe the following failures in
new Ceph deployment with OpenStack-Ansible

fatal: [aio1_ceph-rgw_container-fc588f0a]: FAILED! => {"changed": false,
"cmd": "ceph --cluster ceph -s -f json", "msg": "[Errno 2] No such file
or directory"

This is because the task executes 'ceph' but at this point no package
installation has happened. Packages are normally installed in the
'ceph-common' role which runs after the 'ceph-defaults' one.

Since we are looking to obtain cluster information, the task should be
delegated to a monitor node similar to other tasks in that role

Signed-off-by: Markos Chandras <mchandras@suse.de>
(cherry picked from commit 37e50114de)
2018-08-20 14:18:07 +02:00
Markos Chandras e9433afd6c roles: ceph-defaults: Check if 'rgw' attribute exists for rgw_hostname
If there are no services on the cluster, then the 'rgw' could be missing
and the task is failing with the following problem:

msg": "The task includes an option with an undefined variable.
The error was: 'dict object' has no attribute 'rgw'

We fix this by checking the existence of the 'rgw' attribute. If it's
missing, we skip the task since the role already contains code to set
a good default rgw_hostname.

Signed-off-by: Markos Chandras <mchandras@suse.de>
(cherry picked from commit 126e2e3f92)
2018-08-20 14:18:07 +02:00
Dardo D Kleiner 2c77e1ac4e mgr: improve/fix disabled modules check
Follow up on 36942af698

"disabled_modules" is always a list, it's the items in the list that
can be dicts in mimic.  Many ways to fix this, here's one.

Signed-off-by: Dardo D Kleiner <dardokleiner@gmail.com>
(cherry picked from commit f6519e4003)
2018-08-20 11:49:30 +00:00
Andrew Schoen f183be0328 lv-create: use copy instead of the template module
The copy module does in fact do variable interpolation so we do not need
to use the template module or keep a template in the source.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 04df3f0802)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-16 17:01:41 +02:00
Andrew Schoen 1decd53eb0 tests: cat the contents of lv-create.log in infra_lv_create
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit f5a4c89869)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-16 17:01:41 +02:00
Andrew Schoen 6081aea5a1 lv-create: add an example logfile_path config option in lv_vars.yml
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 131796f275)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-16 17:01:41 +02:00
Andrew Schoen c119150946 tests: adds a testing scenario for lv-create and lv-teardown
Using an explicitly named testing environment name allows us to have a
specific [testenv] block for this test. This greatly simplifies how it will
work as it doesn't really anything from the ceph cluster tests.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 810cc47892)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-16 17:01:41 +02:00
Andrew Schoen 634cc14393 lv-teardown: fail silently if lv_vars.yml is not found
This allows user to opt out of using lv_vars.yml and load configuration
from other sources.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit b0bfc17351)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-16 17:01:41 +02:00
Andrew Schoen 09e4ef3371 lv-teardown: set become: true at the playbook level
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 8424858b40)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-16 17:01:41 +02:00
Andrew Schoen 293aaaf758 lv-create: fail silenty if lv_vars.yml is not found
If a user decides to to use the lv_vars.yml file then it should fail
silenty so that configuration can be picked up from other places.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit e43eec57bb)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-16 17:01:41 +02:00
Andrew Schoen 2648751488 lv-create: set become: true at the playbook level
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit fde47be13c)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-16 17:01:41 +02:00
Andrew Schoen 9af842467e lv-create: use the template module to write log file
The copy module will not expand the template and render the variables
included, so we must use template.

Creating a temp file and using it locally means that you must run the
playbook with sudo privledges, which I don't think we want to require.
This introduces a logfile_path variable that the user can use to control
where the logfile is written to, defaulting to the cwd.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 35301b35af)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-16 17:01:41 +02:00
Neha Ojha 7f44244d23 infrastructure-playbooks/vars/lv_vars.yaml: minor fixes
Signed-off-by: Neha Ojha <nojha@redhat.com>
(cherry picked from commit 909b38da82)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-16 17:01:41 +02:00
Neha Ojha db0e06cbb6 infrastructure-playbooks/lv-create.yml: use tempfile to create logfile
Signed-off-by: Neha Ojha <nojha@redhat.com>
(cherry picked from commit f65f3ea89f)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-16 17:01:41 +02:00
Neha Ojha 89d950fd3c infrastructure-playbooks/lv-create.yml: add lvm_volumes to suggested paste
Signed-off-by: Neha Ojha <nojha@redhat.com>
(cherry picked from commit 65fdad0723)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-16 17:01:41 +02:00
Neha Ojha 1a0f7baf21 infrastructure-playbooks/lv-create.yml: copy without using a template file
Signed-off-by: Neha Ojha <nojha@redhat.com>
(cherry picked from commit 50a6d8141c)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-16 17:01:41 +02:00
Neha Ojha f1245e6011 infrastructure-playbooks/lv-create.yml: don't use action to copy
Signed-off-by: Neha Ojha <nojha@redhat.com>
(cherry picked from commit 186c4e11c7)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-16 17:01:41 +02:00
Neha Ojha 21902f0113 infrastructure-playbooks: standardize variable usage with a space after brackets
Signed-off-by: Neha Ojha <nojha@redhat.com>
(cherry picked from commit 9d43806df9)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-16 17:01:41 +02:00
Neha Ojha fb06c6cb80 vars/lv_vars.yaml: remove journal_device
Signed-off-by: Neha Ojha <nojha@redhat.com>
(cherry picked from commit e0293de3e7)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-16 17:01:41 +02:00
Ali Maredia 10da777634 infrastructure-playbooks: playbooks for creating LVs for bucket indexes and journals
These playbooks create and tear down logical
volumes for OSD data on HDDs and for a bucket index and
journals on 1 NVMe device.

Users should follow the guidelines set in var/lv_vars.yaml

After the lv-create.yml playbook is run, output is
sent to /tmp/logfile.txt for copy and paste into
osds.yml

Signed-off-by: Ali Maredia <amaredia@redhat.com>
(cherry picked from commit 1f018d8612)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-16 17:01:41 +02:00
Sébastien Han 28fc45e346 Revert "osd: generate device list for osd_auto_discovery on rolling_update"
This reverts commit e84f11e99e.

This commit was giving a new failure later during the rolling_update
process. Basically, this was modifying the list of devices and started
impacting the ceph-osd itself. The modification to accomodate the
osd_auto_discovery parameter should happen outside of the ceph-osd.

Also we are trying to not play ceph-osd role during the rolling_update
process so we can speed up the upgrade.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 3149b2564f)
2018-08-16 13:35:23 +00:00
Sébastien Han 6f1499800f rolling_update: register container osd units
Before running the upgrade, let's call systemd to collect unit names
instead of relaying on the device list. This is more accurate and fix
the osd_auto_discovery scenario too.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1613626
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit dad10e8f3f)
2018-08-16 13:35:23 +00:00
Sébastien Han 51de29046b contrib: fix generate group_vars samples
For ceph-iscsi-gw and ceph-rbd-mirror roles the group_name are named
differently (by default) than the role name so we have to change the
script to generate the correct name.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1602327
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 315ab08b16)
2018-08-14 17:51:41 +00:00
Jeffrey Zhang 19c7ca1983 Use /var/lib/ceph/osd folder to filter osd mount point
In some case, use may mount a partition to /var/lib/ceph, and umount
it will be failure and no need to do so too.

Signed-off-by: Jeffrey Zhang <zhang.lei.fly@gmail.com>
(cherry picked from commit 85cc61a6d9)
2018-08-14 14:55:56 +00:00
Mike Christie c44638ae7e stable 3.1 igw: add api setting support
Port the parts of this upstream commit:

commit 91bf53ee93
Author: Sébastien Han <seb@redhat.com>
Date:   Fri Mar 23 11:24:56 2018 +0800

   ceph-iscsi: support for containerize deployment

that allows configuration of
API settings in roles/ceph-iscsi-gw/templates/iscsi-gateway.cfg.j2
using the iscsi-gws.yml.

This fixes Red Hat BZ:
https://bugzilla.redhat.com/show_bug.cgi?id=1613963

Signed-off-by: Mike Christie <mchristi@redhat.com>
2018-08-14 10:23:12 +02:00
Mike Christie 2b76e3771d stable 3.1 igw: enable and start rbd-target-api
Backport
https://github.com/ceph/ceph-ansible/pull/2984
to stable 3.1.

From upstream commit:

commit 1164cdc002
Author: Guillaume Abrioux <gabrioux@redhat.com>
Date:   Thu Aug 2 11:58:47 2018 +0200

    iscsigw: install ceph-iscsi-cli package

installs the cli package but does not start and enable the
rbd-target-api daemon needed for gwcli to communicate with the igw
nodes. This just enables and starts it.

This fixes Red Hat BZ
https://bugzilla.redhat.com/show_bug.cgi?id=1613963.

Signed-off-by: Mike Christie <mchristi@redhat.com>
2018-08-14 10:23:12 +02:00
Sébastien Han e7596d565f group_vars: resync missing options
resync group_vars file with the defaults/main.yml files.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 2dd75a1e6e)
2018-08-13 18:55:06 +02:00
Guillaume Abrioux 904a0a4017 fail if fqdn deployment attempted
fqdn configuration possibility caused a lot of trouble, it's adding a
lot of complexity because of multiple cases and the relation between
ceph-ansible and ceph-container. Moreover, there is no benefit for such
a feature.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1613155
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-08-13 18:55:06 +02:00
Guillaume Abrioux 97cf08e897 config: ensure rgw section has the correct name
the ceph.conf.j2 always assumes the hostname used to register the
radosgw in the servicemap is equivalent to `{{ ansible_hostname }}`
which returns the shortname form.

We need to detect which form of the hostname was used in case of already
deployed cluster and update the ceph.conf accordingly.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1580408

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f422efb1d6)
2018-08-13 18:55:06 +02:00
Guillaume Abrioux 95c28e78d1 mgr: backward compatibility for module management
Follow up on 3abc253fec

The structure had even changed within `luminous` release.
It was first:

```
{
    "enabled_modules": [
        "balancer",
        "dashboard",
        "restful",
        "status"
    ],
    "disabled_modules": [
        "influx",
        "localpool",
        "prometheus",
        "selftest",
        "zabbix"
    ]
}
```
Then it changed for:

```
{
  "enabled_modules": [
      "status"
  ],
  "disabled_modules": [
      "balancer",
      "dashboard",
      "influx",
      "localpool",
      "prometheus",
      "restful",
      "selftest",
      "zabbix"
  ]
}
```

and finally:
```
{
  "enabled_modules": [
      "status"
  ],
  "disabled_modules": [
      {
          "name": "balancer",
          "can_run": true,
          "error_string": ""
      },
      {
          "name": "dashboard",
          "can_run": true,
          "error_string": ""
      }
  ]
}
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 36942af698)
2018-08-13 16:05:21 +00:00
Guillaume Abrioux 9a013ab333 tests: resync iscsigw group name with master
let's align the name of that group in stable-3.1 with master branch.

Not having the same group name on different branches is confusing and
make some nightlies job failing in the CI.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-08-13 12:24:59 +02:00
Guillaume Abrioux 32ef06e80f tests: fix a typo in testinfra for iscsigws and jewel scenario
group name for iscsi-gw nodes in testing is `iscsi-gws`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-08-13 12:24:59 +02:00
Sébastien Han 8ea9d14050 osd: generate device list for osd_auto_discovery on rolling_update
rolling_update relies on the list of devices when performing the restart
of the OSDs. The task that is builind the devices list out of the
ansible_devices dict only runs when there are no partitions on the
drives. However during an upgrade the OSD are already configured, they
have been prepared and have partitions so this task won't run and thus
the devices list will be empty, skipping the restart during
rolling_update. We now run the same task under different requirements
when rolling_update is true and build a list when:

* osd_auto_discovery is true
* rolling_update is true
* ansible_devices exists
* no dm/lv are part of the discovery
* the device is not removable
* the device has more than 1 sector

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1613626
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit e84f11e99e)
2018-08-10 16:30:40 +02:00
Sébastien Han 4785799110 rolling_update: add role ceph-iscsi-gw
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1575829
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit e91648a7af)
2018-08-10 14:38:19 +02:00
Sébastien Han 12083bdab4 mon: fix calamari initialisation
If calamari is already installed and ceph has been upgraded to a higher
version the initialisation will fail later. So if we detect the
calamari-server is too old compare to ceph_rhcs_version we try to update
it.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1601755
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 4c9e24a90f)
2018-08-10 14:15:16 +02:00
Sébastien Han 651058bd1b rgw: remove useless condition
The include does not need a condition on containerized_deployment since
we are already in an include than has the same condition.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 5a89479abe)
2018-08-09 15:38:17 +02:00
Sébastien Han eba9547a6e rgw: remove unused file
copy_configs.yml was not including and is a leftover so let's remove it.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 3bce117de2)
2018-08-09 15:38:17 +02:00
Sébastien Han a16dc0e1de rgw: ability to use ceph-ansible vars into containers
Since the container now simply reads the ceph.conf, we remove all the
unnecessary options.

Also this PR is the foundation to support multiple backend, such as the
new 'beast' from Ceph Mimic.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1582411
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 4d64dd4686)

# Conflicts:
#	roles/ceph-rgw/tasks/docker/main.yml
2018-08-09 15:38:17 +02:00
Ken Dreyer 1a2c6a3572 common: upgrade/install ceph-test deb first
When we deploy a Jewel cluster on Ubuntu with ceph_test: True, we're
unable to upgrade that cluster to Luminous.

"apt-get install ceph-common" fails to upgrade to luminous if a jewel ceph-test package is installed:

  Some packages could not be installed. This may mean that you have
  requested an impossible situation or if you are using the unstable
  distribution that some required packages have not yet been created
  or been moved out of Incoming.
  The following information may help to resolve the situation:

  The following packages have unmet dependencies:
   ceph-base : Breaks: ceph-test (< 12.2.2-14) but 10.2.11-1xenial is to be installed
   ceph-mon : Breaks: ceph-test (< 12.2.2-14) but 10.2.11-1xenial is to be installed

In ceph-ansible master, we resolve this whole class of problem by
installing all the packages in one operation (see
b338fafd90).

For the stable-3.1 branch, take a less-invasive approach, and upgrade
ceph-test prior to any other package. This matches the approach I took
for RPMs in 3752cc6f38, before we had the
better solution in b338fafd90.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1610997
Signed-off-by: Ken Dreyer <kdreyer@redhat.com>
2018-08-09 14:39:33 +02:00
Graeme Gillies 19958f5c27 Allow mgr bootstrap keyring to be defined
In environments where we wish to have manual/greater control over
how the bootstrap keyrings are used, we need to able to externally
define what the mgr keyring secret will be and have ceph-ansible
use it, instead of it being autogenerated

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1610213

Signed-off-by: Graeme Gillies <ggillies@akamai.com>
(cherry picked from commit a46025820d)
2018-08-09 08:25:27 +00:00
Sébastien Han b00d2d0439 Resync rhcs_edits.txt
We were missing an option so let's add it back.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1519835
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 19518656a7)
2018-08-08 15:54:32 +02:00
Sébastien Han a31ce962f7 test: remove osd_crush_location from shrink scenarios
This is not needed since this is already covered by docker_cluster and
centos_cluster scenarios.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 50be3fd9e8)
2018-08-07 19:09:58 +00:00
Sébastien Han b76c7c3afe test: follow up on osd_crush_location for containers
This was fixed by
578aa5c2d5
on non-container, we need to apply the same fix for containers.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 77d4023fbe)
2018-08-07 19:09:58 +00:00
Guillaume Abrioux 9403a3df09 iscsigw: install ceph-iscsi-cli package
Install ceph-iscsi-cli in order to provide the `gwcli` command tool.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1602785

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1164cdc002)
2018-08-07 09:46:25 +02:00
Artur Fijalkowski 290035171f Fix in regular expression matching OSD ID on non-contenerized
deployment.
restart_osd_daemon.sh is used to discover and restart all OSDs on a
host. To do it the scripts loops the list of ceph-osd@ services in the
system. This commit fixes bug in the regular expression responsile for
extraction of OSDs - prior version uses `[0-9]{1,2}` expression
which is ignoring all OSDS which numbers are greater than 99 (thus
longer than 2 digits). Fix removed upper limit of digits in the number.
This problem existed in two places in the script.

Closes: #2964

Signed-off-by: Artur Fijalkowski <artur.fijalkowski@ing.com>
(cherry picked from commit 52d9d406b1)
2018-08-06 18:50:39 +00:00
Guillaume Abrioux 706d0b8289 defaults: backward compatibility with fqdn deployments
This commit ensures we are backward compatible with fqdn deployments.
Since ceph-container enforces deployment to be done with shortname, we
must keep backward compatibility with clusters already deployed with
fqdn configuration

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0a6ff6bbf8)
2018-08-06 14:09:35 +00:00
Sébastien Han 31dd4eeecf rolling_update: set osd sortbitwise
upgrade RHCS 2 -> RHCS 3 will fail if cluster has still set
sortnibblewise,
it stay stuck on "TASK [waiting for clean pgs...]" as RHCS 3 osds will
not start if nibblewise is set.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1600943
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit b3266c5be2)
2018-08-02 14:53:06 +00:00
Sébastien Han 2d5ed5ef8e config: enforce socket name
This was introduced by
59ee2e8d3b
and made our socket checks impossible to run. The PID could be found,
but the cctid cannot.
This happens during upgrade to mimic and on cluster running on mimic.

So let's force the admin socket the way it was so we can properly check
for existing instances also the line $cluster-$name.$pid.$cctid.asok
is only needed when running multiple instances of the same daemon,
thing ceph-ansible cannot do at the time of writing

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1610220
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit ea9e60d48d)
2018-08-02 12:34:48 +00:00
Guillaume Abrioux 826da2c385 tests: support update scenarios in test_rbd_mirror_is_up()
`test_rbd_mirror_is_up()` is failing on update scenarios because it
assumes the `ceph_stable_release` is still set to the value of the
original ceph release, it means it won't enter in the right part of the
condition and fails.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d8281e50f1)
2018-08-02 10:06:55 +00:00