Commit Graph

3668 Commits (c65ea7e9d7962a7b4c2872fa0170a48214feb882)
 

Author SHA1 Message Date
Andrew Schoen c65ea7e9d7 site-docker: validate config before pulling container images
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-05-18 17:58:24 +02:00
Andrew Schoen 890e265fd3 validate: adds a CEPH_RELEASES constant
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-05-18 17:58:24 +02:00
Andrew Schoen d30a99c350 validate: add support for containerized_deployment
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-05-18 17:58:24 +02:00
Andrew Schoen 5d64eb79c1 validate: show an error and stop the playbook when notario is missing
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-05-18 17:58:24 +02:00
Andrew Schoen 62d6f2d84a site-docker.yml: add config validation play
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-05-18 17:58:24 +02:00
Andrew Schoen a80a109ac9 site.yml: the validation play must use become: true
The ceph-defaults role expects this.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-05-18 17:58:24 +02:00
Andrew Schoen 12bdb8ef87 docs: add instructions for installing ansible and notario
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-05-18 17:58:24 +02:00
Andrew Schoen ef48ed4e5a adds a requiremnts.txt file for the project
With the addition of the validate module we need to ensure
that notario is installed. This will be done with the use
of this requirments.txt file and pip.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-05-18 17:58:24 +02:00
Andrew Schoen dea1ea93d5 tests: use notario>=0.0.13 when testing
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-05-18 17:58:24 +02:00
Andrew Schoen f84c2ba27b ceph-defaults: fix failing tasks when osd_scenario was not set correctly
When devices is not defined because you want to use the 'lvm'
osd_scenario but you've made a mistake selecting that scenario these
tasks should not fail.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-05-18 17:58:24 +02:00
Andrew Schoen 91f65e2420 validate: improve error messages when config fails validation
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-05-18 17:58:24 +02:00
Andrew Schoen d83bdce8a9 site.yml: abort playbook when it fails during config validation
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-05-18 17:58:24 +02:00
Andrew Schoen 1f15a81c48 ceph-defaults: move cephfs vars from the ceph-mon role
We're doing this so we can validate this in the ceph-validate role

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-05-18 17:58:24 +02:00
Andrew Schoen ffe05872ac validate: only validate cephfs_pools on mon nodes
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-05-18 17:58:24 +02:00
Andrew Schoen 760a1afc21 validate: only validate osd config options on osd hosts
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-05-18 17:58:24 +02:00
Andrew Schoen 4325ccc857 validate: only check mon and rgw config if the node is in those groups
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-05-18 17:58:24 +02:00
Andrew Schoen b2b905f47e site.yml: remove the testing task that fails the playbook run
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-05-18 17:58:24 +02:00
Andrew Schoen 48c2a4fda8 validate: check rados config options
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-05-18 17:58:24 +02:00
Andrew Schoen 377fe81c10 validate: make sure ceph_stable_release is set to the correct value
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-05-18 17:58:24 +02:00
Andrew Schoen ba7f09c0a7 ceph-validate: move var checks from ceph-common into this role
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-05-18 17:58:24 +02:00
Andrew Schoen 32bac6b491 ceph-validate: move var checks from ceph-osd into this role
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-05-18 17:58:24 +02:00
Andrew Schoen 29a9dffc83 ceph-validate: move ceph-mon config checks into this role
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-05-18 17:58:24 +02:00
Andrew Schoen d87a32347f adds a new ceph-validate role
This will be used to validate config given to ceph-ansible.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-05-18 17:58:24 +02:00
Andrew Schoen 692ab26734 validate: validate osd_scenarios
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-05-18 17:58:24 +02:00
Andrew Schoen 4baa8389e0 validate: check monitor options
validates monitor_address, monitor_address_block and monitor_interface

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-05-18 17:58:24 +02:00
Andrew Schoen 4008d700a4 site.yml: move validate task to it's own play
This needs to be in it's own play with ceph-defaults included
so that I can validate things that might be defaulted in that
role.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-05-18 17:58:24 +02:00
Andrew Schoen 9f68dad2ff validate: first pass at validating the install options
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-05-18 17:58:24 +02:00
Alfredo Deza 0ace2e9534 site: add validation task
Signed-off-by: Alfredo Deza <adeza@redhat.com>
2018-05-18 17:58:24 +02:00
Alfredo Deza 86a32071e8 rpm: add python-notario as a dependency for validation
Signed-off-by: Alfredo Deza <adeza@redhat.com>
2018-05-18 17:58:24 +02:00
Alfredo Deza e33608ec16 library: add a placeholder module for the validate action plugin
Signed-off-by: Alfredo Deza <adeza@redhat.com>
2018-05-18 17:58:24 +02:00
Alfredo Deza 36dc7c7862 plugins create an action plugin for validation using notario
Signed-off-by: Alfredo Deza <adeza@redhat.com>
2018-05-18 17:58:24 +02:00
Sébastien Han 2f43e9dab5 defaults: restart_osd_daemon unit spaces
Extra space in systemctl list-units can cause restart_osd_daemon.sh to
fail

It looks like if you have more services enabled in the node space
between "loaded" and "active" get more space as compared to one space
given in command the command[1].

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1573317
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-05-18 17:53:47 +02:00
Michael Vollman ed050bf3f6 Do nothing when mgr module is in good state
Check whether a mgr module is supposed to be disabled before disabling
it and whether it is already enabled before enabling it.

Signed-off-by: Michael Vollman <michael.b.vollman@gmail.com>
2018-05-18 15:21:45 +02:00
Guillaume Abrioux 415dc0a29b take-over: fix bug when trying to override variable
A customer has been facing an issue when trying to override
`monitor_interface` in inventory host file.
In his use case, all nodes had the same interface for
`monitor_interface` name except one. Therefore, they tried to override
this variable for that node in the inventory host file but the
take-over-existing-cluster playbook was failing when trying to generate
the new ceph.conf file because of undefined variable.

Typical error:

```
fatal: [srvcto103cnodep01]: FAILED! => {"failed": true, "msg": "'dict object' has no attribute u'ansible_bond0.15'"}
```

Including variables like this `include_vars: group_vars/all.yml` prevent
us from overriding anything in inventory host file because it
overwrites everything you would have defined in inventory.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1575915

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-05-18 10:10:08 +02:00
Ha Phan fa8e2e7522 Adding mgr_vms variable 2018-05-17 17:30:27 +02:00
Andy McCrae f45662e270 Fix template reference for ganesha.conf
We can simply reference the template name since it exists within the
role that we are calling. We don't need to check the ANSIBLE_ROLE_PATH
or playbooks directory for the file.
2018-05-17 15:23:52 +02:00
Sébastien Han 49a4712485 switch: disable ceph-disk units
During the transition from jewel non-container to container old ceph
units are disabled. ceph-disk can still remain in some cases and will
appear as 'loaded failed', this is not a problem although operators
might not like to see these units failing. That's why we remove them if
we find them.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1577846
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-05-17 08:48:28 +02:00
Guillaume Abrioux a9247c4de7 purge_cluster: wipe all partitions
In order to ensure there is no leftover after having purged a cluster,
we must wipe all partitions properly.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1492242

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-05-17 08:37:17 +02:00
Guillaume Abrioux 9cad113e2f purge_cluster: fix bug when building device list
there is some leftover on devices when purging osds because of a invalid
device list construction.

typical error:
```
changed: [osd3] => (item=/dev/sda sda1) => {
    "changed": true,
    "cmd": "# if the disk passed is a raw device AND the boot system disk\n if parted -s \"/dev/sda sda1\" print | grep -sq boot; then\n echo \"Looks like /dev/sda sda1 has a boot partition,\"\n echo \"if you want to delete specific partitions point to the partition instead of the raw device\"\n echo \"Do not use your system disk!\"\n exit 1\n fi\n echo sgdisk -Z \"/dev/sda sda1\"\n echo dd if=/dev/zero of=\"/dev/sda sda1\" bs=1M count=200\n echo udevadm settle --timeout=600",
    "delta": "0:00:00.015188",
    "end": "2018-05-16 12:41:40.408597",
    "item": "/dev/sda sda1",
    "rc": 0,
    "start": "2018-05-16 12:41:40.393409"
}

STDOUT:

sgdisk -Z /dev/sda sda1
dd if=/dev/zero of=/dev/sda sda1 bs=1M count=200
udevadm settle --timeout=600

STDERR:

Error: Could not stat device /dev/sda sda1 - No such file or directory.
```

the devices list in the task `resolve parent device` isn't built
properly because the command used to resolve the parent device doesn't
return the expected output

eg:

```
changed: [osd3] => (item=/dev/sda1) => {
    "changed": true,
    "cmd": "echo /dev/$(lsblk -no pkname \"/dev/sda1\")",
    "delta": "0:00:00.013634",
    "end": "2018-05-16 12:41:09.068166",
    "item": "/dev/sda1",
    "rc": 0,
    "start": "2018-05-16 12:41:09.054532"
}

STDOUT:

/dev/sda sda1
```

For instance, it will result with a devices list like:
`['/dev/sda sda1', '/dev/sdb', '/dev/sdc sdc1']`
where we expect to have:
`['/dev/sda', '/dev/sdb', '/dev/sdc']`

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1492242

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-05-17 08:37:17 +02:00
Sébastien Han d80a871a07 rolling_update: move osd flag section
During a minor update from a jewel to a higher jewel version (10.2.9 to
10.2.10 for example) osd flags don't get applied because they were done
in the mgr section which is skipped in jewel since this daemons does not
exist.
Moving the set flag section after all the mons have been updated solves
that problem.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1548071
Co-authored-by: Tomas Petr <tpetr@redhat.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-05-17 08:17:16 +02:00
Ken Dreyer fcea568495 Makefile: add "make tag" command
Add a new "make tag" command. This automates some common operations:

1) Automatically determine the next Git tag version number to create.
   For example:
   "3.2.0beta1 -> "3.2.0beta2"
   "3.2.0rc1 -> "3.2.0rc2"
   "3.2.0" -> "3.2.1"

2) Create the Git tag, and print instructions for the user to push it to
   GitHub.

3) Sanity check that HEAD is a stable-* branch or master (bail on
   everything else).

4) Sanity check that HEAD is not already tagged.

Note, we will still need to tag manually once each time we change the
format, for example when moving from tagging "betas" to tagging "rcs",
or "rcs" to "stable point releases".

Signed-off-by: Ken Dreyer <kdreyer@redhat.com>
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-05-16 19:05:38 +02:00
Sébastien Han a55ff1cfe2 contrib: check for lt 3 arguments
The script now supports 3 or 4 arguments so we need to check if the
script has less 3 args.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-05-16 19:03:33 +02:00
Sébastien Han 8ba4ffa7e5 contruib: ability to set a prefix on backport script
When pushing a PR it might be handy to set the [skip ci] flag if we know
upfront the content should not trigger the CI.

Now you can add [skip ci] as $4 in your command line.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-05-16 10:41:48 +02:00
Andy McCrae 226f80c22b Install packages as a list
To make the package installation more efficient we should install
packages as a list rather than as individual tasks or using a
"with_items" loop. The package managers can handle a list passed to them
to install in one go.

We can use a specified list and substitute any packages that are not to
be installed with the ceph-common package, which is installed on every
package install, then apply the unique filter to the package install
list.
2018-05-16 09:59:00 +02:00
Guillaume Abrioux f749830897 mon: refactor of mgr key fetching
There is no need to stat for created mgr keyrings since they are created
anyway when deploying a ceph cluster > jewel. In case of a jewel
deployment we won't enter that block.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-05-16 09:44:58 +02:00
Guillaume Abrioux 926be51b44 mgr: delete copy_configs.yml (containerized)
This file is a leftover from PR ceph/ceph-ansible#2516
It is not used anymore so it can be removed.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-05-15 19:30:34 +02:00
Guillaume Abrioux 1b4c3f292d rolling_update: fix dest path for mgr keys fetching
the role `ceph-mgr` that is played later in the playbook fails because
the destination path for the fetched keys is wrong.
This patch fix the destination path used in the task `fetch ceph mgr
key(s)` so there is no mismatch.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1574995

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-05-15 19:30:34 +02:00
Guillaume Abrioux 3b89f1bfb1 rolling_update: get fsid in mgr pre_task
{{ fsid }} points to {{ cluster_uuid.stdout }} which is not defined in
this part of the rolling_update playbook.
Since we need to call {{ fsid }} we must get the fsid and register it to
`cluster_uuid`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-05-15 09:01:42 +02:00
Sébastien Han 52fc8a0385 rolling_update: move mgr key creation
Until all the mons haven't been updated to Luminous, there is no way to
create a key. So we should do the key creation in the mon role only if
we are not part of an update.
If we are then the key creation is done after the mons upgrade to
Luminous.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1574995
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-05-15 09:01:42 +02:00
Sébastien Han e810fb217f Revert "mon: fix mgr keyring creation when upgrading from jewel"
This reverts commit 259fae931d.
2018-05-15 09:01:42 +02:00