Commit Graph

5359 Commits (95a073cb3b54022db46da662d92f48250ba50507)
 

Author SHA1 Message Date
Guillaume Abrioux 688d5eebf7 rolling_update: add any_errors_fatal
If a failure occurs in ceph-validate, the upgrade playbook keeps running
where we expect it to fail.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8f9cdf4b10)
2020-06-29 17:13:03 -04:00
Dimitri Savineau e5eba9555b dashboard: configure mgr backend before restart
We need to set the mgr dashboard server ip address before restarting the
dashboard module otherwise we can try to bind the dashboard module on an
already used address.
We already do this configuration for the dashboard port value and ssl
setup so we should do the same for server address too.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1851455

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 03cd75845f)
2020-06-29 12:34:26 -04:00
George Shuklin cfc808804f Add container settings for Ubuntu 20 (the same as Ubuntu 18)
Signed-off-by: George Shuklin <george.shuklin@gmail.com>
(cherry picked from commit 3e87f53875)
2020-06-29 12:33:49 -04:00
Dimitri Savineau c3e89983fc Add playbook for converting cluster to cephadm
The commit adds a new playbook for converting an existing ceph cluster
deployed by ceph-ansible to the cephadm orchestrator.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 548ff26256)
2020-06-29 09:45:22 -04:00
Jan Fajerski 5505b713f2 lvm_setup: lookup device from inventory, default to /dev/sd* names
This fixes a long standing fail in ceph-volumes lvm test suite.
Otherwise the default behaviour should not change.

Signed-off-by: Jan Fajerski <jfajerski@suse.com>
(cherry picked from commit 1fe8e819f9)
2020-06-27 09:31:47 -04:00
Jonathan Rosser 28ee26ae58 Ansible tests are not filters
The use of "| success" and "| changed" are not valid syntax for modern
ansible releases.

Signed-off-by: Jonathan Rosser <jonathan.rosser@rd.bbc.co.uk>
(cherry picked from commit 42884e8175)
2020-06-26 13:36:42 -04:00
Jonathan Rosser 77002c12c8 Install python routes package as a dependancy rather than directly
This is now a dependancy of ceph-mgr so will be installed automatically
and does not need a specific task.

This change means that ceph-mgr installs correctly on Ubuntu Focal where
the python3-routes package is necessary.

Signed-off-by: Jonathan Rosser <jonathan.rosser@rd.bbc.co.uk>
(cherry picked from commit 92288c11c5)
2020-06-26 13:36:42 -04:00
Dimitri Savineau 7eddd89afa podman: Add Type and PIDFile value to unit files
This changes the way we are running the podman containers via systemd.
They are now in dettached mode and Type/PIDFile set.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1834974

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d43769dc2a)
2020-06-23 17:35:24 +02:00
Dimitri Savineau 51cfb89501 ceph-osd: remove ceph-osd-run.sh script
Since we only have one scenario since nautilus then we can just move
the container start command from ceph-osd-run.sh to the systemd unit
service.
As a result, the ceph-osd-run.sh.j2 template and the
ceph_osd_docker_run_script_path variable are removed.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 829990e60d)
2020-06-23 17:35:24 +02:00
Guillaume Abrioux e1c8a0daf6 dashboard: copy self-signed generated crt to mons
This commit makes the playbook copying self-signed generated certificate
to monitors.
When mons and mgrs are deployed on dedicated nodes the playbook will
fail when trying to import certificate and key files since they are
generated on mgrs whereas we try to import them from a monitor.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1846995

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b7539eb275)
2020-06-23 15:43:26 +02:00
Guillaume Abrioux e4f972004b ceph_volume: make zap function idempotent
This commit makes the zap function idempotent, especially when using
lvm_volumes variable.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1845668

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3f47236470)
2020-06-23 09:50:01 +02:00
Dimitri Savineau 5428a41fcf docker: Add Requires on docker service
When using docker container engine then the systemd unit scripts only
use a dependency on the docker daemon via the After parameter.
But if docker is restarted on a live system then the ceph systemd units
should wait for the docker daemon to be fully restarted.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1846830

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit bd22f1d1ec)
2020-06-22 17:30:28 -04:00
Guillaume Abrioux a7fc4af06e docker2podman: make images pulling optional
This commit makes the images pulling skipped if podman isn't installed
on the machine.

In OSP context, the podman installation is done later in the workflow,
it means all `podman pull` commands will fail.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1849559

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 37b20b6525)
2020-06-22 14:19:44 -04:00
Guillaume Abrioux 06cfbb10d4 requirements: exclude ansible 2.9.10
ansible 2.9.10 seems to have introduced a bug.

See https://github.com/ansible/ansible/issues/70168

This commit excludes this version from ceph-ansible requirements.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1525990f39)
2020-06-22 13:05:33 -04:00
Dimitri Savineau 1d93b166fb travis: use tests/requirements.txt
Explicitly install ansible-lint pytest pytest-cov via pip results of a
specific pytest version (4.3.1) which is not supported for pytest-cov
(2.10).
Because we are already defining a specific pytest version in the tests
requirements then we can install all the python dependencies from that
file and remove this from the pip install command.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 296aa13b3c)
2020-06-19 18:18:44 -04:00
Dimitri Savineau bc1cc666e4 docs: Add upgrade operation.
This commit adds a chapter about the ceph upgrade process.

Closes: #5393

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit e41487dbce)
2020-06-18 17:58:44 +02:00
Dimitri Savineau 4a8c4446a4 update the release note.
This updates the release note for ceph_pool module.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-06-18 17:56:15 +02:00
Dimitri Savineau b7f38ab1a3 library/ceph_pool: set name parameter as required
The name parameter is required.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d67759611e)
2020-06-17 12:00:59 -04:00
Guillaume Abrioux 4fe8e12484 switch_to_containers: don't set noup flag
We shouldn't set this flag when running switch_to_containers playbook.
Otherwise the playbook fails waiting for pgs to be clean.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1843569

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b91d60d384)
2020-06-17 09:24:02 -04:00
Dimitri Savineau c6e60db2fb container: inspect Id field instead of RepoDigests
When a container image managed by podman isn't tag anymore then the
RepoDigests field when inspecting the image doesn't return any value.
This is different from docker workflow and it breaks the ceph-ansible
container upgrade when collocated multiple services and using a non
fix container tag (like latest or 4).

$ podman images
REPOSITORY              TAG      IMAGE ID       CREATED        SIZE
docker.io/ceph/daemon   latest   680c9c0d38c3   8 days ago     957 MB
<none>                  <none>   011ee108bfc9   2 months ago   1.01 GB

$ podman inspect 680c9c0d38c3 | jq .[0].RepoDigests[0]
"docker.io/ceph/daemon@sha256:20cf789235e23ddaf38e109b391d1496bb88011239d16862c4c106d0e05fea9e"
$ podman inspect 011ee108bfc9 | jq .[0].RepoDigests[0]
null

Because this field returns "null" then the ansible task trying to
determine this value is failing

-----------------------------
fatal: [foo]: FAILED! =>
  msg: |-
    The task includes an option with an undefined variable. The error
    was: None has no element 0

    The error appears to be in
    'roles/ceph-container-common/tasks/fetch_image.yml': line 137,
    column 3, but may be elsewhere in the file depending on the exact
    syntax problem.

    The offending line appears to be:

    - name: set_fact ceph_osd_image_repodigest_before_pulling
      ^ here
-----------------------------

We don't have this behaviour with docker.

$ docker images
REPOSITORY              TAG      IMAGE ID       CREATED        SIZE
docker.io/ceph/daemon   latest   680c9c0d38c3   8 days ago     928 MB
docker.io/ceph/daemon   <none>   011ee108bfc9   2 months ago   986 MB

$ docker inspect 680c9c0d38c3 | jq .[0].RepoDigests[0]
"docker.io/ceph/daemon@sha256:45e6f28bb67c81b826acb64fad5c0da1cac3dffb41a88992fe4ca2be79575fa6"
$ docker inspect 011ee108bfc9 | jq .[0].RepoDigests[0]
"docker.io/ceph/daemon@sha256:b393a73309d72e43ca7d65cd3519036007947671e373eb59aa75a46185c52231"

Instead we should just get the Id field.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1844496

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

(cherry picked from commit cdb30bd125)
2020-06-16 13:12:26 -04:00
Dimitri Savineau b219b1abed switch_to_container: fix osd systemd regex
The systemd LOAD and ACTIVE fileds could have more than one space between
both values.
This update the systemd regex the same way we're using it in different
part of the code.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1843500

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 50140c9b5d)
2020-06-16 18:10:28 +02:00
Ali Maredia 5b76ba12f7 rgw multisite: add master zone endpoints to zonegroup
We were only adding the endpoints to the master zone but not to the
zonegroup.
This patch fixes the issue.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1839228

Signed-off-by: Ali Maredia <amaredia@redhat.com>
(cherry picked from commit 0175c205fa)
2020-06-09 12:29:56 -04:00
Ansible Deployment User 85df54a698 rgwloadbalancer undefined index variable
The vrrp_instances variable is using a loop with index but the index_var
wasn't defined.
As a result, the fact task was failing on this undefined index variable.

The task includes an option with an undefined variable. The error was:
'index' is undefined

Closes: #5395

Signed-off-by: Florian Faltermeier <florian.faltermeier@uibk.ac.at>
(cherry picked from commit 3f906e0c26)
2020-05-26 12:09:41 -04:00
Guillaume Abrioux c67b3d3530 switch_to_container: refact wait for pg check
There is no need to make this check with several steps.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8aed824f71)
2020-05-22 17:05:22 +02:00
Guillaume Abrioux 0c6f5b6891 tests: report coverage status for unittests
This commit adds pytest-cov usage in unittests

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8d556b0787)
2020-05-22 17:05:22 +02:00
Guillaume Abrioux 33897f9d92 ceph_pool: add tests
Add unit tests for ceph_pool module

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 886b5256fd)
2020-05-22 17:05:22 +02:00
Guillaume Abrioux 96df2c116b ceph_pool: support setting application at pool creation
This commit adds the required changes in order to support
setting application pool at initial pool creation.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fa3aa5a03c)
2020-05-22 17:05:22 +02:00
Guillaume Abrioux a49f6caa6d ceph_pool: refact exec_commands()
We never multiple ceph command at a time, so there's no need to have this design.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c4b7d89c18)
2020-05-22 17:05:22 +02:00
Guillaume Abrioux 09a8d5d71e tests: update pools definitions
setting attributes with empty string is a bad user input.
Also, removing `rule_name` attribute when creating a code erasure pool.
(this rule isnt intended for code erasure pool type).

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 83faf94351)
2020-05-22 17:05:22 +02:00
Guillaume Abrioux 4453028862 common: introduce ceph_pool module calls
This commits calls the `ceph_pool` module for creating ceph pools
everywhere it's needed in the playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit af9f6684f2)
2020-05-22 17:05:22 +02:00
Guillaume Abrioux 9303f15c5b library: add ceph_pool module
This commit adds a new module `ceph_pool`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit bddcb439ce)
2020-05-22 17:05:22 +02:00
Dimitri Savineau 27e206e8e0 doc: Add a release note
This adds a release note for the Ceph Octopus release used in the
stable-5.0 branch.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-05-22 15:33:23 +02:00
Dimitri Savineau 3247b1eea9 ceph-nfs: add stable noarch repository
When using the stable nfs ganesha repository, we need have both arch
and noarch repositories enabled.
Currently the noarch repository is missing which cause the non
containerized deployment to fail.

Closes: #5375

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 44e1ebaaff)
2020-05-19 11:18:45 -04:00
Guillaume Abrioux 521c356f33 common: fix target_size_ratio task enablement
The condition on this task is wrong, we have to check whether
`target_size_ratio` is set in the pool definition instead.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8c7a48832c)
2020-05-19 15:15:03 +02:00
Dimitri Savineau c27d3d9150 tests/library: parametrize ceph_volume objecstore
This adds the objectstore testing for both filestore and bluestore on
the ceph_volume module.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 5407e898a6)
2020-05-15 10:20:39 -04:00
Dimitri Savineau 2d396a2311 tests/library: define container cmd once
In containerized deployment, the ceph_volume module will always uses
the same container command prefix for all actions.
Instead of duplicate this code in all container tests we can define it
once.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit a8e458c452)
2020-05-15 10:20:39 -04:00
Guillaume Abrioux ec21d57d23 facts: always set ceph_run_cmd and ceph_admin_command
always set these facts on monitor nodes whatever we run with `--limit`.
Otherwise, playbook will fail when using `--limit` on nodes where these
facts are used on a delegated task to monitor.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e5e81843e9)
2020-05-15 09:56:10 -04:00
Guillaume Abrioux c7e16aeced test: set sitepackages=false in tox
Otherwise it might try to use the system installed version of ansible
when there's one available.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 6d9acb5e6d)
2020-05-14 14:44:30 +02:00
Dimitri Savineau 02e5167f2a ceph-nfs: bind mount ganesha log directory
The current ganesha log directory is only present in the container
and not bind mount on the host.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 222fe4abd8)
2020-05-13 16:41:49 -04:00
Dimitri Savineau 015fb8e0b9 dashboard: allow disabling grafana api ssl verify
When using an untrusted TLS certificate (like self-signed) on grafana
then the grafana dashboards update subcommand will fail.
One solution could be to trust the TLS certificate.
The other one is to disable the TLS verification on the grafana API.

Closes: #5324

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit b20519efd0)
2020-05-13 16:41:36 -04:00
Dimitri Savineau e6bfdd2e44 rolling_update: fix rbdmirror group name
The rbdmirror group name was using the wrong variable definition.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit c0a213f928)
2020-05-13 16:41:23 -04:00
Dimitri Savineau 9a7af0ce6a docker2podman: manage dashboard nodes
The dashboard nodes (alertmanager, grafana, node-exporter, and prometheus)
were not manage during the docker to podman migration.

This adds the systemd container template of those services to a dedicated
file (systemd.yml) in order to include it in the docker2podman playbook.

This also adds the dashboard container images pull from docker to podman.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1829389

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 252e78b4e4)
2020-05-13 16:41:11 -04:00
Dimitri Savineau 0114457e13 docker2podman: pull images from docker daemon
The docker2podman playbook only installs the podman package and updates
the systemd units with the right container_binary value.

We never pull the container image so if one service is restarted then
the container image will be pulled first before the service can start
which could cause longer downstream.

To avoid to download the container image from internet again we can just
pull it from the local docker daemon.

The container_{binding,package,service}_name variables are removed
because they are only used in the ceph-container-engine role which
isn't call in this playbook.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d38f21aeba)
2020-05-13 16:41:11 -04:00
Benoît Knecht 94a71258a8 ceph-validate: Expand templates in rgw_create_pools
Same fix as `ceph-rgw` for `rgw_create_pools` pool names that contain Jinja
templates.

See #5348 for details.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit 444b46ea24)
2020-05-11 14:00:08 -04:00
Benoît Knecht 9268b34464 ceph-rgw: Make sure pool name templates are expanded
It is common to set templated pool names in `rgw_create_pools`, e.g.

```yaml
rgw_create_pools:
  "{{ rgw_zone }}.rgw.buckets.index":
    pg_num: 16
    size: 3
    type: replicated
```

This worked fine with Ansible 2.8, but broke in Ansible 2.9 due to a change in
the way `with_dict` works [1].

This commit replaces the use of `with_dict` with

```yaml
loop: "{{ rgw_create_pools | dict2items }}"
```

which works as intended and expands the template in the pool name.

[1]: https://docs.ansible.com/ansible/latest/porting_guides/porting_guide_2.9.html#loops

Closes #5348

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit d2b7670c7d)
2020-05-11 14:00:08 -04:00
Benoît Knecht da6e31a4c6 ceph-validate: Fix "fail on unsupported CentOS release"
The `dashboard_enabled` condition used a `true` filter (which doesn't exist)
instead of the `bool` filter.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit b7efca1785)
2020-05-08 12:50:14 -04:00
Ali Maredia 257b96634e docs: minor fixes to README-MULTISITE.md
Make all of the hosts start at 1 and not 0,
also make some minor changes in scenario 3 to
remova an inconsistency.

Signed-off-by: Ali Maredia <amaredia@redhat.com>
(cherry picked from commit bd1440f2cd)
2020-05-08 12:14:22 -04:00
Dimitri Savineau 837657b959 ceph-rgw: use match instead of equalto from jinja2
The '==' jinja2 operator (or 'equalto') has been introduced in jinja2
2.8.
On EL7, jinja2 version is 2.7 so the operator isn't present creating
templating error like:

The error was: TemplateRuntimeError: no test named '=='

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1747206

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 34e6e8e06c)
2020-05-06 15:31:19 -04:00
Dimitri Savineau 4c3a21845c ceph-nfs: fix internal ganesha deployment
Since ea2b654d9 we're not running the rados command from the monitor
nodes but from the ganesha node. Unfortunately we don't have the
required keyring on that node to run the rados command as we don't
import the right keyring.
This commit restores the workflow for internal ganesha deployment like
before ea2b654d9 but keeps the rados commands from the ganesha node for
external deployment until we have a better design.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 8a890306ad)
2020-05-06 13:30:12 -04:00
Dimitri Savineau ddd907c9ec ceph-nfs: fix keyring copy for external ganesha
Fix the condition on the keyring copy task that prevent the ganesha
keyring to be created in the /var/lib/ceph directory.
Also ensure that the directory exists first.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1831285

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 748ac4b928)
2020-05-06 13:30:12 -04:00