Commit Graph

289 Commits (d4024eddbb191a698668140b87a8390c468f74c1)

Author SHA1 Message Date
Guillaume Abrioux f1ae3dec72 container: remove `--ignore` from `podman rm` command
As of podman 2.0.5, `--ignore` param conflicts with `--storage`.
```
Nov 30 13:53:10 magna089 podman[164443]: Error: --storage conflicts with --volumes, --all, --latest, --ignore and --cidfile
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c68b124ba8)
2020-11-30 16:42:59 -05:00
Guillaume Abrioux 10dff6888c container: force rm --storage on ExecStartPre
This is a workaround to avoid error like following:
```
Error: error creating container storage: the container name "ceph-mgr-magna022" is already in use by "4a5f674e113f837a0cc561dea5d2cd55d16ca159a647b7794ab06c4c276ef701"
```

that doesn't seem to be 100% reproducible but it shows up after a
reboot. The only workaround we came up with at the moment is to run
`podman rm --storage <container>` before starting it.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1887716

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5ba7824c55)
2020-11-16 16:37:37 -05:00
Dimitri Savineau ec8903f4af podman: force log driver to journald
Since we've changed to podman configuration using the detach mode and
systemd type to forking then the container logs aren't present in the
journald anymore.
The default conmon log driver is using k8s-file.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1890439

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 16cd183b9c)
2020-11-02 17:46:39 -05:00
Benoît Knecht c733af9d43 Fix Ansible check mode for site.yml.sample playbook
Make sure the `site.yml.sample` playbook can be run in check mode by skipping
tasks that try to read the output of commands that have been skipped.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit 54ba38e35e)
2020-10-07 07:06:19 +02:00
Dimitri Savineau 2185a2201d library: add radosgw_zone module
This adds radosgw_zone ansible module for replacing the command module
usage with the radosgw-admin zone command.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 1281e8bcc8)
2020-10-06 15:00:17 +02:00
Dimitri Savineau 5a371b3607 library: add radosgw_zonegroup module
This adds radosgw_zonegroup ansible module for replacing the command
module usage with the radosgw-admin zonegroup command.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 65dbe0782e)
2020-10-06 15:00:17 +02:00
Dimitri Savineau 1643210ca6 library: add radosgw_realm module
This adds radosgw_realm ansible module for replacing the command module
usage with the radosgw-admin realm command.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d171f4068d)
2020-10-06 15:00:17 +02:00
Dimitri Savineau 858169a27d library: add radosgw_user module
This adds radosgw_user ansible module for replacing the command module
usage with the radosgw-admin user command.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 235c7e27cc)
2020-10-06 15:00:17 +02:00
Guillaume Abrioux 2a3b563c7e rgw: fix multi instances scaleout in baremetal
When rgw and osd are collocated, the current workflow prevents from
scaling out the radosgw_num_instances parameter when rerunning the
playbook in baremetal deployments.

When ceph-osd notifies handlers, it means rgw handlers are triggered
too. The issue with this is that they are triggered before the role
ceph-rgw is run.
In the case a scaleout operation is expected on `radosgw_num_instances`
it causes an issue because keyrings haven't been created yet so the new
instances won't start.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1881313

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a802fa2810)
2020-10-06 10:31:34 +02:00
Guillaume Abrioux cb44f655fc ceph_pool: refact module
remove complexity about current defaults in running cluster

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 29fc115f4a)
2020-10-02 09:32:53 +02:00
Ali Maredia 9f58d4a3d1 rgw multisite: check connection for realm endpoint
This commit adds connection checks before realm pulls
Curls are performed on the endpoint being pulled from
the mons and the rgws

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1731158

Signed-off-by: Ali Maredia <amaredia@redhat.com>
(cherry picked from commit 902575369c)
2020-09-29 09:24:59 -04:00
Guillaume Abrioux 5538dd8b3b Revert "ceph-rgw: remove ceph_pool state and default value"
This reverts commit ba3512a8fc.

(cherry picked from commit bf7b044c9a)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-09-28 16:47:45 -04:00
Dimitri Savineau 7e2e11320d ceph-rgw: remove ceph_pool state and default value
Since the state is now optional and default values are handled in the
ceph_pool module itself.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ba3512a8fc)
2020-09-25 14:05:34 -04:00
Dimitri Savineau 593264e5f7 ceph-rgw: use ceph_pool module
Since [1] we can use the ceph_pool module instead of using the command
module combined with ceph osd pool commands.

[1] bddcb439ce

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 8dacbce68f)
2020-09-10 21:44:19 -04:00
Dimitri Savineau e9c6028eb9 ceph-rgw: allow specifying crush rule on pool
We already support specifiying a custom crush rule during pool creation
in ceph-osd role but not in ceph-rgw role.
This patch adds the missing code to implement this feature.
Note this is only available for replicated pool not erasure. The rule
must also exist prior the pool creation.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1855439

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit cb8f0237e1)
2020-08-17 23:00:13 +02:00
Dimitri Savineau 2faed4c204 podman: always remove container on start
In case of failure, the systemd ExecStop isn't executed so the container
isn't removed. After a reboot of a failed node, the container doesn't
start because the old container is still present in created state.
We should always try to remove the container in ExecStartPre for this
situation.
A normal reboot doesn't trigger this issue and this also doesn't affect
nodes running containers via docker.
This behaviour was introduced by d43769d.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1858865

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 47b7c00287)
2020-07-24 12:47:01 -04:00
Guillaume Abrioux c0b8edfea2 rgw: set container memory limit to 4g
This commit changes the container memory limit for rgw daemons.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1707488

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 86edae724f)
2020-07-10 10:10:32 -04:00
Dimitri Savineau 596f3fa161 radosgw: remove INST_PORT environment variable
This variable isn't consumed by the container so we can remove it.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 1361e84a4e)
2020-07-03 06:37:34 +02:00
Guillaume Abrioux cdf61540d8 rgw: fix multi instances scaleout
When rgw and osd are collocated, the current workflow prevents from
scaling out the radosgw_num_instances parameter when rerunning the
playbook.

The environment file used in the rgw systemd template is rendered when
executing the `ceph-rgw` role but during a new run of the playbook (in
order to scale out rgw instances), handlers are triggered from `ceph-osd`
role which is run before `ceph-rgw`, therefore it tries to start the new
rgw daemon whereas its corresponding environment file hasn't been
rendered yet and fails like following:

```
ceph-radosgw@rgw.ceph4osd3.rgw1.service failed to run 'start-pre' task: No such file or directory
```

This commit moves the tasks generating this file in `ceph-config` role
so it is generated early.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1851906

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7dd68b9ac1)
2020-07-03 06:37:34 +02:00
Dimitri Savineau 7eddd89afa podman: Add Type and PIDFile value to unit files
This changes the way we are running the podman containers via systemd.
They are now in dettached mode and Type/PIDFile set.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1834974

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d43769dc2a)
2020-06-23 17:35:24 +02:00
Dimitri Savineau 5428a41fcf docker: Add Requires on docker service
When using docker container engine then the systemd unit scripts only
use a dependency on the docker daemon via the After parameter.
But if docker is restarted on a live system then the ceph systemd units
should wait for the docker daemon to be fully restarted.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1846830

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit bd22f1d1ec)
2020-06-22 17:30:28 -04:00
Ali Maredia 5b76ba12f7 rgw multisite: add master zone endpoints to zonegroup
We were only adding the endpoints to the master zone but not to the
zonegroup.
This patch fixes the issue.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1839228

Signed-off-by: Ali Maredia <amaredia@redhat.com>
(cherry picked from commit 0175c205fa)
2020-06-09 12:29:56 -04:00
Benoît Knecht 9268b34464 ceph-rgw: Make sure pool name templates are expanded
It is common to set templated pool names in `rgw_create_pools`, e.g.

```yaml
rgw_create_pools:
  "{{ rgw_zone }}.rgw.buckets.index":
    pg_num: 16
    size: 3
    type: replicated
```

This worked fine with Ansible 2.8, but broke in Ansible 2.9 due to a change in
the way `with_dict` works [1].

This commit replaces the use of `with_dict` with

```yaml
loop: "{{ rgw_create_pools | dict2items }}"
```

which works as intended and expands the template in the pool name.

[1]: https://docs.ansible.com/ansible/latest/porting_guides/porting_guide_2.9.html#loops

Closes #5348

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit d2b7670c7d)
2020-05-11 14:00:08 -04:00
Dimitri Savineau 837657b959 ceph-rgw: use match instead of equalto from jinja2
The '==' jinja2 operator (or 'equalto') has been introduced in jinja2
2.8.
On EL7, jinja2 version is 2.7 so the operator isn't present creating
templating error like:

The error was: TemplateRuntimeError: no test named '=='

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1747206

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 34e6e8e06c)
2020-05-06 15:31:19 -04:00
Guillaume Abrioux 60a2e28189 rgw: add multi-instances support when deploying multisite
This commit adds the multi-instances when deploying rgw multisite

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-12 16:44:48 -04:00
Dimitri Savineau e62532de46 update osd pool set size command
Since [1] we can't use osd pool without replicas (size: 1) by default.
We now need to set the mon_allow_pool_size_one flag to true in the ceph
configuration and add the --yes-i-really-mean-it flag to the osd pool
set size cli.

[1] https://github.com/ceph/ceph/commit/21508bd

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-11 11:25:42 +01:00
Guillaume Abrioux b3bbd6bb77 rgw: fix a typo in create_realm_zonegroup_zone_lists
This commit fixes a typo.

`s/realms/secondary_realms`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-10 14:13:30 +01:00
Guillaume Abrioux 7a8a719e75 rgw: add retry/until on pools tasks
Sometimes, these task can timeout for some reason.
Adding these retries can help to avoid unexcepted failures.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-06 08:55:13 -05:00
Ali Maredia 71f55bd54d rgw multisite: enable more than 1 realm per cluster
Make it so that more than one realm, zonegroup,
or zone can be created during a run of the rgw
multisite ansible playbooks.

The rgw hosts now need to be grouped into zones
and realms in the inventory.

.yml files need to be created in group_vars
for the realms and zones. Sample yaml files
are available.

Also remove multsite destroy playbook
and add --cluster before radosgw-admin commands

remove manually added rgw_zone_endpoints var
and have ceph-ansible automatically add the
correct endpoints of all the rgws in a rgw_zone
from the information provided in that rgws hostvars.

Signed-off-by: Ali Maredia <amaredia@redhat.com>
2020-03-04 12:58:13 -05:00
Dimitri Savineau 9d3b49293d purge: stop rgw instances by iteration
It looks like that the service module doesn't support wildcard anymore
for stopping/disabling multiple services.

fatal: [rgw0]: FAILED! => changed=false
  msg: 'This module does not currently support using glob patterns,
        found ''*'' in service name: ceph-radosgw@*'
...ignoring

Instead we should iterate over the rgw_instances list.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-02 16:32:06 +01:00
Dimitri Savineau 44e750ee5d ceph-rgw: increase connection timeout to 10
5s as a connection timeout could be low in some setup. Let's increase
it to 10s.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-02-24 16:01:36 +01:00
Sam Choraria 2a2656a985 ceph-rgw: allow SSL certificate content to supplied
Allow SSL certificate & key contents to be written to the path
specified by radosgw_frontend_ssl_certificate. This permits a
certificate to be deployed & renewal of expired certificates
through ceph-ansible.

Signed-off-by: Sam Choraria <sam.choraria@bbc.co.uk>
2020-02-17 16:22:11 +01:00
Ali Maredia 1834c1e48d rgw: extend automatic rgw pool creation capability
Add support for erasure code pools.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1731148

Signed-off-by: Ali Maredia <amaredia@redhat.com>
Co-authored-by: Dimitri Savineau <dsavinea@redhat.com>
2020-02-17 16:07:43 +01:00
Dimitri Savineau 5a03e0ee1c containers: add KillMode=none to systemd templates
Because we are relying on docker|podman for managing containers then we
don't need systemd to manage the process (like kill).

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-02-13 16:11:33 +01:00
Guillaume Abrioux 483adb5d79 common: add a default value for ceph_directories_mode
Since this variable makes it possible to customize the mode for ceph
directories, let's make it a bit more explicit by adding a default value
in ceph-defaults.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-01-22 09:35:35 +01:00
Benoît Knecht 3842aa1a30 ceph-rgw: Fix customize pool size "when" condition
In 3c31b19ab3, I fixed the `customize pool
size` task by replacing `item.size` with `item.value.size`. However, I
missed the same issue in the `when` condition.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
2020-01-20 09:26:53 -05:00
Guillaume Abrioux 3e262e072b containers: use --cpus instead --cpu-quota
When using docker 1.13.1, the current condition:

```
{% if (container_binary == 'docker' and ceph_docker_version.split('.')[0] is version_compare('13', '>=')) or container_binary == 'podman' -%}
```

is wrong because it compares the first digit (1) whereas it should
compare the second one.
It means we always use `--cpu-quota` although documentation recommend
using `--cpus` when docker version is 1.13.1 or higher.

From the doc:
> --cpu-quota=<value>	Impose a CPU CFS quota on the container. The number of
> microseconds per --cpu-period that the container is limited to before
> throttled. As such acting as the effective ceiling.
> If you use Docker 1.13 or higher, use --cpus instead.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-01-16 13:51:43 -05:00
Benoît Knecht 3c31b19ab3 ceph-rgw: Fix custom pool size setting
RadosGW pools can be created by setting

```yaml
rgw_create_pools:
  .rgw.root:
    pg_num: 512
    size: 2
```

for instance. However, doing so would create pools of size
`osd_pool_default_size` regardless of the `size` value. This was due to
the fact that the Ansible task used

```
{{ item.size | default(osd_pool_default_size) }}
```

as the pool size value, but `item.size` is always undefined; the
correct variable is `item.value.size`.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
2020-01-08 16:16:38 -05:00
Guillaume Abrioux 9bad239d77 common: improve keyrings generation
There is no need to get n * number of nodes the different keyrings.
Adding a `run_once: true` here avoid running a ceph command too many
times which could be impacting large cluster deployment.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-10-02 13:09:50 +02:00
Guillaume Abrioux e08194dd67 rgw: refact tasks directory layout
This commit moves containerized deployment related files to `./tasks/`
directory. This is needed to make `docker-to-podman.yml` working since
we use `tasks_from:` option.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-10-01 10:27:51 -04:00
Guillaume Abrioux bd64167469 container: isolate systemd tasks
This commit isolates the systemd unit files generation for containers into
separate yml files in order to be able importing each corresponding roles
without playing all tasks.
This is needed so we can run ceph-ansible to render systemd unit files
so they call podman instead of docker.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-10-01 10:27:51 -04:00
Guillaume Abrioux ab370b6ad8 global: remove fetch_directory dependency
This commit drops the fetch_directory dependency.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1622688

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-09-26 11:35:24 +02:00
Dimitri Savineau 42082c0a27 lint: fix error [201,206]
[201] Trailing whitespace
 [206] Variables should have spaces before and after: {{ var_name }}

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-08-29 14:28:35 -04:00
Artur Fijalkowski 011270ca69 global: make directories mode parameterizable
This commit makes it possible to parametrize the ceph directories modes.
So it changes hardocded mode for ceph related directories from 0755 to
customizable with `ceph_directories_mode` variable.

Closes: #2920

Signed-off-by: Artur Fijalkowski <artur.fijalkowski@ing.com>
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-08-23 09:38:17 +02:00
guihecheng a0590cae9d rgw/multisite: assign 'rgw_zone' to the exact section in ceph.conf
since the following commit:
  commit 1ac94c048f
  rgw: add support for multiple rgw instances on a single host

we have multi-instance rgw support on a single host and
the config section name of the rgw changed from
[client.rgw.$(hostname)] -> [client.rgw.$(hostname).rgwX]
when X is the sequence number: 0,1,2,...
So we should assign 'rgw_zone' item to the exact rgw instance
config section in ceph.conf

Signed-off-by: guihecheng <guihecheng@cmiot.chinamobile.com>
2019-08-23 08:14:10 +02:00
Guillaume Abrioux 19950b5170 container: rename docker directories
Those 2 directories should be renamed to be more generic (docker vs.
podman).

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-07-24 16:31:46 +02:00
Giulio Fidente d526803c6c Add radosgw_frontend_ssl_certificate parameter
This is necessary when configuring RGW with SSL because
in addition to passing specific frontend options, civetweb
appends the 's' character to the binding port and beast uses
ssl_endpoint instead of endpoint.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1722071
Signed-off-by: Giulio Fidente <gfidente@redhat.com>
2019-07-02 14:14:37 -04:00
Guillaume Abrioux 33eed78d17 containers: improve logging
bindmount /var/log/ceph on all containers so it's possible to retrieve
logs from the host.

related ceph-container PR: ceph/ceph-container#1408

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1710548

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-06-28 13:30:36 -04:00
Dimitri Savineau 7c3640177b roles: Remove useless become (true) flag
We already set the become flag to true at a play level in the site*
playbooks so we don't need to set it at a task level.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-06-19 10:31:32 +02:00
Dimitri Savineau f49090df7e podman: Add systemd dependency on network.target
When using podman, the systemd unit scripts don't have a dependency
on the network. So we're not sure that the network is up and running
when the containers are starting.
With docker this behaviour is already handled because the systemd
unit scripts depend on docker service which is started after the
network.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-06-07 09:28:58 +02:00