Commit Graph

112 Commits (f0cd3c4f48971056e3b838e4798891a350e9d23a)

Author SHA1 Message Date
Dimitri Savineau 77f32a3302 container: set tcmalloc value by default
All ceph daemons need to have the TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES
environment variable set to 128MB by default in container setup.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1970913

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 9758e3c513)
2021-07-01 15:46:19 +02:00
Guillaume Abrioux ddd7c42c2b container/systemd: ensure /var/log/ceph exists
This adds a `ExecStartPre=-/usr/bin/mkdir -p /var/log/ceph` in all
systemd service templates for all ceph daemon.
This is specific to RHCS after a Leapp upgrade is done. Indeed, the
`/var/log/ceph` seems to be removed after the upgrade.
In order to work around this issue let's ensure the directory is present
before trying to start the containers with podman.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1949489

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit bab403b603)
2021-04-14 20:46:09 +02:00
Alex Schultz 7ddbe74712 Use ansible_facts
It has come to our attention that using ansible_* vars that are
populated with INJECT_FACTS_AS_VARS=True is not very performant.  In
order to be able to support setting that to off, we need to update the
references to use ansible_facts[<thing>] instead of ansible_<thing>.

Related: ansible#73654
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1935406
Signed-off-by: Alex Schultz <aschultz@redhat.com>
(cherry picked from commit a7f2fa73e6)
2021-03-26 00:16:58 +01:00
Guillaume Abrioux b903446fa4 containers: use --cpus instead --cpu-quota
When using docker 1.13.1, the current condition:

```
{% if (container_binary == 'docker' and ceph_docker_version.split('.')[0] is version_compare('13', '>=')) or container_binary == 'podman' -%}
```

is wrong because it compares the first digit (1) whereas it should
compare the second one.
It means we always use `--cpu-quota` although documentation recommend
using `--cpus` when docker version is 1.13.1 or higher.

From the doc:
> --cpu-quota=<value>	Impose a CPU CFS quota on the container. The number of
> microseconds per --cpu-period that the container is limited to before
> throttled. As such acting as the effective ceiling.
> If you use Docker 1.13 or higher, use --cpus instead.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3e262e072b)
2021-01-28 16:37:50 -05:00
Guillaume Abrioux 63fa4c9484 containers: modify bindmount option
This commit changes the bind mount option for the mount point
`/var/lib/ceph` in the systemd template for mon and mgr containers. This
is needed in case of collocating mon/mgr with osds using dmcrypt
scenario.
Once mon/mgr got converted to containers, the dmcrypt layer sub mount is
still seen in `/var/lib/ceph`. For some reason it makes the
corresponding devices busy so any other container can't open/close it.
As a result, it prevents osds from starting properly.

Since it only happens on the nodes converted before the OSD play, the idea is
to bind mount `/var/lib/ceph` on mon and mgr with the `rshared` option
so once the sub mount is unmounted, it is propagated inside the
container so it doesn't see that mount point.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1896392

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f5ba6d9b01)
2020-12-15 17:33:11 +01:00
Dimitri Savineau f917bb015c ceph_key: set state as optional
Most ansible module using a state parameter default to the present
value (when available) instead of using it as a mandatory option.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit abb4023d76)
2020-12-01 09:53:26 -05:00
Guillaume Abrioux ef154613c8 container: remove `--ignore` from `podman rm` command
As of podman 2.0.5, `--ignore` param conflicts with `--storage`.
```
Nov 30 13:53:10 magna089 podman[164443]: Error: --storage conflicts with --volumes, --all, --latest, --ignore and --cidfile
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c68b124ba8)
2020-12-01 09:53:15 -05:00
Guillaume Abrioux fe699897ed common: add a default value for ceph_directories_mode
Since this variable makes it possible to customize the mode for ceph
directories, let's make it a bit more explicit by adding a default value
in ceph-defaults.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 483adb5d79)
2020-11-19 21:14:02 -05:00
Guillaume Abrioux ce86d695c2 container: force rm --storage on ExecStartPre
This is a workaround to avoid error like following:
```
Error: error creating container storage: the container name "ceph-mgr-magna022" is already in use by "4a5f674e113f837a0cc561dea5d2cd55d16ca159a647b7794ab06c4c276ef701"
```

that doesn't seem to be 100% reproducible but it shows up after a
reboot. The only workaround we came up with at the moment is to run
`podman rm --storage <container>` before starting it.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1887716

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5ba7824c55)
2020-11-16 16:37:46 -05:00
Dimitri Savineau f344fe6f92 podman: force log driver to journald
Since we've changed to podman configuration using the detach mode and
systemd type to forking then the container logs aren't present in the
journald anymore.
The default conmon log driver is using k8s-file.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1890439

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 16cd183b9c)
2020-11-02 17:46:48 -05:00
Benoît Knecht 69a6053114 Fix Ansible check mode for site.yml.sample playbook
Make sure the `site.yml.sample` playbook can be run in check mode by skipping
tasks that try to read the output of commands that have been skipped.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit 54ba38e35e)
2020-10-07 07:06:54 +02:00
Guillaume Abrioux 03931362dc mgr: enable pg_autoscaler by default
Otherwise, even though we set the pg autoscaler attribute on a pool, the
feature won't be working as expected.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1836431

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-08-18 14:49:31 -04:00
Dimitri Savineau d408c75d76 podman: always remove container on start
In case of failure, the systemd ExecStop isn't executed so the container
isn't removed. After a reboot of a failed node, the container doesn't
start because the old container is still present in created state.
We should always try to remove the container in ExecStartPre for this
situation.
A normal reboot doesn't trigger this issue and this also doesn't affect
nodes running containers via docker.
This behaviour was introduced by d43769d.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1858865

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 47b7c00287)
2020-07-24 12:47:21 -04:00
Dimitri Savineau eb3f065d03 podman: Add Type and PIDFile value to unit files
This changes the way we are running the podman containers via systemd.
They are now in dettached mode and Type/PIDFile set.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1834974

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d43769dc2a)
2020-06-23 17:35:01 +02:00
Dimitri Savineau 09453e22f4 docker: Add Requires on docker service
When using docker container engine then the systemd unit scripts only
use a dependency on the docker daemon via the After parameter.
But if docker is restarted on a live system then the ceph systemd units
should wait for the docker daemon to be fully restarted.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1846830

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit bd22f1d1ec)
2020-06-22 19:11:20 -04:00
abaird-rh 6878aab0f9 Updated use of deprecated filter
This was removed in Ansible 2.9.

[DEPRECATION WARNING]: Using tests as filters is deprecated. Instead of
using `result|version_compare` use `result is version_compare`. This
feature will be removed in version 2.9. Deprecation warnings can be
disabled by setting deprecation_warnings=False in ansible.cfg.

Rename 'version_compare' to the function 'version'.

version_compose was renamed to version since ansible 2.5

Signed-off-by: abaird-rh <abaird@redhat.com>
(cherry picked from commit eb71244bfd)
2020-04-20 13:37:42 -04:00
Dimitri Savineau 6a2272b9c0 ceph-mgr: add saml python lib for dashboard SSO
The dashboard SSO mgr module requires the saml python library to be
installed. This is only a valid scenario for RHCS deployment because
the saml python library isn't available in other classic repositories.
This package is present in RHCS Tools repository so we also need to
enable it on the mgr nodes.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1820233

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 6617d90733)
2020-04-06 11:00:01 -04:00
Dimitri Savineau 3617543517 containers: add KillMode=none to systemd templates
Because we are relying on docker|podman for managing containers then we
don't need systemd to manage the process (like kill).

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 5a03e0ee1c)
2020-02-18 12:10:35 -05:00
Dmitriy Rabotyagov 8d311a537d Fix undefined running_mon
Since commit [1] running_mon introduced, it can be not defined
which results in fatal error [2]. This patch defines default value which
was used before patch [1]

Signed-off-by: Dmitriy Rabotyagov <drabotyagov@vexxhost.com>

[1] 8dcbcecd71
[2] https://zuul.opendev.org/t/openstack/build/c82a73aeabd64fd583694ed04b947731/log/job-output.txt#14011

(cherry picked from commit 2478a7b948)
2020-01-16 18:28:12 -05:00
Guillaume Abrioux cae24dd85a remove container_exec_cmd_mgr fact
Iterating over all monitors in order to delegate a `
{{ container_binary }}` fails when collocating mgrs with mons, because
ceph-facts reset `container_exec_cmd` to point to the first member of
the monitor group.

The idea is to force `container_exec_cmd` to be reset in ceph-mgr.
This commit also removes the `container_exec_cmd_mgr` fact.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1791282

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8dcbcecd71)
2020-01-15 21:10:54 +01:00
Guillaume Abrioux 50738ff5c0 mgr: do not copy all keyrings on all mgr
There is no need to loop over all mgr nodes to set this fact, it's even
breaking deployments because it tries to copy all mgr keyring on all
mgr.

Closes: #4602

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit cb80231725)
2019-10-16 06:45:33 +02:00
Guillaume Abrioux 5568692340 mgr: improve mgr keyring creation
Delegating on remote node isn't necessary here since we are already
iterating over the right nodes.

Closes: #4518

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 161170524d)
2019-10-11 14:51:16 -04:00
Guillaume Abrioux 13ca0531d8 common: improve keyrings generation
There is no need to get n * number of nodes the different keyrings.
Adding a `run_once: true` here avoid running a ceph command too many
times which could be impacting large cluster deployment.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9bad239d77)
2019-10-02 14:34:27 +02:00
Guillaume Abrioux df5337535d container: isolate systemd tasks
This commit isolates the systemd unit files generation for containers into
separate yml files in order to be able importing each corresponding roles
without playing all tasks.
This is needed so we can run ceph-ansible to render systemd unit files
so they call podman instead of docker.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit bd64167469)
2019-10-01 18:50:51 +02:00
Guillaume Abrioux e1d06f498c global: remove fetch_directory dependency
This commit drops the fetch_directory dependency.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1622688

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ab370b6ad8)
2019-09-26 16:21:54 +02:00
Guillaume Abrioux a3cbb59c05 lint: fix error [301], add `changed_when: false` when needed
This commit fixes the error [301]:

`[301] Commands should not change things if nothing needs doing`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 327d564106)
2019-08-28 11:22:47 -04:00
Artur Fijalkowski 27014df45e global: make directories mode parameterizable
This commit makes it possible to parametrize the ceph directories modes.
So it changes hardocded mode for ceph related directories from 0755 to
customizable with `ceph_directories_mode` variable.

Closes: #2920

Signed-off-by: Artur Fijalkowski <artur.fijalkowski@ing.com>
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 011270ca69)
2019-08-23 11:39:23 +00:00
Guillaume Abrioux 0f90ffe9df mgr: refact 'wait for all mgr to be up' task
There's no need to use `shell` module here.
Instead of using `| python -c`, let's use `from_json` filter.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5b9b841108)
2019-08-08 15:57:54 +02:00
Guillaume Abrioux e2b41a17c0 mgr: fix a typo
this tasks isn't using the right container_exec_cmd, that's delegating
to the wrong node.
Let's use the right fact to fix this command.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ec33ee7574)
2019-07-29 15:46:58 +02:00
Guillaume Abrioux 2295a4cf0a containers: improve logging
bindmount /var/log/ceph on all containers so it's possible to retrieve
logs from the host.

related ceph-container PR: ceph/ceph-container#1408

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1710548

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 33eed78d17)
2019-07-02 11:27:34 -04:00
Guillaume Abrioux bcfed47009 dashboard: move ceph-grafana-dashboards package installation
This commit moves the package installation into ceph-dashboard role.
This is needed to install ceph dasboard json file in
`/etc/grafana/dashboards/ceph-dashboard/`.

Closes: #4026

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 6e2e30db54)
2019-06-26 12:03:21 -04:00
Guillaume Abrioux 28e1ce0d8c dashboard: append mgr modules to ceph_mgr_modules
when `dashboard_enabled` is `True`, let's append `dashboard` and
`prometheus` modules to `ceph_mgr_modules` so they are automatically
loaded.

Closes: #4026

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a2b6f44665)
2019-06-26 12:03:21 -04:00
Dimitri Savineau 590f6026bb roles: Remove useless become (true) flag
We already set the become flag to true at a play level in the site*
playbooks so we don't need to set it at a task level.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 7c3640177b)
2019-06-20 22:00:27 +00:00
Dimitri Savineau e9edb5a92a podman: Add systemd dependency on network.target
When using podman, the systemd unit scripts don't have a dependency
on the network. So we're not sure that the network is up and running
when the containers are starting.
With docker this behaviour is already handled because the systemd
unit scripts depend on docker service which is started after the
network.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f49090df7e)
2019-06-07 16:06:26 +02:00
L3D 1daca1ba83 ansible: use 'bool' filter on boolean conditionals
By running ceph-ansible there are a lot ``[DEPRECATION WARNING]`` like these:
```
[DEPRECATION WARNING]: evaluating containerized_deployment as a bare variable,
this behaviour will go away and you might need to add |bool to the expression
in the future. Also see CONDITIONAL_BARE_VARS configuration toggle.. This
feature will be removed in version 2.12. Deprecation warnings can be disabled
by setting deprecation_warnings=False in ansible.cfg.
```

Now appended ``| bool`` on a lot of the affected variables.

Sometimes the coding style from ``variable|bool`` changed to ``variable | bool`` *(with spaces at the pipe)*.

Closes: #4022

Signed-off-by: L3D <l3d@c3woc.de>
(cherry picked from commit ab54fe20ec)
2019-06-07 16:05:51 +02:00
Dimitri Savineau 27bd7df5cf ceph-mgr: install python-routes for dashboard
The ceph mgr dashboard requires routes python library to be installed
on the system.

Resolves: #3995

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f37edfa113)
2019-05-22 13:07:17 +02:00
Guillaume Abrioux e29fd842a6 rename docker_exec_cmd variable
This commit renames the `docker_exec_cmd` variable to
`container_exec_cmd` so it's more generic.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e74d80e72f)
2019-05-17 16:05:58 +02:00
Boris Ranto 5ac7559736 Merge cephmetrics/dashboard-ansible repo
This commit will merge dashboard-ansible installation scripts with
ceph-ansible. This includes several new roles to setup ceph-dashboard
and the underlying technologies like prometheus and grafana server.

Signed-off-by: Boris Ranto & Zack Cerza <team-gmeno@redhat.com>
Co-authored-by: Zack Cerza <zcerza@redhat.com>
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2f141a6e80)
2019-05-17 16:05:58 +02:00
Rishabh Dave df95900913 ceph-mgr: create keys for MGRs
Add code in ceph-mgr for creating a keyring for manager in so that
managers can be deployed on a separate node too.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 56bfec7c58)
2019-05-07 15:12:29 +02:00
Gaudenz Steinlin 29650e71d8 Fix check mode support
Adds "check_mode: no" to commands which register cluster state in a
variable and don't modify anything. These commands have to run in order
to support running the playbook in check mode.

Signed-off-by: Gaudenz Steinlin <gaudenz.steinlin@cloudscale.ch>
(cherry picked from commit 3c8987c7a5)
2019-05-07 13:07:45 +02:00
Rishabh Dave 06b3ab2a6b improve coding style
Keywords requiring only one item shouldn't express it by creating a
list with single item.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 739a662c80)

Conflicts:
	roles/ceph-mon/tasks/ceph_keys.yml
	roles/ceph-validate/tasks/check_devices.yml
2019-05-06 15:09:06 +00:00
Dimitri Savineau 2d3c636fa8 ceph-mgr: Add extra module packages
Since Nautilus there's mgr extra modules not present in ceph-mgr
package but in dedicated packages.

Resolves: #3860

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 86315272c7)
2019-04-18 19:10:31 +02:00
Matthew Vernon a4d75c6ea6 UCA: Uncomment UCA variables in defaults, fix consequent breakage
The Ubuntu Cloud Archive-related (UCA) defaults in
roles/ceph-defaults/defaults/main.yml were commented out, which means
if you set `ceph_repository` to "uca", you get undefined variable
errors, e.g.

```
The task includes an option with an undefined variable. The error was: 'ceph_stable_repo_uca' is undefined

The error appears to have been in '/nfs/users/nfs_m/mv3/software/ceph-ansible/roles/ceph-common/tasks/installs/debian_uca_repository.yml': line 6, column 3, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

- name: add ubuntu cloud archive repository
  ^ here

```

Unfortunately, uncommenting these results in some other breakage,
because further roles were written that use the fact of
`ceph_stable_release_uca` being defined as a proxy for "we're using
UCA", so try and install packages from the bionic-updates/queens
release, for example, which doesn't work. So there are a few `apt` tasks
that need modifying to not use `ceph_stable_release_uca` unless
`ceph_origin` is `repository` and `ceph_repository` is `uca`.

Closes: #3475
Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
(cherry picked from commit 9dd913cf8a)
2019-04-10 03:50:27 +00:00
Guillaume Abrioux bf672f14fe mgr: manage mgr modules when mgr and mon are collocated
When mgrs are implicitly collocated on monitors (no mgrs in mgrs group).
That include was skipped because of this condition :

`inventory_hostname == groups[mgr_group_name][0]`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit cbfdbab177)
2019-04-09 10:59:32 +02:00
Guillaume Abrioux 3272c2347f mgr: wait for all mgr to be available
before managing mgr modules, we must ensure all mgr are available
otherwise we can hit failure like following:

```
stdout:Error ENOENT: all mgr daemons do not support module 'restful', pass --force to force enablement
```

It happens because all mgr are not yet available when trying to manage
with mgr modules.

Closes: #3100

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f596cc1711)
2019-04-09 10:59:32 +02:00
Guillaume Abrioux 82764afe8d update: mask systemd service units during upgrade
This prevents the packaging from restarting services before we do need
to restart them in the rolling update sequence.
We want to handle services restart at rolling_update playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
wumingqiao 31617afca9 ceph-mgr: run mgr_modules.yml only on the first mgr host
the task will be delegated to mons[0] for all mgr hosts, so we can just run it on the first host and have the same effect.

Signed-off-by: wumingqiao <wumingqiao@beyondcent.com>
2019-03-14 20:16:33 +00:00
Dimitri Savineau a089e1ec23 systemd/service: Set docker.service conditionally
We don't need to set After=docker.service when the container_binary
variable isn't set to docker.
It doesn't break anything currently but it could be confusing when
using podman.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-03-07 20:56:11 +00:00
Dimitri Savineau cb381b41fe Add CONTAINER_IMAGE env var to ceph daemons
Ceph daemons will set the CONTAINER_IMAGE environment variable value
in the daemon metadata.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-03-05 15:07:05 +00:00
Guillaume Abrioux 8c8ec63633 container: use tmpfiles.d to creates /run/ceph
instead of using `RuntimeDirectory` parameter in systemd unit files,
let's use a systemd `tmpfiles.d` to ensure `/run/ceph`.

Explanation:

`podman` doesn't create the `/var/run/ceph` if it doesn't exist the time
where the container is run while `docker` used to create it.
In case of `switch_to_containers` scenario, `/run/ceph` gets created by
a tmpfiles.d systemd file; when switching to containers, the systemd
unit file complains because `/run/ceph` already exists

The better fix would be to ensure `/usr/lib/tmpfiles.d/ceph-common.conf`
is removed and only rely on `RuntimeDirectory` from systemd unit file parameter
but we come from a non-containerized environment which is already running,
it means `/run/ceph` is already created and when starting the unit to
start the container, systemd will still complain and we can't simply
remove the directory if daemons are collocated.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-02-13 09:42:27 +01:00