* Make common params of container args in a var to avoid duplication
* The /var/lib/ceph/crash mount was missing after 637ca81c9c
* Add CEPH_USE_RANDOM_NONCE as it's needed when running inside container (can be removed for squid later)
* Add NODE_NAME as some part of ceph code relies on this var
* add default logging opts for
Signed-off-by: Seena Fallah <seenafallah@gmail.com>
Add --security-opt label=disable to all containers
accessing /var/lib/ceph. podman selinux relabeling behavious changed
since version podman-3:4.2.0-1 which prevent some containers to access
files in these subdirectories.
Signed-off-by: Teoman ONAY <tonay@ibm.com>
With podman version podman-3:4.2.0-4.module+el8.7.0+17064+3b31f55c and
later, when mgr fails to start if mon is already running.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2169767
Signed-off-by: Teoman ONAY <tonay@ibm.com>
In order to reduce need of module
internal maintenance and to join forces on plugin development,
it's proposed to switch to using upstream version of
config_template module.
As it's shipped as collection, it's installation for end-users
is trivial and aligns with general approach of shipping extra modules.
Signed-off-by: Dmitriy Rabotyagov <noonedeadpunk@ya.ru>
Update `After=` and `Wants=` parameters in container systemd units
and make them be aligned with the systemd units that come
from the packaging.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2027440
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This adds ceph-*.target systemd unit files support for containerized
deployments.
This also fixes a regression introduced by PR #6719 (rgw and nfs systemd
units not getting purged)
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1962748
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
sufficient for the default value (512) of rgw thread pool size.
But if its value is increased near to the pids-limit value,
it does not leave place for the other processes to spawn and run within
the container and the container crashes.
pids-limit set to unlimited regardless of the container engine.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1987041
Signed-off-by: Teoman ONAY <tonay@redhat.com>
All ceph daemons need to have the TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES
environment variable set to 128MB by default in container setup.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1970913
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
This adds a `ExecStartPre=-/usr/bin/mkdir -p /var/log/ceph` in all
systemd service templates for all ceph daemon.
This is specific to RHCS after a Leapp upgrade is done. Indeed, the
`/var/log/ceph` seems to be removed after the upgrade.
In order to work around this issue let's ensure the directory is present
before trying to start the containers with podman.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1949489
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
It has come to our attention that using ansible_* vars that are
populated with INJECT_FACTS_AS_VARS=True is not very performant. In
order to be able to support setting that to off, we need to update the
references to use ansible_facts[<thing>] instead of ansible_<thing>.
Related: ansible#73654
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1935406
Signed-off-by: Alex Schultz <aschultz@redhat.com>
As of podman 2.0.5, `--ignore` param conflicts with `--storage`.
```
Nov 30 13:53:10 magna089 podman[164443]: Error: --storage conflicts with --volumes, --all, --latest, --ignore and --cidfile
```
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This is a workaround to avoid error like following:
```
Error: error creating container storage: the container name "ceph-mgr-magna022" is already in use by "4a5f674e113f837a0cc561dea5d2cd55d16ca159a647b7794ab06c4c276ef701"
```
that doesn't seem to be 100% reproducible but it shows up after a
reboot. The only workaround we came up with at the moment is to run
`podman rm --storage <container>` before starting it.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1887716
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Instead of using ceph auth get-or-create command via the ansible command
module then we can use the ceph_key module.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Instead of using ceph auth get command via the ansible command module
then we can use the ceph_key module and the info state.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Since we've changed to podman configuration using the detach mode and
systemd type to forking then the container logs aren't present in the
journald anymore.
The default conmon log driver is using k8s-file.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1890439
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
This adds the ceph_fs ansible module for replacing the command module
usage with the ceph fs command.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Otherwise this will generate an ansible warning about the missing
filter.
[DEPRECATION WARNING]: evaluating xxx as a bare variable, this behaviour
will go away and you might need to add |bool to the expression in the
future.
Also see CONDITIONAL_BARE_VARS configuration toggle.. This feature will
be removed in version 2.12.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Since af9f6684 the cephfs pool(s) creation don't use the fs_pools_created
variable anymore because the ceph_pool module is idempotent.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Most ansible module using a state parameter default to the present
value (when available) instead of using it as a mandatory option.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Before [1] we were using default value for
- size
- min_size
- rule_name
when the key wasn't present in the pool dict.
The commit [1] changed this by defaulting to omit.
This patch restores the original workflow by using facts:
- osd_pool_default_size
- osd_pool_default_min_size
- ceph_osd_pool_default_crush_rule_name
[1] af9f6684f2
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
The mds_name fact always gets the ansible_hostname value so we don't
need to have a dedicated fact for this and use the ansible_hostname fact
instead.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
In case of failure, the systemd ExecStop isn't executed so the container
isn't removed. After a reboot of a failed node, the container doesn't
start because the old container is still present in created state.
We should always try to remove the container in ExecStartPre for this
situation.
A normal reboot doesn't trigger this issue and this also doesn't affect
nodes running containers via docker.
This behaviour was introduced by d43769d.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1858865
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
This changes the way we are running the podman containers via systemd.
They are now in dettached mode and Type/PIDFile set.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1834974
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
When using docker container engine then the systemd unit scripts only
use a dependency on the docker daemon via the After parameter.
But if docker is restarted on a live system then the ceph systemd units
should wait for the docker daemon to be fully restarted.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1846830
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
This commits calls the `ceph_pool` module for creating ceph pools
everywhere it's needed in the playbook.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The condition on this task is wrong, we have to check whether
`target_size_ratio` is set in the pool definition instead.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
this commit removes the task which enable application on cephfs pools.
See: https://tracker.ceph.com/issues/43761Fixes: #5278
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
When `rule_name` isn't set in `crush_rules` the osd pool creation will
fail.
This commit adds a new fact `ceph_osd_pool_default_crush_rule_name` with
the default crush rule name.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1817586
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Since [1] we can't use osd pool without replicas (size: 1) by default.
We now need to set the mon_allow_pool_size_one flag to true in the ceph
configuration and add the --yes-i-really-mean-it flag to the osd pool
set size cli.
[1] https://github.com/ceph/ceph/commit/21508bd
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
This commit adds condition in order to not try to customize pools size
when its type is erasure.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This commit adds the pg autoscaler support.
The structure for pool definition has now two additional attributes
`pg_autoscale_mode` and `target_size_ratio`, eg:
```
test:
name: "test"
pg_num: "{{ osd_pool_default_pg_num }}"
pgp_num: "{{ osd_pool_default_pg_num }}"
rule_name: "replicated_rule"
application: "rbd"
type: 1
erasure_profile: ""
expected_num_objects: ""
size: "{{ osd_pool_default_size }}"
min_size: "{{ osd_pool_default_min_size }}"
pg_autoscale_mode: False
target_size_ratio": 0.1
```
when `pg_autoscale_mode` is `True` user has to set a decent value in
`target_size_ratio`.
Given that it's a new feature, it's still disabled by default.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1782253
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Currently, the command executed is wrong, eg:
```
cmd:
- podman
- exec
- ceph-mon-controller-0
- ceph
- --cluster
- ceph
- osd
- pool
- create
- volumes
- '32'
- '32'
- replicated_rule
- '1'
delta: '0:00:01.625525'
end: '2020-02-27 16:41:05.232705'
item:
```
From documentation, the osd pool creation command is :
```
ceph osd pool create {pool-name} {pg-num} [{pgp-num}] [replicated] \
[crush-rule-name] [expected-num-objects]
ceph osd pool create {pool-name} {pg-num} {pgp-num} erasure \
[erasure-code-profile] [crush-rule-name] [expected_num_objects]
```
it means we pass '1' (from item.type) as value for
`expected_num_objects` by default which is very likely not what we want.
Also, this commit modifies the default value when no `rule_name` is set
to use the existing variable `osd_pool_default_crush_rule`
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1808495
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Because we are relying on docker|podman for managing containers then we
don't need systemd to manage the process (like kill).
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Since this variable makes it possible to customize the mode for ceph
directories, let's make it a bit more explicit by adding a default value
in ceph-defaults.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
When using docker 1.13.1, the current condition:
```
{% if (container_binary == 'docker' and ceph_docker_version.split('.')[0] is version_compare('13', '>=')) or container_binary == 'podman' -%}
```
is wrong because it compares the first digit (1) whereas it should
compare the second one.
It means we always use `--cpu-quota` although documentation recommend
using `--cpus` when docker version is 1.13.1 or higher.
From the doc:
> --cpu-quota=<value> Impose a CPU CFS quota on the container. The number of
> microseconds per --cpu-period that the container is limited to before
> throttled. As such acting as the effective ceiling.
> If you use Docker 1.13 or higher, use --cpus instead.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Configuration of cephfs with an existing cluster using --limit used to fail
at different tasks while running with site-docker.yml
This commit addresses both of those tasks
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1773489
Signed-off-by: VasishtaShastry <vipin.indiasmg@gmail.com>
[303] mktemp used in place of tempfile module
[602] Don't compare to empty string
[701] No 'galaxy_info' found
[702] Use 'galaxy_tags' rather than 'categories'
This patch also changes the ansible log_path value via the
ANSIBLE_LOG_PATH environment variable in the travis configuration to
avoid warnings.
[WARNING]: log file at /home/travis/ansible/ansible.log is not writeable
and we cannot create it, aborting
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
This commit removes some legacy tasks.
These tasks aren't needed, they cause the playbook to fail when
collocating daemons.
Closes: #4553
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
There is no need to get n * number of nodes the different keyrings.
Adding a `run_once: true` here avoid running a ceph command too many
times which could be impacting large cluster deployment.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>