Commit Graph

2677 Commits (a36eee18525c7b84380040dd7e908165a47a3a28)

Author SHA1 Message Date
Dimitri Savineau 07d2160421 dashboard: manage password backward compatibility
The ceph dashboard changed the way the password are provided via the
CLI.
This breaks the backward compatibility when using a recent ceph-ansible
version with ceph release without that feature.
This patch adds tasks for legacy workflow (ceph release without that
feature) in ceph-dashboard role.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1915506

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2021-01-19 18:05:02 +01:00
Guillaume Abrioux 623ca14682 dashboard: configure passwords via stdin
Due to recent changes in ceph, the few dashboard passwors
must be passed via `-i`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ef975ef5ea)
2021-01-19 18:05:02 +01:00
Mike Currin 360a2d2b30 Path for ceph config missing in crash template
The path where ceph.conf is located (/etc/ceph) missing in the Docker container bind mounts, this throws errors

Signed-off-by: Mike Currin <currin@gmail.com>
(cherry picked from commit 4cbc9a48c9)
2021-01-06 16:55:39 +01:00
Guillaume Abrioux 290d3ef369 rgw: support switching from single-site to multisite
When collocating rgw with either a mon, mgr or osd, switching from
single site to a multisite rgw setup failed because of the handlers
triggered between the ansible play of the collocated daemon and the play
of the rgw. Since the multisite changes are not yet applied the handlers
fail.
The idea here is to ensure we run the multisite configuration from the
ceph-handler role before the restart happens, this way it won't complain
because of non existing multisite configuration.

(Note: this is also valid when simply changing a multisite configuration)

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1888630

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 513c8cfe55)
2021-01-06 10:38:50 -05:00
Guillaume Abrioux 607ef5a7d2 common: do not use pipefail when not needed
Let's discard the ansible lint error 306 and add a "# noqa 306" on tasks
where we don't need `set -o pipefail`

Fixes: #6090

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 86a8889ee3)
2020-12-16 14:05:45 +01:00
Guillaume Abrioux 6855feb604 ceph-osd: refact `docker_exec_start_osd`
This commit drops nested jinja construction in this set_fact task.
It also rename it to `container_exec_start_osd`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ff95fa9c32)
2020-12-16 14:05:45 +01:00
Dimitri Savineau 24a5b1bbb5 ceph-config: fix ceph-volume lvm batch report
Since the major ceph-volume lvm batch refactoring, the report value
is different.
Before the refact, the report was a dict with the OSDs list to be created
under the "osds" key.
After the refact, the report is a list of dict.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 827b23353f)
2020-12-15 17:26:01 -05:00
Dimitri Savineau 3f16132e44 library: add ceph_osd_flag module
This adds ceph_osd_flag ansible module for replacing the command module
usage with the ceph osd set/unset commands.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 5da593604a)
2020-12-15 17:36:28 +01:00
Dimitri Savineau e51f68fdbb ceph-iscsi: set the pool name in the config file
When using a custom pool for iSCSI gateway then we need to set the pool
name in the configuration otherwise the default rbd pool name will be
used.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 40a87c4b92)
2020-12-15 17:33:24 +01:00
Guillaume Abrioux 63fa4c9484 containers: modify bindmount option
This commit changes the bind mount option for the mount point
`/var/lib/ceph` in the systemd template for mon and mgr containers. This
is needed in case of collocating mon/mgr with osds using dmcrypt
scenario.
Once mon/mgr got converted to containers, the dmcrypt layer sub mount is
still seen in `/var/lib/ceph`. For some reason it makes the
corresponding devices busy so any other container can't open/close it.
As a result, it prevents osds from starting properly.

Since it only happens on the nodes converted before the OSD play, the idea is
to bind mount `/var/lib/ceph` on mon and mgr with the `rshared` option
so once the sub mount is unmounted, it is propagated inside the
container so it doesn't see that mount point.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1896392

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f5ba6d9b01)
2020-12-15 17:33:11 +01:00
Dimitri Savineau fa06752e4b alertmanager/prometheus: fix owner/group
Set the owner/group on alertmanager and prometheus directories and
files to nobody and nogroup (uid and gid 65534) to avoid permission
issues.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1901543

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit eb452d35bc)
2020-12-15 17:32:50 +01:00
Guillaume Abrioux 69b5b96f2d osd: add tag on 'wait for all osd to be up' task
This allows skipping this task if really desired.
Use it carefully. Use it at your own risk.

Fixes: #6073

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5c4ae5356d)
2020-12-15 17:32:09 +01:00
Jukka Nousiainen dca1534ee6 ceph-mon: No become during gen mon initial keyring
Since the backing generate_secret() just hands out urandom output,
running as privileged doesn't seem to be required. It's not
desireable to provide sudo in some Ansible runner environments.

Signed-off-by: Jukka Nousiainen <jukka.nousiainen@csc.fi>
(cherry picked from commit eb7473491b)
2020-12-15 17:31:37 +01:00
Dimitri Savineau 9858d61a57 Revert "config: Always use osd_memory_target if set"
This reverts commit 4d1fdd2b05.

This breaks the backward compatibility with previous osd_memory_target
calculation and we could have a value lower than the minimum value allowed
(896M) which causes some ceph commands to fail (like ceph assimilate-conf).

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit aa6e1f20ea)
2020-12-15 17:31:09 +01:00
Seena Fallah 2485b35825 ceph-osd: use global crush_device_class in lvm_volumes
Use global crush_device_class variable if it's not set per OSD

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
(cherry picked from commit 5e9444fa5c)
2020-12-15 17:30:55 +01:00
Karl-Heinz Preuß 00793c9221 fix broken ceph-fetch-keys role
set fetch_directory variable in default/main.yml instead of using the
defaults jinja filter in tasks/main.yml.

Fixes: #6072

Signed-off-by: Karl-Heinz Preuß <karl-heinz.preuss@cms.hu-berlin.de>
(cherry picked from commit 6ce34ef59f)
2020-12-15 17:30:42 +01:00
Dimitri Savineau f18142fc2e group_vars: remove useless files
Delete legacy files that aren't used anymore.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit e790b0851d)
2020-12-15 17:30:42 +01:00
Guillaume Abrioux 1fcf71dc33 common: drop `fetch_directory` feature
This commit drops the `fetch_directory` feature.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1cc9666c09)
2020-12-15 17:30:42 +01:00
Guillaume Abrioux dc7b9519f4 ceph-config: ceph.conf rendering refactor
This commit cleans up the `main.yml` task file of `ceph-config`.
It drops the local ceph.conf generation.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 900c0f4492)
2020-12-15 17:30:42 +01:00
Guillaume Abrioux d14723d5b4 mon: refact initial keyring generation
adding monitor is no longer possible because we generate a new mon
keyring each time the playbook is run.

Fixes: #5864
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1902281

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 970c6a4ee6)
2020-12-01 09:53:26 -05:00
Dimitri Savineau f917bb015c ceph_key: set state as optional
Most ansible module using a state parameter default to the present
value (when available) instead of using it as a mandatory option.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit abb4023d76)
2020-12-01 09:53:26 -05:00
Guillaume Abrioux 0d22598806 iscsigw: remove `--cap-add=all` from `podman run` cmd
As of podman `2.0.5`, `--cap-add` and `--privileged` are exclusive
options.

```
Nov 30 13:56:30 magna089 podman[171677]: Error: invalid config provided: CapAdd and privileged are mutually exclusive options
```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1902149

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d40dd764e0)
2020-12-01 09:53:15 -05:00
Guillaume Abrioux ef154613c8 container: remove `--ignore` from `podman rm` command
As of podman 2.0.5, `--ignore` param conflicts with `--storage`.
```
Nov 30 13:53:10 magna089 podman[164443]: Error: --storage conflicts with --volumes, --all, --latest, --ignore and --cidfile
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c68b124ba8)
2020-12-01 09:53:15 -05:00
Guillaume Abrioux fe699897ed common: add a default value for ceph_directories_mode
Since this variable makes it possible to customize the mode for ceph
directories, let's make it a bit more explicit by adding a default value
in ceph-defaults.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 483adb5d79)
2020-11-19 21:14:02 -05:00
Guillaume Abrioux 0efc347a67 osd: ensure /var/lib/ceph/osd/{cluster}-{id} is present
This commit ensures that the `/var/lib/ceph/osd/{{ cluster }}-{{ osd_id }}` is
present before starting OSDs.

This is needed specificly when redeploying an OSD in case of OS upgrade
failure.
Since ceph data are still present on its devices then the node can be
redeployed, however those directories aren't present since they are
initially created by ceph-volume. We could recreate them manually but
for better user experience we can ask ceph-ansible to recreate them.

NOTE:
this only works for OSDs that were deployed with ceph-volume.
ceph-disk deployed OSDs would have to get those directories recreated
manually.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1898486

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 873fc8ec0f)
2020-11-19 21:14:02 -05:00
Dimitri Savineau 76a77f1c92 ceph-facts: fix read osd pool default crush fact
We don't need to use run_once on that task when having running monitors
otherwise the read task could be skip and the set task will fail.

The conditional check 'crush_rule_variable.rc == 0' failed. The error
was: error while evaluating conditional (crush_rule_variable.rc == 0):
'dict object' has no attribute 'rc'

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1898856

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit e150df789e)
2020-11-18 17:01:14 -05:00
Guillaume Abrioux ce86d695c2 container: force rm --storage on ExecStartPre
This is a workaround to avoid error like following:
```
Error: error creating container storage: the container name "ceph-mgr-magna022" is already in use by "4a5f674e113f837a0cc561dea5d2cd55d16ca159a647b7794ab06c4c276ef701"
```

that doesn't seem to be 100% reproducible but it shows up after a
reboot. The only workaround we came up with at the moment is to run
`podman rm --storage <container>` before starting it.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1887716

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5ba7824c55)
2020-11-16 16:37:46 -05:00
Gaudenz Steinlin 0f679e7b20 config: Always use osd_memory_target if set
The osd_memory_target variable was only used if it was higher than the
calculated value based on the number of OSDs. This is changed to always
use the value if it is set in the configuration. This allows this value
to be intentionally set lower so that it does not have to be changed
when more OSDs are added later.

Signed-off-by: Gaudenz Steinlin <gaudenz.steinlin@cloudscale.ch>
(cherry picked from commit 4d1fdd2b05)
2020-11-16 09:21:01 +01:00
Benoît Knecht 2ea3db269e ceph-facts: Fix osd_pool_default_crush_rule fact
The `osd_pool_default_crush_rule` is set based on `crush_rule_variable`, which
is the output of a `grep` command.

However, two consecutive tasks can set that variable, and if the second task is
skipped, it still overwrites the `crush_rule_variable`, leading the
`osd_pool_default_crush_rule` to be set to `ceph_osd_pool_default_crush_rule`
instead of the output of the first task.

This commit ensures that the fact is set right after the `crush_rule_variable`
is assigned, before it can be overwritten.

Closes #5912

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit c5f7343a2f)
2020-11-13 10:42:13 -05:00
Guillaume Abrioux aadef08311 dashboard: change dashboard_grafana_api_no_ssl_verify default value
This sets the `dashboard_grafana_api_no_ssl_verify` default value
according to the length of `dashboard_crt` and `dashboard_key`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5cadfea42e)
2020-11-05 09:02:15 +01:00
Guillaume Abrioux c9b7e46213 dashboard: enable https by default
see linked bz for details

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1889426

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 767d3c898e)
2020-11-05 09:02:15 +01:00
Gaudenz Steinlin f344a4810c osd: Fix number of OSD calculation
If some OSDs are to be created and others already exist the calculation
only counted the to be created OSDs. This changes the calculation to
take all OSDs into account.

Signed-off-by: Gaudenz Steinlin <gaudenz.steinlin@cloudscale.ch>
(cherry picked from commit 15044da030)
2020-11-03 21:52:44 -05:00
Dimitri Savineau bcd2797d11 rgw/rbdmirror: use service dump instead of ceph -s
The ceph status command returns a lot of information stored in variables
and/or facts which could consume resources for nothing.
When checking the rgw/rbdmirror services status, we're only using the
servicmap structure in the ceph status output.
To optimize this, we could use the ceph service dump command which contains
the same needed information.
This command returns less information and is slightly faster than the ceph
status command.

$ ceph status -f json | wc -c
2001
$ ceph service dump -f json | wc -c
1105
$ time ceph status -f json > /dev/null

real	0m0.557s
user	0m0.516s
sys	0m0.040s
$ time ceph service dump -f json > /dev/null

real	0m0.454s
user	0m0.434s
sys	0m0.020s

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 3f9081931f)
2020-11-03 14:38:49 -05:00
Dimitri Savineau 69b51b5f19 monitor: use quorum_status instead of ceph status
The ceph status command returns a lot of information stored in variables
and/or facts which could consume resources for nothing.
When checking the quorum status, we're only using the quorum_names
structure in the ceph status output.
To optimize this, we could use the ceph quorum_status command which contains
the same needed information.
This command returns less information.

$ ceph status -f json  | wc -c
2001
$ ceph quorum_status -f json  | wc -c
957
$ time ceph status -f json > /dev/null

real	0m0.577s
user	0m0.538s
sys	0m0.029s
$ time ceph quorum_status -f json > /dev/null

real	0m0.544s
user	0m0.527s
sys	0m0.016s

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 88f91d8c12)
2020-11-03 14:38:49 -05:00
wangxiaotong 8bc0806f10 osds: use ceph osd stat instead of ceph status
Improve the checked way of the OSD created checking process.
This replaces the ceph status command by the ceph osd stat command.
The osdmap structure isn't needed anymore.

$ ceph status -f json | wc -c
2001
$ ceph osd stat -f json | wc -c
132
$ time ceph status -f json > /dev/null

real    0m0.563s
user    0m0.526s
sys     0m0.036s
$ time ceph osd stat -f json > /dev/null

real	0m0.457s
user	0m0.411s
sys	0m0.045s

Signed-off-by: wangxiaotong <wangxiaotong@fiberhome.com>
(cherry picked from commit b9cb0f12e9)
2020-11-03 14:38:49 -05:00
Guillaume Abrioux 61001660bb common: follow up on #5948
In addition to f7e2b2c608

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 371d854a5c)
2020-11-03 09:52:42 +01:00
Benoît Knecht 4a7186697e ceph-mon: Don't set monitor directory mode recursively
After rolling updates performed with
`infrastructure-playbooks/rolling_updates.yml`, files located in
`/var/lib/ceph/mon/{{ cluster }}-{{ monitor_name }}` had mode 0755 (including
the keyring), making them world-readable.

This commit separates the task that configured permissions recursively on
`/var/lib/ceph/mon/{{ cluster }}-{{ monitor_name }}` into two separate tasks:

1. Set the ownership and mode of the directory itself;
2. Recursively set ownership in the directory, but don't modify the mode.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit 0d76826bbb)
2020-11-02 18:42:31 -05:00
Gaudenz Steinlin a1ff05b26e openstack: use ceph_keyring_permissions by default
Otherwise this task fails if no permission is set on the item.
Previously the code omited the mode parameter if it was not set, but
this was lost with commit ab370b6ad8.

Signed-off-by: Gaudenz Steinlin <gaudenz.steinlin@cloudscale.ch>
(cherry picked from commit 79ff79c422)
2020-11-02 18:42:18 -05:00
Dimitri Savineau f344fe6f92 podman: force log driver to journald
Since we've changed to podman configuration using the detach mode and
systemd type to forking then the container logs aren't present in the
journald anymore.
The default conmon log driver is using k8s-file.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1890439

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 16cd183b9c)
2020-11-02 17:46:48 -05:00
Dimitri Savineau 34a310e5f4 ceph-handler: fix curl ipv6 command with rgw
When using the curl command with ipv6 address and brackets then we need
to use the -g option otherwise the command fails.

$ curl http://[fdc2:328:750b:6983::6]:8080
curl: (3) [globbing] error: bad range specification after pos 9

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit cdb7b09cd7)
2020-11-02 17:44:22 -05:00
Guillaume Abrioux 12e06d07c8 iscsi: fix ownership on iscsi-gateway.cfg
This file is currently deployed with '0644' ownership making this file
readable by any user on the system.
Since it contains sensitive information it should be readable by the
owner only.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1890119

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a822f77300)
2020-10-21 18:27:59 -04:00
Benoît Knecht d165dc676d ceph-osd: Fix check mode for start osds tasks
Correctly set `osd_ids_non_container.stdout_lines` to an empty list if it's
undefined (i.e. in check mode).

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit 8b0023cb77)
2020-10-21 13:19:49 +02:00
Benoît Knecht dd51ca530c ceph-mon: Fix check mode for deploy monitor tasks
Skip the `get initial keyring when it already exists` task when both commands
whose `stdout` output it requires have been skipped (e.g. when running in check
mode).

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit 8f436ab5d8)
2020-10-21 13:19:49 +02:00
Guillaume Abrioux 0a917b3917 crash: refact caps definition
there is no need to use `{{ }}` syntax here.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a8bd947c7d)
2020-10-20 09:09:30 +02:00
Gaudenz Steinlin 950ad22f72 ceph-crash: Only deploy key to targeted hosts
The current task installs the ceph-crash key to "most" hosts via
"delegate_to". This key is only used by the ceph-crash daemon and should
just be installed on all hosts targeted by this role. There is no need
for using a delegated task.

Signed-off-by: Gaudenz Steinlin <gaudenz.steinlin@cloudscale.ch>
(cherry picked from commit 68cc93fb18)
2020-10-19 20:20:46 +02:00
Guillaume Abrioux 6a521155d5 ceph-osd: start osd after systemd overrides
The service should be started after the ceph-osd systemd overrides has
been added, otherwise, the latter isn't considered.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1860739

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 59d0f01992)
2020-10-15 09:34:58 -04:00
Dimitri Savineau 7c337d96d2 ceph-osd: don't start the OSD services twice
Using the + operation on two lists doesn't filter out the duplicate
keys.
Currently each OSDs is started (via systemd) twice.
Instead we could use the union filter.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 4eaa65c362)
2020-10-14 10:00:21 -04:00
Guillaume Abrioux 709deb90cc handler: refact check_socket_non_container
the `stat --printf=%n` returns something like following:

```
ok: [osd0] => changed=false
  cmd: |-
    stat --printf=%n /var/run/ceph/ceph-osd*.asok
  delta: '0:00:00.009388'
  end: '2020-10-06 06:18:28.109500'
  failed_when_result: false
  rc: 0
  start: '2020-10-06 06:18:28.100112'
  stderr: ''
  stderr_lines: <omitted>
  stdout: /var/run/ceph/ceph-osd.2.asok/var/run/ceph/ceph-osd.5.asok
  stdout_lines: <omitted>
```

it makes the next task "check if the ceph osd socket is in-use" grep
like this:

```
ok: [osd0] => changed=false
  cmd:
  - grep
  - -q
  - /var/run/ceph/ceph-osd.2.asok/var/run/ceph/ceph-osd.5.asok
  - /proc/net/unix
```

which will obviously fail because this path never exists. It makes the
OSD handler broken.

Let's use `find` module instead.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 46d4d97da9)
2020-10-14 10:31:05 +02:00
Benoît Knecht 69a6053114 Fix Ansible check mode for site.yml.sample playbook
Make sure the `site.yml.sample` playbook can be run in check mode by skipping
tasks that try to read the output of commands that have been skipped.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit 54ba38e35e)
2020-10-07 07:06:54 +02:00
Guillaume Abrioux 52826caa51 rgw: fix multi instances scaleout in baremetal
When rgw and osd are collocated, the current workflow prevents from
scaling out the radosgw_num_instances parameter when rerunning the
playbook in baremetal deployments.

When ceph-osd notifies handlers, it means rgw handlers are triggered
too. The issue with this is that they are triggered before the role
ceph-rgw is run.
In the case a scaleout operation is expected on `radosgw_num_instances`
it causes an issue because keyrings haven't been created yet so the new
instances won't start.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1881313

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a802fa2810)
2020-10-06 09:21:58 -04:00