Commit Graph

5497 Commits (df98746378e50cbc059bb60da2fada4de91cfbf0)
 

Author SHA1 Message Date
Guillaume Abrioux df98746378 rgw: multisite refact
Add the possibility to deploy rgw multisite configuration with a mix of
secondary and primary zones on a same rgw node.
Before that, on a same node, all instances were either primary
zones *OR* secondary.

Now you can define a rgw instance like following:

```
rgw_instances:
  - instance_name: 'rgw0'
    rgw_zonemaster: false
    rgw_zonesecondary: true
    rgw_zonegroupmaster: false
    rgw_realm: 'france'
    rgw_zonegroup: 'zonegroup-france'
    rgw_zone: paris-00
    radosgw_address: "{{ _radosgw_address }}"
    radosgw_frontend_port: 8080
    rgw_zone_user: jacques.chirac
    rgw_zone_user_display_name: "Jacques Chirac"
    system_access_key: P9Eb6S8XNyo4dtZZUUMy
    system_secret_key: qqHCUtfdNnpHq3PZRHW5un9l0bEBM812Uhow0XfB
    endpoint: http://192.168.101.12:8080
```

Basically it's now possible to define `rgw_zonemaster`,
`rgw_zonesecondary` and `rgw_zonegroupmaster` at the intsance
level instead of the whole node level.

Also, this commit adds an option `deploy_secondary_zones` (default True)
which can be set to `False` in order to explicitly ask the playbook to
not deploy secondary zones in case where the corresponding endpoint are
not deployed yet.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1915478

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 71a5e666e3)
2021-01-28 16:37:32 -05:00
Guillaume Abrioux 32ac435962 library: fix bug in radosgw_zone.py
If for some reason `get_zonegroup()` returns a failure, we must handle
and make the module exit properly instead of failing with the following
python trace:

```
Traceback (most recent call last):
  File "./AnsiballZ_radosgw_zone.py", line 247, in <module>
    _ansiballz_main()
  File "./AnsiballZ_radosgw_zone.py", line 234, in _ansiballz_main
    exitcode = debug(sys.argv[1], zipped_mod, ANSIBALLZ_PARAMS)
  File "./AnsiballZ_radosgw_zone.py", line 202, in debug
    runpy.run_module(mod_name='ansible.modules.radosgw_zone', init_globals=None, run_name='__main__', alter_sys=True)
  File "/usr/lib64/python3.6/runpy.py", line 205, in run_module
    return _run_module_code(code, init_globals, run_name, mod_spec)
  File "/usr/lib64/python3.6/runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/usr/lib64/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/vagrant/.ansible/tmp/ansible-tmp-1610728441.41-685133-218973990589597/debug_dir/ansible/modules/radosgw_zone.py", line 467, in <module>
    main()
  File "/home/vagrant/.ansible/tmp/ansible-tmp-1610728441.41-685133-218973990589597/debug_dir/ansible/modules/radosgw_zone.py", line 463, in main
    run_module()
  File "/home/vagrant/.ansible/tmp/ansible-tmp-1610728441.41-685133-218973990589597/debug_dir/ansible/modules/radosgw_zone.py", line 425, in run_module
    zonegroup = json.loads(_out)
  File "/usr/lib64/python3.6/json/__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "/usr/lib64/python3.6/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib64/python3.6/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fedb36688d)
2021-01-28 16:37:32 -05:00
Dimitri Savineau c7d204ce37 ceph-defaults: change default ceph container tag
The "latest" ceph container tag references the latest stable release
(octopus at the moment). "latest" is an alias on "latest-octopus".
On the devel branch we should use "latest-master" tag instead.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 7d56771975)
2021-01-22 17:39:39 -05:00
Dimitri Savineau c4f11809c4 module_utils: don't add newline to the data
When executing a command via the run_command method and passing some
data with stdin then the default behavior is to add append a newline.
This breaks the value of password used by our modules.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 6616908577)
2021-01-18 14:46:53 -05:00
Dimitri Savineau fdda54eeb4 dashboard: manage password backward compatibility
The ceph dashboard changed the way the password are provided via the
CLI.
This breaks the backward compatibility when using a recent ceph-ansible
version with ceph release without that feature.
This patch adds tasks for legacy workflow (ceph release without that
feature) in both ceph-dashboard role and ceph_dashboard_user module.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2021-01-18 14:46:53 -05:00
Guillaume Abrioux fb03dfda30 dashboard: configure passwords via stdin
Due to recent changes in ceph, the few dashboard passwors
must be passed via `-i`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ef975ef5ea)
2021-01-18 14:46:53 -05:00
Guillaume Abrioux fa5fbe6960 library: refact ceph_dashboard_user
refact this module due to recent changes in ceph pacific.
The password must be passed with `-i` option.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2725db3e9f)
2021-01-18 14:46:53 -05:00
Guillaume Abrioux 11736265a1 mon: fix cephx disabled deployment
Due to missing condition on `cephx` variable, cephx disabled deployments
are broken.
This commit fixes this.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1910151

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4af0845702)
2021-01-18 13:50:57 -05:00
Guillaume Abrioux af95c34c6b fs2bs: skip migration when a mix of fs and bs is detected
Since the default of `osd_objectstore` has changed as of 3.2, some
deployments might have a mix of filestore and bluestore OSDs on a same
node. In some specific cases, there's a possibility that a filestore OSD
shares a journal/db device with a bluestore OSD. We shouldn't try to
redeploy in this context because ceph-volume will complain. (either
because in lvm batch you can't pass partition or about gpt header).
The safest option is to skip the migration on the node when such a mix
is detected or force all osds including those already using bluestore
(option `force_filestore_to_bluestore=True` has to be passed as an extra var).
If all OSDs are using filestore, then they will be migrated to
bluestore.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1875777

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e66f12d138)
2021-01-13 12:36:31 +01:00
Guillaume Abrioux 8d59a25a55 switch2container: fix mon quorum check
The current check makes no sense because it checks any of other monitor
than the one being played (either a previous one already converted or a
next that isn't yet converted) is present on the quorum.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1909011

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 175ffa1b88)
2021-01-11 18:42:38 -05:00
Mike Currin 54f7983be2 Path for ceph config missing in crash template
The path where ceph.conf is located (/etc/ceph) missing in the Docker container bind mounts, this throws errors

Signed-off-by: Mike Currin <currin@gmail.com>
(cherry picked from commit 4cbc9a48c9)
2021-01-06 16:55:12 +01:00
Guillaume Abrioux 46fac7db28 rgw: support switching from single-site to multisite
When collocating rgw with either a mon, mgr or osd, switching from
single site to a multisite rgw setup failed because of the handlers
triggered between the ansible play of the collocated daemon and the play
of the rgw. Since the multisite changes are not yet applied the handlers
fail.
The idea here is to ensure we run the multisite configuration from the
ceph-handler role before the restart happens, this way it won't complain
because of non existing multisite configuration.

(Note: this is also valid when simply changing a multisite configuration)

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1888630

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 513c8cfe55)
2021-01-06 10:29:59 -05:00
Dimitri Savineau 9587c5b15c library: remove containerized parameter from cv
The ceph-volume module relies on environment variables to determine if
the command should be executed within a container or not.
The containerized parameter isn't used anymore and we can remove it.

Fixes: #6153

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 613ab11b9b)
2021-01-06 10:52:56 +01:00
Dimitri Savineau 7d088320df cephadm: remove loop on host add tasks
Instead of iterate over the host list for adding the node/label to the
host orchestrator configuration then we can do it parallelly.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 5b6f907a72)
2020-12-16 18:37:39 -05:00
Dimitri Savineau b44297b7e6 library: add cephadm_bootstrap module
This adds cephadm_bootstrap ansible module for replacing the command module
usage with the cephadm bootstrap command.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit c3ed124d31)
2020-12-16 17:39:36 -05:00
Dimitri Savineau 32f593e5a1 library: add cephadm_adopt module
This adds cephadm_adopt ansible module for replacing the command module
usage with the cephadm adopt command.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 08f118077f)
2020-12-16 17:39:26 -05:00
Fabien Brachere ba3db6be9f library: add missing `target_size_ratio` parameter support in ceph_pool module
When creating a new pool, target_size_ratio was ignored by ansible module ceph_pool.py.
target_size_ratio is now used when pg_autoscale_mode is on.
Tests added to library tests.
This adds too the use in the role ceph-rgw.

Signed-off-by: Fabien Brachere <fabien.brachere@celeste.fr>
(cherry picked from commit 4026ba9da1)
2020-12-16 10:57:33 -05:00
Dimitri Savineau a2704581b1 ceph-config: fix ceph-volume lvm batch report
Since the major ceph-volume lvm batch refactoring, the report value
is different.
Before the refact, the report was a dict with the OSDs list to be created
under the "osds" key.
After the refact, the report is a list of dict.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 827b23353f)
2020-12-15 17:25:49 -05:00
Dimitri Savineau d4024eddbb library: add ceph_osd_flag module
This adds ceph_osd_flag ansible module for replacing the command module
usage with the ceph osd set/unset commands.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 5da593604a)
2020-12-15 14:12:44 +01:00
Dimitri Savineau 1f1ca3ec8a monitoring: use config_template module for config
The alertmanager, grafana and prometheus configuration file are
generated with the template module which doesn't allow for using
config overrides.
Instead we could use the config_template plugin action and add a
new variable for overrides (one for each component).

With this patch, one should be able to add configuration to
prometheus with the following:

---
alertmanager_conf_overrides:
  global:
    smtp_smarthost: 'localhost:25'
...

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1902999

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

(cherry picked from commit 5a41026347)
2020-12-14 13:28:12 -05:00
Seena Fallah e1314de3d9 ceph-osd: use global crush_device_class in lvm_volumes
Use global crush_device_class variable if it's not set per OSD

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
(cherry picked from commit 5e9444fa5c)
2020-12-14 13:28:00 -05:00
Guillaume Abrioux bed91982ea tests: force box removal
This avoids interactive mode for `vagrant box remove`.
This can happen for some reason when there's leftover from previous
deployment (VMs not destroyed as expected)

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 011c97786b)
2020-12-14 12:29:14 -05:00
Guillaume Abrioux cb56bb9f21 tests: rgw_multisite playbook test refactor
Currently we create an object from the primary sites but we try to read
that object still from the master which doesn't make sense, we should
try to read it from a secondary site.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e2ea403d5e)
2020-12-14 12:29:14 -05:00
Karl-Heinz Preuß da7b708636 fix broken ceph-fetch-keys role
set fetch_directory variable in default/main.yml instead of using the
defaults jinja filter in tasks/main.yml.

Fixes: #6072

Signed-off-by: Karl-Heinz Preuß <karl-heinz.preuss@cms.hu-berlin.de>
(cherry picked from commit 6ce34ef59f)
2020-12-14 11:42:50 -05:00
Dimitri Savineau 41f7f9d020 Revert "config: Always use osd_memory_target if set"
This reverts commit 4d1fdd2b05.

This breaks the backward compatibility with previous osd_memory_target
calculation and we could have a value lower than the minimum value allowed
(896M) which causes some ceph commands to fail (like ceph assimilate-conf).

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit aa6e1f20ea)
2020-12-14 02:41:45 +01:00
Dimitri Savineau e96293024b purge-container-cluster: always prune force
Since podman 2.x, there's now a confirmation when running podman
container prune command.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 0108c9f941)
2020-12-09 16:45:30 -05:00
Dimitri Savineau 228407308c tests/vagrant: update box version to CentOS 8.3
This updates the CentOS libvirt box version to 8.3

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 801e7a29cf)
2020-12-09 16:45:30 -05:00
Jukka Nousiainen 302fa3b2f8 ceph-mon: No become during gen mon initial keyring
Since the backing generate_secret() just hands out urandom output,
running as privileged doesn't seem to be required. It's not
desireable to provide sudo in some Ansible runner environments.

Signed-off-by: Jukka Nousiainen <jukka.nousiainen@csc.fi>
(cherry picked from commit eb7473491b)
2020-12-07 09:24:37 -05:00
Dimitri Savineau 17c4744579 rhcs: drop fetch_directory override
Since the fetch_directory variable has been dropped then we don't need
the override in rhcs file.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit a2cbab16a4)
2020-12-03 12:10:07 -05:00
Guillaume Abrioux 6b04f1154f common: do not use pipefail when not needed
Let's discard the ansible lint error 306 and add a "# noqa 306" on tasks
where we don't need `set -o pipefail`

Fixes: #6090

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 86a8889ee3)
2020-12-01 20:18:35 -05:00
Guillaume Abrioux 679d3e2d10 osd: add tag on 'wait for all osd to be up' task
This allows skipping this task if really desired.
Use it carefully. Use it at your own risk.

Fixes: #6073

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5c4ae5356d)
2020-12-01 11:04:37 +01:00
Guillaume Abrioux 7ab606bac5 iscsigw: remove `--cap-add=all` from `podman run` cmd
As of podman `2.0.5`, `--cap-add` and `--privileged` are exclusive
options.

```
Nov 30 13:56:30 magna089 podman[171677]: Error: invalid config provided: CapAdd and privileged are mutually exclusive options
```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1902149

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d40dd764e0)
2020-11-30 16:42:59 -05:00
Guillaume Abrioux f1ae3dec72 container: remove `--ignore` from `podman rm` command
As of podman 2.0.5, `--ignore` param conflicts with `--storage`.
```
Nov 30 13:53:10 magna089 podman[164443]: Error: --storage conflicts with --volumes, --all, --latest, --ignore and --cidfile
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c68b124ba8)
2020-11-30 16:42:59 -05:00
Guillaume Abrioux dc5aea52cf switch2containers: do not stop ceph.target in osd play
`ceph.target` should be disabled only. Otherwise, in collocation
scenario you stop other collocated services in the OSD play which isn't
what we want to do. Each daemon has its corresponding play for managing
the transition to container.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1901865

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0b05620597)
2020-11-30 10:11:57 +01:00
Dimitri Savineau 51bea82677 alertmanager/prometheus: fix owner/group
Set the owner/group on alertmanager and prometheus directories and
files to nobody and nogroup (uid and gid 65534) to avoid permission
issues.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1901543

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit eb452d35bc)
2020-11-27 14:55:39 -05:00
Guillaume Abrioux 0edbabbf4d mon: refact initial keyring generation
adding monitor is no longer possible because we generate a new mon
keyring each time the playbook is run.

Fixes: #5864

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 970c6a4ee6)
2020-11-26 09:12:22 +01:00
Guillaume Abrioux 10551da173 mon: replace `command` task by `copy`
We can achieve this task using `copy` module.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5ff2ca270f)
2020-11-26 09:12:22 +01:00
Dimitri Savineau 1cf76da74a ceph-iscsi: set the pool name in the config file
When using a custom pool for iSCSI gateway then we need to set the pool
name in the configuration otherwise the default rbd pool name will be
used.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 40a87c4b92)
2020-11-25 09:19:03 -05:00
Guillaume Abrioux 2a96eb81b7 tests: use github workflow for nbsp char check
Let's use a github workflow instead of travis for this.

With this commit we can get rid of Travis.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 94c37b9de8)
2020-11-24 10:39:03 +01:00
Guillaume Abrioux ed9a470113 lint: ignore 302,303,505 errors
ignore 302,303 and 505 errors

[302] Using command rather than an argument to e.g. file
[303] Using command rather than module
[505] referenced files must exist

they aren't relevant on these tasks.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 195d88fcda)
2020-11-24 10:39:03 +01:00
Guillaume Abrioux 805183bde3 lint: do not use 'local_action'
Fix ansible-lint 504 error:

[504] Do not use 'local_action', use 'delegate_to: localhost'

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c948b668eb)
2020-11-24 10:39:03 +01:00
Guillaume Abrioux 6ef95e9cde lint: trailing whitespace
Fix ansible-lint 201 error:

[201] Trailing whitespace

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit dfc7e6e4bd)
2020-11-24 10:39:03 +01:00
Guillaume Abrioux 0c3adbc710 lint: all tasks should be named
Fix ansible-lint 502 error:

[502] All tasks should be named

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 97dd9218dd)
2020-11-24 10:39:03 +01:00
Guillaume Abrioux 5375713d3e lint: use shell only when shell functionality is required
Fix ansible-lint 305 error:

[305] Use shell only when shell functionality is required

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 11b4bf5083)
2020-11-24 10:39:03 +01:00
Guillaume Abrioux e83bcd9459 lint: don't compare to literal true/false
Fix ansible lint 601 error:

[601] Don't compare to literal True/False

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2011e4dbc8)
2020-11-24 10:39:03 +01:00
Guillaume Abrioux 35a44a4f5a lint: variables should have spaces before and after
Fix ansible lint 206 error:

[206] Variables should have spaces before and after: {{ var_name }}

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9fba6eecfa)
2020-11-24 10:39:03 +01:00
Guillaume Abrioux 630e6be904 lint: commands should not change things
Fix ansible lint 301 error:

[301] Commands should not change things if nothing needs doing

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5450de58b3)
2020-11-24 10:39:03 +01:00
Guillaume Abrioux 1d4cd3328a lint: set pipefail on shell tasks
Fix ansible lint 306 error:

[306] Shells that use pipes should set the pipefail option

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1879c26eb9)
2020-11-24 10:39:03 +01:00
Guillaume Abrioux ffc63ad5f5 tests: use github workflow for ansible-lint
let's use github workflow instead of travis.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d4400f911a)
2020-11-24 10:39:03 +01:00
Guillaume Abrioux d86a159a79 osd: ensure /var/lib/ceph/osd/{cluster}-{id} is present
This commit ensures that the `/var/lib/ceph/osd/{{ cluster }}-{{ osd_id }}` is
present before starting OSDs.

This is needed specificly when redeploying an OSD in case of OS upgrade
failure.
Since ceph data are still present on its devices then the node can be
redeployed, however those directories aren't present since they are
initially created by ceph-volume. We could recreate them manually but
for better user experience we can ask ceph-ansible to recreate them.

NOTE:
this only works for OSDs that were deployed with ceph-volume.
ceph-disk deployed OSDs would have to get those directories recreated
manually.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1898486

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 873fc8ec0f)
2020-11-19 11:52:20 -05:00