Commit Graph

5553 Commits (0ab159242a2b925d7c28d7d149df653ddcdb4df9)
 

Author SHA1 Message Date
Dimitri Savineau 8f26ffdbac rolling_update: enforce ceph-container-engine
When running the rolling_update.yml playbook and adding the dashboard
component in the same time then the requirement (like container packages)
aren't installed.
This could lead to a failure in case of using authentication on the
container registry because the playbook will try to login on the registry
but podman/docker aren't yet installed.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1903504
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1918650

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 48a456dc8c)
2021-03-11 13:52:21 +01:00
Dimitri Savineau 735965ef9c ceph-common: enable rhcs tools repo for monitoring
The monitoring node running grafana needs the rhcs tools repostory
enabled in non containerized deployment to be able to install the
ceph-grafana-dashboards rpm package.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1918650

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit e4dd0067c6)
2021-03-11 13:52:21 +01:00
Dimitri Savineau 3ba27c9387 rolling_update: exclude clients from node-exporter
Since b105549 we don't install node-exporter on client nodes so we should
also exclude the client node from the node-exporter upgrade.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 94af3c87d1)
2021-03-11 13:52:02 +01:00
Guillaume Abrioux 1b424ad5e9 purge: zap and destroy db and wal devices for lvm batch
Those devices (db/wal) are never zapped in lvm batch deployment.
Iterating over `dedicated_devices` and `bluestore_wal_devices` fixes
this issue.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1922926

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 984191ac7f)
2021-03-11 13:51:38 +01:00
Tyler Bishop ba76102952 facts: support device aliases for (dedicated|bluestore_wal)_devices
Just likve `devices`, this commit adds the support for linux device aliases for
`dedicated_devices` and `bluestore_wal_devices`.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1919084

Signed-off-by: Tyler Bishop <tbishop@liquidweb.com>
(cherry picked from commit ee4b8804ae)
2021-03-11 13:51:19 +01:00
Guillaume Abrioux e3165f9a07 mon: fix cephx disabled deployment
Due to missing condition on `cephx` variable, cephx disabled deployments
are broken.
This commit fixes this.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1910151

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4af0845702)
2021-03-11 13:51:04 +01:00
Guillaume Abrioux bb1f66cb51 switch2container: fix mon quorum check
The current check makes no sense because it checks any of other monitor
than the one being played (either a previous one already converted or a
next that isn't yet converted) is present on the quorum.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1909011

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 175ffa1b88)
2021-03-11 13:50:27 +01:00
Guillaume Abrioux 241418409d common: ensure shaman returns right repo
Due to recent changes in shaman, there's a chance it returns the wrong
repository from architecture point of view.
We can query shaman and ask for the correct architecture to get around
this.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 39649f0ce8)
2021-03-10 16:43:04 +01:00
Matthew Vernon fdf437743c Fix typo and broken link for documenting RGW frontends
http://docs.ceph.com/docs/nautilus/radosgw/frontends/ 404s so replace
it with a working "latest" docs link, and correct the spelling of
"additional" while I'm at it.

Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
(cherry picked from commit 847611048e)
2021-03-03 14:20:26 +01:00
Florian Haas 6fe14c6d01 requirements.txt: Move the six dependency into the general requirements
config_template.py depends on six, which isn't listed in the default
requirements.txt. This previously frequently wasn't a problem, because
six used to be a standard package being installed into a venv, and
lots of other projects depended on it.

It also does get installed for unit and integration tests via
tests/requirements.txt, so any broken dependency on six wouldn't be
detected by tox runs.

However, as other projects and distributions have phased out Python
2.7 support the dependency on six becomes less common. Thus, as long
as ceph-ansible does require it for config_template.py, add it to the
base requirements.

Signed-off-by: Florian Haas <florian@citynetwork.eu>
(cherry picked from commit d49ea9818b)
2021-03-03 13:22:29 +01:00
Guillaume Abrioux c3304c213b dashboard: add missing parameter in `ceph_cmd`
the `ceph_cmd` fact is missing the `--net=host` parameter.

Some tasks consuming this fact can fail like following:

```
Error: error configuring network namespace for container b8ec913db1fb694ae683faf202680de7a59c714a004e533aba87e8503d29261f: Missing CNI default network
```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1931365

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f143b1a647)
2021-03-03 12:57:08 +01:00
Guillaume Abrioux 858048560e update: fix require-osd-release task
This commit fixes two issues in rolling_update.yml:

- `container_exec_cmd_update_osd` is unset in the `complete osd upgrade`
play so it never runs the command in a container.
- the 'require-osd-release' task is never applied because the condition
  looks for luminous release.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1930164

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-02-18 22:22:06 +01:00
Guillaume Abrioux 158224503d defaults: update rhcs dashboard images versions
The current dashboard images deployed have a bad health index.
Updating to a newer version fixes this issue.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1925350

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a16ae693d8)
2021-02-18 18:22:28 +01:00
Guillaume Abrioux 3c0a5a0b61 doc: add a note about "latest" tags
See the change for details.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4e95180c80)
2021-02-11 16:50:43 +01:00
Dimitri Savineau a43960790f doc: update containerized deployment
This adds more documentation to the configuration and usage of
containerizerd deployment.

Closes: #6198

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d42d584085)
2021-02-11 16:50:43 +01:00
Guillaume Abrioux 55d0c79046 tests: install correct ansible-lint version
We need to pin the ansible-lint version depending on python version
being used.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-02-10 08:32:24 +01:00
Guillaume Abrioux 8fada83589 tests: set `mon_max_pg_per_osd` in rgw_multisite
Otherwise, the job fails when it tries to create a bucket with `s3cmd mb`
command because we have too many PGs per OSD.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 54bae480d2)
2021-02-10 08:32:24 +01:00
Guillaume Abrioux b5d082c4bc rgw: fix a typo in multisite
if `rgw_zonegroupmaster` is not defined at the rgw instance level in
`rgw_instances` it will fallback to a wrong variable (`rgw_zonemaster`).

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1925247

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 931b87e830)
2021-02-10 08:32:24 +01:00
Guillaume Abrioux 920f07514a rgw: quick fix in create_zone_user.yml
typical error:

```
2021-02-01 03:11:09,809 p=93834 u=cephuser n=ansible | TASK [ceph-rgw : check if the realm system user already exists] ***************************************************************************************************************************************************
2021-02-01 03:11:09,809 p=93834 u=cephuser n=ansible | Monday 01 February 2021  03:11:09 -0500 (0:00:00.084)       0:14:38.607 *******
2021-02-01 03:11:09,836 p=93834 u=cephuser n=ansible | fatal: [ceph-kvm-ms2-1611241931591-node7-rgw]: FAILED! =>
  msg: |-
    The task includes an option with an undefined variable. The error was: 'None' has no attribute 'realm'
```

This task should be skipped when `zone_users` is undefined.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1922998

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-02-01 11:28:57 -05:00
Dimitri Savineau 6278c5a4e3 ceph-mon: add ExecStartPre docker stop to systemd
We already do that in the other systemd templates (mgr, mds, etc..)
and would present to add workaround in other orchestration tool.
This change is for containerized deployment only.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1882724

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 3749d297c7)
2021-01-29 12:00:14 -05:00
Guillaume Abrioux aeee3471e3 rgw: avoid useless call to ceph-rgw
since `ceph-rgw` may be called from `ceph-handler` in some contexts we
should avoid rerunning it unnecessarily.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8617081664)
2021-01-28 16:37:50 -05:00
Guillaume Abrioux b903446fa4 containers: use --cpus instead --cpu-quota
When using docker 1.13.1, the current condition:

```
{% if (container_binary == 'docker' and ceph_docker_version.split('.')[0] is version_compare('13', '>=')) or container_binary == 'podman' -%}
```

is wrong because it compares the first digit (1) whereas it should
compare the second one.
It means we always use `--cpu-quota` although documentation recommend
using `--cpus` when docker version is 1.13.1 or higher.

From the doc:
> --cpu-quota=<value>	Impose a CPU CFS quota on the container. The number of
> microseconds per --cpu-period that the container is limited to before
> throttled. As such acting as the effective ceiling.
> If you use Docker 1.13 or higher, use --cpus instead.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3e262e072b)
2021-01-28 16:37:50 -05:00
Guillaume Abrioux 14267fe0c4 rgw: multisite refact
Add the possibility to deploy rgw multisite configuration with a mix of
secondary and primary zones on a same rgw node.
Before that, on a same node, all instances were either primary
zones *OR* secondary.

Now you can define a rgw instance like following:

```
rgw_instances:
  - instance_name: 'rgw0'
    rgw_zonemaster: false
    rgw_zonesecondary: true
    rgw_zonegroupmaster: false
    rgw_realm: 'france'
    rgw_zonegroup: 'zonegroup-france'
    rgw_zone: paris-00
    radosgw_address: "{{ _radosgw_address }}"
    radosgw_frontend_port: 8080
    rgw_zone_user: jacques.chirac
    rgw_zone_user_display_name: "Jacques Chirac"
    system_access_key: P9Eb6S8XNyo4dtZZUUMy
    system_secret_key: qqHCUtfdNnpHq3PZRHW5un9l0bEBM812Uhow0XfB
    endpoint: http://192.168.101.12:8080
```

Basically it's now possible to define `rgw_zonemaster`,
`rgw_zonesecondary` and `rgw_zonegroupmaster` at the intsance
level instead of the whole node level.

Also, this commit adds an option `deploy_secondary_zones` (default True)
which can be set to `False` in order to explicitly ask the playbook to
not deploy secondary zones in case where the corresponding endpoint are
not deployed yet.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1915478

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 71a5e666e3)
2021-01-28 16:37:50 -05:00
Guillaume Abrioux a36eee1852 fs2bs: skip migration when a mix of fs and bs is detected
Since the default of `osd_objectstore` has changed as of 3.2, some
deployments might have a mix of filestore and bluestore OSDs on a same
node. In some specific cases, there's a possibility that a filestore OSD
shares a journal/db device with a bluestore OSD. We shouldn't try to
redeploy in this context because ceph-volume will complain. (either
because in lvm batch you can't pass partition or about gpt header).
The safest option is to skip the migration on the node when such a mix
is detected or force all osds including those already using bluestore
(option `force_filestore_to_bluestore=True` has to be passed as an extra var).
If all OSDs are using filestore, then they will be migrated to
bluestore.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1875777

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e66f12d138)
2021-01-22 11:37:40 -05:00
Dimitri Savineau 07d2160421 dashboard: manage password backward compatibility
The ceph dashboard changed the way the password are provided via the
CLI.
This breaks the backward compatibility when using a recent ceph-ansible
version with ceph release without that feature.
This patch adds tasks for legacy workflow (ceph release without that
feature) in ceph-dashboard role.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1915506

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2021-01-19 18:05:02 +01:00
Guillaume Abrioux 623ca14682 dashboard: configure passwords via stdin
Due to recent changes in ceph, the few dashboard passwors
must be passed via `-i`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ef975ef5ea)
2021-01-19 18:05:02 +01:00
Dimitri Savineau 4335fed787 library: remove containerized parameter from cv
The ceph-volume module relies on environment variables to determine if
the command should be executed within a container or not.
The containerized parameter isn't used anymore and we can remove it.

Fixes: #6153

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 613ab11b9b)
2021-01-06 16:56:20 +01:00
Mike Currin 360a2d2b30 Path for ceph config missing in crash template
The path where ceph.conf is located (/etc/ceph) missing in the Docker container bind mounts, this throws errors

Signed-off-by: Mike Currin <currin@gmail.com>
(cherry picked from commit 4cbc9a48c9)
2021-01-06 16:55:39 +01:00
Guillaume Abrioux 290d3ef369 rgw: support switching from single-site to multisite
When collocating rgw with either a mon, mgr or osd, switching from
single site to a multisite rgw setup failed because of the handlers
triggered between the ansible play of the collocated daemon and the play
of the rgw. Since the multisite changes are not yet applied the handlers
fail.
The idea here is to ensure we run the multisite configuration from the
ceph-handler role before the restart happens, this way it won't complain
because of non existing multisite configuration.

(Note: this is also valid when simply changing a multisite configuration)

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1888630

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 513c8cfe55)
2021-01-06 10:38:50 -05:00
Guillaume Abrioux 607ef5a7d2 common: do not use pipefail when not needed
Let's discard the ansible lint error 306 and add a "# noqa 306" on tasks
where we don't need `set -o pipefail`

Fixes: #6090

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 86a8889ee3)
2020-12-16 14:05:45 +01:00
Guillaume Abrioux 6855feb604 ceph-osd: refact `docker_exec_start_osd`
This commit drops nested jinja construction in this set_fact task.
It also rename it to `container_exec_start_osd`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ff95fa9c32)
2020-12-16 14:05:45 +01:00
Dimitri Savineau 49522f46b1 workflow/pytest: update python matrix version
On this branch we should test pytest against python 2.7 and 3.6.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-12-16 14:05:45 +01:00
Guillaume Abrioux dc4523a0c1 tests: use github workflow for nbsp char check
Let's use a github workflow instead of travis for this.

With this commit we can get rid of Travis.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 94c37b9de8)
2020-12-16 14:05:45 +01:00
Guillaume Abrioux ba312a5b5d lint: ignore 302,303,505 errors
ignore 302,303 and 505 errors

[302] Using command rather than an argument to e.g. file
[303] Using command rather than module
[505] referenced files must exist

they aren't relevant on these tasks.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 195d88fcda)
2020-12-16 14:05:45 +01:00
Guillaume Abrioux 8a8a082693 lint: do not use 'local_action'
Fix ansible-lint 504 error:

[504] Do not use 'local_action', use 'delegate_to: localhost'

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c948b668eb)
2020-12-16 14:05:45 +01:00
Guillaume Abrioux ace031e86e lint: trailing whitespace
Fix ansible-lint 201 error:

[201] Trailing whitespace

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit dfc7e6e4bd)
2020-12-16 14:05:45 +01:00
Guillaume Abrioux 72fc8877cb lint: all tasks should be named
Fix ansible-lint 502 error:

[502] All tasks should be named

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 97dd9218dd)
2020-12-16 14:05:45 +01:00
Guillaume Abrioux ab62d27c44 lint: use shell only when shell functionality is required
Fix ansible-lint 305 error:

[305] Use shell only when shell functionality is required

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 11b4bf5083)
2020-12-16 14:05:45 +01:00
Guillaume Abrioux 2a0e07cfd7 lint: don't compare to literal true/false
Fix ansible lint 601 error:

[601] Don't compare to literal True/False

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2011e4dbc8)
2020-12-16 14:05:45 +01:00
Guillaume Abrioux 87d53fea08 lint: variables should have spaces before and after
Fix ansible lint 206 error:

[206] Variables should have spaces before and after: {{ var_name }}

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9fba6eecfa)
2020-12-16 14:05:45 +01:00
Guillaume Abrioux 35e738c681 lint: commands should not change things
Fix ansible lint 301 error:

[301] Commands should not change things if nothing needs doing

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5450de58b3)
2020-12-16 14:05:45 +01:00
Guillaume Abrioux 92b261df89 lint: set pipefail on shell tasks
Fix ansible lint 306 error:

[306] Shells that use pipes should set the pipefail option

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1879c26eb9)
2020-12-16 14:05:45 +01:00
Guillaume Abrioux 81a293e5f1 tests: use github workflow for ansible-lint
let's use github workflow instead of travis.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d4400f911a)
2020-12-16 14:05:45 +01:00
Dimitri Savineau 24a5b1bbb5 ceph-config: fix ceph-volume lvm batch report
Since the major ceph-volume lvm batch refactoring, the report value
is different.
Before the refact, the report was a dict with the OSDs list to be created
under the "osds" key.
After the refact, the report is a list of dict.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 827b23353f)
2020-12-15 17:26:01 -05:00
Dimitri Savineau 3f16132e44 library: add ceph_osd_flag module
This adds ceph_osd_flag ansible module for replacing the command module
usage with the ceph osd set/unset commands.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 5da593604a)
2020-12-15 17:36:28 +01:00
Dimitri Savineau e51f68fdbb ceph-iscsi: set the pool name in the config file
When using a custom pool for iSCSI gateway then we need to set the pool
name in the configuration otherwise the default rbd pool name will be
used.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 40a87c4b92)
2020-12-15 17:33:24 +01:00
Guillaume Abrioux 63fa4c9484 containers: modify bindmount option
This commit changes the bind mount option for the mount point
`/var/lib/ceph` in the systemd template for mon and mgr containers. This
is needed in case of collocating mon/mgr with osds using dmcrypt
scenario.
Once mon/mgr got converted to containers, the dmcrypt layer sub mount is
still seen in `/var/lib/ceph`. For some reason it makes the
corresponding devices busy so any other container can't open/close it.
As a result, it prevents osds from starting properly.

Since it only happens on the nodes converted before the OSD play, the idea is
to bind mount `/var/lib/ceph` on mon and mgr with the `rshared` option
so once the sub mount is unmounted, it is propagated inside the
container so it doesn't see that mount point.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1896392

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f5ba6d9b01)
2020-12-15 17:33:11 +01:00
Dimitri Savineau fa06752e4b alertmanager/prometheus: fix owner/group
Set the owner/group on alertmanager and prometheus directories and
files to nobody and nogroup (uid and gid 65534) to avoid permission
issues.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1901543

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit eb452d35bc)
2020-12-15 17:32:50 +01:00
Guillaume Abrioux 1ac034a802 switch2containers: do not stop ceph.target in osd play
`ceph.target` should be disabled only. Otherwise, in collocation
scenario you stop other collocated services in the OSD play which isn't
what we want to do. Each daemon has its corresponding play for managing
the transition to container.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1901865

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0b05620597)
2020-12-15 17:32:23 +01:00
Guillaume Abrioux 69b5b96f2d osd: add tag on 'wait for all osd to be up' task
This allows skipping this task if really desired.
Use it carefully. Use it at your own risk.

Fixes: #6073

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5c4ae5356d)
2020-12-15 17:32:09 +01:00