`command -v` is a bash script which needs a shell to run.
Fixes: #6325
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 14c472707c)
When collocating OSDs with other daemon, `num_osds` is incorrectly calculated
because `ceph-config` is called multiple times.
Indeed, the following code:
```
num_osds: "{{ lvm_list.stdout | default('{}') | from_json | length | int + num_osds | default(0) | int }}"
```
makes `num_osds` be incremented each time `ceph-config` is called.
We have to reset it in order to get the correct number of expected OSDs.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 31a0f2653d)
In order to avoid false positive in the CI that I've been unable to
reproduce.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f7fd1c2298)
While working on the previous PR, I found a couple of typos in the
docs. This fixes those.
Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
(cherry picked from commit 8b1474ab75)
The "update apt cache" in the ceph-handler role was never called and the
handler trigger after adding the uca repository doesn't exist at all.
Instead of using a handler for that we can just set the update_cache
parameter to true like the other apt_repository tasks.
Resolve merge conflict from cherry-picking this commit.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 09d6706697)
When asking `ceph-volume` to report only in `lvm batch` context, there's
a bug described in bz1896803 [1] when `--yes` is passed (which by the
way isn't necessary with `--report`).
This commit ensure `--yes` isn't passed to `ceph-volume` when `--report`
is used.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1896803
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1896803
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fe6d6ba622)
This commit makes sure purge playbooks remove those file if for any reason they
have been left.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1920900
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b9dd253a4f)
There's no need to slow down the playbook execution time by migrating
all the `ceph-crash` instances in a serial way. Let's remove the
`serial: 1` so the migration is achieved in a parallel way.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 980a5a7df4)
When running the rolling_update.yml playbook and adding the dashboard
component in the same time then the requirement (like container packages)
aren't installed.
This could lead to a failure in case of using authentication on the
container registry because the playbook will try to login on the registry
but podman/docker aren't yet installed.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1903504
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1918650
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 48a456dc8c)
The monitoring node running grafana needs the rhcs tools repostory
enabled in non containerized deployment to be able to install the
ceph-grafana-dashboards rpm package.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1918650
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit e4dd0067c6)
Since b105549 we don't install node-exporter on client nodes so we should
also exclude the client node from the node-exporter upgrade.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 94af3c87d1)
Those devices (db/wal) are never zapped in lvm batch deployment.
Iterating over `dedicated_devices` and `bluestore_wal_devices` fixes
this issue.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1922926
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 984191ac7f)
Just likve `devices`, this commit adds the support for linux device aliases for
`dedicated_devices` and `bluestore_wal_devices`.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1919084
Signed-off-by: Tyler Bishop <tbishop@liquidweb.com>
(cherry picked from commit ee4b8804ae)
Due to missing condition on `cephx` variable, cephx disabled deployments
are broken.
This commit fixes this.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1910151
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4af0845702)
The current check makes no sense because it checks any of other monitor
than the one being played (either a previous one already converted or a
next that isn't yet converted) is present on the quorum.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1909011
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 175ffa1b88)
Due to recent changes in shaman, there's a chance it returns the wrong
repository from architecture point of view.
We can query shaman and ask for the correct architecture to get around
this.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 39649f0ce8)
http://docs.ceph.com/docs/nautilus/radosgw/frontends/ 404s so replace
it with a working "latest" docs link, and correct the spelling of
"additional" while I'm at it.
Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
(cherry picked from commit 847611048e)
config_template.py depends on six, which isn't listed in the default
requirements.txt. This previously frequently wasn't a problem, because
six used to be a standard package being installed into a venv, and
lots of other projects depended on it.
It also does get installed for unit and integration tests via
tests/requirements.txt, so any broken dependency on six wouldn't be
detected by tox runs.
However, as other projects and distributions have phased out Python
2.7 support the dependency on six becomes less common. Thus, as long
as ceph-ansible does require it for config_template.py, add it to the
base requirements.
Signed-off-by: Florian Haas <florian@citynetwork.eu>
(cherry picked from commit d49ea9818b)
the `ceph_cmd` fact is missing the `--net=host` parameter.
Some tasks consuming this fact can fail like following:
```
Error: error configuring network namespace for container b8ec913db1fb694ae683faf202680de7a59c714a004e533aba87e8503d29261f: Missing CNI default network
```
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1931365
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f143b1a647)
This commit fixes two issues in rolling_update.yml:
- `container_exec_cmd_update_osd` is unset in the `complete osd upgrade`
play so it never runs the command in a container.
- the 'require-osd-release' task is never applied because the condition
looks for luminous release.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1930164
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The current dashboard images deployed have a bad health index.
Updating to a newer version fixes this issue.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1925350
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a16ae693d8)
This adds more documentation to the configuration and usage of
containerizerd deployment.
Closes: #6198
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d42d584085)
Otherwise, the job fails when it tries to create a bucket with `s3cmd mb`
command because we have too many PGs per OSD.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 54bae480d2)
if `rgw_zonegroupmaster` is not defined at the rgw instance level in
`rgw_instances` it will fallback to a wrong variable (`rgw_zonemaster`).
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1925247
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 931b87e830)
typical error:
```
2021-02-01 03:11:09,809 p=93834 u=cephuser n=ansible | TASK [ceph-rgw : check if the realm system user already exists] ***************************************************************************************************************************************************
2021-02-01 03:11:09,809 p=93834 u=cephuser n=ansible | Monday 01 February 2021 03:11:09 -0500 (0:00:00.084) 0:14:38.607 *******
2021-02-01 03:11:09,836 p=93834 u=cephuser n=ansible | fatal: [ceph-kvm-ms2-1611241931591-node7-rgw]: FAILED! =>
msg: |-
The task includes an option with an undefined variable. The error was: 'None' has no attribute 'realm'
```
This task should be skipped when `zone_users` is undefined.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1922998
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
We already do that in the other systemd templates (mgr, mds, etc..)
and would present to add workaround in other orchestration tool.
This change is for containerized deployment only.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1882724
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 3749d297c7)
since `ceph-rgw` may be called from `ceph-handler` in some contexts we
should avoid rerunning it unnecessarily.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8617081664)
When using docker 1.13.1, the current condition:
```
{% if (container_binary == 'docker' and ceph_docker_version.split('.')[0] is version_compare('13', '>=')) or container_binary == 'podman' -%}
```
is wrong because it compares the first digit (1) whereas it should
compare the second one.
It means we always use `--cpu-quota` although documentation recommend
using `--cpus` when docker version is 1.13.1 or higher.
From the doc:
> --cpu-quota=<value> Impose a CPU CFS quota on the container. The number of
> microseconds per --cpu-period that the container is limited to before
> throttled. As such acting as the effective ceiling.
> If you use Docker 1.13 or higher, use --cpus instead.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3e262e072b)
Add the possibility to deploy rgw multisite configuration with a mix of
secondary and primary zones on a same rgw node.
Before that, on a same node, all instances were either primary
zones *OR* secondary.
Now you can define a rgw instance like following:
```
rgw_instances:
- instance_name: 'rgw0'
rgw_zonemaster: false
rgw_zonesecondary: true
rgw_zonegroupmaster: false
rgw_realm: 'france'
rgw_zonegroup: 'zonegroup-france'
rgw_zone: paris-00
radosgw_address: "{{ _radosgw_address }}"
radosgw_frontend_port: 8080
rgw_zone_user: jacques.chirac
rgw_zone_user_display_name: "Jacques Chirac"
system_access_key: P9Eb6S8XNyo4dtZZUUMy
system_secret_key: qqHCUtfdNnpHq3PZRHW5un9l0bEBM812Uhow0XfB
endpoint: http://192.168.101.12:8080
```
Basically it's now possible to define `rgw_zonemaster`,
`rgw_zonesecondary` and `rgw_zonegroupmaster` at the intsance
level instead of the whole node level.
Also, this commit adds an option `deploy_secondary_zones` (default True)
which can be set to `False` in order to explicitly ask the playbook to
not deploy secondary zones in case where the corresponding endpoint are
not deployed yet.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1915478
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 71a5e666e3)
Since the default of `osd_objectstore` has changed as of 3.2, some
deployments might have a mix of filestore and bluestore OSDs on a same
node. In some specific cases, there's a possibility that a filestore OSD
shares a journal/db device with a bluestore OSD. We shouldn't try to
redeploy in this context because ceph-volume will complain. (either
because in lvm batch you can't pass partition or about gpt header).
The safest option is to skip the migration on the node when such a mix
is detected or force all osds including those already using bluestore
(option `force_filestore_to_bluestore=True` has to be passed as an extra var).
If all OSDs are using filestore, then they will be migrated to
bluestore.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1875777
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e66f12d138)
The ceph dashboard changed the way the password are provided via the
CLI.
This breaks the backward compatibility when using a recent ceph-ansible
version with ceph release without that feature.
This patch adds tasks for legacy workflow (ceph release without that
feature) in ceph-dashboard role.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1915506
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Due to recent changes in ceph, the few dashboard passwors
must be passed via `-i`
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ef975ef5ea)
The ceph-volume module relies on environment variables to determine if
the command should be executed within a container or not.
The containerized parameter isn't used anymore and we can remove it.
Fixes: #6153
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 613ab11b9b)
The path where ceph.conf is located (/etc/ceph) missing in the Docker container bind mounts, this throws errors
Signed-off-by: Mike Currin <currin@gmail.com>
(cherry picked from commit 4cbc9a48c9)
When collocating rgw with either a mon, mgr or osd, switching from
single site to a multisite rgw setup failed because of the handlers
triggered between the ansible play of the collocated daemon and the play
of the rgw. Since the multisite changes are not yet applied the handlers
fail.
The idea here is to ensure we run the multisite configuration from the
ceph-handler role before the restart happens, this way it won't complain
because of non existing multisite configuration.
(Note: this is also valid when simply changing a multisite configuration)
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1888630
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 513c8cfe55)
Let's discard the ansible lint error 306 and add a "# noqa 306" on tasks
where we don't need `set -o pipefail`
Fixes: #6090
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 86a8889ee3)
This commit drops nested jinja construction in this set_fact task.
It also rename it to `container_exec_start_osd`
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ff95fa9c32)
Let's use a github workflow instead of travis for this.
With this commit we can get rid of Travis.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 94c37b9de8)
ignore 302,303 and 505 errors
[302] Using command rather than an argument to e.g. file
[303] Using command rather than module
[505] referenced files must exist
they aren't relevant on these tasks.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 195d88fcda)
Fix ansible-lint 504 error:
[504] Do not use 'local_action', use 'delegate_to: localhost'
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c948b668eb)
Fix ansible-lint 502 error:
[502] All tasks should be named
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 97dd9218dd)
Fix ansible-lint 305 error:
[305] Use shell only when shell functionality is required
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 11b4bf5083)
Fix ansible lint 206 error:
[206] Variables should have spaces before and after: {{ var_name }}
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9fba6eecfa)
Fix ansible lint 301 error:
[301] Commands should not change things if nothing needs doing
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5450de58b3)