Commit Graph

4420 Commits (67163113ea80a2e6f3d90293b17d6e7212e7949a)
 

Author SHA1 Message Date
Guillaume Abrioux 67163113ea debug commit
dnm

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-11-26 09:22:20 +01:00
Guillaume Abrioux bb515496ca containers: modify bindmount option
This commit changes the bind mount option for the mount point
`/var/lib/ceph` in the systemd template for mon and mgr containers. This
is needed in case of collocating mon/mgr with osds using dmcrypt
scenario.
Once mon/mgr got converted to containers, the dmcrypt layer sub mount is
still seen in `/var/lib/ceph`. For some reason it makes the
corresponding devices busy so any other container can't open/close it.
As a result, it prevents osds from starting properly.

Since it only happens on the nodes converted before the OSD play, the idea is
to bind mount `/var/lib/ceph` on mon and mgr with the `rshared` option
so once the sub mount is unmounted, it is propagated inside the
container so it doesn't see that mount point.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1896392

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f5ba6d9b01)
2020-11-23 10:09:28 -05:00
Dimitri Savineau 6c8df0523e switch2container: chown symlink in mon/mgr plays
fa2bb3a only fix the symlink owner/group issue in the OSD play. If the
OSDs are collocated with other services like MONs and MGRs then the
chown command will fail.

$ find /var/lib/ceph/osd/ceph-0 -not -user 167 -execdir chown 167:167 {} +
chown: cannot dereference './block': Permission denied

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1896448

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 35ed9977aa)
2020-11-16 16:37:16 -05:00
Guillaume Abrioux 8b6aedc871 mon: fix force peer addition task
when using `monitor_interface`, if nodes don't have same interface names
this task will fail like following:

```
fatal: [argo010]: FAILED! => {
    "msg": "The task includes an option with an undefined variable. The error was: 'ansible.vars.hostvars.HostVarsVars object' has no attribute u'ansible_enp1s0f0'\n\nThe error appears to have been in '/usr/share/ceph-ansible/roles/ceph-mon/tasks/docker/main.yml': line 19, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: ipv4 - force peer addition as potential bootstrap peer for cluster bringup - monitor_interface\n  ^ here\n"
}

```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1876551

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-10-13 12:02:27 -04:00
Guillaume Abrioux f99a4a7305 osd: add missing param to the container cli calls
This adds some missing param to the container cli calls in
ceph-osd-run.sh.j2

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1885558

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-10-07 11:38:45 -04:00
Dimitri Savineau be1d98f425 ceph-osd: add missing container_binary
90f3f61 introduced the docker-to-podman.yml playbook but the
ceph-osd-run.sh.j2 template still has some docker hardcoded instead
of using the container_binary variable.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-10-07 14:47:40 +02:00
Dimitri Savineau 045d4612d6 library/ceph_key: set no_log on secret
We don't need to show this information during the module execution.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit a3f4e2b4d1)
2020-09-29 10:49:35 -04:00
Kefu Chai 05725183b6 docs: update URLs to point to the RTD links
Fixes #5798
Signed-off-by: Kefu Chai <tchaikov@gmail.com>
(cherry picked from commit f3a78371d9)
2020-09-25 10:47:46 -04:00
Dimitri Savineau de98f9ab8e facts: refact `ceph_uid` fact
There's no need to set this fact with a set_fact
We can achieve this in ceph-defaults

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1875058

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-09-21 13:49:12 -04:00
Dimitri Savineau 17b1427084 switch2container: chown symlink for devices
If the OSD directory is using symlinks for referencing devices (like
block, db, wal for bluestore and journal for filestore) then the chown
command could fail to change the owner:group on some system.

$ ls -hl /var/lib/ceph/osd/ceph-0/
total 28K
lrwxrwxrwx 1 ceph ceph 92 Sep 15 01:53 block -> /dev/ceph-45113532-95ca-471b-bd75-51de46f1339c/osd-data-570a1aee-60c0-44c9-8036-ffed7d67a4e6
-rw------- 1 ceph ceph 37 Sep 15 01:53 ceph_fsid
-rw------- 1 ceph ceph 37 Sep 15 01:53 fsid
-rw------- 1 ceph ceph 55 Sep 15 01:53 keyring
-rw------- 1 ceph ceph  6 Sep 15 01:53 ready
-rw------- 1 ceph ceph  3 Sep 15 02:00 require_osd_release
-rw------- 1 ceph ceph 10 Sep 15 01:53 type
-rw------- 1 ceph ceph  2 Sep 15 01:53 whoami
$ find /var/lib/ceph/osd/ceph-0 -not -user 167 -execdir chown 167:167 {} +
chown: cannot dereference './block': Permission denied
$ find /var/lib/ceph/osd/ceph-0 -not -user 167
/var/lib/ceph/osd/ceph-0/block

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit da4280e243)
2020-09-17 14:57:16 -04:00
Dimitri Savineau 042b9e81de switch2container: remove deb systemd units
When running the switch2container playbook on a Debian based system
then the systemd unit path isn't the same than Red Hat based system.
Because the systemd unit files aren't removed then the new container
systemd unit isn't take in count.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit c1af69a7e7)
2020-09-17 14:57:16 -04:00
Guillaume Abrioux 470c1d821c tests: migrate to quay.ceph.io registry
in order to avoid docker.io rate limiting

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2001039c0e)
2020-09-11 00:59:14 +02:00
RPietrzak 84edd510d7 Remove 'run_once: true' from wait 'for all osd to be up' task in ceph-osd/tasks/main.yml role.
This together with condition 'ansible_play_hosts_all | last' causes skipping that task on the first host.

Signed-off-by: RPietrzak <rp.pietrzak@gmail.com>
2020-08-21 15:58:31 +02:00
Guillaume Abrioux e08a5fe555 tests: followup on bff2114
remove same node for containerized deployments

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-08-20 14:33:44 +02:00
Guillaume Abrioux bff2114934 tests: remove 1 osd node for upgrade scenario
This node was needed for the upgrade job in stable-4.0.
Since we moved the code erasure pool testing in lvm_osds, we don't need
to fire up that node anymore.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-08-20 10:10:25 +02:00
Guillaume Abrioux c373bfa00c osd: move systemd rendering task
This commit moves the systemd rendering task into `systemd.yml` file.
Otherwise, when running docker to podman playbook, the systemd unit file
isn't updated as it should be.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1870141

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-08-19 11:22:07 -04:00
Guillaume Abrioux 8a154ae14a osd: change lvm bindmount
This commit makes the bindmount a bit more generic, otherwise it
currently makes the OSDs failing to start in an OSP FFU upgrade
(with RHEL7 > RHEL8 OS upgrade).
docker2podman playbook is run from ceph-ansible stable-3.2 branch
against RHEL7 nodes where `/var/run/lvmetad.socket` exists but once the
system is upgraded to RHEL8, this socket doesn't exist anymore and
prevent OSDs from starting after the reboot.

As a workaround we can make this bindmount a bit more generic like what
is done in `stable-4.0` branch by mounting `/run/lvm` instead.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1866252

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-08-05 09:23:39 -04:00
Dimitri Savineau 70469a69e4 docker2podman: set disk_list for non lvm scenario
When using non lvm scenarios (collocated or non-collocated) then the
disk_list variable isn't set because this is done during the ceph-osd
role (start_osds.yml) which isn't executed in the docker2podman
playbook.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1862046

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-07-30 12:02:06 -04:00
Dimitri Savineau 54b55c8bac tests: pin pytest-forked to 1.2.0
The pytest-forked 1.3.0 release isn't compatible with the pytest release
we are using in that branch.

-----------------------
pytest-forked 1.3.0 requires pytest>=3.10, but you'll have pytest 3.6.1
which is incompatible.
-----------------------

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-07-29 10:35:22 -04:00
Dimitri Savineau a16bb2b630 README-MULTISITE: fix old conflict
The automatic backport [1] done by mergify has merged the backport PR
even if a conflict was present in the documentation.

[1] https://github.com/ceph/ceph-ansible/pull/3803

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-07-07 21:18:00 +02:00
Dimitri Savineau da5c093ba4 facts: explicitly disable facter and ohai
By default, ansible gathers facts from facter and ohai if installed on
the remote nodes, given we don't need them, let's exclude these facts
from our facts gathering

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit c95adc564b)
2020-07-07 14:13:23 -04:00
Dimitri Savineau 4e3301d361 ceph-osd: exit gracefully when no data partition
When using collocated or non-collocated osd_scenarios (ceph-disk) and
trying to deterime the OSD_DEVICE from the OSD_ID passed to the systemd
unit then we can be in a situation where the OSD hasn't been activated
but the OSD ID exists.
This means the data partition isn't in activate state and the ceph-disk
list command won't show the OSD ID on the data partition.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1850377

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-07-07 18:18:14 +02:00
Guillaume Abrioux 90f3f61548 infra: introduce docker to podman playbook
This isn't backported from master because there are too many changes
between stable-3.2 and other newer branches.

NOTE:
This playbook  *doesn't* add podman support in stable-3.2 at all.
This is a tripleO dedicated playbook which is intended to be run
early during FFU workflow in order to prepare the OS upgrade.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1853457

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-07-07 12:11:09 -04:00
Guillaume Abrioux 6daa2c9d22 doc: add a note about deprecated branches
This commit adds a note about `stable-3.0` `stable-3.1` branches which
are deprecated and not maintained anymore.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit bbe30bcc69)
2020-07-03 14:45:05 +02:00
Guillaume Abrioux 15f03e66b2 doc: add a note about containerized deployments
This commit updates the documentation to add a note about containerized
deployments.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e61488507b)
2020-07-03 14:45:05 +02:00
Guillaume Abrioux 2b6528561d doc: fix warning treated as an error
Typical error:

```
Warning, treated as error:
/home/jenkins-build/build/workspace/ceph-ansible-docs-pull-requests/docs/source/day-2/upgrade.rst:2:Title underline too short.
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5c254861bd)
2020-07-03 09:45:39 +02:00
Guillaume Abrioux 8b8fa74db7 switch_to_containers: don't set noup flag
We shouldn't set this flag when running switch_to_containers playbook.
Otherwise the playbook fails waiting for pgs to be clean.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1843569

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b91d60d384)
2020-06-29 15:25:01 +02:00
Guillaume Abrioux b2e1dcc0f4 switch-to-containers: set and unset osd flags
The workflow in this playbook should be the same than in rolling_update,
we should first set noout and nodeep-scrub flags before migrating the
first osd and unset osd flags after the last osd is migrated.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2cfaa056e0)
2020-06-29 15:25:01 +02:00
Guillaume Abrioux 1cf3a57a6c Revert "switch-to-containers: set and unset osd flags"
This reverts commit 5a4134098a.

We need to provide a tag for RHCS 3.3z6 without this commit.
2020-06-25 17:08:10 +02:00
Guillaume Abrioux 693e534ee9 Revert "switch_to_containers: don't set noup flag"
This reverts commit b7ec4a995b.

We need to provide a tag for RHCS 3.3z6 without this commit.
2020-06-25 17:07:25 +02:00
Dimitri Savineau a2556f084d docker: Add Requires on docker service
When using docker container engine then the systemd unit scripts only
use a dependency on the docker daemon via the After parameter.
But if docker is restarted on a live system then the ceph systemd units
should wait for the docker daemon to be fully restarted.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1846830

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit bd22f1d1ec)
2020-06-22 21:08:13 -04:00
Dimitri Savineau 9a9ef7bc97 docs: Add upgrade operation.
This commit adds a chapter about the ceph upgrade process.

Closes: #5393

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit e41487dbce)
2020-06-18 18:02:09 +02:00
Guillaume Abrioux b7ec4a995b switch_to_containers: don't set noup flag
We shouldn't set this flag when running switch_to_containers playbook.
Otherwise the playbook fails waiting for pgs to be clean.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1843569

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b91d60d384)
2020-06-18 09:56:28 +02:00
Guillaume Abrioux 5a4134098a switch-to-containers: set and unset osd flags
The workflow in this playbook should be the same than in rolling_update,
we should first set noout and nodeep-scrub flags before migrating the
first osd and unset osd flags after the last osd is migrated.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2cfaa056e0)
2020-06-18 09:56:28 +02:00
Guillaume Abrioux 6f3d696742 clients: move dummy container creation
This commit moves the dummy container creation task right before the
cephx keys creation task so it can't be run out of time.

Also, this commit makes the dummy container running for ever.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1828105

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-04-27 13:31:52 -04:00
ianwatsonrh 2666c54b3a typo: updating type check on rc
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1827271
Signed-off-by: ianwatsonrh <ianwatson@redhat.com>
(cherry picked from commit ccf6a7f153)
2020-04-23 11:36:59 -04:00
Guillaume Abrioux 8c9be9c179 doc: add day-2 operations documentation
This commit is the first of a serie in order to describe all day-2 operations
that are possible via ceph-ansible using a set of playbook provided in
`infrastructure-playbooks` directory.

Fixes: #5061

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7e800303e9)
2020-04-23 13:29:32 +02:00
Rishabh Dave 34cf0e5301 library/ceph_volume: look for error messages in stderr
Error message were moved to from stdout in stderr here -
b8d6dcbe9f (diff-20f7c578a4e69ec61a5869d706567a24R137).

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1793542
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 4249d1e02d)
2020-04-20 13:36:57 -04:00
Dimitri Savineau 65b0e9bb5d ceph-validate: update RHEL requirement for RHCS
We were not testing the right ansible_distribution fact value for RHEL
distribution.
This commit also updates the minial RHEL version supported by RHCS.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 5de74fe512)
2020-04-14 11:27:21 -04:00
Guillaume Abrioux a51331beb9 add-osd: refact the playbook
There's no need to have two plays anymore since we now set/unset osd
flags in `ceph-osd` role.

Also, this commit makes the role `ceph-facts` to be called after
`ceph-defaults`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-04-07 11:19:53 -04:00
Guillaume Abrioux 724620ed3d add-osd: fix fact gathering in add-osd
This commit makes this playbook gathering facts from all other nodes but
clients.
When collocating OSDs on other nodes it can fail like following:

```
fatal: [vm252-11]: FAILED! => {
    "msg": "'ansible.vars.hostvars.HostVarsVars object' has no attribute 'ansible_hostname'"
}
```

In that case, a fact from a RGW node is called when rendering the
`ceph.conf.j2` but it fails because facts are gathered only from mon and
osd nodes.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1806765

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-04-07 11:19:53 -04:00
Guillaume Abrioux 8ccf91c1f0 add-osd: unset noup flag after last osd is deployed
this commit fixes a bug when using `add-osd.yml` playbook.
`noup` flag is set early but it never got unset before the "wait for pgs
clean" check, so the playbook always fails because OSDs aren't never
seen UP.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1816023

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-04-07 11:19:53 -04:00
Guillaume Abrioux a8f5e43624 ceph_key: fetch key when needed
Fetch the key when it is present in the cluster but not on the node.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ccfa249919)
2020-04-03 16:19:03 -04:00
Guillaume Abrioux 323d4f8f0b ceph_key: fix idempotency when no secret is passed
553584cbd0 introduced a regression when no
secret is passed, it overwrites the secret each time the task is run.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 003defec03)
2020-04-03 16:19:03 -04:00
Guillaume Abrioux b107dcf80b ceph_key: remove 'update' state
With this change, the state `present` is enough to update a keyring.
If the keyring already exist, it will be updated if caps or secret
passed to the module are different.
If the keyring doen't exist, it will be created.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1808367

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 553584cbd0)
2020-04-03 16:19:03 -04:00
Dimitri Savineau edfeb98593 tests: add mgr nodes to shrink_mon inventory
Since 306ce82 we explicitly fail when there's no mgr node preent in the
inventory.

fatal: [mon0]: FAILED! => {
    "changed": false
}

MSG:

Please add a mgr host to your inventory.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-04-02 22:02:35 +02:00
Guillaume Abrioux d4ffe21225 osd: support changing default rule even when osd_crush_location isn't defined
Creating crush rules even with no crush hierarchy configuration is a
valid scenario so we shouldn't be bound to the first task result (which
configure crush hierarchy) to be able to add new crush rules.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1816989

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5b0476385c)
2020-03-31 23:04:03 +02:00
Dimitri Savineau 586c6e8afe Add site-container.yml symlink
This adds a symlink to the site-docker.yml.sample playbook.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-31 23:00:49 +02:00
Guillaume Abrioux 3b1794a0fd switch_to_containers: exclude clients nodes from facts gathering
just like site.yml and rolling_update, let's exclude clients node from
the fact gathering.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 332c39376b)
(cherry picked from commit 5c3ba0787c)
2020-03-30 11:10:29 -04:00
Guillaume Abrioux cfe77bc51f main: exclude client nodes from facts gathering when delegate_facts_host
This commit excludes client nodes from facts gathering, they are not
needed and can speed up this task.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 865d2eac9b)
2020-03-30 11:10:29 -04:00