otherwise a bunch of jobs will fail like following:
```
[WARNING]: Unable to parse /home/jenkins-build/build/workspace/ceph-ansible-nightly-luminous-ubuntu-container-stable-3.2-bluestore_lvm_osds/tests/functional/bs-lvm-osds/container/hosts-ubuntu as an inventory source
[WARNING]: No inventory was parsed, only implicit localhost is available
[WARNING]: provided hosts list is empty, only localhost is available. Note that the implicit localhost does not match 'all'
```
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
As of testinfra 2.0.0, the binary name is `py.test`.
But let's pin the version to 1.19.0.
Indeed, migrating to 2.0.0 requires our current testing to be reworked a bit.
Since we don't have the bandwidth ATM for this, it's better to simply
keep testing with testinfra 1.19.0.
Note that I've replaced all `testinfra` occurences by `py.test` anyway.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
otherwise, it will end up with error like following:
```
FAILED! => {"msg": "'ansible.vars.hostvars.HostVarsVars object' has no attribute 'ansible_hostname'"}
```
because facts won't have been gathered.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1670663
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
as of b70d54ac80 the service launched isn't
ceph-rbd-mirror@admin.service.
it's now `ceph-rbd-mirror@rbd-mirror.{{ ansible_hostname }}`
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
removing the content of this directory seems a bit agressive and cause a
redeployment to fail after a purge on debian based distrubition.
Typical error:
```
fatal: [mon0]: FAILED! => changed=false
attempts: 3
msg: No package matching 'ceph' is available
```
The following task will consider the cache is still valid, so apt
doesn't refresh it:
```
- name: update apt cache if cache_valid_time has expired
apt:
update_cache: yes
cache_valid_time: 3600
register: result
until: result is succeeded
```
since the task installing ceph packages has a `update_cache: no` it
fails:
```
- name: install ceph for debian
apt:
name: "{{ debian_ceph_pkgs | unique }}"
update_cache: no
state: "{{ (upgrade_ceph_packages|bool) | ternary('latest','present') }}"
default_release: "{{ ceph_stable_release_uca | default('') }}{{ ansible_distribution_release ~ '-backports' if ceph_origin == 'distro' and ceph_use_distro_backports else '' }}"
register: result
until: result is succeeded
```
/tmp/* isn't specific to ceph as well, so we shouldn't remove everything
in this directory.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
using `shell` module seems to be the only way to make this task working
on rhel based distribution AND debian based distributions.
on ubuntu, using `command` ansible module fails like following
(not due to `sudo` usage or not):
```
ok: [osd1] => changed=false
cmd: command -v ceph-volume
failed_when_result: false
msg: '[Errno 2] No such file or directory: ''command'': ''command'''
rc: 2
```
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1653307
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Tuned name of a task and error message to make it more user understandable
Fixes BZ 1648168 - ceph-validate : devices are not validated in non-collocated and lvm_batch scenario
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1648168
Signed-off-by: VasishtaShastry <vipin.indiasmg@gmail.com>
The "get osd ids" statement only registers the osd_ids_non_container variable. Running "ls /var/lib/ceph/osd/ | sed 's/.*-//'" should never produce a change on the system. Adding changed_when: false prevents irrelevant change messages from Ansible.
Signed-off-by: Kevin Coakley <kcoakley@sdsc.edu>
this is needed to properly handle semaphore synchronization for udev
actions via dmcrypt/cryptsetup.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1683770
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Referring to BZ#1683290, as dsavineau suggests, being this
bug tripleO specific, removed the ubuntu section and removed
useless mountpoints.
Signed-off-by: fpantano <fpantano@redhat.com>
There's no need to set the client_admin_ceph_authtool_cap variable
via a set_fact task.
Instead we can set this in the role defaults.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
The administrator keyring needs full capabilities on mds like mon,
osd and mgr.
Whithout this, the client.admin key won't be able to run commands
against mds (like ceph tell mds.0 session ls)
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1672878
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Set directories to 755 and files to 644 to /var/lib/ceph/mon/{{ cluster }}-{{ monitor_name }} recursively instead of setting files and directories to 755 recursively. The ceph mon process writes files to this path with permissions 644. This update stops ansible from updating the permissions in /var/lib/ceph/mon/{{ cluster }}-{{ monitor_name }} every time ceph mon writes a file and increases idempotency.
Signed-off-by: Kevin Coakley <kcoakley@sdsc.edu>
Let's bootstrap mgrs on monitors only if there's no mgrs section in
inventory hostfile.
Closes: #3613
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
the previous approach was wrong.
checking if `item.key` is in `osd_auto_discovery_exclude` (`['dm-',
'loop']`) is incorrect because it will obviously not match. Therefore,
the condition will return `True` whatever the device we are checking.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
when the following failure is thrown
```
rhel-8.0.0-beta-1.7- [=== ] --- B/s | 0 B --:-- ETArhel-8.0.0-beta-1.7-appstream 0.0 B/s | 0 B 00:00
rhel-8.0.0-beta-1.7- [=== ] --- B/s | 0 B --:-- ETArhel-8.0.0-beta-1.7-baseos 0.0 B/s | 0 B 00:00
rhel-8.0.0-beta-1.7- [ === ] --- B/s | 0 B --:-- ETArhel-8.0.0-beta-1.7-builder 0.0 B/s | 0 B 00:00
Failed to synchronize cache for repo 'rhel-8.0.0-beta-1.7-appstream', ignoring this repo.
Failed to synchronize cache for repo 'rhel-8.0.0-beta-1.7-baseos', ignoring this repo.
Failed to synchronize cache for repo 'rhel-8.0.0-beta-1.7-builder', ignoring this repo.
No match for argument: python3
Error: Unable to find a match
```
dnf returns 0 anyway.
Let's ensure the pattern 'Failed' isn't present in the output.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Add a new `osd_auto_discovery_exclude` to give the possibility of
excluding some devices in auto_discovery scenario.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
There's no need to restart firewalld service when a new rule is
added due to the usage of the immediate flag.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
I didn't use the `ceph/ubuntu-bionic` image because it's broken at the
time of writing this commit. I'll switch back to `ceph/ubuntu-bionic` as
soon as it will be fixed.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
We shouldn't reset `ceph_release` with `ceph_stable_release` when
`ceph_repository` is `rhcs`
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1645379
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
introduce two new variables to make the check that 'wait for all osd to
be up' configurable.
It's possible that for some deployments, OSDs can take longer to be seen
as UP and IN.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1676763
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Change the default dist value to el8 for consistency, since many of the
upcoming ceph-ansible builds will be based on RHEL 8.
Signed-off-by: Ken Dreyer <kdreyer@redhat.com>
The existing task checks that the number of OSDs is equal to the number of up OSDs before continuing.
The problem is that if none of the OSDs have been discovered yet, the task will exit immediately and subsequent pool creation will fail (num_osds = 0, num_up_osds = 0).
This is related to Bugzilla 1578086.
In this change, we also check that at least one OSD is present. In our testing, this results in the task correctly waiting for all OSDs to come up before continuing.
Signed-off-by: David Waiting <david_waiting@comcast.com>
Ceph module path needs to be configured if we want to avoid issues
like:
no action detected in task. This often indicates a misspelled module
name, or incorrect module path
Currently the ansible-lint command in Travis CI complains about that.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
This reverts commit bb2bbeb941.
Looks like when not passing `--pid=host` we are facing some issues when
deploying more than 2 OSDs in containerized environment.
At the moment, we are still troubleshooting this issue but we prefer to
revert this commit so it doesn't block any PR in the CI.
As soon as we have a fix; we will push a new PR to remove `--pid=host`
(a revert of revert...)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>