upgrade RHCS 2 -> RHCS 3 will fail if cluster has still set
sortnibblewise,
it stay stuck on "TASK [waiting for clean pgs...]" as RHCS 3 osds will
not start if nibblewise is set.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1600943
Signed-off-by: Sébastien Han <seb@redhat.com>
Let's create a dedicated environment for these scenarios, there is no
need to deploy everything.
By the way, doing so will save some times.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Ansible 2.4 is currently end-of-life.
Ansible 2.5 will go end-of-life after Ansible 2.7 is released.
Fixes: #2901
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The group_vars/all file is not available on 'ooo-collocation' scenario,
it's making the `dev_setup.yml` failing because this path is hardcoded.
The idea here is to check if the pattern 'ooo-collocation' is present in
`change_dir` variable so we can set this path properly according to the
scenario being run.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
`test_rbd_mirror_is_up()` is failing on update scenarios because it
assumes the `ceph_stable_release` is still set to the value of the
original ceph release, it means it won't enter in the right part of the
condition and fails.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
In addition to ceph/ceph-build#1082
Let's set the ansible version in each ceph-ansible branch's respective
requirements.txt.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Do not run device validation on every hosts, only on OSD nodes.
Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
We know make sure that:
* devices are actually block special files
* length of dedicated_device is identical to devices
Signed-off-by: Sébastien Han <seb@redhat.com>
since no latest-bis-jewel exists, it's using latest-bis which points to
ceph mimic. In our testing, using it for idempotency/handlers tests
means upgrading from jewel to mimic which is not what we want do.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Since `V2.6-stable` is available and has packages for `mimic`, let's
update this default value accordingly so nfs nodes can be deployed with
mimic.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
34f70428 has introduced a fix using `command` module while this could
have been achieved by using `lvol` module.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Relying on `copy_admin_key` to import created keys on client nodes makes
us obliged to copy admin key on those nodes which is not something we might
want.
We should use the fact `condition_copy_admin_key` which will be set to
`True` when the delegated node is a mon which means we can import keys
without taking care of admin keyring.
Fixes: #2867
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The original_basename option in the copy module changed to be
_original_basename in Ansible 2.6+, this PR resyncs the config_template
module to allow this to work with both Ansible 2.6+ and before.
Additionally, this PR removes the _v1_config_template.py file, since
ceph-ansible no longer supports versions of Ansible before version 2,
and so we shouldn't continue to carry that code.
Closes: #2843
Signed-off-by: Andy McCrae <andy.mccrae@gmail.com>
Follow up on #2784
We must check in the generated fact `_disabled_ceph_mgr_modules` to
enable disabled mgr module.
Otherwise, this task will be skipped because it's not comparing the
right list.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1600155
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
On containerized deployment, if a mon is stopped, the socket is not
purged and can cause failure when a cluster is redeployed after the
purge playbook has been run.
Typical error:
```
fatal: [osd0]: FAILED! => {}
MSG:
'dict object' has no attribute 'osd_pool_default_pg_num'
```
the fact is not set because of this previous failure earlier:
```
ok: [mon0] => {
"changed": false,
"cmd": "docker exec ceph-mon-mon0 ceph --cluster test daemon mon.mon0 config get osd_pool_default_pg_num",
"delta": "0:00:00.217382",
"end": "2018-07-09 22:25:53.155969",
"failed_when_result": false,
"rc": 22,
"start": "2018-07-09 22:25:52.938587"
}
STDERR:
admin_socket: exception getting command descriptions: [Errno 111] Connection refused
MSG:
non-zero return code
```
This failure happens when the ceph-mon service is stopped, indeed, since
the socket isn't purged, it's a leftover which is confusing the process.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
We must initialize `children` variable in `_get_osd_id_from_host()`,
otherwise, if for any reason the deployment has failed and result with
an osd host with no OSD registered, we won't enter in the condition,
therefore, `children` is never set and the function tries to return
something undefined.
Typical error:
```
E UnboundLocalError: local variable 'children' referenced before assignment
```
Fixes: #2860
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
When you delete a zone without removing from zonegroup, the period update would
fail since that command needs to load the zone and zonegroup to be able to
update the master. Period update would fail with an error like this:
radosgw-admin period update --commit
-1 Cannot find zone id= (name=), switching to local zonegroup configuration
-1 Cannot find zone id= (name=)
Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>
As of Kraken, the journal code does not use the hdparm command anymore
so we can remove it from our package dependency list.
Fixes: https://github.com/ceph/ceph-ansible/issues/1402
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit f6910efa24389c264062963b2054c7cd29ffebb3)
We should test ceph-ansible against the latest ansible stable version on
master.
This commit also remove the pinning to 1.7.1 version of testinfra
because ansible 2.5 requires a newer version.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The container image recently merged both cluster and mon log into a
single stream. Following this, we now see this warning coming from the
container image:
2018-06-19 13:44:01.542990 7ff75b024700 1 mon.vm02@1(peon).log
v57928205 unable to write to '/var/log/ceph/ceph.log' for channel
'cluster': (2) No such file or directory
So we now tell the mon to not log cluster log on the filesystem.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1591771
Signed-off-by: Sébastien Han <seb@redhat.com>
We forgot to add mgr_group_name when checking for the mon repo, thus the
conditional on the next task was failing.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1598185
Signed-off-by: Sébastien Han <seb@redhat.com>
The data structure has slightly changed on mimic.
Prior to mimic, it used to be:
```
{
"enabled_modules": [
"status"
],
"disabled_modules": [
"balancer",
"dashboard",
"influx",
"localpool",
"prometheus",
"restful",
"selftest",
"zabbix"
]
}
```
From mimic it looks like this:
```
{
"enabled_modules": [
"status"
],
"disabled_modules": [
{
"name": "balancer",
"can_run": true,
"error_string": ""
},
{
"name": "dashboard",
"can_run": true,
"error_string": ""
}
]
}
```
This means we can't simply check if `item` is in `item in
_ceph_mgr_modules.disabled_modules`
the idea here is to use filter `map(attribute='name')` to build a list
when deploying mimic.
Fixes: #2766
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The container runs for 300 sec, then dies and removes itself thanks to
the '--rm' option, so there is no point of removing it. Also this is
causing failure under some circonstances.
Closing: https://bugzilla.redhat.com/show_bug.cgi?id=1568157
Signed-off-by: Sébastien Han <seb@redhat.com>
We now add a default 'rbd' application type to each pool we create. This
will remove the warning: " application not enabled on N pool(s) "
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1590275
Signed-off-by: Sébastien Han <seb@redhat.com>
Some fixes have gone into
git.openstack.org/openstack/ansible-config_template to deal with a few
bugs we have run into.
This PR brings the ceph-ansible config_template version up to the same
as the ansible-config_template openstack repo.
Closes: #2742
Signed-off-by: Andy McCrae <andy.mccrae@gmail.com>
The script ceph-osd-run.sh holds the config options to start the
container, if one of these options are modified we must restart the
container. This was not the case before becauase the 'notify' flag
wasn't present.
Closing: https://bugzilla.redhat.com/show_bug.cgi?id=1596061
Signed-off-by: Sébastien Han <seb@redhat.com>
When using a module there is no need to apply this Ansible option. The
module will handle the idempotency on its own. So the module decides
wether or not the task has changed during the execution.
Signed-off-by: Sébastien Han <seb@redhat.com>
keyring files in /etc/ceph. Default value is the same as it was (0600),
but this variable allows user to override it (f.e. set it to 0640).
Signed-off-by: George Shuklin <george.shuklin@gmail.com>
During 226f80c22b only Debian package
installs had the correct state set to ensure packages were upgraded when
the "upgrade_ceph_packages" var was set to true.
Signed-off-by: Andy McCrae <andy.mccrae@gmail.com>
--net=host was hardcoded in the startup line so even though
mon_docker_net_host was set to False the net option would always be
activated.
mon_docker_net_host is set to True by default so this commit does not
change the behaviour.
Signed-off-by: Sébastien Han <seb@redhat.com>
avoid duplicating test unnecessarily just because of docker exec syntax.
Using the same logic than in the playbook with `docker_exec_cmd` allow us
to execute the same test on both containerized and non containerized environment.
The idea is to set a variable `docker_exec_cmd` with the
'docker exec <container-name>' string when containerized and
set it to '' when non containerized.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
these tests are skipped on bluestore osds scenarios.
they were going to fail anyway since they are run on mon nodes and
`devices` is defined in inventory for each osd node. It means
`num_devices * num_osd_hosts` returns `0`.
The result is that the test expects to have 0 OSDs up.
The idea here is to move these tests so they are run on OSD nodes.
Each OSD node checks their respective OSD to be UP, if an OSD has 2
devices defined in `devices` variable, it means we are checking for 2
OSD to be up on that node, if each node has all its OSD up, we can say
all OSD are up.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>