Sometime the playbook gets stuck because even with `--connect-timeout=`
option, the connexion to the existing ceph cluster never timeout.
As a workaround, using `timeout` command provided by coreutils will
actually timeout if we can't connect to the cluster.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1537003
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
When containerized deployment, `docker_exec_cmd` is not set before the
task which try to retrieve the current fsid is played, it means it
considers there is no existing fsid and try to generate a new one.
Typical error:
```
ok: [mon0 -> mon0] => {
"changed": false,
"cmd": [
"ceph",
"--connect-timeout",
"3",
"--cluster",
"test",
"fsid"
],
"delta": "0:00:00.179909",
"end": "2018-01-09 10:36:58.759846",
"failed": false,
"failed_when_result": false,
"rc": 1,
"start": "2018-01-09 10:36:58.579937"
}
STDERR:
Error initializing cluster client: Error('error calling conf_read_file: errno EINVAL',)
```
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The CI complains because of `ceph_uid` fact which doesn't exist since
the docker image tag used in the CI doesn't match with this condition.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This commit add new osd scenarios, it aims to simplify the CI setup and
brings a better coverage on the OSD scenarios.
We decided to differentiate between filestore and bluestore, thinking
ahead when filestore won't be supported anymore.
So we now have two classes of tests:
* Filestore
* Bluestore
In each of those classes we have container and non-container.
Then for each we test the following:
* collocated
* collocated dmcrypt
* non-collocated
* non-collocated dmcrypt
* auto discovery collocated
* auto discovery collocated dmcrypt
This gives us a nice coverage and also reduces the footprint on the CI.
We are now up to 4 scenarios, each containing 6 OSD VMs.
Signed-off-by: Sébastien Han <seb@redhat.com>
The `always_run` key is deprecated and being removed in Ansible 2.4.
Using it causes a warning to be displayed:
[DEPRECATION WARNING]: always_run is deprecated.
This patch changes all instances of `always_run` to use the `always`
tag, which causes the task to run each time the playbook runs.
The task which sets `ceph_current_fsid` in `facts.yml` in case of containerized
deployment, will definitely fail because `docker_exec_cmd` is not set
yet.
This commits simply makes `facts.yml` played after `check_socket.yml` so
`docker_exec_cmd` is set properly.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Running the socket check on all the hosts will override the default
value of docker_exec_cmd, leaving it with the last value (currently
rbd-mirror), as a result the subsequent docker_exec_cmd usage for the
:x
Signed-off-by: Sébastien Han <seb@redhat.com>
We have seen issues with leftover socker. So now, if a socket is found
we also check if it's accessed by a process. If so, we can run the
handler, if not we remove it and continue the playbook.
Signed-off-by: Sébastien Han <seb@redhat.com>
Co-Authored-by: Guillaume Abrioux <gabrioux@redhat.com>
When Ansible is not run with verbose options it's difficult to see which
include and/or set_fact does what. So adding a name for each clarifies.
Signed-off-by: Sébastien Han <seb@redhat.com>
In a collocated scenario, where you might put a rgw, a mds and a mon on
the same node you don't want the handler blindly restart all the daemons
on the node. Indeed some of them might not be configured yet.
Implementing a more precise socket detection, for each daemon type.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1488813
Signed-off-by: Sébastien Han <seb@redhat.com>
This will give us more flexibility and the possibility to deploy a client node
for an external ceph-cluster.
related BZ:
https://bugzilla.redhat.com/show_bug.cgi?id=1469426Fixes: #1670
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Move `fsid`,`monitor_name`,`docker_exec_cmd` and `ceph_release` set_fact
to `ceph-defaults` role.
It will allow to reuse these facts without having to play `ceph-common`
or `ceph-docker-common`.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>