Commit Graph

33 Commits (90e47c5fb0c95f4b1a17cdf2a019bdcebc77a773)

Author SHA1 Message Date
Randy J. Martinez 127a643fd0 ceph-defaults: fix ceph_uid fact on container deployments
Red Hat is now using tags[3,latest] for image rhceph/rhceph-3-rhel7.
Because of this, the ceph_uid conditional passes for Debian
when 'ceph_docker_image_tag: latest' on RH deployments.
I've added an additional task to check for rhceph image specifically,
and also updated the RH family task for ceph/daemon [centos|fedora]tags.

Signed-off-by: Randy J. Martinez <ramartin@redhat.com>
2018-04-17 16:54:51 +02:00
Guillaume Abrioux 899b0eb451 defaults: check only 1 time if there is a running cluster
There is no need to check for a running cluster n*nodes time in
`ceph-defaults` so let's add a `run_once: true` to save some resources
and time.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-04-16 11:23:00 +02:00
Sébastien Han 82ccbdafbc ceph-defaults: bring backward compatibility for old syntax
If people keep on using the mon_cap, osd_cap etc the playbook will
translate this old syntax on the flight.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-04-11 12:18:34 +02:00
Sébastien Han bb60f2fea4 ceph-defaults: fix ceoh_uid for container image tag latest
According to our recent change, we now use "CentOS" as a latest
container image. We need to reflect this on the ceph_uid.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-04-09 13:54:55 +02:00
Attila Fazekas ecd3563c21 Deploying without managed monitors failed
Tripleo deployment failed when the monitors not manged
by tripleo itself with:
    FAILED! => {"msg": "list object has no element 0"}

The failing play item was introduced by
 f46217b69a .

fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1552327

Signed-off-by: Attila Fazekas <afazekas@redhat.com>
2018-04-04 18:16:46 +02:00
Guillaume Abrioux dcf6a246a4 defaults: remove `run_once: true` when creating fetch_directory
because of `serial: 1`, it can be an issue when the playbook is being
run on client nodes.
Since the refact of `ceph-client` we skip the role `ceph-defaults` on
every node except the first client node, it means that the task is not
going to be played because of `run_once: true`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-04-04 10:51:17 +02:00
Guillaume Abrioux cf27c5e941 move selinux check to `ceph-defaults`
This check is alone in `ceph-docker-common` since a previous code
refactor.
Moving this check in `ceph-defaults` allows us to run `ceph-clients`
without having to run `ceph-docker-common` even in non-containerized
deployment.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-04-04 10:51:17 +02:00
Andrew Schoen 6cffbd5409 ceph-defaults: set is_atomic variable
This variable is needed for containerized clusters and is required for
the ceph-docker-common role. Typically the is_atomic variable is set in
site-docker.yml.sample though so if ceph-docker-common is used outside
of that playbook it needs set in another way. Moving the creation of
the variable inside this role means playbooks don't need to worry
about setting it.

fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1558252

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-03-21 19:16:11 +01:00
Sébastien Han 22f843e3d4 default: define 'osd_scenario' variable
osd_scenario does not exist in the ceph-default role so if we try to
play ceph-default on an OSD node, the playbook will fail with undefined
variable.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-02-08 17:42:12 +01:00
Guillaume Abrioux deaf273b25 syntax: change local_action syntax
Use a nicer syntax for `local_action` tasks.
We used to have oneliner like this:
```
local_action: wait_for port=22 host={{ hostvars[inventory_hostname]['ansible_default_ipv4']['address'] }} state=started delay=10 timeout=500 }}
```

The usual syntax:
```
    local_action:
      module: wait_for
      port: 22
      host: "{{ hostvars[inventory_hostname]['ansible_default_ipv4']['address'] }}"
      state: started
      delay: 10
      timeout: 500
```
is nicer and kind of way to keep consistency regarding the whole
playbook.

This also fix a potential issue about missing quotation :

```
Traceback (most recent call last):
  File "/tmp/ansible_wQtWsi/ansible_module_command.py", line 213, in <module>
    main()
  File "/tmp/ansible_wQtWsi/ansible_module_command.py", line 185, in main
    rc, out, err = module.run_command(args, executable=executable, use_unsafe_shell=shell, encoding=None, data=stdin)
  File "/tmp/ansible_wQtWsi/ansible_modlib.zip/ansible/module_utils/basic.py", line 2710, in run_command
  File "/usr/lib64/python2.7/shlex.py", line 279, in split
    return list(lex)                                                                                                                                                                                                                                                                                                            File "/usr/lib64/python2.7/shlex.py", line 269, in next
    token = self.get_token()
  File "/usr/lib64/python2.7/shlex.py", line 96, in get_token
    raw = self.read_token()
  File "/usr/lib64/python2.7/shlex.py", line 172, in read_token
    raise ValueError, "No closing quotation"
ValueError: No closing quotation
```

writing `local_action: shell echo {{ fsid }} | tee {{ fetch_directory }}/ceph_cluster_uuid.conf`
can cause trouble because it's complaining with missing quotes, this fix solves this issue.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1510555

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-01-31 10:45:34 +01:00
Guillaume Abrioux ec16cbdb1a defaults: avoid getting stuck (ceph --connect-timeout)
Sometime the playbook gets stuck because even with `--connect-timeout=`
option, the connexion to the existing ceph cluster never timeout.

As a workaround, using `timeout` command provided by coreutils will
actually timeout if we can't connect to the cluster.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1537003

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-01-25 10:15:59 +01:00
Guillaume Abrioux 900f447c82 containers: fix bug when looking for existing cluster
When containerized deployment, `docker_exec_cmd` is not set before the
task which try to retrieve the current fsid is played, it means it
considers there is no existing fsid and try to generate a new one.

Typical error:

```
ok: [mon0 -> mon0] => {
    "changed": false,
    "cmd": [
        "ceph",
        "--connect-timeout",
        "3",
        "--cluster",
        "test",
        "fsid"
    ],
    "delta": "0:00:00.179909",
    "end": "2018-01-09 10:36:58.759846",
    "failed": false,
    "failed_when_result": false,
    "rc": 1,
    "start": "2018-01-09 10:36:58.579937"
}

STDERR:

Error initializing cluster client: Error('error calling conf_read_file: errno EINVAL',)
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-01-10 16:23:18 +01:00
Guillaume Abrioux acfbebe67e defaults: rename check_socket files for containers
When containerized deployment, we are not looking for a socket but for a
running container.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-01-10 15:44:47 +01:00
Sébastien Han 7eaf444328 default: look for the right return code on socket stat in-use
As reported in https://github.com/ceph/ceph-ansible/issues/2254, the
check with fuser is not ideal. If fuser is not available the return code
is 127. Here we want to make sure that we looking for the correct return
code, so 1.

Closes: https://github.com/ceph/ceph-ansible/issues/2254
Signed-off-by: Sébastien Han <seb@redhat.com>
2017-12-14 16:59:14 +01:00
Guillaume Abrioux 6a9b5c9632 defaults: fix CI issue with ceph_uid fact
The CI complains because of `ceph_uid` fact which doesn't exist since
the docker image tag used in the CI doesn't match with this condition.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-12-12 09:02:37 +01:00
John Fulton ffae294288 Set tighter permissions on keyrings when containerized
During a containerized deployment, set the permissions
of ceph.client.admin.keyring and other keyrings to
chmod 600 and chown it to ceph.
2017-12-06 19:22:28 -05:00
Guillaume Abrioux 238754a844 osd: skip some set_fact when osd_scenario=lvm
these tasks are not needed when using `osd_scenario: lvm`

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1509230

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-11-07 15:30:08 +01:00
Major Hayden f73232caa4
Use check_mode instead of always_run
This patch changes the `always_run: yes` task option to
`check_mode: no` to avoid Ansible warnings.
2017-10-25 09:53:34 -05:00
Major Hayden c2b5118c1b
Revert "Avoid deprecated always_run"
This reverts commit 620fb37dd4.
2017-10-25 09:48:09 -05:00
Sébastien Han a53aa9e8b4 ci: new osd scenarios
This commit add new osd scenarios, it aims to simplify the CI setup and
brings a better coverage on the OSD scenarios.
We decided to differentiate between filestore and bluestore, thinking
ahead when filestore won't be supported anymore.
So we now have two classes of tests:

* Filestore
* Bluestore

In each of those classes we have container and non-container.
Then for each we test the following:

* collocated
* collocated dmcrypt
* non-collocated
* non-collocated dmcrypt
* auto discovery collocated
* auto discovery collocated dmcrypt

This gives us a nice coverage and also reduces the footprint on the CI.
We are now up to 4 scenarios, each containing 6 OSD VMs.

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-10-18 09:26:06 +02:00
Major Hayden 620fb37dd4
Avoid deprecated always_run
The `always_run` key is deprecated and being removed in Ansible 2.4.
Using it causes a warning to be displayed:

    [DEPRECATION WARNING]: always_run is deprecated.

This patch changes all instances of `always_run` to use the `always`
tag, which causes the task to run each time the playbook runs.
2017-10-12 08:29:44 -05:00
Sébastien Han cac7d034bf defaults: fix check socket non-container handler
Signed-off-by: Sébastien Han <seb@redhat.com>
2017-10-04 15:33:52 +02:00
Sébastien Han 5968cf09b1 ci: add collocation scenario
Signed-off-by: Sébastien Han <seb@redhat.com>
2017-10-04 11:19:12 +02:00
Sébastien Han 0ce76113bf Merge pull request #1956 from ceph/osd-container-id
Osd container
2017-10-03 18:52:24 +02:00
Sébastien Han 3bd341f6c0 osd: container use id instead of dev name
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1494127
Signed-off-by: Sébastien Han <seb@redhat.com>
2017-10-03 14:44:00 +02:00
Guillaume Abrioux 081f226106 defaults: change running order in main.yml
The task which sets `ceph_current_fsid` in `facts.yml` in case of containerized
deployment, will definitely fail because `docker_exec_cmd` is not set
yet.
This commits simply makes `facts.yml` played after `check_socket.yml` so
`docker_exec_cmd` is set properly.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-10-02 18:42:43 +02:00
Sébastien Han 048b55be4a defaults: only run socket checks on their specific roles
Running the socket check on all the hosts will override the default
value of docker_exec_cmd, leaving it with the last value (currently
rbd-mirror), as a result the subsequent docker_exec_cmd usage for the
:x

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-09-29 02:38:24 +02:00
Sébastien Han 8b6456dc8a handler: enhance socket detection
We have seen issues with leftover socker. So now, if a socket is found
we also check if it's accessed by a process. If so, we can run the
handler, if not we remove it and continue the playbook.

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-Authored-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-09-25 13:44:51 +02:00
Sébastien Han d100b4e596 name includes and set_fact for clarity
When Ansible is not run with verbose options it's difficult to see which
include and/or set_fact does what. So adding a name for each clarifies.

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-09-18 23:39:46 +02:00
Sébastien Han 12f6e53090 defaults: do not restart unconfigured (yet) daemons
In a collocated scenario, where you might put a rgw, a mds and a mon on
the same node you don't want the handler blindly restart all the daemons
on the node. Indeed some of them might not be configured yet.
Implementing a more precise socket detection, for each daemon type.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1488813
Signed-off-by: Sébastien Han <seb@redhat.com>
2017-09-08 12:02:37 +02:00
Guillaume Abrioux 539197a2fc Introduce new role ceph-config.
This will give us more flexibility and the possibility to deploy a client node
for an external ceph-cluster.

related BZ:
https://bugzilla.redhat.com/show_bug.cgi?id=1469426

Fixes: #1670

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-08-24 11:33:03 +02:00
Guillaume Abrioux 7a333d05ce Add handlers for containerized deployment
Until now, there is no handlers for containerized deployments.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-08-02 17:12:20 +02:00
Guillaume Abrioux fc6b6e9859 Move basics facts to `ceph-defaults`
Move `fsid`,`monitor_name`,`docker_exec_cmd` and `ceph_release` set_fact
to `ceph-defaults` role.
It will allow to reuse these facts without having to play `ceph-common`
or `ceph-docker-common`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-08-02 17:12:20 +02:00