Commit Graph

4967 Commits (d23383a820ed23e675af28e80b161e17d17fe40d)
 

Author SHA1 Message Date
Guillaume Abrioux 500256cdab validate: fix ntp_daemon_type check in validate
is_atomic is defined in ceph-facts or very early in main playbook.

In non containerized deployment, is_atomic is only set in ceph-facts
which is played after ceph-validate.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-02-14 10:34:37 +00:00
Guillaume Abrioux 9c10affb69 site.yml: run ceph-validate before facts/defaults roles
ceph-validate must be run before any other role.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-02-14 10:34:37 +00:00
Guillaume Abrioux ac4aded4aa tests: fix ubuntu-container-all_daemons
the public_network subnet used for this scenario was wrong.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-02-14 10:34:37 +00:00
Guillaume Abrioux 76303b457c container: create ceph-common.conf tmpfiles.d if it doesn't exist
Otherwise the task will fail.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-02-14 10:34:37 +00:00
Guillaume Abrioux b24202f6a4 facts: move two set_fact into ceph-facts
those two set_fact tasks should be moved in ceph-facts.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-02-14 10:34:37 +00:00
Guillaume Abrioux 69310a5cd6 switch_to_containers: support multiple rgw instances per host
add multiple rgw instances per host in switch_to_containers playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-02-13 09:42:27 +01:00
Guillaume Abrioux 70f1eea9b2 switch_to_containers: remove non-containerized systemd unit files
remove old systemd unit files (non-containerized) during the
switch_to_containers transition.

We have seen sometimes the unit started is the old one instead of the
new systemd unit generated.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-02-13 09:42:27 +01:00
Guillaume Abrioux 4064035a54 switch_to_containers: use ceph binary from container
use the ceph binary from the container instead of the host.
If the ceph CLI version isn't compatible between host and container
image, it can cause the CLI to hang.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-02-13 09:42:27 +01:00
Guillaume Abrioux 8c8ec63633 container: use tmpfiles.d to creates /run/ceph
instead of using `RuntimeDirectory` parameter in systemd unit files,
let's use a systemd `tmpfiles.d` to ensure `/run/ceph`.

Explanation:

`podman` doesn't create the `/var/run/ceph` if it doesn't exist the time
where the container is run while `docker` used to create it.
In case of `switch_to_containers` scenario, `/run/ceph` gets created by
a tmpfiles.d systemd file; when switching to containers, the systemd
unit file complains because `/run/ceph` already exists

The better fix would be to ensure `/usr/lib/tmpfiles.d/ceph-common.conf`
is removed and only rely on `RuntimeDirectory` from systemd unit file parameter
but we come from a non-containerized environment which is already running,
it means `/run/ceph` is already created and when starting the unit to
start the container, systemd will still complain and we can't simply
remove the directory if daemons are collocated.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-02-13 09:42:27 +01:00
Guillaume Abrioux 7e0a70f7a8 switch_to_containers: do not try to redeploy monitors
`ceph-mon` tries to redeploy monitors because it assumes it was not yet
deployed since `mon_socket_stat` and `ceph_mon_container_stat` are
undefined (indeed, we stop the daemon before calling `ceph-mon` in the
switch_to_containers playbook).

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-02-13 09:42:27 +01:00
Guillaume Abrioux ad3a489847 tests: rename switch_to_containers
rename switch_to_containers scenario

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-02-13 09:42:27 +01:00
Rishabh Dave 05ea783eff fix mistake in task that aborts when ntpd is chosen on Atomic
Since it's already confusing whether ntp_daemon_type should be "ntp" or
"ntpd", fix the mistake in the title of the task that aborts if
ntp_daemon_type is set to "ntpd" and OS being used is Atomic.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
2019-02-12 09:09:27 +01:00
Guillaume Abrioux 54f5dc3aab doc: resync group_vars sample files
resync group_vars sample files with their corresponding original files.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-02-11 17:19:27 +01:00
Rishabh Dave bdff3e48fd don't install NTPd on Atomic
Since Atomic doesn't allow any installations and NTPd is not present
on Atomic image we are using, abort when ntp_daemon_type is set to ntpd.

https://github.com/ceph/ceph-ansible/issues/3572
Signed-off-by: Rishabh Dave <ridave@redhat.com>
2019-02-11 12:02:30 +01:00
Sébastien Han c69c8c9ac1 mon: do not hardcode ceph uid
167 is the ceph uid for Red Hat based system, thus trying to deploy a
monitor on Debian fail since the ceph user id on that system is 64045.
This commit uses the ceph_uid variable which contains the right uid
based on system/container detection.

Closes: https://github.com/ceph/ceph-ansible/issues/3589
Signed-off-by: Sébastien Han <seb@redhat.com>
2019-02-11 09:09:40 +00:00
Leah Neukirchen 4fe7f37849 Fix uses of default(omit) with string concatenation
When {{omit}} is concatenated with another string, it expands to something
like __omit_place_holder__63eea0d96dd6ed867b95405e11d87dddf61f448d.
However, in these use-cases we need an empty string.

Regression introduced in d53f55e807.

Signed-off-by: Leah Neukirchen <leah.neukirchen@mayflower.de>
2019-02-08 16:18:15 +00:00
Patrick C. F. Ernzer c605ff6a68 setup_ntp: call handler to disable ntpd if chronyd used
The task setup chronyd called the handler disable chronyd, which of
course defeats the purpose.

Changing the task to disable ntpd instead fixes the issue of chronyd
being disabled after it got enabled.

Fixes: #3582

Signed-off-by: Patrick C. F. Ernzer pcfe@redhat.com
2019-02-08 12:04:44 +01:00
Guillaume Abrioux d4b3c1d409 iscsi-gws: remove a leftover
remove leftover introduced by 9d590f4

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-02-08 01:11:42 +01:00
Guillaume Abrioux 9d590f4339 iscsi: fix permission denied error
Typical error:
```
fatal: [iscsi-gw0]: FAILED! =>
  msg: 'an error occurred while trying to read the file ''/home/guits/ceph-ansible/tests/functional/all_daemons/fetch/e5f4ab94-c099-4781-b592-dbd440a9d6f3/iscsi-gateway.key'': [Errno 13] Permission denied: b''/home/guits/ceph-ansible/tests/functional/all_daemons/fetch/e5f4ab94-c099-4781-b592-dbd440a9d6f3/iscsi-gateway.key'''
```

`become: True` is not needed on the following task:

`copy crt file(s) to gateway nodes`.

Since it's already set in the main playbook (site.yml/site-container.yml)

The thing is that the files get generated in the 'fetch_directory' with
root user because there is a 'delegate_to' + we run the playbook with
`become: True` (from main playbook).

The idea here is to create files under ansible user so we can open them
later to copy them on the remote machine.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-02-07 17:57:22 +01:00
Sébastien Han bb2bbeb941 osd: container remove --pid=host
Let's try again with the Nautilus release.

Closes: https://github.com/ceph/ceph-ansible/issues/1297
Signed-off-by: Sébastien Han <seb@redhat.com>
2019-02-07 12:13:51 +00:00
Guillaume Abrioux 708e13e7bb repo: update gitignore file
- do not ignore raw_install_python.yml

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-02-06 15:06:27 +00:00
Guillaume Abrioux b37c4adb32 ansible: increase fact cache timeout
10m seems a bit low, indeed, a complete run can take more than 1h.
Let's increase it to 2h

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-02-06 08:58:52 +00:00
John Fulton cc0bf197e1 Fix CNI error when net=host is not used on OSD calls
Follow up fix that 410abd7 missed.

Related: ceph#3561

Signed-off-by: John Fulton <fulton@redhat.com>
2019-02-05 22:49:01 +00:00
John Fulton 719a25b571 Create Ceph Initial Dirs earlier
Include tasks from create_ceph_initial_dirs earlier during
ceph config role.

Fixes: #3568
Signed-off-by: John Fulton <fulton@redhat.com>
2019-02-05 18:38:05 +00:00
John Fulton dab3f6ee3f Fix CNI error when net=host is not used in some podman calls
With 'podman version 1.0.0' on RHEL8 beta the 'get ceph version' and
'ceph monitor mkfs' commands fail [1] with "error configuring network
namespace for container Missing CNI default network".

When net=host is added these errors are resolved. net=host is used in
many other calls (grep -R net=host | wc -l --> 38).

Fixes: #3561
Signed-off-by: John Fulton <fulton@redhat.com>
(cherry picked from commit 410abd7745)
2019-02-05 18:14:28 +01:00
Guillaume Abrioux a31058374b site-container: import role ceph-facts
when ceph-container-common notifies handlers because a new container
image has been pulled, ceph-handler will throw an error because of
undefined variables since they are set in ceph-facts role.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-02-05 18:14:28 +01:00
Guillaume Abrioux 914d94cae8 set RuntimeDirectory in all systemd unit templates
/var/run/ceph resides in a non persistent filesystem (tmpfs)
After a reboot, all daemons won't start because this directory will be
missing.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-02-05 18:14:28 +01:00
Guillaume Abrioux ac7f4b3a01 tests: increase amount of memory for all vms
double the amount of memory from 512m to 1024m.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-02-05 18:14:28 +01:00
Guillaume Abrioux 7ade032807 osd: bind mount /var/run/udev/
without this, the command `ceph-volume lvm list --format json` hangs and
takes a very long time to complete.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-02-05 18:14:28 +01:00
Guillaume Abrioux ff5509295a tests: remove useless test
`test_mon_host_line_has_correct_value()` will cover this test in
anycase. It doesn't worth to have a dedicated test for this.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-02-05 18:14:28 +01:00
Guillaume Abrioux efc051d17c tests: update test_mon_host_line_has_correct_value()
since msgr2 introduction, this test must be updated.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-02-05 18:14:28 +01:00
Guillaume Abrioux 0d72fe9b30 tests: add a rhel8 scenario testing
test upstream with rhel8 vagrant image

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-02-05 18:14:28 +01:00
Guillaume Abrioux fdca29f2a7 facts: set timeout_command fact in ceph-defaults
- also add `--foreground` which seems to fix some issue we are facing when
using timeout with `podman`.
- use this fact in the `is ceph running already?` task.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-02-05 18:14:28 +01:00
Guillaume Abrioux 16efdbc59b podman: support podman installation on rhel8
Add required changes to support podman on rhel8

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1667101

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-02-05 18:14:28 +01:00
Patrick C. F. Ernzer fd222a8bbf ansible.cfg: change log_path to directory used by fact_caching_connection
Since fact_caching_connection uses a directory in $HOME already, write the
ansible.log to the same directory.

Fixes: #3509

Signed-off-by: Patrick C. F. Ernzer <pcfe@redhat.com>
2019-02-05 13:53:28 +00:00
John Fulton 37b5d1084a Make python print statements python3 compatible
The restart_osd_daemon.sh generated from the j2 template
contains a python call which uses 'print x' instead of
'print(x)'. Add the missing parentheses to make this call
compatible with both 2 and 3.

Also add parentheses to other python print calls found
in roles/ceph-client/defaults/main.yml and
infrastructure-playbooks/cluster-os-migration.yml.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1671721
Signed-off-by: John Fulton <fulton@redhat.com>
2019-02-01 15:23:27 +00:00
Andrew Schoen 7b411b93d5 tests: do not run lvm_setup.yml on lvm_auto_discovery tests
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2019-02-01 12:28:12 +01:00
Andrew Schoen 70a4368bc5 ceph-config: do not always assume containers when calculating num_osds
CEPH_CONTAINER_IMAGE should be None if containerized_deployment
is False.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2019-02-01 12:28:12 +01:00
Andrew Schoen e0dcd9f2c7 tests: fix Vagrantfile symlink for lvm-auto-discovery tests
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2019-02-01 12:28:12 +01:00
Andrew Schoen 03d61a8842 docs: using osd_auto_discovery and osd_scenario lvm
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2019-02-01 12:28:12 +01:00
Andrew Schoen fc9502039d tests: adds the lvm_auto_discovery container testing scenario
This tests osd_auto_discovery: True, containerized_deployment: True
and the lvm osd scenario

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2019-02-01 12:28:12 +01:00
Andrew Schoen 3d4beaf952 tests: create as many drives for virtualbox as libvirt
This just ensures that virtualbox and libvirt are making
the same amount of devices for tests.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2019-02-01 12:28:12 +01:00
Andrew Schoen c53ccab0b2 tests: adds the lvm_auto_discovery scenario
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2019-02-01 12:28:12 +01:00
Andrew Schoen 88eda479a9 ceph-facts: generate devices when osd_auto_discovery is true
This task used to live in ceph-osd, but we need it defined here to that
ceph-config can use it when trying to determine the number of osds.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2019-02-01 12:28:12 +01:00
Andrew Schoen c5b082848f validate: do not validate lvm config if osd_auto_discovery is true
If osd_auto_discovery is set with the lvm scenario it's expected for
lvm_volumes and devices to be empty.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2019-02-01 12:28:12 +01:00
Guillaume Abrioux 46db5a1e38 tests: ensure iptables rule is inserted for rgw_multisite job
wip

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-02-01 10:20:28 +01:00
Guillaume Abrioux 773a7608d1 tests: run dev_setup and lvm_setup on secondary cluster for rgw_multisite
Otherwise, the deployment of the second cluster fails.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-02-01 10:20:28 +01:00
John Fulton cba9b23363 Do not timeout podman/docker pull if timeout value is '0'
If user sets "docker_pull_timeout: '0'" then do not use the
timeout command when running podman/docker pull. Also, use
"timeout -s KILL"; without KILL, podman on RHEL8 beta does
not timeout and deployment can hang.

Related: https://bugzilla.redhat.com/show_bug.cgi?id=1670625
Signed-off-by: John Fulton <fulton@redhat.com>
2019-01-31 15:35:09 +00:00
Guillaume Abrioux fe1528adb4 config: support num_osds fact setting in containerized deployment
This part of the code must be supported in containerized deployment

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1664112

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-01-31 14:30:23 +00:00
Guillaume Abrioux c0ad91957c tests: add missing hosts file for ubuntu testing
we need it for dev-ubuntu-container-all_daemons job

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-01-31 08:19:17 +01:00