Commit Graph

4844 Commits (4636f3f7e29267398495dc37db2fa511dae6596b)
 

Author SHA1 Message Date
Guillaume Abrioux ed84325b1d update: ensure ceph command returns 0
these commands could return something else than 0.
Let's ensure all retries have been done before actually failing.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-18 11:16:11 +02:00
Guillaume Abrioux 543d1e2e41 update: set osd flags before upgrading any mon
Typical error:

```
failed: [mon0 -> mon2] (item=noout) => changed=true
  cmd:
  - ceph
  - --cluster
  - ceph
  - osd
  - set
  - noout
  delta: '0:00:00.293756'
  end: '2019-04-17 06:31:57.552386'
  item: noout
  msg: non-zero return code
  rc: 1
  start: '2019-04-17 06:31:57.258630'
  stderr: |-
    Traceback (most recent call last):
      File "/bin/ceph", line 1222, in <module>
        retval = main()
      File "/bin/ceph", line 1146, in main
        sigdict = parse_json_funcsigs(outbuf.decode('utf-8'), 'cli')
      File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 788, in parse_json_funcsigs
        cmd['sig'] = parse_funcsig(cmd['sig'])
      File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 728, in parse_funcsig
        raise JsonFormat(s)
    ceph_argparse.JsonFormat: unknown type CephBool
  stderr_lines:
  - 'Traceback (most recent call last):'
  - '  File "/bin/ceph", line 1222, in <module>'
  - '    retval = main()'
  - '  File "/bin/ceph", line 1146, in main'
  - '    sigdict = parse_json_funcsigs(outbuf.decode(''utf-8''), ''cli'')'
  - '  File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 788, in parse_json_funcsigs'
  - '    cmd[''sig''] = parse_funcsig(cmd[''sig''])'
  - '  File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 728, in parse_funcsig'
  - '    raise JsonFormat(s)'
  - 'ceph_argparse.JsonFormat: unknown type CephBool'
  stdout: ''
  stdout_lines: <omitted>
```

Having mixed versions of monitors seems to cause this error.
Moving these tasks before any monitor gets upgraded seems to be enough
to get around this issue.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-18 11:16:11 +02:00
Guillaume Abrioux a4bc7bda51 update: refact msgr2 migration
this commit refact the msgr2 protocol introduction.

If it's a fresh install, let's go with v2 only.
If we upgrade to nautilus, we should go with v2+v1 syntax to ensure
nothing breaks.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-18 11:16:11 +02:00
Andrew Schoen e2529dcd7f rolling_update: ceph commands should use --cluster
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2019-04-18 10:55:11 +02:00
Andrew Schoen 67453853ff rolling_update: set num_osds to the number of running osds
We do this so that the ceph-config role can most accurately
report the number of osds for the generation of the ceph.conf
file.

We don't want to use ceph-volume to determine the number of
osds because in an upgrade to nautilus ceph-volume won't be able to
accurately count osds created by ceph-disk.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2019-04-18 10:55:11 +02:00
Andrew Schoen 5e3dfe5021 ceph-osd: do not run lvm batch tasks during update
When performing a rolling update do not try to create
any new osds with `ceph-volume lvm batch`. This is troublesome
because when upgrading to nautilus the devices list might contain
devices that are currently being used by ceph-disk and have GPT
headers on them, which will cause ceph-volume to fail when
trying to use such a device. Any devices originally created
by ceph-disk will need to be removed from the devices list
before any new osds can be created.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2019-04-18 10:55:11 +02:00
Andrew Schoen 399a821439 tests: adds the migrate_ceph_disk_to_ceph_volume scenario
This test deploys a luminous cluster with ceph-disk created osds
and then upgrades to nautilus and migrates those osds to ceph-volume.
The nodes are then rebooted and cluster state verified.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2019-04-18 10:55:11 +02:00
Andrew Schoen 28c47e4d1b rolling_update: migrate ceph-disk osds to ceph-volume
When upgrading to nautlius run ``ceph-volume simple scan`` and
``ceph-volume simple activate --all`` to migrate any running
ceph-disk osds to ceph-volume.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1656460

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2019-04-18 10:55:11 +02:00
Dimitri Savineau c8814d1331 ceph-iscsi-gw: Remove library directory
The library directory that contain the custom ceph modules in present
in the ceph-ansible root directory.
All igw_* mocules are already present there so we don't need the one
present in roles/ceph-iscsi-gw/library.
Also remove the associated spec file.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-04-18 10:37:57 +02:00
Dimitri Savineau f601549a8a test_osds: remove scenario leftover
Since there's only only scenario available we don't need lvm_scenario
and no_lvm_scenario.
Also add missing assert for ceph-volume tests.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-04-17 17:28:12 +02:00
Dimitri Savineau e471bce76b allow using ansible 2.8
Currently we only support ansible 2.7
We plan to use 2.8 when it will be release so we have to support both
2.7 and 2.8.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1700548

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-04-17 16:57:37 +02:00
Dimitri Savineau 9f99f539f7 tests/functional/setup: change mount options
In the CI jobs we can change the mount options of the main partition
to avoid extra operations on disk.
Adding jmespath to tests/requirements.txt due to the json_query
filter usage.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-04-17 08:23:07 +02:00
Dimitri Savineau c84a74592a test_mons: test mon listening on port 3300
Since nautilus and msgr2 the monitors also bind on port 3300 in
addition of 6789.
This patch updates test_mons to reflect that change.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-04-17 08:19:48 +02:00
Guillaume Abrioux edfa4310d3 defaults: refact package dependencies installation.
Because 5c98e361df could be seen as a non
backward compatible change this commit reverts it and bring back package
dependencies installation support.
Let's just modify the default value instead.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-16 11:07:59 -04:00
Guillaume Abrioux 83df60cbc3 defaults: remove some package dependencies
These packages aren't needed anymore.
They were needed for ceph-init-detect buti as of ceph-init-detect doesn't exist
anymore.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1683885

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-15 11:28:58 -04:00
Rishabh Dave d5967af7fb allow adding a monitor to a deployed cluster
Add a playbook that deploys a new monitor on a new node, adds that node
to the Ceph cluster and the monitor to the quorum and updates the ceph
configuration file on OSD nodes.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
2019-04-15 10:00:50 +02:00
Rishabh Dave 96c180cc0e check if mon daemon is installed before restarting it
Signed-off-by: Rishabh Dave <ridave@redhat.com>
2019-04-15 10:00:50 +02:00
Guillaume Abrioux edf1ee2073 mon: check if an initial monitor keyring already exists
When adding a new monitor, we must reuse the existing initial monitor
keyring. Otherwise, the new monitor will issue its 'mkfs' with a new
monitor keyring and it will result with a mismatch between them. The
new monitor will be unable to join the quorum in the end.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Rishabh Dave <ridave@redhat.com>
2019-04-15 10:00:50 +02:00
Dimitri Savineau eb658b3af6 purge-cluster: remove python-ceph-argparse package
When using purge-cluster playbook with nautilus, there's still the
python-ceph-argparse package installed on the host preventing to
reinstall a ceph cluster with a different version (like luminous or
mimic)

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-04-15 09:15:08 +02:00
Dimitri Savineau 2c8b585edb docs: Update ceph.conf supported section
[rgw] isn't a valide section.
[client.rgw.{instance_name] should be used instead.

Resolves: #3841

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-04-15 09:09:02 +02:00
Dimitri Savineau 150acba8c5 switch-from-non-containerized: stop all osds
e6bfb84 introduced a regression in the switch from non containerized
to container deployment.
We need to stop all previous OSDs services. We just don't need the
ceph-disk pattern in the regex.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-04-11 16:26:53 -04:00
Guillaume Abrioux a1254d767c purge: remove references to ceph-disk
as of stable-4.0, ceph-disk is no longer supported.
These tasks aren't needed anymore.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-11 11:57:02 -04:00
Guillaume Abrioux 73aa788459 shrink-osd: remove legacy playbook
as of stable-4.0, ceph-disk is no longer supported.
Let's remove this legacy version of the playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-11 11:57:02 -04:00
Guillaume Abrioux e6bfb843f4 switch_to_containers: remove ceph-disk references
as of stable-4.0, ceph-disk is no longer supported.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-11 11:57:02 -04:00
Guillaume Abrioux f899da3172 osd: remove legacy file
this file is not used anymore, let's remove it.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-11 11:57:02 -04:00
Guillaume Abrioux 3519281b44 tests: pass osd_scenario value to lvm_setup.yml
we must pass the value of osd_scenario from the stable-3.2 branch which
is used for the initial deployment.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-11 11:57:02 -04:00
Guillaume Abrioux 83e84c6a4a tests: remove test_journal_collocation.py in OSD testing
this test is related to ceph-disk which is dropped as of stable-4.0

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-11 11:57:02 -04:00
Guillaume Abrioux bb15c19519 resync sample file
d17b1b48b6 introduced a change that hasn't been reported in sample files

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-11 11:57:02 -04:00
Guillaume Abrioux 4f68462009 osd: remove ceph-disk scenarios files
these files aren't needed anymore since we only use lvm scenario.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-11 11:57:02 -04:00
Guillaume Abrioux f0416c8892 osd: remove dedicated_devices variable
This variable was related to ceph-disk scenarios.
Since we are entirely dropping ceph-disk support as of stable-4.0, let's
remove this variable.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-11 11:57:02 -04:00
Guillaume Abrioux 4d35e9eeed osd: remove variable osd_scenario
As of stable-4.0, the only valid scenario is `lvm`.
Thus, this makes this variable useless.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-11 11:57:02 -04:00
Guillaume Abrioux 4d5637fd8a osd: remove legacy file
ceph_disk_cli_options_facts.yml is not used anymore, let's remove it.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-11 11:57:02 -04:00
Sébastien Han 2888c0825f validate: only check device when they are devices
We only validate the devices that are passed if there is a list of
devices to validate.

Signed-off-by: Sébastien Han <seb@redhat.com>
2019-04-11 11:57:02 -04:00
Sébastien Han 72211d4a24 plugin: validate.py do not check osd_scenario
osd_scenario now defaults to lvm and should not be changed. So we don't
need to test it.

Signed-off-by: Sébastien Han <seb@redhat.com>
2019-04-11 11:57:02 -04:00
Sébastien Han e2467272f8 plugin: validate lint
Make python linter happy.

Signed-off-by: Sébastien Han <seb@redhat.com>
2019-04-11 11:57:02 -04:00
Sébastien Han f8e1542423 doc: update osd scenario
This commits adds documentation for the lvm scenario and the deprecation
of collocated and non-collocated scenario.

Signed-off-by: Sébastien Han <seb@redhat.com>
2019-04-11 11:57:02 -04:00
Sébastien Han 52df15895b osd: default osd_scenario to lvm
osd_scenario has become obsolete and defaults to lvm. With lvm there is
no such things has collocated and non-collocated.

Signed-off-by: Sébastien Han <seb@redhat.com>
2019-04-11 11:57:02 -04:00
Sébastien Han 9ea1e49407 validate: print a message for old scenarios
ceph-disk is not supported anymore, so all the newly created OSDs will
be configured using ceph-volume.

Signed-off-by: Sébastien Han <seb@redhat.com>
2019-04-11 11:57:02 -04:00
Sébastien Han e2a5aa062e osd: remove ceph-disk support
We don't support the preparation of OSD with ceph-disk. ceph-volume is
only supported. However, the start operation of OSD is still supported.
So let's say you change a config option, the handlers will be able to
restart all the OSDs via their respective systemd unit files.

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-11 11:57:02 -04:00
Dimitri Savineau d25af1b872 tests: Add debug to ceph-override.json
It's usefull to have logs in debug mode enabled in order to have
more information for developpers.
Also reindent to json file.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-04-11 15:15:41 +02:00
Dimitri Savineau a19054be18 tests/functional: use ceph-override.json symlink
We don't need to have multiple ceph-override.json copies. We
currently already have symlink to all_daemons/ceph-override.json so
we can do it for all scenarios.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-04-11 15:15:41 +02:00
Dimitri Savineau d2efb7f02b ceph-mds: Set application pool to cephfs
We don't need to use the cephfs variable for the application pool
name because it's always cephfs.
If the cephfs variable is set to something else than the default
value it will break the appplication pool task.

Resolves: #3790

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-04-11 15:15:41 +02:00
Guillaume Abrioux c1e4529b0e update: fix undefined error when no mgr group is declared
if mgr group isn't defined in inventory, that task will fail with
undefined error.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-11 09:22:35 +02:00
Guillaume Abrioux 7e0adca7a4 osds: allow passing devices by path
ceph-volume didn't work when the devices where passed by path.
Since it now support it, let's allow this feature in ceph-ansible

Closes: #3812

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-10 13:22:30 -04:00
Guillaume Abrioux 631e5d3144 mon: remove useless delegate_to
Let's use a condition to run this task only on the first mon.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-10 00:39:25 +00:00
Dimitri Savineau d17b1b48b6 rgw: change default frontend on nautilus
As discussed in ceph/ceph#26599, beast is now the default frontend
for rados gateway with nautilus release.
Add rgw_thread_pool_size variable with 512 as default value and keep
backward compatibility with num_threads option when using civetweb.
Update radosgw_civetweb_num_threads to reflect rgw_thread_pool_size
change.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-04-09 17:21:51 +02:00
Dimitri Savineau 37816570c6 container-common: Enable docker on boot for ubuntu
docker daemon is automatically started during package installation
but the service isn't enabled on boot.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-04-09 16:50:10 +02:00
Dimitri Savineau 57b4e76d11 rolling_update: Remove ceph aliases
ceph aliases have been introduced in stable-3.2 during the ceph
deployment. On master this has been removed but we don't handle
this removal in the upgrade from stable-3.2 to master via the
rolling_update playbook.
Also remove the task from purge-docker-cluster missing from
d9e7835

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-04-09 16:50:10 +02:00
Matthew Vernon 9dd913cf8a UCA: Uncomment UCA variables in defaults, fix consequent breakage
The Ubuntu Cloud Archive-related (UCA) defaults in
roles/ceph-defaults/defaults/main.yml were commented out, which means
if you set `ceph_repository` to "uca", you get undefined variable
errors, e.g.

```
The task includes an option with an undefined variable. The error was: 'ceph_stable_repo_uca' is undefined

The error appears to have been in '/nfs/users/nfs_m/mv3/software/ceph-ansible/roles/ceph-common/tasks/installs/debian_uca_repository.yml': line 6, column 3, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

- name: add ubuntu cloud archive repository
  ^ here

```

Unfortunately, uncommenting these results in some other breakage,
because further roles were written that use the fact of
`ceph_stable_release_uca` being defined as a proxy for "we're using
UCA", so try and install packages from the bionic-updates/queens
release, for example, which doesn't work. So there are a few `apt` tasks
that need modifying to not use `ceph_stable_release_uca` unless
`ceph_origin` is `repository` and `ceph_repository` is `uca`.

Closes: #3475
Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
2019-04-09 13:44:00 +02:00
Rishabh Dave 7f09316a87 add_mdss: change number of processes for testing to 8
Run tests with 8 processes/cores since other scenarios do the same.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
2019-04-09 11:34:42 +02:00