The LTC backend has drifted from being able to pass tests on the stable
PyTorch version, so pinning to nightly on ARM.
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
Creates a build_linux_arm64 job that builds the release on an arm64 self-hosted runner.
Drop Python 3.10 support
Pass TM_TORCH_VERSION to choose the Stable PyTorch version (since arm64 doesn't have nightly builds)
Borrows nightly / stable Pytorch switch from the WIP
https://github.com/llvm/torch-mlir/pull/2038
This patch adds a (default-true) input called `cache-enabled` to the
setup-build action, so that when the input is false, ccache is not setup
on the host machine. This patch also sets the input to be false for the
release builds.
Despite using sudo to delete the workspace directory, we still
occasionally run into checkout errors. This patch thus drops the
deletion of the workspace prior to checkout. It also restricts the
number of parallel jobs in the submodule fetch step to just one, to try
and resolve the checkout issue ("index.lock: File exists.").
We have recently started seeing errors like:
```
Synchronizing submodule url for 'externals/llvm-project'
Synchronizing submodule url for 'externals/mlir-hlo'
/usr/bin/git -c protocol.version=2 submodule update --init --force --depth=1
Error: fatal: Unable to create '/home/anush/actions-runner/_work/torch-mlir/torch-mlir/.git/modules/externals/llvm-project/index.lock': File exists.
```
As a workaround, this patch removes the workspace directory before the
checkout step.
- Use v3 of actions/checkout, since the version we use (v2) uses
Node.js 12, which is deprecated by GitHub.
- Source the PowerShell venv sctipt (instead of the bash sript) since
the calling script is a PowerShell script. Without this, the build
doesn't use venv at all.
- Make the build dependencies in whl-requirements.txt (used by
setup.py) match those in requirements.txt. To that end, this patch
creates a build-requirements.txt that is referenced by
requirements.txt and whl-requirements.txt.
This patch makes a few small, but key, changes to enable ccache on
Windows. First, it replaces the hendrikmuhs/ccache-action action with
command line invocations to the ccache binary, since the action has two
bugs, one of which causes CI to refer to different ccache artifacts
before versus after the build on Windows whereas the other bug can
sometimes cause the action to incorrectly infer that the cache is empty.
Second, this patch slightly alters the cache key, so that our old cache
artifacts, which have grown too big, are eventually discarded in favor
of the new, smaller cache artifacts. Along the way, this patch also
keeps the RollPyTorch's cache artifact separate from the regular build's
cache artifact so as to keep these artifacts small, and also because the
RollPyTorch action is off the critical path for most contributors.
Finally, this patch makes small changes to the CMake file so that on
Windows, the ccache binary is added as a prefix, as recommended on the
[ccache Wiki](https://github.com/ccache/ccache/wiki/MS-Visual-Studio).
* ci: update versions of external actions
Node.js 12 actions are deprecated and will eventually go away, so this
patch bumps the old actions to their latest versions that use Node.js
16.
* ci: replace deprecated action with bash commands
The llvm/actions/install-ninja action uses Node.js 12, which is
deprecated. Since that action is not updated to work with Node.js 16,
this patch replaces that action with equivalent bash commands to install
Ninja.
* ci: use smaller ccache artifacts to reduce evictions
Over time, our ccache sizes have grown quite large (some as large as
1.3 GB), which results in us routinely exceeding GitHub's limits, thus
triggering frequent cache evictions. As a result, cache downloads and
uploads take unnecessary long, in addition to fewer cache entries being
available.
Based on experiments on a clean cache state, it appears that we need
less than 300 MB of (compressed) ccache artifacts for each build type.
Anything larger than that will accrue changes from the past that aren't
needed.
To alleviate the cache burden, this patch sets the maximum ccache size
to be 300 MB. This change should not affect the success or failure of
our builds. I will monitor the build times to check whether this change
causes any performance degradation.
* ci: use consistent platform identifiers
Prior to this patch, some of our builds ran on `ubuntu-latest`, while
some others ran on `ubuntu-20.04` and others ran on `ubuntu-22.04`, with
similar situations for macOS and windows. This patch instead sets all
Linux builds to run on `ubuntu-latest`, all macOS builds to run on
`macos-latest`, and all Windows builds to run on `windows-latest`, to
make debugging future CI failures a little easier.
* Update buildRelease.yml
Update Releases right after a Release build.
* Move gh-page update after release builds
This removes the periodic update and updates after a release build.
This enables building Pytorch from source in the CI.
The build should mostly hit the ccache.
Release builds will follow once we have some runtime on the CI.