This patch makes a few small, but key, changes to enable ccache on
Windows. First, it replaces the hendrikmuhs/ccache-action action with
command line invocations to the ccache binary, since the action has two
bugs, one of which causes CI to refer to different ccache artifacts
before versus after the build on Windows whereas the other bug can
sometimes cause the action to incorrectly infer that the cache is empty.
Second, this patch slightly alters the cache key, so that our old cache
artifacts, which have grown too big, are eventually discarded in favor
of the new, smaller cache artifacts. Along the way, this patch also
keeps the RollPyTorch's cache artifact separate from the regular build's
cache artifact so as to keep these artifacts small, and also because the
RollPyTorch action is off the critical path for most contributors.
Finally, this patch makes small changes to the CMake file so that on
Windows, the ccache binary is added as a prefix, as recommended on the
[ccache Wiki](https://github.com/ccache/ccache/wiki/MS-Visual-Studio).
* ci: update versions of external actions
Node.js 12 actions are deprecated and will eventually go away, so this
patch bumps the old actions to their latest versions that use Node.js
16.
* ci: replace deprecated action with bash commands
The llvm/actions/install-ninja action uses Node.js 12, which is
deprecated. Since that action is not updated to work with Node.js 16,
this patch replaces that action with equivalent bash commands to install
Ninja.
* ci: use smaller ccache artifacts to reduce evictions
Over time, our ccache sizes have grown quite large (some as large as
1.3 GB), which results in us routinely exceeding GitHub's limits, thus
triggering frequent cache evictions. As a result, cache downloads and
uploads take unnecessary long, in addition to fewer cache entries being
available.
Based on experiments on a clean cache state, it appears that we need
less than 300 MB of (compressed) ccache artifacts for each build type.
Anything larger than that will accrue changes from the past that aren't
needed.
To alleviate the cache burden, this patch sets the maximum ccache size
to be 300 MB. This change should not affect the success or failure of
our builds. I will monitor the build times to check whether this change
causes any performance degradation.
* ci: use consistent platform identifiers
Prior to this patch, some of our builds ran on `ubuntu-latest`, while
some others ran on `ubuntu-20.04` and others ran on `ubuntu-22.04`, with
similar situations for macOS and windows. This patch instead sets all
Linux builds to run on `ubuntu-latest`, all macOS builds to run on
`macos-latest`, and all Windows builds to run on `windows-latest`, to
make debugging future CI failures a little easier.
Until recently, we had to either risk feature branches creating PyTorch
build caches (which were unusable by the main branch or other parallel
feature branches because of GitHub's rules around sharing caches among
branches) or we had to limit the PyTorch build caches to only the main
branch, causing CI runs on feature branches to be terribly slow because
they had to rebuild PyTorch each time.
This patch enables the best of both worlds, by using a fork
(github.com/ashay/cache) of the GitHub's cache action, where the fork
adds an option (called `save`) which, when set, uploads a new cache
entry. We thus set this `save` flag only when we're building PyTorch
from source in Torch-MLIR's main branch, whereas all other builds set
this `save` flag to `false`.
The ability to conditionally update the cache has been an oft-requested
feature on the original (github.com/actions/cache) repository and
multiple unmerged PRs exist to allow conditional cache updates, so it is
likely that using the fork is only a temporary solution.
This patch updates the build_linux_packages.sh script so that when
PyTorch needs to be built from source, it is built _before_ building
LLVM and before building Torch-MLIR. The rationale behind this change
is that previously, when the PyTorch build was triggered through the
Torch-MLIR build, the PyTorch compilation added more entries to the
ccache artifacts. However, since we cache the PyTorch _binary_ (i.e.
the WHL file), there is no need to add the PyTorch compilation to the
ccache artifacts. By removing the PyTorch compilation files, we keep
the ccache artifact size small, thus reducing the number of evictions
when we exceed GitHub's allowed limit.
lib/Dialect/Torch/Utils/Utils.cpp includes TorchOps.h, which, by way of
included header files, refers to both TorchOps.h.inc as well as
TorchTypes.h.inc. However, the build rules do not specify the
dependency of the `TorchMLIRTorchUtils` target on the TableGen generated
header files, causing spurious build errors.
This patch fixes the problem by adding `MLIRTorchOpsIncGen` and
`MLIRTorchTypesIncGen` to the list of dependencies of
`TorchMLIRTorchUtils`.
* build: update llvm tag to 74fb770d
This commit makes the following changes needed to update bump LLVM:
+ replace usages of `tensor::createPadScalarOp`, see https://reviews.llvm.org/D136493
+ Update file checks
This patch is part of a larger set of improvements to the CI/build
system. In the code, we refer to the version as the string that
contains the release identifier such as 1.14.0.dev20221028, so calling
the file that contains the commit hash as pytorch-version.txt creates
confusion. For the sake of simplicity, this patch renames that file to
be pytorch-hash.txt.
If PyTorch build caches are created on a branch other than the main
branch, then GitHub does not share those caches with the main branch,
making every CI run that runs for each PR slow. This patch resolves the
problem by letting only the main branch create and use PyTorch build
caches.
The parameter "supportFPInputOnly" of function createPoolingOp() is
supposed to be "supportNonFPInput", which was added to distinguish
between "MaxPool2d" and "AvgPool2d" op in #718
This commit removes almost all of the valsem ops, since the value
semantics version of the ops now exist in PyTorch. The only op missing
is `aten.bernoulli_.float`. In addition, this commit also simplifies
the implementation of `aten.fill.Scalar` by moving it to the pattern
that converts elementwise ops.
* Add LazyGraphExecutor registration
* Update PyTorch version to 1.14.0.dev20221024
Co-authored-by: Roll PyTorch Action <torch-mlir@users.noreply.github.com>
* Relax the need for only CPU versions of PyTorch
This allows installing corresponding PyTorch CUDA / ROCM versions and using torch-mlir.
* Remove obsolete comments
Whether or not the PyTorch build is cached should not affect the success
of the Torch-MLIR build, but based on the existing code, a build may
fail if the `TM_PYTORCH_INSTALL_WITHOUT_REBUILD` variable was set but
the build cache doesn't exist.
Although that variable is set by CI upon a cache hit, nuances of
Github's caching behavior can create situations where the coupling
between `TM_PYTORCH_INSTALL_WITHOUT_REBUILD` and the cache lookup fails.
Specifically, a branch other than our default branch (`main`) may create
the cache entry, but because Github doesn't share this cache entry with
builds running on the `main` branch, the `main` branch build tries to
create it's own cache entry. However, since cache identifiers are
unique and because caches are immutable, the caching step running in the
`main` branch appears to create an invalid cache entry (of 233 bytes,
instead of the expected ~60 MB).
Consequently, subsequent builds observe a cache "hit", since caches
created by the `main` branch are shared with all other branches, but
because this cache entry is invalid (since it doesn't actually contain
the ~60 MB PyTorch WHL file), the builds fail.
One workaround would be to let only the `main` branch create caches, but
in doing so, we would also prevent other branches from _reading_ the
cache, making the builds in those branches terribly slow.
So this patch uses a different workaround, which is to check whether the
PyTorch WHL file exists, even if the build observed a cache hit. If the
file doesn't exist, even if it was a purported cache hit, the code
builds PyTorch from source, which is probably intuitive.
A longer term fix will follow, after a discussion with the wider team.
Upstream PyTorch nightly page
[https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html]
somehow dropped the link for torch-1.14.0.dev20221018 for macOS but not
for Linux or Windows, whereas our RollPyTorch action assumes that if the
nightly version is available for Linux, it is also available for macOS.
This reverts the commit that changed the PyTorch version.
Without this patch, CI logs contained the line:
-- Linker detection: GNU ld
GNU ld is notoriously slow at linking large binaries, so this patch
swaps GNU ld with the LLVM linker.
Since the linker invocation is driven through the compiler, perhaps the
best way to use the LLVM linker is to tell the compiler which linker
binary to use. This patch adds the `-fuse-ld=lld` flag to all Linux
builds of Torch-MLIR in CI to make it use lld.
* ci: cache PyTorch source builds
This patch reduces the time spent in regular CI builds by caching
PyTorch source builds. Specifically, this patch:
1. Makes CI lookup the cache entry for the PyTorch commit hash in
pytorch-version.txt
2. If lookup was successful, CI fetches the previously-generated WHL
file into the build_tools/python/wheelhouse directory
3. CI sets the `TM_PYTORCH_INSTALL_WITHOUT_REBUILD` variable to `true`
4. The build_libtorch.sh script then uses the downloaded WHL file
instead of rebuilding PyTorch
* ci: warm up PyTorch source cache during daily RollPyTorch action
This patch makes the RollPyTorch action write the updated WHL file to
the cache, so that it can be later retrieved by CI that runs for each
PR. We deliberately add the caching step to the end of the action since
the RollPyTorch action never needs to read from the cache, although
executing this step earlier in the process should not cause problems
either.
This commit makes the following changes needed to update bump LLVM:
- Replace `linalg.init_tensor` with `tensor.empty` (see:
https://reviews.llvm.org/D135129)
- Replace `NoSideEffect` with `Pure` (see
https://reviews.llvm.org/D135505)
- Replace `body` region accessor for `ReduceOp` and `ReduceWindowOp`
with `getBody`
- Fix incorrect use of `tosa::ReduceSumOp` in `AtenNativeLayerNormOp`
conversion pattern. The result type of `tosa::ReduceSumOp` must have
the same rank as the input type. (see:
https://www.mlplatform.org/tosa/tosa_spec.html#_reduce_sum)
Co-authored-by: Ashay Rane <ashay@users.noreply.github.com>
Co-authored-by: Ashay Rane <ashay@users.noreply.github.com>
This commit replaces test inputs that were being linearly transformed
by multiplying and adding/subtracting to the input tensor with inputs
that use the `low` and `high` keyword arguments instead.