With the recent LLVM integrate and changes from
https://github.com/llvm/llvm-project/pull/78260, we hit this build error
in Stablehlo (which is quite old).
```
external/stablehlo/stablehlo/transforms/StablehloRefineShapes.cpp:1020:14: error: no member named 'startRootUpdate' in 'mlir::PatternRewriter'
rewriter.startRootUpdate(op);
~~~~~~~~ ^
external/stablehlo/stablehlo/transforms/StablehloRefineShapes.cpp:1026:16: error: no member named 'finalizeRootUpdate' in 'mlir::PatternRewriter'
rewriter.finalizeRootUpdate(op);
~~~~~~~~ ^
external/stablehlo/stablehlo/transforms/StablehloRefineShapes.cpp:1029:16: error: no member named 'cancelRootUpdate' in 'mlir::PatternRewriter'
rewriter.cancelRootUpdate(op);
~~~~~~~~ ^
external/stablehlo/stablehlo/transforms/StablehloRefineShapes.cpp:1108:14: error: no member named 'updateRootInPlace' in 'mlir::PatternRewriter'
rewriter.updateRootInPlace(op->getParentOp(), [&]() { return; });
~~~~~~~~ ^
4 errors generated.
Target @torch-mlir//:torch-mlir-opt failed to build
```
I'm still puzzled as to how this didn't fail with the CMake merge gating
CI (do we not test Stablehlo builds/tests?). In any case, bumping our
submodule to https://github.com/openxla/stablehlo/pull/1918 fixes it.
It exposes a new failing lit test in TorchToStablehlo though, that I
have looped stablehlo developers into
([here](https://discord.com/channels/999073994483433573/999074539138990131/1201235845391331419)).
```
bazel run @torch-mlir//test/Conversion:TorchToStablehlo/scatter.mlir.test
...external/torch-mlir/test/Conversion/TorchToStablehlo/scatter.mlir
within split at <stdin>:1 offset :33:8: error: unexpected error: Expects non-empty reduction block for type inference
%0 = torch.aten.scatter.src %arg0, %int0, %arg1, %arg2 : !torch.vtensor<[?,?],si64>, !torch.int, !torch.vtensor<[?,?],si64>, !torch.vtensor<[?,?],si64> -> !torch.vtensor<[?,?],si64>
^
LLVM ERROR: Failed to infer result type(s).
```
Bazel CI:
https://github.com/sjain-stanford/torch-mlir/actions/runs/7732673480/job/21083102228
This includes custom op matching for decomposed operations and fusing
dequantization into dense operations. As a validation we compare
to the dequant+mm torch implementation.
llvm-project: bbd2b08b95fe76bea138c1b03c1cd42ed3ee04df
stablehlo: ab709fe48de88c67717abfbd7ef17425eb95ddaf
These commits were chosen in order to account for an MLIR API break from
3dbac2c007
which required a patch to stablehlo. We integrate a bit beyond that
commit to deal with some revert/reapply cycles in the intervening range
which were discovered in another downstream.
Further, it requires adaptation to the stablehlo API breaks introduced
from https://github.com/openxla/stablehlo/pull/1872 which are along for
the ride.
Since some stablehlo builders were changed to directly take int64_t
array refs, also traced that up some call stacks to eliminate some
signed/unsigned mismatches that result.
Also adds a few TOSA tests to the passing set that seem to work now.
This is a first step towards the structure we discussed here:
https://gist.github.com/stellaraccident/931b068aaf7fa56f34069426740ebf20
There are two primary goals:
1. Separate the core project (C++ dialects and conversions) from the
hard PyTorch dependencies. We move all such things into projects/pt1 as
a starting point since they are presently entangled with PT1-era APIs.
Additional work can be done to disentangle components from that
(specifically LTC is identified as likely ultimately living in a
`projects/ltc`).
2. Create space for native PyTorch2 Dynamo-based infra to be upstreamed
without needing to co-exist with the original TorchScript path.
Very little changes in this path with respect to build layering or
options. These can be updated in a followup without commingling
directory structure changes.
This also takes steps toward a couple of other layering enhancements:
* Removes the llvm-external-projects/torch-mlir-dialects sub-project,
collapsing it into the main tree.
* Audits and fixes up the core C++ build to account for issues found
while moving things. This is just an opportunistic pass through but
roughly ~halves the number of build actions for the project from the
high 4000's to the low 2000's.
It deviates from the discussed plan by having a `projects/` tree instead
of `compat/`. As I was thinking about it, this will better accommodate
the follow-on code movement.
Once things are roughly in place and the CI passing, followups will
focus on more in-situ fixes and cleanups.
The last llvm bump in https://github.com/llvm/torch-mlir/pull/2511
pointed to
b44b3494f6,
however the bazel build upstream was not clean at this point:
```
ERROR: /root/.cache/bazel/_bazel_root/b89349c08f7224396763d14fe35cba11/external/llvm-project/mlir/BUILD.bazel:5837:18: TdGenerate
external/llvm-project/mlir/include/mlir/Dialect/LLVMIR/NVVMOpsInterface.h.inc failed: (Exit 1): mlir-tblgen failed: error executing command ...
external/llvm-project/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td:20:9: error: Could not find include file 'mlir/Dialect/LLVMIR/BasicPtxBuilderInterface.td'
include "mlir/Dialect/LLVMIR/BasicPtxBuilderInterface.td"
^
external/llvm-project/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td:20:9: error: Unexpected token at top level
include "mlir/Dialect/LLVMIR/BasicPtxBuilderInterface.td"
^
```
The bazel fixes followed in a subsequent commit at
28b27c1b10.
This PR bumps LLVM by a few more commits (to include the bazel fixes)
which helps restore Torch-MLIR's bazel build back to 🟢 .
GHA workflow to test bazel build:
https://github.com/sjain-stanford/torch-mlir/actions/runs/6555101471/job/17803082508
We just have to do this: I ran into an issue today where I needed to make a one line patch to stablehlo to work around a compiler issue, and it is completely unapparent how to do so given that the mlir-hlo repo is a read-only export and is at the tail end of a multi-week integration chain from the open-source stablehlo repo.
We've discussed this often enough and gotten +1 from everyone that they are ok with taking the e2e testing hit if it becomes necessary: It is necessary as the current situation is unmanageable.
Looking at it, I expect it wouldn't actually be very difficult to build a little runner binary out of the stablehlo interpreter and subprocess call that in order to get the testing coverage back. I leave that as an exercise to the users of this part of the stack and recommend following the breadcrumbs from the deleted python/torch_mlir_e2e_test/stablehlo_backends/linalg_on_tensors.py file and the main.py changes.
Note that I am pointing us at a stablehlo fork for the moment until it is apparent that we don't need to carry any local patches to it. We can update this in a few days if everything is clear.
Corresponding commits:
* mlir-hlo: 16886a108eff5197f816ca0f1950cc5ff1b078d9
* stablehlo: 77a59815a82b34f7b08ed2d42a711d9920682d0e
* llvm-project: 4acc3ffbb0af5631bc7916aeff3570f448899647
* Adapt to ByteCodeOpInterface changes.
* Adapt to RegionBranchPoint changes: https://reviews.llvm.org/D159116
* Adapt inferReturnTypes to get the value from properties.
* Adapt invalid.mlir to properties syntax
* [TOSA] Align with custom assembly format change.
* [TOSA] handle change of axis to int32 type
* [TOSA] Restore improper convert to i32
Landing with Windows broken (it cannot be fixed because of the way the mlir-hlo dep is inserted). Will followup with an untangling.
---------
Co-authored-by: TatWai Chong <tatwai.chong@arm.com>
Co-authored-by: Eric Kunze <eric.kunze@arm.com>
This commit updates the `llvm-project` and `mlir-hlo` submodules to
commits:
llvm-project: a3f2751f782f3cdc6ba4790488ec20163a40ac37
mlir-hlo: 97c7e4b4506c3a2441c923e592833f45da439009
Changes made:
- Rename `getSuccessorEntryOperands` with `getEntrySuccessorOperands`
and remove `operands` from
`getSuccessorRegions` (https://reviews.llvm.org/D157506)
- Make `TypeConverter` a `const` (https://reviews.llvm.org/D157601)
This commit updates the `llvm-project` and `mlir-hlo` submodules to
commits:
llvm-project: f580901d5d30e37755212f1c09e5b587587fbfeb
mlir-hlo: 503736d156c25022813c51cbdbe3b862d67a6916
- update llvm tag to 4592543a01609feb4b3c19e81a9d54743e15e329
- mhlo now points to f6615343fdab2c74bebd23c78366cf097f9a72df
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
This patch updates the submodules to:
- llvm: 3f8d8c1aac3086f603ad73f18fe2bd4fb91fa10a
- mhlo: 4384a47b03dc377d651523037867899a340b0e96
The only change made is calling `registerAllExtensions` during dialect
registration. See: https://reviews.llvm.org/D120368
This commit updates the `llvm-project` and `mlir-hlo` submodules to
commits:
- llvm-project: 6875424135312aeb26ab8e0358ba7f9e6e80e741
- mlir-hlo: 92fd33a4bacbeb93ab276a49f38bdebd5f9d7487
The calls to `mlir::MlirOptMain` are updated to no longer specify the
flag `preloadDialectInContext` that has been removed (see:
https://reviews.llvm.org/D149039).
-- This commit adds e2e support for atend.sort op.
-- 1. Adds aten.sort op in torch dialect.
-- 2. Adds tm_tensor.sort op in TMTensor dialect.
-- 3. Adds lowering of aten.sort -> tm_tensor.sort.
Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
- Update llvm tag to 5e111eb275eee3bec1123b4b85606328017e5ee5
- mhlo now points to a99159c45ee5c497f8dce01eff807a6d57629b61
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>