Link to related RFC:
https://discourse.llvm.org/t/rfc-rename-torch-mlir-compile-apis-and-introduce-fx-based-analogs/76646
This commit updates the documentation, tests, CMake files, and API for
the proposed changes in the RFC. There is a new torch_mlir/fx.py for
user level APIs related to importing modules and a corresponding test
for this path can be found at test/python/fx_importer/basic_test.py.
---------
Co-authored-by: MaheshRavishankar <mravisha@amd.com>
Adds an escape hatch from creating a DenseResourceElementsAttr for
single value tensors into DenseElementsAttr.
For 0d or 1element, splats are better as DenseElementsAttr. Don't use
DenseResourceElementsAttr for it
If a tensor is initialized by a list with a single constant integer,
this folder turns it into a torch.vtensor.literal
---------
Co-authored-by: Dave Liddell <dliddell@xilinx.com>
Leaning on the QDQ functionality in torch we can support the QLinearConv
operation by piggybacking through `torch.Convolution`. This includes
some changes such as allowing the `onnx` rewriter to run recursively.
Doing so allows `QLinearConv` to decopmose to `onnx.Convolution` which
is then lowered to `torch`.
The existing `flatten` lowering did not define what the intermediate
shape was. This could result in failures to lower further to linalg as
the intermediate shape was unknown. Added a shape refinement section.
So that the CumSum Op in OPT can get the constant that it requires to be lowered to TMTensor
---------
Co-authored-by: Rob Suderman <rob.suderman@gmail.com>
Co-authored-by: Xida Ren <xida.ren.dev@gmail.com>
`torch` requires that padding be symmetric for pooling operations. To
support non-symmetric pad we need to separately materialize out the
padding operation.
---------
Co-authored-by: James Newling <james.newling@gmail.com>
Fix for https://github.com/llvm/torch-mlir/issues/2765
The onnx docs say that you can't do shape inference using the in-memory
API for models > 2 GB. This fix replaces that API with the file-based
API. Since the new API generates an intermediate file, also added a
--keep switch to keep that file, which I delete by default.
---------
Co-authored-by: Dave Liddell <dliddell@xilinx.com>
With the recent LLVM integrate and changes from
https://github.com/llvm/llvm-project/pull/78260, we hit this build error
in Stablehlo (which is quite old).
```
external/stablehlo/stablehlo/transforms/StablehloRefineShapes.cpp:1020:14: error: no member named 'startRootUpdate' in 'mlir::PatternRewriter'
rewriter.startRootUpdate(op);
~~~~~~~~ ^
external/stablehlo/stablehlo/transforms/StablehloRefineShapes.cpp:1026:16: error: no member named 'finalizeRootUpdate' in 'mlir::PatternRewriter'
rewriter.finalizeRootUpdate(op);
~~~~~~~~ ^
external/stablehlo/stablehlo/transforms/StablehloRefineShapes.cpp:1029:16: error: no member named 'cancelRootUpdate' in 'mlir::PatternRewriter'
rewriter.cancelRootUpdate(op);
~~~~~~~~ ^
external/stablehlo/stablehlo/transforms/StablehloRefineShapes.cpp:1108:14: error: no member named 'updateRootInPlace' in 'mlir::PatternRewriter'
rewriter.updateRootInPlace(op->getParentOp(), [&]() { return; });
~~~~~~~~ ^
4 errors generated.
Target @torch-mlir//:torch-mlir-opt failed to build
```
I'm still puzzled as to how this didn't fail with the CMake merge gating
CI (do we not test Stablehlo builds/tests?). In any case, bumping our
submodule to https://github.com/openxla/stablehlo/pull/1918 fixes it.
It exposes a new failing lit test in TorchToStablehlo though, that I
have looped stablehlo developers into
([here](https://discord.com/channels/999073994483433573/999074539138990131/1201235845391331419)).
```
bazel run @torch-mlir//test/Conversion:TorchToStablehlo/scatter.mlir.test
...external/torch-mlir/test/Conversion/TorchToStablehlo/scatter.mlir
within split at <stdin>:1 offset :33:8: error: unexpected error: Expects non-empty reduction block for type inference
%0 = torch.aten.scatter.src %arg0, %int0, %arg1, %arg2 : !torch.vtensor<[?,?],si64>, !torch.int, !torch.vtensor<[?,?],si64>, !torch.vtensor<[?,?],si64> -> !torch.vtensor<[?,?],si64>
^
LLVM ERROR: Failed to infer result type(s).
```
Bazel CI:
https://github.com/sjain-stanford/torch-mlir/actions/runs/7732673480/job/21083102228
`onnx` explicitly specifies that `raw_data` is stored in `little-endian`
layout. While converting
to `torch` we need to convert from a known endian format to an internal
format of consistent
layout. This means endianness must be correct during the import of
`onnx.Constant`.
---------
Co-authored-by: Xida Ren (Cedar) <cedar.ren@gmail.com>
Note that we are waiting for actual FX traced graph support for sparse
tensors. For details see
https://github.com/pytorch/pytorch/issues/117188
Until then, however, we provide this clever importer that builds the FX
traced graph for for the dense case and then puts a sparse annotation
back on the parameters.
With import test.
Linalg has quantized specific operations. We can lower to these
operations when there is a known zeropoint and scale operations. This
allows the `convolution` to occur with lower bitwidth's, improving the
overall performance.
We were seeing some assertion failures after some checks around folders
were tightened up in LLVM:
https://github.com/llvm/llvm-project/pull/75887 . This PR essentially
moves the logic that used to be applied at the LLVM level into the
folder, which seems to be the suggested fix.
I'm not sure if the IR that caused issues for us _should_ be valid?
```
%1 = torch.aten.detach %arg0 : !torch.tensor<[1],f32> -> !torch.tensor
```
A better fix might be to create a verifier ensuring the result of
`aten.detach` has the same type as its operand.
---------
Co-authored-by: aaron-stgeorge <aaron.stgeorge@getcruise.com>
Torch does not have an equivalent matmul operation for integers. Instead
it sidechannels the information via its quantized types. For this
lowering we setup these sidechannels then invoke `torch.mm`.
This preserves sparsity at the most obvious places of lowering TORCH
tensors to MLIR RankedTensorType tensors. Other places are marked for
audit. With some initial lowering tests.
This adds an encoding field to the torch type, using the interfaces for
printing, parsing, and verification. Note that although this change
prepares adding sparsity to the torch type (as illustrated by the round
trip and invalid tests), nothing in this change depends on the actual
contents of the encoding field!
This includes custom op matching for decomposed operations and fusing
dequantization into dense operations. As a validation we compare
to the dequant+mm torch implementation.
We can plumb the linear matmul into pytorch using its quantized types
with side channel information. To handle the final int8 operation we
dequantize and requantize.
This commit adds mapping from `onnx.pad` op to `torch.pad` op. Currently
it does not support `axes` parameter of `onnx.pad` op.
Signed-off-by: Gaurav Shukla <gaurav.shukla@amd.com>
Currently transposed convolution is not handled correctly by
`TorchToTosa`. This PR allows transposed convolutions to pass through
the conversion so that they can be handled by other conversion passes
later in a pipeline.
An example input which produces a compilation error is:
```
func.func @forward(%input: !torch.vtensor<[1,64,1,100],f32>) -> !torch.vtensor<[1,64,2,200],f32> {
%true = torch.constant.bool true
%int1 = torch.constant.int 1
%int2 = torch.constant.int 2
%weight = torch.vtensor.literal(dense<0.0> : tensor<64x64x3x3xf32>) : !torch.vtensor<[64,64,3,3],f32>
%bias = torch.vtensor.literal(dense<0.0> : tensor<64xf32>) : !torch.vtensor<[64],f32>
%stride = torch.prim.ListConstruct %int2, %int2 : (!torch.int, !torch.int) -> !torch.list<int>
%int1x1 = torch.prim.ListConstruct %int1, %int1 : (!torch.int, !torch.int) -> !torch.list<int>
%output = torch.aten.convolution %input, %weight, %bias, %stride, %int1x1, %int1x1, %true, %int1x1, %int1 : !torch.vtensor<[1,64,1,100],f32>, !torch.vtensor<[64,64,3,3],f32>, !torch.vtensor<[64],f32>, !torch.list<int>, !torch.list<int>, !torch.list<int>, !torch.bool, !torch.list<int>, !torch.int -> !torch.vtensor<[1,64,2,200],f32>
return %output : !torch.vtensor<[1,64,2,200],f32>
}
```
This MLIR produces an error about a cast operation with a size mismatch
when passed through `torch-to-tosa`:
```
error: 'tensor.cast' op operand type 'tensor<1x64x1x50xf32>' and result type 'tensor<1x64x2x200xf32>' are cast incompatible
```
---------
Co-authored-by: Srinath Avadhanula <srinath.avadhanula@getcruise.com>
We can make the per-tensor version of the operation to the dequantize
operation via marking with the make quantized tensor component. This
introductions the `qint*` and `quint*` tensor type that can be lowered
to teh appropriate dequantization behavior during the torch-to-linalg
conversion.
We can map the per_tensor case to the `torch.aten.quantize_per_linear`
operation. In this case we extract the `scale` and `zeropoint` values
and directly invoke the quantization, then return the integer
representation value.
Implemented ONNX.Range. The spec says the data type for start, limit,
delta are 0-D can be double, float, int16, int32, int64, All int types
mapped to !torch.int and all float types mapped to !torch.float
---------
Co-authored-by: Kumar Deepak <kumar@xilinx.com>
Handles the multiple cases of `onnx` constant values and converts them
to `torch` literal tensors. This can include splats with a single
integer or floating point value, a set of explicit integer values, or
an elements array attr of values.
This PR updates the torch-to-tosa conversion with following changes:
- Support torch.none as min/max input argument for tosa.clamp op
- Support negative value as start index for tosa.slice op
- Add tosa.logical_or lowering support
e2e test:
python -m e2e_testing.main --config=tosa
LIT tests:
cmake --build build --target tools/torch-mlir/all
---------
Co-authored-by: Ze Zhang <ze.zhang@getcruise.com>
Changes made during upstreaming:
* Removed comments attributing some copied code back to torch-mlir
(since it is now repatriated).
* Re-organized imports.
* Inlined RefMapping/RefTracker and TypeSubclassMap from an external
utility module.
* Added FxImporter class comments.
* Updated stack trace extraction to be fail safe.
* Added an entry-point for `import_frozen_exported_program` which uses
the shiny new upstream `torch.export.export()` API (versus the
lower-level/older API that Turbine is presently using). This
necessitated a small FX rewrite to line external state management up
with current conventions.
* Adapted one of Turbine's importer tests to go with this initial
submission. Turbine unfortunately has a lot of more-integration-ey
tests, and I would like to extract those as more of unit tests of the
importer features and upstream them that way vs trying to copy directly.
For now, one overall test with the initial submission gets us moving.
I acknowledge that there are some code quality things that could be
improved in this submission: this was authored over the course of many
months (and often via some trial and error). I would like to keep it
relatively converged with the downstream for the next few steps while
getting the test suite upstreamed. And then it will be easier to take a
hygienic pass through the code.
Including co-authors for contributors in the git log of the original
repository.
Co-authored-by: Ean Garvey <87458719+monorimet@users.noreply.github.com>
Co-authored-by: Avinash Sharma <aviator1994@gmail.com>
Co-authored-by: Arham Khan <arhammkhan@gmail.com>
Co-authored-by: brucekimrokcmu <kwangkyk@alumni.cmu.edu>
Co-authored-by: saienduri <77521230+saienduri@users.noreply.github.com>
The expression for HardSigmoid in Onnx
(https://onnx.ai/onnx/operators/onnx__HardSigmoid.html): max(0, min(1,
alpha * x + beta))
is inherently different from HardSigmoid in Torch
(https://pytorch.org/docs/stable/generated/torch.nn.Hardsigmoid.html)
which is: if x < -3 -> 0
elif x > 3 -> 1
else x/6 + 1/2
That being said, it was just better to compute out the entire expression
when translating the Onnx expression to Torch mlir, which is done in
this PR. Some of the logic is shared from the files in
`DecomposeComplexOps`. Therefore, refactored some shared logic between
`DecomposeComplexOps` and `DefaultDomainGToP` and put it in a `Utils`
file.
This PR adds the `enable_ir_printing` option to `torch_mlir.compile`,
which can be used to print the IR for all intermediate passes.
When running the added test file via:
```shell
$ python test/python/compile.py 2> tiny.stderr
```
the file `tiny.stderr` is about 700 KB.
The three remaining compare operations
onnx.Greater
onnx.Less
onnx.GreaterOrEqual
Are also added with this push request.
This concludes a set of basic tensor compare functions.
Lowerings for `transpose` from ONNX to `aten`. Implementation depends on
making multiple `aten.transpose` operations swapping pairs of dimensions.
As `onnx.transpose` can swap around any dimensions it may require
constructing multiple `aten.transpose`.
This replaces the lowering of aten.cat with tensor.concat, allowing more
efficient handling of concatenations in downstream flows. The refbackend
populates concat decomposition patterns that can be used to recover the
previous lowering.
This commit adds the OnnxToTorch support for Reciprocal, Round,
ScatterElements, Sigmoid, Sin, Tanh, Sqrt, Sub, Sum, Where, Xor,
Squeeze, Unsqueeze ops.
For reviewers, the ops that weren't trivial and probably require extra
review are Sum, Squeeze, and Unsqueeze.
Lowerings for `selu` lowerings for ONNX to the corresponding torch
implementations. Torch's `selu` implementation has fewer features so
we use the a generalized `elu` with the input scale set to `1.0`.
Simple Python console script to import an ONNX protobuf to the torch
dialect for additional processing.
For installed wheels, this can be used with something like:
```
torch-mlir-import-onnx test/python/onnx_importer/LeakyReLU.onnx
```
Or from a dev setup:
```
python -m torch_mlir.tools.import_onnx ...
```
This is part 1 of 2, which will also include upstreaming the FX
importer. I started with ONNX because it forces some project layout
updates and is more self contained/easier as a first step.
Deviating somewhat from the RFCs on project layout, I made the following
decisions:
* Locating the `onnx_importer.py` into `torch_mlir.extras` as Maks
already has opened up that namespace and it seemed to fit. Better to
have fewer things at that level.
* Setup the build so that the root project only contains MLIR Python and
pure Python deps (like the importers), but this can be augmented with
the `projects/` adding more depending on which features are enabled.
* The default build continues to build everything whereas in
`TORCH_MLIR_ENABLE_ONLY_MLIR_PYTHON_BINDINGS=1` mode, it builds a
`torch-mlir-core` wheel with the pure contents only.
`onnx_importer.py` and `importer_smoke_test.py` are almost verbatim
copies from SHARK-Turbine. I made some minor local alterations to adapt
to paths and generalize the way they interact with the outer project. I
expect I can copy these back to Turbine verbatim from here. I also
updated the license boilerplate (they have the same license but slightly
different project norms for the headers) but retained the correct
copyright.
Other updates:
* Added the ONNX importer unit test (which also can generate test data)
in lit, conditioned on the availability of the Python `onnx` package. In
a followup once I know everything is stable, I'll add another env var
that the CI can set to always enable this so we know conclusively if
tests pass.
* Moved the ONNX conversion readme to `docs/`.
* Renamed CMake option `TORCH_MLIR_ENABLE_ONLY_MLIR_PYTHON_BINDINGS` ->
`TORCH_MLIR_ENABLE_PYTORCH_EXTENSIONS` and inverted the sense. Made the
JitIR importer and LTC options `cmake_dependent_options` for robustness.
This commit adds the OnnxToTorch support for BitwiseXor, BitwiseOr, Div, Equal, Cast,
Ceil, Floor, Cos, and Clip op.
This commit also adds the TorchToLinalg support for aten.clamp.Tensor and aten.clamp_min.Tensor op.
Signed-Off By: vivekkhandelwal1424@gmail.com
Despite aten.mm requiring the input and output types match, we still opt
to maintain signedness semantics in case later passes try to do any sort
of integer type narrowing.
This commit adds the OnnxToTorch support for Atan, Bitshift, BitwiseAnd,
and BitwiseNot op.
This commit also adds the TorchToLinalg support for AtenBitwiseLeftShiftTensorOp.
Signed-Off By: vivekkhandelwal@nod-labs.com
Adds a pipeline to convert custom ops and metadata represented as
`torch.operator` custom ops to corresponding `torch` ops where possible.
This is part of a multi-part approach for building ONNX import in as a
regular feature of torch-mlir. It is focused on the conversions vs the
infra. We will end up maintaining a [pure-python
importer](https://github.com/nod-ai/SHARK-Turbine/blob/main/python/shark_turbine/importers/onnx_importer.py)
to go with this in torch-mlir, and we will also maintain test case
generation utilities derived from it.
I have left substantial documentation in the README of the conversion
directory, including the recommended approach that we will take to keep
building this out.
(note that this organizes the code to coincide with the refactoring in
#2442 versus the current flat arrangement)
The logic for lowering the aten view op to linalg is fairly complex.
In this PR I have tried to follow all non-failing paths through the
lowering and add unit tests where they're missing.
There is 1 logical change to the lowering: redundant tensor.cast ops
(same source and destination type) are folded.
This lifts the core of the jit_ir_importer and ltc out of the pt1
project, making them peers to it. As a side-effect of this layering, now
the "MLIR bits" (dialects, etc) are not commingled with the various
parts of the pt1 project, allowing pt1 and ltc to overlay cleanly onto a
more fundamental "just MLIR" Python core. Prior to this, the Python
namespace was polluted to the point that this could not happen.
That "just MLIR" Python core will be introduced in a followup, which
will create the space to upstream the FX and ONNX pure Python importers.
This primary non-NFC change to the API is:
* `torch_mlir.dialects.torch.importer.jit_ir` ->
`torch_mlir.jit_ir_importer`.
The rest is source code layering so that we can make the pt1 project
optional without losing the other features.
Progress on #2546.
- adds support for an optional verifier to the generated torch op
tablegen (GeneratedTorchOps.td)
- uses the above to add a verifier for the torch permute op.
Motivation: I hit an unclear error from linalg while developing a
decomposition pass for pixel_shuffle. The error would have been clearer
if the problem had been detected earlier in the invalid aten.permute op.
Testing: new tests added. To run added tests, from the base directory
run
```
./build/bin/llvm-lit test/Dialect/Torch/invalid.mlir
```