This is the lowering of gridsampler from onnx to torch using our prior
implementation of AtenGridSamplerOp.
Here are several checks for cornercases implemented. We may decide to
have part of these checks in AtenGridSamplerOp instead of the onnx
lowering portion.
Finish supporting importing the vast majority of `onnx` operations. This
includes:
- region support
- region value inherentance
- `torch.string` support
- `torch.list` support
- `torch.optional` support
A bunch of small fixes are interlinked and trigger crashes if not
addressed as a group. This includes:
- aten view when expand from a rank-0 tensor
- slice folder with negative indices
- `aten._shape_as_tensor` folder on a rank-0 tensor
- `aten.cat` of a tensor with a length-0 tensor
Torch lowering only supported the most recent version. Refactored the
lowering so more easily handle default values and optional operands /
attributes.
We collapsed and broadcasted scatter indices to a single element
version. We should instead upport `tm_tensor.scatter`s support for
multiple indices and the implicitly broadcasted behavior. This avoids
the serialization and materializing a needlessly large indices tensor.
Also note that we are in the process of proposing SparseTensorMetadata
to PyTorch FX graph export (see
https://github.com/pytorch/pytorch/pull/117907). This will hopefully
eventually replace the current data structures in torch-mlir.
There is no reason to treat `ConstantOfShape` as a specialized import
any as there exists a onnx-to-torch equivalent. Dropping the import
coding and adding support for resource conversion substantially
increases test coverage for dynamically shaped tests.
According to the [official TOSA
spec](https://www.mlplatform.org/tosa/tosa_spec.html#_cast), `tosa.cast`
allows a cast from `fp32` to `fp16`. We were not previously accounting
for this in the `TorchToTosa` lowering.
Also did a tiny bit of cleanup in the code to make it easier to spot
which conversions are currently allowed.
---------
Co-authored-by: Srinath Avadhanula <srinath.avadhanula@getcruise.com>
This enables better re-use in downstreams which use different func
implementations and should have no impact on those that don't except in
opt pipelines if using the old form. With interfaces, explicit pipelines
via `--pass-pipeline=` must be used.
Simple folder for limited size aten tensor operations. This is primarily
useful for shape computation folding as they unfortunately can use
`aten` operators. Add, sub, mul are common examples of these folders.
Onnx slice lowering used arange needlessly instead of directly
constructing the constant dimension values. This makes lowerings to
linalg struggle as multiple folders are required to get what is a
constant index value.
As of https://github.com/pytorch/pytorch/pull/118969, `ExportedProgram`
has the long awaited fixes to correctly categorize various things
relating to parameters, buffers, mutated inputs and constants.
With this additional modeling, we are finally able to implement
(safely/soundly) the mutable semantics that were attempted on the
TorchScript path. The difference is that on that path, we had to
conservatively treat everything as mutable and run some dodgy heuristics
(which have been the cause of many bugs relating to
"MaximizeValueSemantics") to try to get back to an immutable state.
The new model supports mutability at the graph edges, allowing both user
inputs and buffers to be mutated (there is some more support than that,
but that is all I fully tracked through to implementation).
Therefore, when we receive programs like this, we now can selectively
enable mutation at the edges. This happens to be the mutability model
that IREE supports, which I expect to be a primary beneficiary. However,
there is nothing stopping anyone else from handling the `!torch.tensor`
types and the existing copy/overwrite ops that will be selectively
added.
Since this relies on API changes that will not release until 2.3, I'm
being a bit cautious about not refactoring existing facilities.
We can route the torch tests via `onnx` using the `torch.onnx.export`
tooling. We can then reimport, lower to torch, and compile to linalg to
validate the onnx path is working correctly.
The current implementation exposes some failures in the `onnx` path so
we cannot enable the onnx test suite yet due to segmentation faults.
This commit adds the OnnxToTorch lowering for cosh, acosh, asin, asinh,
and atanh op.
This commit also adds the TorchToLinalg lowering for acosh, asin, asinh,
and atanh op.
Signed-Off By: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>
Some operations include a backend matcher for specialized operations. We
map these back to generics so they appropriately match to the high
performance versions. This is done for the attention operation.
Fixes https://github.com/llvm/torch-mlir/issues/2866
Some backends / downstream projects expect that a "fully converted"
program has no remaining ops or attributes from the original dialect(s).
This test exposes issues that need fixing
(1) propagate sparsity into the FX graph (over elt-wise) (2) batched
dimensions need a new "dense(batch)" format
This commit adds the OnnxToTorch support for Mean, IsInf, IsNaN, and
PRelu ops. All high priority ops were taken so went with these. The non
trivial ones are Mean and IsInf which might require extra review
---------
Co-authored-by: MaheshRavishankar <mravisha@amd.com>
Various improvements on sparsity metadata:
(1) define single data structure for all sparsity related metadata
(2) handle batched dense dimensions, as well as dense subtensor
dimensions
(3) refine sparsity propagation for deeper networks
This PR introduces a sparse_jit wrapper that can run simple models with
sparse tensor inputs end-to-end. The implementation shows all required
components on modifying sparse tensor types with a 1:N relation on the
call sites. Two tests shows that the JIT runs end-to-end while computing
the correct results.
More details to follow (generalizing to COO and different ranks, as well
as support for *output* sparse tensors), but the general concepts are
all here now.
**_Update: Thanks to Rob, bump to proper LLVM/MLIR hash is done!_**
_**NOTE that all parameter passing changes are nicely done "downstream"
in MLIR, so very little changes are required in torch-mlir code
proper**_
---------
Co-authored-by: Franz Haniel <77495327+frafranz@users.noreply.github.com>
Co-authored-by: Franz Haniel <franz.haniel@amd.com>
This PR contains three commits to update the validation checks in the
ONNX -> Torch conversion pass for the AveragePool, Pad, and Slice operators:
> onnx: fix preconditions for lowering AveragePool ops
>
> The `pads` attribute of the AveragePool operator specifies the value to
> pad at both the beginning as well as the end of the axis (see
> https://onnx.ai/onnx/operators/onnx__AveragePool.html#attributes), so
> the size of this attribute should be twice the rank of the input tensor.
> However, our TorchOnnxToTorch bails out early since it incorrectly
> compares the pads attribute with the rank (not twice the rank) of the
> input tensor.
>
> This patch fixes the code to match the spec and adds a lit test.
> onnx: allow optional constant value for Pad operator
>
> The `constant_value` input of the onnx.Pad operator is optional (see
> https://onnx.ai/onnx/operators/onnx__Pad.html#inputs), but the
existing
> logic for lowering the operator into the Torch dialect assumes that it
> is mandatory.
>
> This patch makes the attribute optional and constructs a default value
> (a list of zeros the size of the input tensor) if the attribute was not
> specified.
> onnx: fix checks for axes and steps inputs of Slice operator
>
> The ONNX Spec for the Slice operator allows the `starts` and `ends`
> inputs to have fewer indices that the dimensions of the `data` tensor
> (see https://onnx.ai/onnx/operators/onnx__Slice.html), but our code
> expects these inputs to be as many as the `data` tensor's dimensions.
>
> More precisely, the spec requires that the `starts` and `ends` inputs
> are only as long as the `axes` input, but since the `axes` input is
> optional, the default type for the `axes` input has to match the type
> for the `starts` and `ends` inputs. Moreover, the number of indices in
> the `steps` input also has to match those in the `axes` inputs (instad
> of matching the dimensions of the `data` input).
>
> This patch fixes the checks in the TorchOnnxToTorch conversion so that
> they match the ONNX spec.
This commit modifies the OnnxToTorch lowering of Onnx.Reshape op by
creating the result shape list for the aten.reshape using the result
shape values inferred from the op's result shape.
Signed-Off By: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>
Folds aten::index_select ops under the following conditions:
1. If the input and output are the same shape, the indexing operation is
a NOP, so just return the input.
2. If the input has shape <1x1x...xNx...x1> (all 1's except for one
dim), and the output shape is <1x1x...x1> (all 1's), then there is a
single index, so extract the single element value and return a tensor
with that value.
---------
Co-authored-by: Dave Liddell <dliddell@xilinx.com>
Link to related RFC:
https://discourse.llvm.org/t/rfc-rename-torch-mlir-compile-apis-and-introduce-fx-based-analogs/76646
This commit updates the documentation, tests, CMake files, and API for
the proposed changes in the RFC. There is a new torch_mlir/fx.py for
user level APIs related to importing modules and a corresponding test
for this path can be found at test/python/fx_importer/basic_test.py.
---------
Co-authored-by: MaheshRavishankar <mravisha@amd.com>
Adds an escape hatch from creating a DenseResourceElementsAttr for
single value tensors into DenseElementsAttr.
For 0d or 1element, splats are better as DenseElementsAttr. Don't use
DenseResourceElementsAttr for it
If a tensor is initialized by a list with a single constant integer,
this folder turns it into a torch.vtensor.literal
---------
Co-authored-by: Dave Liddell <dliddell@xilinx.com>
Leaning on the QDQ functionality in torch we can support the QLinearConv
operation by piggybacking through `torch.Convolution`. This includes
some changes such as allowing the `onnx` rewriter to run recursively.
Doing so allows `QLinearConv` to decopmose to `onnx.Convolution` which
is then lowered to `torch`.
The existing `flatten` lowering did not define what the intermediate
shape was. This could result in failures to lower further to linalg as
the intermediate shape was unknown. Added a shape refinement section.
So that the CumSum Op in OPT can get the constant that it requires to be lowered to TMTensor
---------
Co-authored-by: Rob Suderman <rob.suderman@gmail.com>
Co-authored-by: Xida Ren <xida.ren.dev@gmail.com>
`torch` requires that padding be symmetric for pooling operations. To
support non-symmetric pad we need to separately materialize out the
padding operation.
---------
Co-authored-by: James Newling <james.newling@gmail.com>
Fix for https://github.com/llvm/torch-mlir/issues/2765
The onnx docs say that you can't do shape inference using the in-memory
API for models > 2 GB. This fix replaces that API with the file-based
API. Since the new API generates an intermediate file, also added a
--keep switch to keep that file, which I delete by default.
---------
Co-authored-by: Dave Liddell <dliddell@xilinx.com>
With the recent LLVM integrate and changes from
https://github.com/llvm/llvm-project/pull/78260, we hit this build error
in Stablehlo (which is quite old).
```
external/stablehlo/stablehlo/transforms/StablehloRefineShapes.cpp:1020:14: error: no member named 'startRootUpdate' in 'mlir::PatternRewriter'
rewriter.startRootUpdate(op);
~~~~~~~~ ^
external/stablehlo/stablehlo/transforms/StablehloRefineShapes.cpp:1026:16: error: no member named 'finalizeRootUpdate' in 'mlir::PatternRewriter'
rewriter.finalizeRootUpdate(op);
~~~~~~~~ ^
external/stablehlo/stablehlo/transforms/StablehloRefineShapes.cpp:1029:16: error: no member named 'cancelRootUpdate' in 'mlir::PatternRewriter'
rewriter.cancelRootUpdate(op);
~~~~~~~~ ^
external/stablehlo/stablehlo/transforms/StablehloRefineShapes.cpp:1108:14: error: no member named 'updateRootInPlace' in 'mlir::PatternRewriter'
rewriter.updateRootInPlace(op->getParentOp(), [&]() { return; });
~~~~~~~~ ^
4 errors generated.
Target @torch-mlir//:torch-mlir-opt failed to build
```
I'm still puzzled as to how this didn't fail with the CMake merge gating
CI (do we not test Stablehlo builds/tests?). In any case, bumping our
submodule to https://github.com/openxla/stablehlo/pull/1918 fixes it.
It exposes a new failing lit test in TorchToStablehlo though, that I
have looped stablehlo developers into
([here](https://discord.com/channels/999073994483433573/999074539138990131/1201235845391331419)).
```
bazel run @torch-mlir//test/Conversion:TorchToStablehlo/scatter.mlir.test
...external/torch-mlir/test/Conversion/TorchToStablehlo/scatter.mlir
within split at <stdin>:1 offset :33:8: error: unexpected error: Expects non-empty reduction block for type inference
%0 = torch.aten.scatter.src %arg0, %int0, %arg1, %arg2 : !torch.vtensor<[?,?],si64>, !torch.int, !torch.vtensor<[?,?],si64>, !torch.vtensor<[?,?],si64> -> !torch.vtensor<[?,?],si64>
^
LLVM ERROR: Failed to infer result type(s).
```
Bazel CI:
https://github.com/sjain-stanford/torch-mlir/actions/runs/7732673480/job/21083102228
`onnx` explicitly specifies that `raw_data` is stored in `little-endian`
layout. While converting
to `torch` we need to convert from a known endian format to an internal
format of consistent
layout. This means endianness must be correct during the import of
`onnx.Constant`.
---------
Co-authored-by: Xida Ren (Cedar) <cedar.ren@gmail.com>
Note that we are waiting for actual FX traced graph support for sparse
tensors. For details see
https://github.com/pytorch/pytorch/issues/117188
Until then, however, we provide this clever importer that builds the FX
traced graph for for the dense case and then puts a sparse annotation
back on the parameters.
With import test.
Linalg has quantized specific operations. We can lower to these
operations when there is a known zeropoint and scale operations. This
allows the `convolution` to occur with lower bitwidth's, improving the
overall performance.
We were seeing some assertion failures after some checks around folders
were tightened up in LLVM:
https://github.com/llvm/llvm-project/pull/75887 . This PR essentially
moves the logic that used to be applied at the LLVM level into the
folder, which seems to be the suggested fix.
I'm not sure if the IR that caused issues for us _should_ be valid?
```
%1 = torch.aten.detach %arg0 : !torch.tensor<[1],f32> -> !torch.tensor
```
A better fix might be to create a verifier ensuring the result of
`aten.detach` has the same type as its operand.
---------
Co-authored-by: aaron-stgeorge <aaron.stgeorge@getcruise.com>
Torch does not have an equivalent matmul operation for integers. Instead
it sidechannels the information via its quantized types. For this
lowering we setup these sidechannels then invoke `torch.mm`.
This preserves sparsity at the most obvious places of lowering TORCH
tensors to MLIR RankedTensorType tensors. Other places are marked for
audit. With some initial lowering tests.
This adds an encoding field to the torch type, using the interfaces for
printing, parsing, and verification. Note that although this change
prepares adding sparsity to the torch type (as illustrated by the round
trip and invalid tests), nothing in this change depends on the actual
contents of the encoding field!
This includes custom op matching for decomposed operations and fusing
dequantization into dense operations. As a validation we compare
to the dequant+mm torch implementation.
We can plumb the linear matmul into pytorch using its quantized types
with side channel information. To handle the final int8 operation we
dequantize and requantize.
This commit adds mapping from `onnx.pad` op to `torch.pad` op. Currently
it does not support `axes` parameter of `onnx.pad` op.
Signed-off-by: Gaurav Shukla <gaurav.shukla@amd.com>
Currently transposed convolution is not handled correctly by
`TorchToTosa`. This PR allows transposed convolutions to pass through
the conversion so that they can be handled by other conversion passes
later in a pipeline.
An example input which produces a compilation error is:
```
func.func @forward(%input: !torch.vtensor<[1,64,1,100],f32>) -> !torch.vtensor<[1,64,2,200],f32> {
%true = torch.constant.bool true
%int1 = torch.constant.int 1
%int2 = torch.constant.int 2
%weight = torch.vtensor.literal(dense<0.0> : tensor<64x64x3x3xf32>) : !torch.vtensor<[64,64,3,3],f32>
%bias = torch.vtensor.literal(dense<0.0> : tensor<64xf32>) : !torch.vtensor<[64],f32>
%stride = torch.prim.ListConstruct %int2, %int2 : (!torch.int, !torch.int) -> !torch.list<int>
%int1x1 = torch.prim.ListConstruct %int1, %int1 : (!torch.int, !torch.int) -> !torch.list<int>
%output = torch.aten.convolution %input, %weight, %bias, %stride, %int1x1, %int1x1, %true, %int1x1, %int1 : !torch.vtensor<[1,64,1,100],f32>, !torch.vtensor<[64,64,3,3],f32>, !torch.vtensor<[64],f32>, !torch.list<int>, !torch.list<int>, !torch.list<int>, !torch.bool, !torch.list<int>, !torch.int -> !torch.vtensor<[1,64,2,200],f32>
return %output : !torch.vtensor<[1,64,2,200],f32>
}
```
This MLIR produces an error about a cast operation with a size mismatch
when passed through `torch-to-tosa`:
```
error: 'tensor.cast' op operand type 'tensor<1x64x1x50xf32>' and result type 'tensor<1x64x2x200xf32>' are cast incompatible
```
---------
Co-authored-by: Srinath Avadhanula <srinath.avadhanula@getcruise.com>
We can make the per-tensor version of the operation to the dequantize
operation via marking with the make quantized tensor component. This
introductions the `qint*` and `quint*` tensor type that can be lowered
to teh appropriate dequantization behavior during the torch-to-linalg
conversion.
We can map the per_tensor case to the `torch.aten.quantize_per_linear`
operation. In this case we extract the `scale` and `zeropoint` values
and directly invoke the quantization, then return the integer
representation value.
Implemented ONNX.Range. The spec says the data type for start, limit,
delta are 0-D can be double, float, int16, int32, int64, All int types
mapped to !torch.int and all float types mapped to !torch.float
---------
Co-authored-by: Kumar Deepak <kumar@xilinx.com>
Handles the multiple cases of `onnx` constant values and converts them
to `torch` literal tensors. This can include splats with a single
integer or floating point value, a set of explicit integer values, or
an elements array attr of values.
This PR updates the torch-to-tosa conversion with following changes:
- Support torch.none as min/max input argument for tosa.clamp op
- Support negative value as start index for tosa.slice op
- Add tosa.logical_or lowering support
e2e test:
python -m e2e_testing.main --config=tosa
LIT tests:
cmake --build build --target tools/torch-mlir/all
---------
Co-authored-by: Ze Zhang <ze.zhang@getcruise.com>
Changes made during upstreaming:
* Removed comments attributing some copied code back to torch-mlir
(since it is now repatriated).
* Re-organized imports.
* Inlined RefMapping/RefTracker and TypeSubclassMap from an external
utility module.
* Added FxImporter class comments.
* Updated stack trace extraction to be fail safe.
* Added an entry-point for `import_frozen_exported_program` which uses
the shiny new upstream `torch.export.export()` API (versus the
lower-level/older API that Turbine is presently using). This
necessitated a small FX rewrite to line external state management up
with current conventions.
* Adapted one of Turbine's importer tests to go with this initial
submission. Turbine unfortunately has a lot of more-integration-ey
tests, and I would like to extract those as more of unit tests of the
importer features and upstream them that way vs trying to copy directly.
For now, one overall test with the initial submission gets us moving.
I acknowledge that there are some code quality things that could be
improved in this submission: this was authored over the course of many
months (and often via some trial and error). I would like to keep it
relatively converged with the downstream for the next few steps while
getting the test suite upstreamed. And then it will be easier to take a
hygienic pass through the code.
Including co-authors for contributors in the git log of the original
repository.
Co-authored-by: Ean Garvey <87458719+monorimet@users.noreply.github.com>
Co-authored-by: Avinash Sharma <aviator1994@gmail.com>
Co-authored-by: Arham Khan <arhammkhan@gmail.com>
Co-authored-by: brucekimrokcmu <kwangkyk@alumni.cmu.edu>
Co-authored-by: saienduri <77521230+saienduri@users.noreply.github.com>
The expression for HardSigmoid in Onnx
(https://onnx.ai/onnx/operators/onnx__HardSigmoid.html): max(0, min(1,
alpha * x + beta))
is inherently different from HardSigmoid in Torch
(https://pytorch.org/docs/stable/generated/torch.nn.Hardsigmoid.html)
which is: if x < -3 -> 0
elif x > 3 -> 1
else x/6 + 1/2
That being said, it was just better to compute out the entire expression
when translating the Onnx expression to Torch mlir, which is done in
this PR. Some of the logic is shared from the files in
`DecomposeComplexOps`. Therefore, refactored some shared logic between
`DecomposeComplexOps` and `DefaultDomainGToP` and put it in a `Utils`
file.
This PR adds the `enable_ir_printing` option to `torch_mlir.compile`,
which can be used to print the IR for all intermediate passes.
When running the added test file via:
```shell
$ python test/python/compile.py 2> tiny.stderr
```
the file `tiny.stderr` is about 700 KB.
The three remaining compare operations
onnx.Greater
onnx.Less
onnx.GreaterOrEqual
Are also added with this push request.
This concludes a set of basic tensor compare functions.
Lowerings for `transpose` from ONNX to `aten`. Implementation depends on
making multiple `aten.transpose` operations swapping pairs of dimensions.
As `onnx.transpose` can swap around any dimensions it may require
constructing multiple `aten.transpose`.
This replaces the lowering of aten.cat with tensor.concat, allowing more
efficient handling of concatenations in downstream flows. The refbackend
populates concat decomposition patterns that can be used to recover the
previous lowering.
This commit adds the OnnxToTorch support for Reciprocal, Round,
ScatterElements, Sigmoid, Sin, Tanh, Sqrt, Sub, Sum, Where, Xor,
Squeeze, Unsqueeze ops.
For reviewers, the ops that weren't trivial and probably require extra
review are Sum, Squeeze, and Unsqueeze.
Lowerings for `selu` lowerings for ONNX to the corresponding torch
implementations. Torch's `selu` implementation has fewer features so
we use the a generalized `elu` with the input scale set to `1.0`.
Simple Python console script to import an ONNX protobuf to the torch
dialect for additional processing.
For installed wheels, this can be used with something like:
```
torch-mlir-import-onnx test/python/onnx_importer/LeakyReLU.onnx
```
Or from a dev setup:
```
python -m torch_mlir.tools.import_onnx ...
```
This is part 1 of 2, which will also include upstreaming the FX
importer. I started with ONNX because it forces some project layout
updates and is more self contained/easier as a first step.
Deviating somewhat from the RFCs on project layout, I made the following
decisions:
* Locating the `onnx_importer.py` into `torch_mlir.extras` as Maks
already has opened up that namespace and it seemed to fit. Better to
have fewer things at that level.
* Setup the build so that the root project only contains MLIR Python and
pure Python deps (like the importers), but this can be augmented with
the `projects/` adding more depending on which features are enabled.
* The default build continues to build everything whereas in
`TORCH_MLIR_ENABLE_ONLY_MLIR_PYTHON_BINDINGS=1` mode, it builds a
`torch-mlir-core` wheel with the pure contents only.
`onnx_importer.py` and `importer_smoke_test.py` are almost verbatim
copies from SHARK-Turbine. I made some minor local alterations to adapt
to paths and generalize the way they interact with the outer project. I
expect I can copy these back to Turbine verbatim from here. I also
updated the license boilerplate (they have the same license but slightly
different project norms for the headers) but retained the correct
copyright.
Other updates:
* Added the ONNX importer unit test (which also can generate test data)
in lit, conditioned on the availability of the Python `onnx` package. In
a followup once I know everything is stable, I'll add another env var
that the CI can set to always enable this so we know conclusively if
tests pass.
* Moved the ONNX conversion readme to `docs/`.
* Renamed CMake option `TORCH_MLIR_ENABLE_ONLY_MLIR_PYTHON_BINDINGS` ->
`TORCH_MLIR_ENABLE_PYTORCH_EXTENSIONS` and inverted the sense. Made the
JitIR importer and LTC options `cmake_dependent_options` for robustness.
This commit adds the OnnxToTorch support for BitwiseXor, BitwiseOr, Div, Equal, Cast,
Ceil, Floor, Cos, and Clip op.
This commit also adds the TorchToLinalg support for aten.clamp.Tensor and aten.clamp_min.Tensor op.
Signed-Off By: vivekkhandelwal1424@gmail.com
Despite aten.mm requiring the input and output types match, we still opt
to maintain signedness semantics in case later passes try to do any sort
of integer type narrowing.
This commit adds the OnnxToTorch support for Atan, Bitshift, BitwiseAnd,
and BitwiseNot op.
This commit also adds the TorchToLinalg support for AtenBitwiseLeftShiftTensorOp.
Signed-Off By: vivekkhandelwal@nod-labs.com
Adds a pipeline to convert custom ops and metadata represented as
`torch.operator` custom ops to corresponding `torch` ops where possible.
This is part of a multi-part approach for building ONNX import in as a
regular feature of torch-mlir. It is focused on the conversions vs the
infra. We will end up maintaining a [pure-python
importer](https://github.com/nod-ai/SHARK-Turbine/blob/main/python/shark_turbine/importers/onnx_importer.py)
to go with this in torch-mlir, and we will also maintain test case
generation utilities derived from it.
I have left substantial documentation in the README of the conversion
directory, including the recommended approach that we will take to keep
building this out.
(note that this organizes the code to coincide with the refactoring in
#2442 versus the current flat arrangement)
The logic for lowering the aten view op to linalg is fairly complex.
In this PR I have tried to follow all non-failing paths through the
lowering and add unit tests where they're missing.
There is 1 logical change to the lowering: redundant tensor.cast ops
(same source and destination type) are folded.
This lifts the core of the jit_ir_importer and ltc out of the pt1
project, making them peers to it. As a side-effect of this layering, now
the "MLIR bits" (dialects, etc) are not commingled with the various
parts of the pt1 project, allowing pt1 and ltc to overlay cleanly onto a
more fundamental "just MLIR" Python core. Prior to this, the Python
namespace was polluted to the point that this could not happen.
That "just MLIR" Python core will be introduced in a followup, which
will create the space to upstream the FX and ONNX pure Python importers.
This primary non-NFC change to the API is:
* `torch_mlir.dialects.torch.importer.jit_ir` ->
`torch_mlir.jit_ir_importer`.
The rest is source code layering so that we can make the pt1 project
optional without losing the other features.
Progress on #2546.
- adds support for an optional verifier to the generated torch op
tablegen (GeneratedTorchOps.td)
- uses the above to add a verifier for the torch permute op.
Motivation: I hit an unclear error from linalg while developing a
decomposition pass for pixel_shuffle. The error would have been clearer
if the problem had been detected earlier in the invalid aten.permute op.
Testing: new tests added. To run added tests, from the base directory
run
```
./build/bin/llvm-lit test/Dialect/Torch/invalid.mlir
```
This is a first step towards the structure we discussed here:
https://gist.github.com/stellaraccident/931b068aaf7fa56f34069426740ebf20
There are two primary goals:
1. Separate the core project (C++ dialects and conversions) from the
hard PyTorch dependencies. We move all such things into projects/pt1 as
a starting point since they are presently entangled with PT1-era APIs.
Additional work can be done to disentangle components from that
(specifically LTC is identified as likely ultimately living in a
`projects/ltc`).
2. Create space for native PyTorch2 Dynamo-based infra to be upstreamed
without needing to co-exist with the original TorchScript path.
Very little changes in this path with respect to build layering or
options. These can be updated in a followup without commingling
directory structure changes.
This also takes steps toward a couple of other layering enhancements:
* Removes the llvm-external-projects/torch-mlir-dialects sub-project,
collapsing it into the main tree.
* Audits and fixes up the core C++ build to account for issues found
while moving things. This is just an opportunistic pass through but
roughly ~halves the number of build actions for the project from the
high 4000's to the low 2000's.
It deviates from the discussed plan by having a `projects/` tree instead
of `compat/`. As I was thinking about it, this will better accommodate
the follow-on code movement.
Once things are roughly in place and the CI passing, followups will
focus on more in-situ fixes and cleanups.
NonValueSemantic Ops like Add_, div_, etc. expect result DType to be the
same as the first input. However, current implementation would result in
wrong result type for case like:
```python
a = torch.randn(3, 3).half() # float16
b = torch.randn(3, 3) # float32
a += b # i.e. torch.ops.aten.add_(a, b)
```
torch expects `a` to be float16, but dtype refinement would infer
float32 type, since it's replaced by `aten.add`.
Add aten.isclose op
Add its torch-to-tosa lowering
Update the TorchToTosa/basic.mlir tests
To test e2e tosa lowering:
`python -m e2e_testing.main -v -c=tosa`
---------
Co-authored-by: Ze Zhang <ze.zhang@getcruise.com>
Add aten.unflatten.int op
Add its torch-to-tosa lowering
Update the TorchToTosa/basic.mlir tests
To test e2e tosa lowering:
`python -m e2e_testing.main -v -c=tosa`
---------
Co-authored-by: Ze Zhang <ze.zhang@getcruise.com>
Strict symbolic shapes allow us to assume numpy-style dynamic broadcasts
never occur. This allows us to strengthen the folder for broadcasts to
cases where the rank is the same and all shapes match (including dynamic
sentinel values).
When importing dynamic shaped programs from Dynamo, via torch.compile or
torch.export, we can assume that strict symbolic shape checks have been
done prior to generating torch IR. Among other shape checking, this
eliminates the case where an unknown dimension can be dynamically '1' in
a way that signals a broadcast.
Adds a `isAssumingStrictSymbolicShapes` utility which consults a
`torch.assume_strict_symbolic_shapes` attribute on an enclosing scope
and returns true if present.
In the linalg pipeline, many runtime checks are elided when this returns
true.
Corresponding commits:
* mlir-hlo: 16886a108eff5197f816ca0f1950cc5ff1b078d9
* stablehlo: 77a59815a82b34f7b08ed2d42a711d9920682d0e
* llvm-project: 4acc3ffbb0af5631bc7916aeff3570f448899647
* Adapt to ByteCodeOpInterface changes.
* Adapt to RegionBranchPoint changes: https://reviews.llvm.org/D159116
* Adapt inferReturnTypes to get the value from properties.
* Adapt invalid.mlir to properties syntax
* [TOSA] Align with custom assembly format change.
* [TOSA] handle change of axis to int32 type
* [TOSA] Restore improper convert to i32
Landing with Windows broken (it cannot be fixed because of the way the mlir-hlo dep is inserted). Will followup with an untangling.
---------
Co-authored-by: TatWai Chong <tatwai.chong@arm.com>
Co-authored-by: Eric Kunze <eric.kunze@arm.com>
* view_as_real test case, allow dtype in testutils.randn
* abstract python upstream func implemented
* fixed upstream dtype func, implemented view_as_real backend op
* formatted AtenViewAsRealOp, removed change in e2etest/framework
* removed test suit from reshape_like.py, because it's moved to basic.py
* implemented C-API wrapper for mlirComplexF128 type
* fixed torch.complex dtype width in MLIR and Torch MLIR, deleted float16 dtype dict
* Changed IR input of aten fft_fft unit test
* code refactored
* code refactored and fixed ci test
* refactored: removed white spaces, and rolled back to having both input/output affine expr
* refactored: deleted output affine expr to reduce redundancy
* xfail ltc backend
* removed ComplexImag and ComplexReal from torchdynamo xfail set
* copied and pasted from main branch as there's no change to be made in this file
* refactored abstract_interp_lib_gen.py
* refactored: torchtypes.td, formatted, removed commented out code
* Support brevitas custom op (#2320)
* f16 change for brevitas
* Adapt the change of brevitas quant custom op name
* Add unit tests
* Make brevitas conversions isolated
* Address the comments
---------
Co-authored-by: dan <danimal197@gmail.com>
When using custom ops, sometimes PyTorch will insert namespaces to the
abstract interpretation function name in the format:
`__torch__.{namespace_1}.{namespace_2}...{op_name}`. The extra
namespaces are not part of the abstract interpretation function name,
so it needs to be removed before generating the library of MLIR
snippets of abstract interpretation functions. This commit adds
support for removing the namespace information.
* LTC->MLIR Debug Info support
* SW-95317 Propagate Lazy->Jit->MLIR scope name.
* Enhance location information based on op names
Currently, the location information attached to the ops just considers
the filename, line number and column number. Attaching operation name
would help identify the type of computation by just looking at the
profile of execution.
* Update locations logic; updated debug-info.py test
* Use {scope}/{op_name} format to track names by default
---------
Co-authored-by: Gleb Kazantaev <gleb.kazantaev@cerebras.net>
Co-authored-by: Mark Browning <mark@cerebras.net>
Co-authored-by: Vimal Patel <vimal@polymagelabs.com>
The implementation at this place was a remnent of the times the pipeline was
run only once.
Rely instead on the backend verification, after optimizations have had an
opportunity to resolve some uncertainties. (e.g. `!torch.optional`).
* RecomposeComplexOps: Remove dead slice op
* lib/Dialect/Torch/IR/TorchOps.cpp: Fold slice ops even when they are on non-value tensors
* lib/Conversion/TorchToTosa/TorchToTosa.cpp: Fix slice start/end out of range/none
* lib/Dialect/Torch/IR/TorchOps.cpp: AtenSliceTensorOp::fold: Fold slices that go from 0:int_max
* More tests for aten.split.Tensor
In PyTorch, the `NumberType` is equal to `Union[int, float,
complex]`. However, the abstract interpretation library was treating
the `NumberType` as `Union[int, float]`, resulting in type mismatches
when reifying certain dtype functions. This commit fixes the type
inconsistency by having the abstract interpretation functions take as
an input a `Union[int, float, complex]` for the ops that take
`!torch.number` inputs.
This commit adds the support for index.Tensor op when the index values
are negative. This commit wraps around the index values by checking
their values at run time.
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
check the return type of the division to figure out whether to use
the floating point implementation of a division or to use the integer.
the issue rose from the fact that the inputs are all integer but the
result was casted to floating point. The conversion then chose to
use the integer implementation of division which is not legal in tosa
when all the inputs get casted to floating point.
fix(TorchToLinalg): AtenDivScalarOp
upcast self operand as well if applicable, the self operand must also
be casted to float as it can be an integer.
When `use_tracing=True` is used to import a model into Torch-MLIR,
several casts get inserted in the IR to bridge the untyped inputs and
outputs with the typed body of the computation. These casts create
extra aliases of tensors that cause the current analysis in
`maximize-value-semantics` to fail.
In particular, the `maximize-value-semantics` analysis assumes that the
only valid alias right after an overwrite is the overwritten
alias. So, if there is a use of a casted version of the overwritten
alias after the overwrite, the analysis fails.
This commit improves the analysis by identifying all cast-like aliases
of the overwritten alias and allowing such aliases to be used after an
overwrite.
Because this issue only arises when using tracing, it cannot be
currently tested e2e, so only lit test is added.
Lowering torch operations that allow different compatible data types
in its operands to tosa end up generating invalid tosa IR with mixed
data types. In tosa spec, certain operations (generally element-wise
operations) require all operands to have the same data type.
Add wrapper functions for those element-wise tosa ops to perform op
creation with type conversion if necessary.
This commit adds dtype functions for all the torch ops that did not
previously have one and removes the pass `RefineTypes`, since the
abstract interpretation library now takes care of all the dtype
propagation.
All dtype functions added are tested except for
- `aten.embedding`
- `aten._embedding_bag`
- `aten.embedding_bag`
These functions need a change to the testing framework to allow
specifying the actual data inside the tensor used for testing. I will
fix this in a follow up patch.
Co-authored-by: Jiahao Li <liplus17@163.com>
Bool tensors are represented in TorchScript as an array of
`int8_t`s. However, when importing them into Torch-MLIR, the importer
was assuming the array had `int32_t` elements, leading to the importer
reading into memory that was out of bounds. This commit fixes the
casting of the bool tensor.
This commits adds the support for cases for index_put_op:
1.) where index is a 2-d tensor.
2.) where indices is a list of tensors and none, with exactly
2 non none tensors along the consecutive dimensions.
This commit also adds a utility to compute the broadcast shape
given the two input tensors.
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
This commit adds the ability to specify extra abstract interpretation
functions in `torch_mlir.compile` to use during type refinement. This
allows users to easily add custom ops without having to interact with
MLIR or C++ directly.
To keep things simple in shape functions, `Scalar` inputs are
considered `float`s. This means that when inserting the shape
functions into the IR, we must cast any `!torch.number`s into `float`s
so that the operand type matches the expected type in the shape
function. This commit adds the cast from `Scalar` to `float`.
The original design for the dtype functions outlined in
https://github.com/llvm/torch-mlir/issues/1462 was unable to properly
handle ops that take optional tensors as an input when the optional
tensor has a value of None. By the time the op gets imported into
torch-mlir, if an optional value is None, all information about the
original type is lost from the op type signature, preventing
torch-mlir from knowing if a value of None was from an optional tensor
or not, which was crucial in the original design since each tensor
argument must be turned into two separate arguments for the dtype
function.
This commit changes the interface to dtype functions such that each
tensor turns into a tuple of two ints, the first representing the rank
of the tensor and the second the dtype of the tensor. Since now there
is a one-to-one correspondence between the operands of an op and the
operands of its dtype function, there is no ambiguity about which
operand of the op corresponds with which operand of the dtype
function.
To test the implementation, this commit defines dtype function for
convolution op, which takes one optional tensor as an argument.
* LowerToBackendContract: Explicitly error out on unimplemented operator
But only reject torch.operator when results are invalid.
Otherwise it might be a custom op that the backend supports.
Currently, the op `torch.tensor_static_info_cast` will not get
canonicalized away if the result type has any shape or dtype
information. This is because `isValidSubtype` only returns true when
the tensor types being compared are exactly the same or the supertype
has no shape and dtype information. Being unable to canonicalize away
the `torch.tensor_static_info_cast` gets in the way of further
optimizations, such as shape propagation.
This commit improves `isValidSubtype` by adding logic that compares
the shapes and dtypes of the two tensor types to determine of one type
is indeed a valid subtype of the other.
Fixes https://github.com/llvm/torch-mlir/issues/1926
The data-flow analysis does not always propagate information to the
entire graph. This results in some lattice elements being
uninitialized. Currently the lattice elements are not checked to see
if they are uninitialized before rewriting the graph, potentially
resulting in invalid IR (see
https://github.com/llvm/torch-mlir/issues/1896).
This commit adds handling for uninitialized lattice elements.
This patch replaces all MHLO operations with their StableHLO
counterparts and adds a validation pass to ensure that no MHLO operations
remain before translating all Stablehlo operations to the MHLO dialect
for further lowering to the Linalg dialect.
This patch also updates all lit tests so that they refer to the
`convert-torch-to-stablehlo` pass and so that they check for StableHLO
operations.
Credit to @vivekkhandelwal1 for finding the necessary changes.
Summary of changes:
- Switch Tosa_IntArrayAttr[N], Tosa_IntArrayAttrUpto[N] to DenseI64ArrayAttr.
- Replace kNoIterationLimit with kNoLimit. (https://reviews.llvm.org/D140525)
- Add dependency on MhloPasses when MHLO is enabled
- Specify result type when using mhlo::DotOp
This commit replaces the `tanh` dtype function, which was being used
to test the implementation of dtype functions in
a710237437, with a dtype function for
`expm1`. The dtype function for `expm1` is identical to the `tanh`
one, so the same level of testing is maintained.
Currently, there are ops getting dtype information from the
`RefineTypes` pass and ops getting dtype information from the
`TorchDtypeRefinementPipeline`. Since each pass can only propagete
dtype information for the ops it knows how to handle, some models with
many ops handled in both passes require the two dtype propagation
passes to execute many times, reaching the iteration limit set in the
`LowerToBackendContractPass`. To temporarily avoid this issue while
the migration to `TorchDtypeRefinementPipeline` is finished, this
commit switches `tanh` to `expm1`, since the latter is used a lot less
in large models.
This reverts commit eaab9be207, since it
is causing the post-merge CI tests to fail, causing subsequent PRs to be
blocked. Specifically, the tests
`ElementwiseAtenLogicalAndOpPromoteBroadcastModule_basic` and
`ElementwiseAtenLogicalXorOpPromoteBroadcastModule_basic` fail because
the oracle does not match the computed result. This patch reverts the
commit to make the post-merge builds green again.
In order to verify if a given IR satisfies the backend contract, the
verifier needs to know if decompositions took place, and if so, which
ops were decomposed and which were not.
This commit adds two arguments to `verifyBackendContractPass` to
specify if decompositions took place and which ops to consider backend
legal, similar to the arguments of `LowerToBackendContractPass`.
* [custom op] Generalize shape library logic to work with dtypes
This commit generalizes the shape library logic, so that dtype rules
for ops can also be expressed using the same mechanism. In other
words, each op can now have a shape function and a dtype function
specified in Python that is imported during lowering to calculate the
shapes and dtypes throught a program. For more information about how
to specify a dtype function, see the updated
`docs/adding_a_shape_and_dtype_function.md`.
For those not familiar with how the shape library works, the file
`docs/calculations_lib.md` provides an overview.
The current implementation of `DecomposeComplexOps` fails if an op
expected to be decomposed does not get decomposed in the first
iteration of the `createTorchSimplificationPipeline` in
`LowerToBackendContractPass`. However, some graphs require multiple
iterations of `createTorchSimplificationPipeline` to fully propagate
all statically knowable information, such as dtypes and shapes, to the
entire graph, sometimes resulting in the need to run
`DecomposeComplexOps` more than once.
This commit changes `DecomposeComplexOps` to use a greedy algorithm
for pattern application and moves the legalization check of ops to the
`LowerToBackendContractPass` to allow for the `DecomposeComplexOps` to
run more than once.
- Support for non-prefixed accessors has been removed. See:
https://reviews.llvm.org/D136727
- Rename `operands` to `methodOperands` in `prim.CallMethod` since the
name `operands` overlaps with a builtin method name. See:
https://reviews.llvm.org/D136727
- Add passes in refbackend to lower memref.subview. See:
https://reviews.llvm.org/D136377
- Replace `CopyToValueTensorOps` first in `RewriteViewLikeSubgraph` in
maximize-value-semantics.
The current implementation of the `RewriteViewLikeSubgraph` pass in
maximize-value-semantics creates temporarily invalid IR. In
particular, given a forward slice starting from a
`CopyToNonValueTensorOp` and ending in `CopyToValueTensorOp`s, the
pass first replaces all uses of the `CopyToNonValueTensorOp` with
its operand, which results in all the `CopyToValueTensorOp` users
having their operand have type `!torch.vtensor`, which is invalid.
The correct way to do things is to first replace all the
`CopyToValueTensorOp`s with their operand, and then replace all uses
of the `CopyToNonValueTensorOp` with its operand.
This only started failing now because the generated accessor
`getOperand` for the `CopyToValueTensorOp` now returns a
`TypedValue<NonValueTensorType>`, which has an assert checking that
the value returned is of the expected type.
This commit changes the `InsertRngGlobalsPass` to `TorchConversionToMLProgram`
pass. This commit also adds the `MLProgramBufferize` pass for the
bufferization of ml_program dialect ops to run on refbackend.
Signed-Off By: Vivek Khandelwal<vivek@nod-labs.com>
Summary of changes:
- Replace call to `MemoryEffectOpInterface::hasNoEffect`
with `isMemoryEffectFree`.
- Make fix for the dynamic dims, since
`kDynamicSize` value changed to
`std::numeric_limits<int64_t>::min()` from `-1` in llvm
- `makeShapeLLVMCompatible` and `makeShapeTorchCompatible`
utilities convert shapes in order to remain consistent
with the Torch and MLIR semantics.
- Update tags
llvm: 147fe9de29dc13c14835127b35280c4d95c8e8ba
mhlo: 1944b5fa6062ec4c065d726c9c5d64f1487ee8c5
Signed-Off By: Vivek Khandelwal<vivek@nod-labs.com>
The current implementation sets the `nextSeed` value to `temp & 127`,
which is wrong. The last step of the LCG algorithm for the multiplier
and increment chosen should be `temp % 2^{64} = temp & (1 <<
63)`. However, because we are dealing with i64 values, the modulus
operation happens automatically, so it is not needed.
See Donald Knuth's values for LCG here:
https://en.wikipedia.org/wiki/Linear_congruential_generator
This commit fixes the aten.mean and aten.mean.dim op decomposition
for supporting large-sized inputs.
This commit also fixes the formatting for the file stats.py
Signed-Off By: Vivek Khandelwal<vivek@nod-labs.com>
Adds support to run `.mlir` LIT tests in bazel.
```
bazel test @torch-mlir//test/...
```
Follow-on PR will contain these updates:
- Add tests to GHA CI workflow
- Add `.py` LIT tests to bazel
* build: update llvm tag to 74fb770d
This commit makes the following changes needed to update bump LLVM:
+ replace usages of `tensor::createPadScalarOp`, see https://reviews.llvm.org/D136493
+ Update file checks
This commit removes almost all of the valsem ops, since the value
semantics version of the ops now exist in PyTorch. The only op missing
is `aten.bernoulli_.float`. In addition, this commit also simplifies
the implementation of `aten.fill.Scalar` by moving it to the pattern
that converts elementwise ops.
This commit makes the following changes needed to update bump LLVM:
- Replace `linalg.init_tensor` with `tensor.empty` (see:
https://reviews.llvm.org/D135129)
- Replace `NoSideEffect` with `Pure` (see
https://reviews.llvm.org/D135505)
- Replace `body` region accessor for `ReduceOp` and `ReduceWindowOp`
with `getBody`
- Fix incorrect use of `tosa::ReduceSumOp` in `AtenNativeLayerNormOp`
conversion pattern. The result type of `tosa::ReduceSumOp` must have
the same rank as the input type. (see:
https://www.mlplatform.org/tosa/tosa_spec.html#_reduce_sum)
Co-authored-by: Ashay Rane <ashay@users.noreply.github.com>
Co-authored-by: Ashay Rane <ashay@users.noreply.github.com>
This commit adds lowering of `aten.div.int` and `aten.bitwise_or.Tensor`
ops. Both these ops are required in order to support bloom_560m model.
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
* Fix c10::prim::Constant conversion; Added CAPI for passes; Added passes to base lazy backend
* Update ivalue_importer to use ImportOptions; Added tests for non-value/value tensor types
* Added tests for scalar Constant import; Updated MB::importFunction to use ImportOptions
* Test updates
* Move back module variable name
* Remove RefineTypes from TorchMlirLoweringContext::Build()
* Rename pass; Remove passes from base lazy backend
* Rename pass to VerifyBackendContractPass
* Aligned cmd pass name; Fixed TorchConversion passes registration
* test: allow spaces in path to Python executable
On Windows, the path to the Python binary may contain spaces, so this
patch adds quotes around the path to the python executable.
Thanks to @sstamenova for suggesting the fix!
* python: remove header file that causes Windows build failures
Similar to https://reviews.llvm.org/D125284, we can safely remove this
header file without affecting the build on either Linux. It is
necessary to remove this header file on Windows builds since otherwise
it causes build errors.
* python: drop `TORCH_API` from function defined in Torch-MLIR
`TORCH_API` should apply to functions that are either exported by
libtorch.so or ones that are imported from libtorch.so by its downstream
consumers (like Torch-MLIR). Neither case applies to the
`importJitFunctionAsFuncOp()` function, since it is defined in
Torch-MLIR (and thus outside libtorch.so). This patch fixes the problem
by dropping `TORCH_API` from that function's declaration.
* python: make output of class anotations deterministic
The `class-annotator-repr.py` test checks for class annotations in a
specific order, but prior to this patch, the order was
non-deterministic, since the code iterated on an _unordered_ map.
This patch makes the iteration order deterministic through two changes:
1. using a sorted map
2. using the class qualified name instead of the address of the class in
memory
* test: use Python3_EXECUTABLE as interpreter path for consistency
This ensures that tests use the Python3 version that was detected using
CMake, instead of whichever python version that happens to be in the
PATH variable when invoking the test.
* test: fix RUN string
The parenthesis syntax does not run on Windows (the shell interprets the
`(` character as part of the path). Moreover, the ODR violation in the
comment no longer seems to apply.
* python: port parallel test framework to Windows
Since Windows does not support `fork` natively, Python's
`multiprocessing` module needs to use `spawn` on Windows. However, to
use `spawn`, the multiprocessing module serializes (or pickles) the
worker function and its arguments. Sadly, the multiprocessing module
(both the default one in Python and the one that is extended in PyTorch)
is unable to serialize lambda functions (see
https://stackoverflow.com/a/19985580) for detals.
Unfortunately, given how our tests are structured, we require that the
function under test is passed as an argument to another function, so we
cannot sidestep our use of lambda functions.
To resolve this problem, this patch makes use of the `multiprocess` and
`dill` Python modules, which together offers a multiprocessing mechanism
that can serialize lambda functions. The multiprocess module also
offers a process pool, which simplifies the code for our parallel
testing framework.
It seems as though an upstream change in PyTorch has caused the module
dump to include not just the module being tested, but also several
seemingly unrelated functions in the `torch._decom.decompositions`
namespace. The presence of these new functions caused lit to match
variables against incorrect statements (i.e. statements in the
unrelated functions instead of the module under test).
This patch inserts `CHECK-LABEL` statements in the failing tests so that
lit ignores these unrelated functions and only checks the statements at
or after the test module definition.
- Update MHLO commit to build with LLVM commit hash 00d648bd
- Update TorchToMhlo code to work with Stablehlo
- Re-enabled two failing TOSA tests, thus resolving Github Issue #1231
We were already hitting many cases where backends different in terms of
the legal ops that they wanted. This caused unnecessary coupling between
the backends. Examples:
- https://github.com/llvm/torch-mlir/pull/1161
- https://github.com/llvm/torch-mlir/pull/862
This PR centralizes all compilation to go through `torch_mlir.compile`
so that we can keep the logic centralized there. We should move these
lists closer to each backend. Especially cases like
https://github.com/llvm/torch-mlir/pull/862 where blocking a
decomposition is necessary to avoid a crash emphasize that the set of
decompositions is tightly coupled to the backend, and should be
"controlled by the backend" and not something arbitrarily tweakable.
Also:
- Fix a small bug in the way we passed through the backendLegalOps
option.
- Add better error messages in `torch_mlir.compile` for import errors.
One of the simplifications made by the pass `RefinePublicReturn`
currently only happens if the tensor in question only has one
user. However, the current method of checking this does not correctly
handle the case of a user having multiple uses of the same
tensor. This commit makes sure only unique users are considered.
This is a first step towards formalizing the set of ops in our backend
contract. The goal is to eventually formalize `torch` dialect ops into 3
categories:
1. Legal in backend contract
2. Illegal in backend contract
3. Conditionally legal in backend contract
The "conditionally legal" set are the ops that we can optionally
decompose for backends.
This patch adds relevant pass options for this throughout the compiler,
in preparation for a new set of traits which will formalize this
classification.
This introduces a new pass LowerToBackendContract (better name very
welcome) which performs the bulk of the simplifications that we do,
such as
- shape refinement
- dtype refinement
- maximizing value semantics
- inlining global slots
- decomposing complex ops
The key difference from before is that it iterates the set of
transformations, which can help to break a number of "catch-22" issues
where one simplification depends on another, the latest example being
here:
https://github.com/llvm/torch-mlir/issues/1131
This also exposed that RefineTypes was sometimes crashing/asserting for
certain inputs. This commit hardens it a bit.
Summary of changes:
- Switch to C++17 (similar to https://reviews.llvm.org/D131348)
- Update MHLO to build with LLVM commit hash 061e0189
- Replace deprecated `hasValue()` and `getValue()` with `has_value()`
and `value()` respectively (https://reviews.llvm.org/D131349)
- Use `TypedAttr` (https://reviews.llvm.org/D130092)
- Use updated assembly format of `mhlo.compare` op (commit
d03ef01e70fbf9afd0fa1976fbb7ed31838929b3 in MHLO repo)
Rather than a per-global-slot initializer region, we now have one for
the whole module. For example, it might look like this:
```
torch.global_slot "private" @tensor : !torch.tensor
torch.global_slot "private" @list : !torch.list<tensor>
torch.global_slot.module_initializer {
%0 = torch.tensor.literal(dense<0.0> : tensor<f32>) : !torch.tensor
%1 = torch.prim.ListConstruct %0 : (!torch.tensor) -> !torch.list<tensor>
torch.initialize.global_slots [
@tensor(%0 : !torch.tensor)
@list(%1 : !torch.list<tensor>)
]
}
```
This new structure allows GlobalizeObjectGraph to create the initializer in a
much simpler way, avoiding the need to reason about whether different slots
alias each other. Reasoning about whether slots alias each other now is the
responsibility of InlineGlobalSlots, which has to do a much more complicated
analysis, implemented using MLIR's dataflow analysis framework.
Recommended review order:
- Check out the new IR constructs in the .mlir files of various passes
- Op definitions (*.td)
- Changes to GlobalizeObjectGraph pass.
- InlineGlobalSlots pass (~total rewrite)
- Misc changes:
- Moving torchMlirAdjustStaticInformation for sharing with C++ code.
- EraseModuleInitializer pass
To make this a bit nicer, it would be good to have a `torch.module` op
with an initializer region attached. That would be more invasive though.
This change has highlighted certain aspects of our project layering
which are worth calling out. None of our backends can handle global
slots, so we enforce that there are no global slots before backend
lowering. At an earlier stage in the project, we had aspirations of
transparently handling mutable global state and such, but for reasons
described below, that is no longer a goal. So really global slots should
be seen as a progressive lowering step as part of inlining all the
IValue's in the original program (GlobalizeObjectGraph is also one such
step).
Over time, with insights from work like IREE-JAX, it has become clear
that there isn't a reliable programming model we can compile for users
where we just transparently handle mutable global state (and some other
things, like lists and dictionaries). There is a need for an "outer
program" that orchestrates more restricted subroutines of the kind we
can handle in our compile flow here. The benefit of that is that it
decouples considerations like shapes, dtypes, etc. from the program
constructs used in the outer program. As long as the outer program can
efficiently invoke (pipelining/async/etc.) high-performance
data-parallel numerical subroutines of the kind we compile in our flow
here, then there is a complete programming model. This is also
consistent with the direction of upstream PyTorch which is becoming more
tracing-based (which inherently loses a lot of program structure, which
then has to be applied back with an "outer program" orchestrating the
traced subroutines).
follow up #761:
This patch updates the `torch_mlir::convertTensorToMlirElementsAttr()`
method to enable the creation of tensors whose base type is Float16.
This patch also adds a test to validate the IR generation, and it
updates the test for importing tensors of various types.
* [MHLO] Support for dynamic shape in basic op conversion by introducing CHLO dialect
Co-authored-by: Bairen Yi <yibairen.byron@bytedance.com>
Co-authored-by: Jiawei Wu <xremold@gmail.com>
Co-authored-by: Tianyou Guo <tianyou.gty@alibaba-inc.com>
Co-authored-by: Xu Yan <yancey.yx@alibaba-inc.com>
Co-authored-by: Ziheng Jiang <ziheng.jiang@bytedance.com>
* [MHLO] Support I32 as shape tensor dtype
* [NFC] Add a 'TODO' annotation
- Includes a canonicalizer for `aten.add.t`needed for successfully lowering the shape function
- Only offers support for statically sized index tensors when there is more than one
- Dynamic shape support remains for single indexing tensors
This commit adds verifiers to the ops `ToBuiltinTensorOp` and
`FromBuiltinTensorOp` that make sure that the input and output have
the same shape and data type.
This enables building Pytorch from source in the CI.
The build should mostly hit the ccache.
Release builds will follow once we have some runtime on the CI.
In the interest of merging upstream LLVM quickly, a previous patch
(7f08169) updated the torch-mlir build to register all dialects and
passes through Python bindings. This patch limits the dialects and
passes to only those that are used in torch-mlir.
Key to this change are the removal of
`MLIRPythonExtension.RegisterEverything` and the introduction of a new
Python module (`_mlir_libs/_site_initialize_0.py`), where we register
the dialects and passes used by torch-mlir.
This commit adds the decomposition for `aten.var.dim` op.
This commit also make changes in the decomposition for `aten.var` op.
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
This patch adds a new pass `torch-verify-conversion-to-value-semantics`,
which looks for non-value semantics tensors to catch such tensors early
during compilation.
This pass requires `torch-refine-public-return` pass to ensure that
return operations are updated to use value tensors, followed by the
canonicalize pass to remove any dead ops that may use or produce
non-value tensors.
Prior to this patch, the canonicalizers for `AtenSizeOp` and
`AtenSizeIntOp` succeeded only if the tensor operand's type information
included the size of the requested dimension(s). We can extend the set
of optimizable cases by propagating types across operations whose result
type matches the input tensor type.
Specifically, this patch enables the canonicalizers for `AtenSizeOp` and
`AtenSizeIntOp` to see past `tensor_static_info_cast`,
`copy.to_vtensor`, and `copy.to_tensor` ops until it reaches the first
op whose result type contains size information for the requested
dimensions, with a maximum bound of 6 parent lookups to avoid indefinite
compilation times. All other encountered ops cause the canonicalizer to
give up.
Prior to this patch, the code in the `torch-simplify-shape-calculations`
pass iterated on the uses of an op's result while also modifying the
value. This caused the iterator to get invalidated, thus terminating
the loop early and producing incorrect IR. This patch makes use of
`llvm::make_early_inc_range()` to ensure that the iterator is not
invalidated while executing the loop body.
This commit does three things:
1. Reverts some of the shape lib changes merged in
https://github.com/llvm/torch-mlir/pull/844
2. Updates the signature of `aten.sum_dim_IntList` that was recently
updated in
23bdb570cf
3. Replaces `aten.zero.functional` with `aten.zero`, updated in 960758b0b7
`aten.select_scatter` op.
This commit adds:
1. Lowering of `aten.slice_scatter` op into `tensor.insert_slice`
op.
2. Decomposes the `aten.select_scatter` op into `aten.slice_scater`
op.
Signed-Off-By: Prateek Gupta <gprateek93@gmail.com>
The canonicalizer converts `torch.prim.dtype` ops into integer constants
for valid types, but the type may not be known until type refinement is
complete. However, type refinement cannot make progress until
`torch.prim.dtype` ops have been resolved to their corresponding integer
constants, thus creating a circular dependency.
This patch creates a tight coupling between type refinement and the
lowering of `torch.prim.dtype` ops by handling such ops as they are
encountered during type refinement. The unit test in this patch aims to
check whether the type refinement pass can now handle chains of
operations that alternate between type construction and type refinement.
A prior patch (63538de2) that added support for bfloat16 type did not
add the canonicalization pattern to fold `torch.prim.dtype` operations
on bfloat16 tensors into the integer constant 15. This patch fixes the
problem.
TorchScript nodes like `prim::Load` and `prim::Store` aren't supported
in torch-mlir because they can't be lowered to backends, but such nodes
can occur in the TorchScript IR.
This patch adds a rudimentary translation from such nodes to
corresponding ops in the Torch dialect. Since we expected such nodes to
go away during lowering because of the SymbolDCE pass, this patch does
not add code to lower these ops beyond the Torch dialect.
In the `pyhpc_turbulent_kinetic_energy` TorchBench benchmark, the shape
calculation occurs inside loops, but because `DropShapeCalculationsPass`
does not explicitly mark the Torch dialect as legal, the pass execution
fails.
This patch adds Torch to the list of legal dialects, and adds a test to
validate the translation.
Prior to this patch, the torch dialect included `AtenTriuOp` for
computing the upper triangular part of the input matrix, but there was
no code for lowering the op to the linalg dialect.
This patch adds code to generate a `linalg.generic` operation that
compares indices (computed using `linalg.index`) to choose between zero
or the original value (using `arith.select`). The lowering fails if the
number of dimensions are less than two. This patch also adds a few
end-to-end tests.
* [MLIR][TORCH] Add folder for torch_c.from_i64 & torch_c.to_i64
* add unit tests for each individual fold
* fix failure of NumelZeroRankModule & TestMultipleTensorAndPrimitiveTypesReturn
This commit decomposes `aten.baddbmm` op into `aten.bmm`,
`aten.mul.Scalar`, and `aten.add.Tensor` op.
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>