We can map to `tensor.reshape` for handling multiple output dynamic
shapes. Later we can perform a more complex analysis for indentifying
expand/collapse cases from the tensor.reshape.
Initially we planned to handle this identification at the `torch` level
however it will be easier to handle once converted to core
mlir-dialects.
Decomposition RepeatInterleaveSelfInt with following ops:
```python
def my_repeat_interleave(input, repeats, dim=None):
if dim is None:
# Flatten the input and then repeat
return input.flatten().unsqueeze(-1).tile((1, repeats)).flatten()
else:
# Calculate the shape after repeat
expanded_shape = list(input.shape)
expanded_shape[dim] *= repeats
# Repeat the tensor along the specified dimension
repeat_shape = [1] * (input.dim() + 1)
repeat_shape[dim + 1] = repeats
input = input.unsqueeze(-1)
# Tile and then reshape
tiled = torch.tile(input, repeat_shape)
# Rearrange and reshape
repeated = tiled.reshape(*expanded_shape)
return repeated
```
I passed the tests of stablehlo and linalg. When testing onnx, strange
things happened.
In torch-mlir's CI **torch_nightly** and my own
environment(torch==2.4.0.dev20240318+cpu), it can **pass the pass**.
In torch-mlir's CI **torch_stable**, it **failed**.
The test case is `RepeatInterleaveSelfIntNoDimModule_basic`, the result
shape should be [120].
```python
class RepeatInterleaveSelfIntNoDimModule(torch.nn.Module):
def __init__(self):
super().__init__()
@export
@annotate_args([
None,
([3, 4, 5], torch.float32, True),
])
def forward(self, x):
return x.repeat_interleave(2)
@register_test_case(module_factory=lambda: RepeatInterleaveSelfIntNoDimModule())
def RepeatInterleaveSelfIntNoDimModule_basic(module, tu: TestUtils):
module.forward(tu.rand(3, 4, 5))
```
The error log is as follows:
```
Unexpected outcome summary: (onnx)
****** Failed tests - 1 tests
FAIL - "RepeatInterleaveSelfIntNoDimModule_basic"
@ trace item #0 - call to "forward"
@ output of call to "forward"
ERROR: shape (torch.Size([6, 4, 5])) is not equal to golden shape (torch.Size([120]))
```
@rsuderman
Would you please help me check what's wrong with my PR? Thanks a lot.
Align corner modes which select what the corners mean.
Either the center of the corner points or the edges of the edge points.
---------
Co-authored-by: Rob Suderman <rob.suderman@gmail.com>
The new cases added for quantized matmuls are:
1. vec-vec
2. vec-mat
3. mat-vec
each of which are now lowered to expand(s), quantized_matmul, and
collapse.
1. onnx.MatMulInteger now converts to aten.matmul instead of aten.mm
2. aten.matmul, for ranks >=2, now allows quantized inputs and will
lower to linalg::quantized_matmul or linalg::quantized_batch_matmul.
3. added AtenMatmulOp to the FuseQuantizeOps rewrite patters
QuantizeOperands, QuantizeTransposedOperands, and QuantizeAccumulator
4. added several tests, including some to test AtenMmOp with varying
quantization signed-ness.
5. a quantized matmul mat-vec test is added to verify the failure to
lower to linalg; cleaned of out-of-date code related to common
torch-mlir lowering xfails.
6. in debugging a real model with quantized matmuls, I found a bug on
the scalarize-shapes pass which resulted from the aten.full op folder
returning an incompatible result type. This is fixed by the small change
here to
[lib/Dialect/Torch/IR/TorchOps.cpp](https://github.com/llvm/torch-mlir/compare/main...zjgarvey:torch-mlir:MatMulIntegerFix?expand=1#diff-dc8ed165c207918e606490eee3984b1ad51d7034e6aac36fc046bf47f6f03f4f).
- Added linalg lowering for `AtenFloorDivideScalarOp`
- Needed `AtenDivScalarModeOp` for the decomp.
- Added linalg lowering for `AtenDivScalarModeOp`
- Moved linalg payload logic to `createDivModePayload()` since the logic
was nearly identical for both `AtenDivScalarModeOp` and
`AtenDivTensorModeOp`. Just a template function
- Added `AtenDivScalarModeOp` lowering for stablehlo
Pytorch's
[`torch.floor_divide()`](https://pytorch.org/docs/stable/generated/torch.floor_divide.html)
in a previous version (for a reason unknown to me) preformed a
truncation instead of "floor". The already implemented op
`AtenFloorDivideTensorOp` was done before this change. However, this
wasn't caught because our testcases only tested positive floor division.
I changed this to floor as well as adding a few test cases.
If there is only a single value scattered there can be an implicit batch
dimension. This includes a check for the implicit batch dimension when
reshaping the update tensor. It includes an e2e test to verify
correctness.
1. Changes the linalg lowering for dequantization ops to always sign
cast to float to prevent misrepresenting uint32 overflow on subtraction
with zero point.
2. Adds a basic quantized model test which only quantizes and
dequantizes and now passes with these changes in linalg and onnx
configs.
3. Changes the aten.mm lowering to allow mismatched quantized types.
4. If a quantized matmul arg is uint8, we shift by 128 to faithfully
represent the quantization as a signed i8 quantization. This worked fine
in the AtenMmOp lowering, but I'd be happy to move it to a rewrite in
FuseQuantizedOps.cpp instead if that seems more appropriate.
With the changes 3 and 4, the QuantizedMLP_basic and
QuantizedSingleLayer_basic e2e tests now passes with the onnx config.
…ute_reshape_shape
as that `aten.view` support at most one `-1` in dim list. The original
calculation of `numel` is wrong when there is a `-1` in dim list.
This PR only performs a lit test. In lieu of an e2e test, https://github.com/nod-ai/SHARK-TestSuite/pull/142 makede sure that the lowering works & the numbers check out.
Co-authored-by: Xida Ren <xida.ren.dev@gmail.com>
Squeezes can be ambiguous without the output shape information. For
instance (1, 1, 256) squeezed can be either (1, 256) or (256). We need
to check the resulting shape to know what the shape should look like.
Shapes can be processed as tensors to represent the set of dimensions.
As reshapes take a list of scalars this can result in a single dynamic
dimension blocking the adjacent static dimensions.
This pass attempts to de-couple tensor computations related to shapes
and propagate values to better support lowering scalar tensor
computations.
This commit also cleans up the OnnxToTorch lowering for the ReduceMean
op and adds the support for handling edge cases.
Signed-Off By: Vivek Khandelwal vivekkhandelwal1424@gmail.com
The `convertTensorToElementType` function expects it's argument to have
a valid tensor type that is not `Torch::NoneType`. This PR checks that
the bias tensor is not of type `Torch::NoneType` before calling
`convertTensorToElementType` on the bias tensor argument in the
`matchAndRewrite` member function of the `ConvertAtenConvolutionOp`
class.
See the related issues here:
[SHARK-Turbine#556](https://github.com/nod-ai/SHARK-Turbine/issues/556)
1. Adds uint8 casting to onnx.Cast op
2. Fixes an issue with onnx.DequantizeLinear when the scale comes with
shape [1].
3. Adds support for unsigned types in an AtenItemOp folder
4. Adds a simpler quantized model for easier debugging
5. Adds a fusion pass to convert [quant -> dequant -> transpose -> mm]
patterns to [transpose -> quant -> mm].
6. Moved some xfails that are still not passing, but for different
reasons than onnx.cast failures.
Now there no lowing for `aten.Int.bool` in `convert-torch-to-arith`
pass. this PR add this support.
Below is the UT.
```
func.func @torch.aten.Int.bool(%arg0: !torch.bool) -> !torch.int {
%0 = torch.aten.Int.bool %arg0 : !torch.bool -> !torch.int
return %0 : !torch.int
}
```
Two e2e tests (AdaptiveAveragePool1/2dUnitOutputSizeDynamic) were
failing due to numerics. This was as a result of passing -1 as the
kernel size in the lowering for the corresponding onnx op
GlobalAveragePool.
Reshaping tensors depend on directly matching individual dimensions to
their corresponding dim in the `torch.view` reshape dimensions. This
involves decoupling dynamic dimensions from their static counterparts
and support cleanup / canonicalization.
This commit adds the OnnxToTorch lowering for the Mish, Softplus,
HardSwish, Trilu, ThresholdedRelu op
Signed-Off By: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>
The previous conversions for AtenAdaptiveAvgPool1dOp and
AtenAdaptiveMaxPool2dOp are refactored into a general templated
conversion that works for all of the AtenAdaptive...PoolNdOp's.
New support is added for the following ops:
1. AtenAdaptiveMaxPool1d
2. AtenAdaptiveMaxPool3d
3. AtenAdaptiveAvgPool3d
Support is also provided for passing inputs without batch dimensions.
For example, applying adaptive_avg_pool2d to an input tensor of rank 3.
After [pytorch #118162](https://github.com/pytorch/pytorch/pull/118162)
gets down to torch-mlir, I'll add a test for AdaptiveMaxPool1d with
return_indices (which will pass with that upstream fix).
---------
Co-authored-by: James Newling <james.newling@gmail.com>
This adds support for converting DynamicQuantizeLinear from torch-onnx
to torch.
I could not get an e2e test to pass, since there seems to be some issues
with uint8 casting somewhere lower in the pipeline. For example
compiling with IREE for llvm-cpu, I would get either the correct zero
point (if zp < 128) or the correct zero-point minus 256 (if zp >= 128).
The output tensor seems to always return a tensor of zeros, which also
occurs when running uint8 examples through QuantizeLinear.
Edit: the first problem can be resolved by casting the output back to
uint8 on output, the second problem is resolved with PR #3018
Added support for dynamic shapes in `flattenusingints` op in tosa
dialect. Due to this some Argmax tests pass
This PR fixes this issue https://github.com/llvm/torch-mlir/issues/3004
The following tests pass after this PR
```
1. "ArgmaxIntModule_basic"
2. "ArgmaxIntModule_multiple_maxs"
3. "ArgmaxModule_basic"
```
The only difference between version 7 and newer versions is support for
different data types. We should allow this pattern to match as early as
7. Earlier versions have a more manual broadcast specification through
attributes, so I did not include those versions.
See: [onnx.Div
docs](https://onnx.ai/onnx/operators/onnx__Div.html#l-onnx-doc-divl)
Reduce mean lowerings did not succesfully lower to `linalg` via torched.
There were two separate paths that could be consolidated to a single
simpler pass. This resulted in a significant improvement in test
coverage.
If the broadcast shape is length-1 at a dim while `?` in the input dim
then we need to broadcast to the dynamic dim. This is equivalent to
taking a max of two dimensions.
This folds small version of the tensor-scalar comparison operators as
they are commonly used for shape computations. This includes le, lt, ge,
gt, eq, and ne.
The current padding operation was not functional for dynamic shapes.
Updated and enabled tests so that onnx.pad tests pass.
Work TBD for reflection padding.
We can support `onnx.Size` by requesing the size of each dimensions and
taking the product of the results, then packing it into a tensor.
---------
Co-authored-by: Scott Todd <scott.todd0@gmail.com>
This mostly copy-pastes the reduce minimum implementation to reduce max
to improve test coverage. We also improve the aten lowering for min/max
dim for unsigned types.
The addition of an e2e test is actually provided in the Shark-Testsuite.
This adds 2 test cases for the gridsampler e2e test.
Also as intended there were some items found which needed correction, so
the Gridsampler op has also a change.
`getRawBuffer` expects a densely packed vector of `i1` values however
`onnx` does not densely pack the values. Include code to handle the
packing / unpacking.
Existing lowering via aten.view does not work as well for dynamic shapes
as the lowering to tensor.expand must re-infer dynamic shape matching.
Better to directly lower.
This is the lowering of gridsampler from onnx to torch using our prior
implementation of AtenGridSamplerOp.
Here are several checks for cornercases implemented. We may decide to
have part of these checks in AtenGridSamplerOp instead of the onnx
lowering portion.
Finish supporting importing the vast majority of `onnx` operations. This
includes:
- region support
- region value inherentance
- `torch.string` support
- `torch.list` support
- `torch.optional` support
A bunch of small fixes are interlinked and trigger crashes if not
addressed as a group. This includes:
- aten view when expand from a rank-0 tensor
- slice folder with negative indices
- `aten._shape_as_tensor` folder on a rank-0 tensor
- `aten.cat` of a tensor with a length-0 tensor
The corrective transpose at the end is computed incorrectly. Is it
actually computin the inverse transpose. Inverting the permutations
fixes the issue.
Torch lowering only supported the most recent version. Refactored the
lowering so more easily handle default values and optional operands /
attributes.
Added Support for float dtype in in torch.arange in TOSA Dialect
This resolves the following issue :-
https://github.com/llvm/torch-mlir/issues/2762
The following test cases are passing after this change
1. ArangeDtypeIntModule_basic
2. ArangeFloatModule_basic
3. ArangeNegativeStartFloatModule_basic
4. ArangeStartFloatModule_basic
5. ArangeStartNegativeStepFloatModule_basic
6. ArangeStartOutDtypeModule_basic
7. ArangeStartStepFloatModule_basic
---------
Co-authored-by: James Newling <james.newling@gmail.com>
We collapsed and broadcasted scatter indices to a single element
version. We should instead upport `tm_tensor.scatter`s support for
multiple indices and the implicitly broadcasted behavior. This avoids
the serialization and materializing a needlessly large indices tensor.
There is no reason to treat `ConstantOfShape` as a specialized import
any as there exists a onnx-to-torch equivalent. Dropping the import
coding and adding support for resource conversion substantially
increases test coverage for dynamically shaped tests.
Strided slicing can occur with a negative stride. In these cases we need
to bound end differently. This included removing a function that was
generating bad limits.
According to the [official TOSA
spec](https://www.mlplatform.org/tosa/tosa_spec.html#_cast), `tosa.cast`
allows a cast from `fp32` to `fp16`. We were not previously accounting
for this in the `TorchToTosa` lowering.
Also did a tiny bit of cleanup in the code to make it easier to spot
which conversions are currently allowed.
---------
Co-authored-by: Srinath Avadhanula <srinath.avadhanula@getcruise.com>
Onnx slice lowering used arange needlessly instead of directly
constructing the constant dimension values. This makes lowerings to
linalg struggle as multiple folders are required to get what is a
constant index value.
We can route the torch tests via `onnx` using the `torch.onnx.export`
tooling. We can then reimport, lower to torch, and compile to linalg to
validate the onnx path is working correctly.
The current implementation exposes some failures in the `onnx` path so
we cannot enable the onnx test suite yet due to segmentation faults.
This commit adds the OnnxToTorch lowering for cosh, acosh, asin, asinh,
and atanh op.
This commit also adds the TorchToLinalg lowering for acosh, asin, asinh,
and atanh op.
Signed-Off By: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>
Some operations include a backend matcher for specialized operations. We
map these back to generics so they appropriately match to the high
performance versions. This is done for the attention operation.
This commit adds the OnnxToTorch support for Mean, IsInf, IsNaN, and
PRelu ops. All high priority ops were taken so went with these. The non
trivial ones are Mean and IsInf which might require extra review
---------
Co-authored-by: MaheshRavishankar <mravisha@amd.com>
By updating convertScalarToDtype invocation pass original source and
destination datatypes for the add op. Also fixes a potential problem
with the sub op.
---------
Co-authored-by: Xida Ren <xida.ren.dev@gmail.com>