The saga of aligning onnx and torch padding conventions continues.
```python
onnx_pads = [low_x, low_y, low_z, high_x, high_y, high_z]
torch_pads = [low_z, high_z, low_y, high_y, low_x, high_x]
```
So not only is the lexicographical ordering hierarchy swapped (low/high
x spatial-dim -> spatial-dim x low/high) but the ordering in the the
spatial-dim specification is also reversed.
This patch properly reverses the pad ordering (and actually uses the
`shuffledPadding` to pad).
`onnx.Shape` can select only a subset of indices using attributes. Add
support for these attributes.
---------
Co-authored-by: zjgarvey <47986913+zjgarvey@users.noreply.github.com>
Following up from the discussion in
<https://github.com/llvm/torch-mlir/pull/3550>, I've edited the lowering
to prevent OOB extracts in a more direct fashion (i.e., just clamping
directly).
I don't think this affects the lit tests at all, but I've tested the
changes in our external test suite at
<https://github.com/nod-ai/SHARK-TestSuite/tree/main/>. I found the
issue when I was unexpectedly getting `nan`'s along the output image
border for a resize test there.
Change linalg.matmul_unsigned to linalg.matmul with unsigned type_fn
Signed-off-by: Max Dawkins <max.dawkins@gmail.com>
Co-authored-by: Max Dawkins <max.dawkins@gmail.com>
There were two issues related to `ignore_index` being set
(1) the onnx-to-linalg pass as not reading the value correctly (2) the
mean pass was not considering the `ignore_index` value
For (2) when taking the mean we need to know how many of the values were
considered in the sum and therefore we cannot divide by the total number
of elements. Adding a summation across the total number should correct
this issue.
The static uneven divisible AdaptiveAvgPool2d means that although the
input size is not an integer multiple of ouput size, but the kernel and
stride size can also be fixed (not dynamic). The derivation logic of
kernel and stride size is consistent with
torch/_decomp/decomposations.py:adaptive_avg_pool2d as described in the
following:
1. Stride Size
Firstly , derive the start index in each reduce operation according to
the output size (`n`), `start_index = ([0, 1, ..., n - 1] * input_size)
// output_size`. For each index `k`, if `k * (input_size % output_size)
< output_size`, then the current and previous stride keeps the same as
`input_size // output_size`. So suppose `(n-1) * (input_size %
output_size) < output_size`, the stride in the whole AdaptiveAvgPool2d
process keeps static, as `input_size // output_size`.
2. Kernel Size
torch/_decomp/decomposations.py:adaptive_avg_pool2d calculates a static
kernel size when the input/output sizes satisfy either of the two
conditions, `input_size % output_size == 0` or `output_size %
(input_size % output_size) == 0`. Here if `input_size % output_size ==
0`, then the kernel size equals `input_size // output_size`, otherwise
`input_size // output_size + 1.`
Torch has all scalars represented as i64 and f64 types which results in
extraneous trunc-extf commands. We can rework this by elliding
widen-narrow cases away.
Current StableHlo lowering strategy works well when `src` tensor's rank
is no bigger than `dst` tensor's. The new patch make it succeed in other
cases. The following is an example.
```
%190 = torch.prim.ListConstruct %arg4 : (!torch.vtensor<[1,1024],si64>) -> !torch.list<vtensor>
%191 = torch.aten.index_put.hacked_twin %189, %190, %186, %true : !torch.vtensor<[1024,768],f32>, !torch.list<vtensor>, !torch.vtensor<[1,1024,768],f32>, !torch.bool -> !torch.vtensor<[1024,768],f32>
```
- Adds support for lowering depthwise + quantized convolution ops to
linalg::DepthwiseConv2DNhwcHwcQOp
- Changed the variable name for groupSize (which is really C/G) to the
more appropriate numGroups (G).
- Discovered in e2e testing that linalg does not accept (Cin = groups &&
Cout = K*groups for K>1) as a "depthwise" conv, so this also updates the
case-checking to reflect this issue.
Pytorch and ONNX apparently round to nearest, ties go to nearest even,
but we were using `math::round` for the torch-to-linalg conversion of
`quantize_per_tensor`, which rounds away from zero on ties.
This PR adds a conversion in the TorchOnnxToTorch pass for the ONNX
Multinomial operation. It also adds a TorchToLinalg lowering for the
`aten.Multinomial` op and does a light refactor of some repeated code
that generates random floating point numbers in
`TorchToLinalg/Random.cpp`.
This patch adds a few misc pad op related changes:
1. Addresses issue <https://github.com/llvm/torch-mlir/issues/3457>
2. Addresses issue <https://github.com/llvm/torch-mlir/issues/3442>
3. Fixes the padding order for asymmetrically padded onnx.Conv ops
4. Enables passing quantization through those onnx.Conv op pre-paddings
5. Modifies the torch-to-linalg lowering of AtenReplicationPad2d op to
enable support for input rank != 4
Unfortunately, even with all of these changes, the e2e tests for the
ReplicationPad2d still fail the onnx config, since the torch export
procedure for rearranging the pad order is complicated enough that the
padding ints end up not being able to fold back to constants.
The LpNormalization lowering was previously just computing the norm,
which is incorrect. This computes the norm then divides the input tensor
by it's norm.
I've tested this against some simple onnx models locally. I'll look into
adding a test case for this in an external test suite.
Register `aten.fake_quantize_per_channel_affine` and
`aten.fake_quantize_per_tensor_affine.tensor_qparams` ops
---------
Co-authored-by: Ze Zhang <ze.zhang@getcruise.com>
Fix the pad tensor rearrangement such that we change the representation
from [x1_begin, x2_begin, ..., x1_end, x2_end,...] to [xn_begin, xn_end,
...., x2_begin, x2_end, x1_begin, x1_end] where x1, x2 .. xn are the
dimensions of the pads tensor argument.
---------
Co-authored-by: zjgarvey <zjgarvey@gmail.com>
Co-authored-by: zjgarvey <47986913+zjgarvey@users.noreply.github.com>
* use lhs tensor's element type as compute type when rhs is scalar.
* previously `a != 1.0`(a is a fp32 tensor) will lowering to `%6 =
stablehlo.compare EQ, %4, %5, FLOAT : (tensor<2x5xf64>, tensor<2x5xf64>)
-> tensor<2x5xi1>`
* now it will lowering to `%6 = stablehlo.compare EQ, %4, %5, FLOAT :
(tensor<2x5xf32>, tensor<2x5xf32>) -> tensor<2x5xi1>`
Addresses an issue with onnx.Gather lowering to linalg:
<https://github.com/nod-ai/SHARK-Turbine/issues/242>
The builder for tensor.expand_shape, without an explicitly provided
output shape, fails to infer an output shape in the case of multiple
dynamic reassociation dims. I tried adding the output shape explicitly
for tensor.expand_shape, but ran into compilation issues later on (see
<https://github.com/iree-org/iree/issues/17760>).
This PR adds support by lowering this op to tensor.reshape when multiple
dynamic reassociation dims are provided.
Due to the custom operation parser, the print and parser were expecting
two different forms.
One having the dictionary before the value and the other after.
Following the format of the other constants ops, the constant.int will
follow the `value attr-dict` format. Updated the parser accordingly.
This bump triggered an upstream assert. Includes a WAR for #3506.
Also includes several things I needed to do to repro:
* When TORCH_MLIR_TEST_CONCURRENCY=1, test runs will be printed.
* Added TORCH_MLIR_TEST_VERBOSE=1 handling to enable verbose mode
(useful on CI).
---------
Co-authored-by: Stella Laurenzo <stellaraccident@gmail.com>
- Adds limited support for lowering onnx.Loop to primLoopOp
- lower in the pipeline`torch-to-scf` there is a check to see if loop is
for like. A primLoopOp is for like when the input condition is a
`trueBoolConstant`. To adapt the onnx to torch lowering to take
advantage of it, the implementation checks for specific op patterns in
the loodBody region and decides if loop is for like and uses the right
input condition op.
- to adapt the onnxLoopBody to torchLoopBody, we need to adapt the input
block arguments and set the correct output condition variable in the
loop body.
- scanOutput variables are currently not supported.
Before this PR, a statically shaped aten.convolution would generate
dynamically shaped linalg IR, and even `-canonicalize` would not be able
to fold it back into static shapes. This PR ensure that shape
calculations are folded on construction to directly generate statically
shaped linalg IR.
We achieve that by ensuring that `arith` ops involved in computing
shapes are created via `createOrFold`, so that later uses of
`getAsOpFoldResult` see constants instead of those ops.
For example
```
module {
func.func @forward(%arg0: !torch.vtensor<[32,336,112,112],f32>,
%arg1: !torch.vtensor<[336,168,3,3],f32>,
%arg2: !torch.vtensor<[336],f32>)
-> !torch.vtensor<[32,336,56,56],f32> {
%false = torch.constant.bool false
%int2 = torch.constant.int 2
%int1 = torch.constant.int 1
%0 = torch.prim.ListConstruct %int1, %int1 : (!torch.int, !torch.int) -> !torch.list<int>
%1 = torch.prim.ListConstruct %int2, %int2 : (!torch.int, !torch.int) -> !torch.list<int>
%2 = torch.prim.ListConstruct : () -> !torch.list<int>
%3 = torch.aten.convolution %arg0, %arg1, %arg2, %1, %0, %0, %false, %2, %int2
: !torch.vtensor<[32,336,112,112],f32>, !torch.vtensor<[336,168,3,3],f32>, !torch.vtensor<[336],f32>, !torch.list<int>,
!torch.list<int>, !torch.list<int>, !torch.bool, !torch.list<int>, !torch.int
-> !torch.vtensor<[32,336,56,56],f32>
return %3 : !torch.vtensor<[32,336,56,56],f32>
}
}
```
would result in
```
[...]
%padded = tensor.pad %2 low[%14, %15, %16, %17] high[%14, %15, %16, %17] {
^bb0(%arg3: index, %arg4: index, %arg5: index, %arg6: index):
tensor.yield %cst : f32
} : tensor<32x336x112x112xf32> to tensor<?x?x?x?xf32>
[...]
%45 = linalg.conv_2d_ngchw_gfchw {dilations = dense<1> : vector<2xi64>, strides = dense<2> : vector<2xi64>}
ins(%expanded, %expanded_37 : tensor<?x2x?x?x?xf32>, tensor<2x168x168x3x3xf32>)
outs(%expanded_44 : tensor<32x2x168x?x?xf32>) -> tensor<32x2x168x?x?xf32>
[...]
```
and with this PR all shapes are static.
The `index_put` operation, `input[indices] = values`, allows for the
values to be any shape that is broadcastable to the slice
`input[indices]`. This commit adds broadcasting support to the Linalg
lowering of `IndexPutHackedTwinOp`.
Fixes: #3465
This adds support for a few ops:
- torch.linalg_det
- torch._linalg_det (if the LU and pivot returns are unused)
- onnx.Det
An scf loop is used, since the row reduction algorithm applied here has
some loop-carried dependencies.
The current support being added here is very basic, and only works if no
permutations are required during row reduction, and assumes the matrices
are non-singular.
This adds a torchvision op to torch-mlir and a path from onnx.DeformConv
to torchvision.deform_conv2d.
I'm not implementing the torch->linalg lowering for the torchvision op
yet, but posting this PR to get feedback on some of the choices being
made here and to flesh out the onnx frontend a bit.
This adds an onnx->torch conversion for onnx.RoiAlign into
torchvision.roi_align or torchvision.roi_pool, and adds those two
torchvision ops to torch-mlir.
Add a new op with shape/dtypes and decompose into
`fake_quantize_per_tensor_affine` when the second result is unused.
The xfail_set change is on ONNX because torch cannot export this op to
ONNX.
1. truncates zero-points to i32
2. modifies the default accumulator type for i8 from i64 to i32.
3. now uses the input dtype to infer accumulator dtype.
This implements the Onnx.NegativeLogLikelihoodLoss op using the
signature provided
[here](https://onnx.ai/onnx/operators/onnx__NegativeLogLikelihoodLoss.html)
by replacing it with a `NLLLossForward` op.
Additionally, I included a helper function `get_loss_reduction_enum` to
convert from a string `reduction` parameter to the corresponding
intended integer value since this is an operation that will be reused
for any loss function module. This differs from `get_reduction_enum` in
`TorchUpstream.cpp` which handles the `reduce` parameter from
`scatter_reduce` type operations.
Issues was found here https://github.com/nod-ai/SHARK-Turbine/issues/643
- [ONNX] Fix padding attributes for onnx.AveragePool
- [Linalg] Add countIncludePad false support for AtenAvgPool1/2dOp
- [Linalg] Add an avg_pool2d countIncludePad False e2e tests
- [Linalg] Fix conflict with AtenAvgPool3dOp
- [Linalg] Fix e2e crash with AtenAvgPool1dOp
- [Linalg] Add dynamic dim support for AtenAvgPool2dOp
- [Linalg] Fix AvgPool2dDivisorOverrideModule crash
There is currently no int16 quantization support in torch. This patch
adds a new mlir type to correspond to the missing "torch.qint16" type,
and enables lowering of quantization-related onnx ops using int16 types.
In follow-up patches, custom quantization logic for ops like
aten.matmul/aten.mm/aten.convolution may need to be revisited to allow
support for qint16. The passes in FuseQuantizedOps.cpp may also need
slight modifications.
This commit adds the lowering for SequenceAt, SequenceEmpty,
SequenceInsert, SequenceErase op
Signed-Off By: Vivek Khandelwal<vivekkhandelwal1424@gmail.com>
Supports asymmetric padding by performing a torch.nn.functional.pad on
the input before performing the convolution.
Signed-off-by: Suraj Sudhir <suraj.sudhir@arm.com>
This commit also adds the Torch declaration for aten.max_unpool2d and
aten.max_unpool3d op. The TorchToLinalg lowering for the same will be
added in a follow-up commit.
Signed-Off By: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>
This patch adds two `memref` passes to `torch-mlir-opt`, which already
occur in the pass pipeline
`torch-backend-to-linalg-on-tensors-backend-pipeline`. Additionally,
necessary op interface external models are included to address issue
#3352.
This addresses 7 of the model failures I'm seeing in the test suite. See
[Shark-Turbine issue
#566](https://github.com/nod-ai/SHARK-Turbine/issues/566).
Need the op ```linalg.conv_2d_ngchw_gfchw_q``` to be added upstream
before merging this. See [llvm-project PR #92136
](https://github.com/llvm/llvm-project/pull/92136).
A small additional expansion to operand quantization is included in this
patch to address a model failure that occurs when unblocking the
quantized group convolutions in one of these onnx models.
Updates:
- some unsupported modes are now going to report a match failure for
unsupported coordinate transformation modes.
- fixes a bug that was introduced in the last patch for resize (my
bad...)
- uses actual x and y coordinates for computing weights in bilinear
interpolation (rather than eps modified values)
- slightly simplifies the bilinear interpolation payload for readability
and performance
- passes coordinate transformation mode information from an onnx.Resize
op to the mode string for the aten._interpolate op. This allows us to
perform custom logic in the torch->linalg lowering to support
onnx.Resize options without losing the default behaviors of the
interpolate op.
This PR fixes the bugs for `Torch::AtenOneHotOp` by:
1) Using `Torch::kUnknownSize` as the default value for `numClasses` in
the pattern matching stage in `DecomposeAtenOneHotOp`
2) Adding `AtenIntScalarOp` to the patterns in `TorchToArith`
3) Handling both `int` and `float` types for `off` and `on` values in
`TorchOnnxToTorch` conversion
It also includes:
1) A new test in `TorchToArith/basic.mlir`, for `torch.aten.Int.Scalar`,
and
2) A new test in `decompose-complex-ops.mlir`, for `torch.aten.one_hot`
**Dependencies**
This PR is dependent on #3334.
I am trying to eliminate 'getWithLeastStaticInformation' in
DecomposeAtenTriuOp. Could you provide me with some suggestions?
@qingyunqu @zjgarvey
See issue https://github.com/llvm/torch-mlir/issues/3312
This commit fixes the bugs for the `onnx.OneHot` operator by:
1) Converting negative indices to non-negative indices
2) Handling both `int` and `float` types for `off` and `on` values
3) Using the correct result type
It also includes a new unit test.
Discord Thread:
https://discord.com/channels/636084430946959380/1238330633328005243
## Context:
[This](https://github.com/llvm/torch-mlir/blob/main/python/torch_mlir/fx.py#L61)
was updated to support e2e tests for the TorchDynamo frontend in
Torch-MLIR, where we run FX decompositions and import the FX IR to
generate Torch dialect, followed by
`torch-function-to-torch-backend-pipeline`, skipping only the shape/type
refinement for now. However, we should be able to skip many of the torch
simplification passes, as depicted in the [frontend
roadmap](https://github.com/llvm/torch-mlir/blob/main/docs/images/roadmap_frontend.png).
Based on IREE's TorchDynamo
[pipeline](https://github.com/iree-org/iree/blob/main/compiler/plugins/input/Torch/InputConversion/Passes.cpp#L29),
the only two passes we seem to require are: `ReduceOpVariantsPass` and
`DecomposeComplexOpsPass`. This is inline with our findings as well
based on initial exploration.
This PR creates a dedicated frontend simplification pipeline for
TorchDynamo / FX Importer which calls only `ReduceOpVariantsPass` and
`DecomposeComplexOpsPass`. We rely on the e2e fx_importer tests to
ensure we're not regressing by removing many of the passes that were
historically needed for TorchScript.
One notable change here is that we do not call the
`LowerToBackendContractPass` anymore, which used to call
`TorchSimplificationPipeline` iteratively until VerifyBackendContract
was clean. Some of this was required for the shape/type refinement to
converge, which seems a non-issue for Dynamo frontend. Do we anticipate
this (the iterative invocation of TorchSimplificationPipeline followed
by VerifyBackendContract) to be worth retaining in the Dynamo frontend
pipeline? If so, I can make those changes, PLMK.
This commit fixes the onnx.MaxPool op lowering which was lacking the
indices result support.
Signed-Off By: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>
The old lowering only had logic for 2d (i.e. images). this patch allows
interpolation for n spatial dims, which is required for some 3d vision
models such as
- onnx/models/pytorch-3dunet_vaiq_int8
which successfully compiles and runs with this patch.
* not to decompose `aten.amax` on `stablehlo` backend. Because it could
be lowering to `stablehlo.reduce` directly.
* lowering `aten.max.dim` to `stablehlo.reduce apply max` when
`AtenMaxDimOp.getIndices()` doesn't have users. It's more simple.
In constant folding progress, a new constant op will be created
according to the origin op's result type.
See the code in TorchDialect.cpp.
```cpp
Operation *TorchDialect::materializeConstant(OpBuilder &builder,
Attribute value, Type type,
Location loc) {
if (auto integerType = dyn_cast<Torch::IntType>(type))
return builder.create<Torch::ConstantIntOp>(loc, cast<IntegerAttr>(value));
if (auto floatType = dyn_cast<Torch::FloatType>(type))
return builder.create<Torch::ConstantFloatOp>(loc, cast<FloatAttr>(value));
if (auto numberType = dyn_cast<Torch::NumberType>(type)) {
if (auto floatValue = dyn_cast<mlir::FloatAttr>(value)) {
return builder.create<Torch::ConstantNumberOp>(loc, floatValue);
} else if (auto intValue = dyn_cast<mlir::IntegerAttr>(value)) {
return builder.create<Torch::ConstantNumberOp>(loc, intValue);
}
}
if (isa<Torch::BoolType>(type)) {
return builder.create<Torch::ConstantBoolOp>(loc, cast<IntegerAttr>(value));
}
if (isa<Torch::NoneType>(type))
return builder.create<ConstantNoneOp>(loc);
if (auto stringAttr = dyn_cast<StringAttr>(value))
return builder.create<ConstantStrOp>(loc, stringAttr);
if (auto elementsAttr = dyn_cast<ElementsAttr>(value)) {
// Only !torch.vtensor can be constant folded. !torch.tensor has
// non-trivial aliasing semantics which prevent deduplicating it.
assert(isa<ValueTensorType>(type) && "should be a vtensor type!");
return builder.create<ValueTensorLiteralOp>(loc, elementsAttr);
}
return nullptr;
}
```
So when the op has a tensor result type, it must be "ValueTensorType"
due to the **assert** statement. However, many fold methods in
TorchOps.cpp only have a judgment of "BaseTensorType".
As mentioned in issue #3290 , the difference between onnx.Transpose in
versions 1 and 13 is minimal, and therefore should be supported with the
same conversion pattern.
After running the model tests in SHARK-TestSuite, I noticed a few model
failures due to half-fusion.
Notably, RDN_pytorch_vaiq_int8 had a depth=5 convolution chain with
multiple AtenViewOp's.
This change enables more customization with operand quantization, and
generalizes the patterns QuantizeOperands and QuantizeTransposeOperands
to QuantizeOperandsPastCommutingOps.
This allows for passing quantization through operations which are
functionally unaffected by quantization, such as view-like ops. The
purpose of this change is to address a myriad of quantization issues
seen in quantized onnx models that have some reshape-like operations
sandwiched in between a dequant and something like a matmul (whose other
operand is immediately quantizable).
* Enables assume_strict_symbolic_shapes on fx_importer imported
programs, indicating strict shape semantics.
* Reworks the view->reshape lowering to take advantage of strict mode
and do one of:
* Collapse to 0D
* Flatten/Unflatten when there is an inferred dim.
* Fallback to tensor.reshape
* Splits some test cases up and adds an attribute to control the old
pattern (so new corners can be tested in strict mode in isolation).
* Dynamic inferred mode needs upstream work to generalize expand_shape
(so that case is suppressed here).
* Deletes the assert from the existing tensor.reshape lowering if strict
shape mode is enabled (since the condition it is dynamically asserting
cannot happen).
This commit adds the OnnxToTorch support for ReduceSumSquare ops.
---------
Co-authored-by: Ubuntu <archana@archana-cpu.judsoscro3wupi0qm4bjlj5m3b.bx.internal.cloudapp.net>
While playing with TorchDynamo on ResNet18. I notice following issues:
- `prims.convert_element_type` can’t be canonicalized even if the input
and the output share the same type
- `aten.max_pool2d_with_indices` is always used instead of
`aten.max_pool2d`, even if the second returned output (indices) has no
user
This PR fixes above issues by adding a folder to the
PrimsConvertElementTypeOp and a canonicalizer to the
AtenMaxPool2dWithIndicesOp
Lit test:
`cmake --build build --target check-torch-mlir-all`
---------
Co-authored-by: Ze Zhang <ze.zhang@getcruise.com>
This is probably a decent PR for learning about blocks and regions.
If you're here to learn about that, consider also looking at
lib/Conversion/TorchToSCF/TorchToSCF.cpp
While this doesn't include an e2e test, it is tested downstream in
https://github.com/nod-ai/SHARK-TestSuite/blob/main/e2eshark/onnx/operators/If/model.py
---------
Co-authored-by: Xida Ren <xida.ren.dev@gmail.com>
I spent a little while debugging numerics issues with some tests similar
to the ones in quantized_models.py, only to find that pytorch's
quantized conv transpose is catastrophically inaccurate. I'll upstream
the issue and only leave the tests here which are of the form quantize
-> dequantize -> op.