Register `aten.fake_quantize_per_channel_affine` and
`aten.fake_quantize_per_tensor_affine.tensor_qparams` ops
---------
Co-authored-by: Ze Zhang <ze.zhang@getcruise.com>
Fix the pad tensor rearrangement such that we change the representation
from [x1_begin, x2_begin, ..., x1_end, x2_end,...] to [xn_begin, xn_end,
...., x2_begin, x2_end, x1_begin, x1_end] where x1, x2 .. xn are the
dimensions of the pads tensor argument.
---------
Co-authored-by: zjgarvey <zjgarvey@gmail.com>
Co-authored-by: zjgarvey <47986913+zjgarvey@users.noreply.github.com>
This commit adds the support for new data types: uint4, and int4 and
uint8 tensor protos. Also, it moves some tests from failing to crashing.
Fixes https://github.com/llvm/torch-mlir/issues/3507
Signed-Off By: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>
Addresses an issue with onnx.Gather lowering to linalg:
<https://github.com/nod-ai/SHARK-Turbine/issues/242>
The builder for tensor.expand_shape, without an explicitly provided
output shape, fails to infer an output shape in the case of multiple
dynamic reassociation dims. I tried adding the output shape explicitly
for tensor.expand_shape, but ran into compilation issues later on (see
<https://github.com/iree-org/iree/issues/17760>).
This PR adds support by lowering this op to tensor.reshape when multiple
dynamic reassociation dims are provided.
This bump triggered an upstream assert. Includes a WAR for #3506.
Also includes several things I needed to do to repro:
* When TORCH_MLIR_TEST_CONCURRENCY=1, test runs will be printed.
* Added TORCH_MLIR_TEST_VERBOSE=1 handling to enable verbose mode
(useful on CI).
---------
Co-authored-by: Stella Laurenzo <stellaraccident@gmail.com>
The `index_put` operation, `input[indices] = values`, allows for the
values to be any shape that is broadcastable to the slice
`input[indices]`. This commit adds broadcasting support to the Linalg
lowering of `IndexPutHackedTwinOp`.
Fixes: #3465
This adds support for a few ops:
- torch.linalg_det
- torch._linalg_det (if the LU and pivot returns are unused)
- onnx.Det
An scf loop is used, since the row reduction algorithm applied here has
some loop-carried dependencies.
The current support being added here is very basic, and only works if no
permutations are required during row reduction, and assumes the matrices
are non-singular.
This adds a torchvision op to torch-mlir and a path from onnx.DeformConv
to torchvision.deform_conv2d.
I'm not implementing the torch->linalg lowering for the torchvision op
yet, but posting this PR to get feedback on some of the choices being
made here and to flesh out the onnx frontend a bit.
This adds an onnx->torch conversion for onnx.RoiAlign into
torchvision.roi_align or torchvision.roi_pool, and adds those two
torchvision ops to torch-mlir.
Add a new op with shape/dtypes and decompose into
`fake_quantize_per_tensor_affine` when the second result is unused.
The xfail_set change is on ONNX because torch cannot export this op to
ONNX.
Resolves#3384.
Many ONNX operators are defined by functions and therefore could be
expanded into simpler ONNX operations during importing, avoiding the
need for tools downstream to support these operators directly.
This commit adds this capability to onnx_importer.py. When importing a
node, the schema for the node's operator is retrieved. If the schema
provides a function for the operator, a specialized version for the
node's types and attributes will be created and imported as an MLIR
function with private visibility. An MLIR function call will then be
emitted, instead of a normal operator node. Caching is used to avoid
generating redundant functions within the same module.
In order to avoid a disruptive change to the importer output for a
large number of operators that already have TorchOnnxToTorch support,
an allowlist strategy is used by default. With this commit, only one
operator is allowlisted for expansion, MeanVarianceNormalization.
However, many other operators can be correctly expanded by the current
code, so hopefully the allowlist can be gradually extended. It is
possible to disable the allowlist in the configuration, in which case
all functions are expanded (useful for testing).
Tools downstream of the importer may now need to do inlining when
consuming the output of the importer, e.g.:
cat imported.mlir | torch-mlir-opt --inline --convert-onnx-to-torch
Explanations for subtle code changes:
- Looking up the correct schema and function for an operator requires
knowing the opset version. NodeImporter retrieves this from the
opset imports on the ModelProto retained by the GraphInfo. Previously,
the model_proto field on GraphInfo was None when importing a subgraph
in import_regions, but this conflicts with the new need for opset
version info. Since the apparent purpose of setting it to None was to
control how GraphInfo generates its input map, a new flag is added to
GraphInfo (is_subgraph) to control this behavior, so that the actual
ModelProto can now be provided without breaking this. This also turned
out to be useful for getting the Config via ModelInfo via GraphInfo.
- Some operators' functions are context-dependent, which means the
function definition depends on the types of the inputs. Therefore node
importing now needs to look up the types of a node's inputs, not just
its outputs as was the case previously. Consequently the operand to
find_type_proto_for_name() may now be a graph input or initializer in
some cases, so it has to be updated.
Issues was found here https://github.com/nod-ai/SHARK-Turbine/issues/643
- [ONNX] Fix padding attributes for onnx.AveragePool
- [Linalg] Add countIncludePad false support for AtenAvgPool1/2dOp
- [Linalg] Add an avg_pool2d countIncludePad False e2e tests
- [Linalg] Fix conflict with AtenAvgPool3dOp
- [Linalg] Fix e2e crash with AtenAvgPool1dOp
- [Linalg] Add dynamic dim support for AtenAvgPool2dOp
- [Linalg] Fix AvgPool2dDivisorOverrideModule crash
This commit also adds the Torch declaration for aten.max_unpool2d and
aten.max_unpool3d op. The TorchToLinalg lowering for the same will be
added in a follow-up commit.
Signed-Off By: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>
This addresses 7 of the model failures I'm seeing in the test suite. See
[Shark-Turbine issue
#566](https://github.com/nod-ai/SHARK-Turbine/issues/566).
Need the op ```linalg.conv_2d_ngchw_gfchw_q``` to be added upstream
before merging this. See [llvm-project PR #92136
](https://github.com/llvm/llvm-project/pull/92136).
A small additional expansion to operand quantization is included in this
patch to address a model failure that occurs when unblocking the
quantized group convolutions in one of these onnx models.
Updates:
- some unsupported modes are now going to report a match failure for
unsupported coordinate transformation modes.
- fixes a bug that was introduced in the last patch for resize (my
bad...)
- uses actual x and y coordinates for computing weights in bilinear
interpolation (rather than eps modified values)
- slightly simplifies the bilinear interpolation payload for readability
and performance
- passes coordinate transformation mode information from an onnx.Resize
op to the mode string for the aten._interpolate op. This allows us to
perform custom logic in the torch->linalg lowering to support
onnx.Resize options without losing the default behaviors of the
interpolate op.
This commit fixes the onnx.MaxPool op lowering which was lacking the
indices result support.
Signed-Off By: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>
* not to decompose `aten.amax` on `stablehlo` backend. Because it could
be lowering to `stablehlo.reduce` directly.
* lowering `aten.max.dim` to `stablehlo.reduce apply max` when
`AtenMaxDimOp.getIndices()` doesn't have users. It's more simple.
…cation and sparse tensors.
**NOTE**: This PR _doges_ the issue in buffer-deallocation pass instead
of resolving it. In the future, we need to fix the bug in
buffer-deallocation pass when handling code generated by sparse
compiler.
While playing with TorchDynamo on ResNet18. I notice following issues:
- `prims.convert_element_type` can’t be canonicalized even if the input
and the output share the same type
- `aten.max_pool2d_with_indices` is always used instead of
`aten.max_pool2d`, even if the second returned output (indices) has no
user
This PR fixes above issues by adding a folder to the
PrimsConvertElementTypeOp and a canonicalizer to the
AtenMaxPool2dWithIndicesOp
Lit test:
`cmake --build build --target check-torch-mlir-all`
---------
Co-authored-by: Ze Zhang <ze.zhang@getcruise.com>
This is probably a decent PR for learning about blocks and regions.
If you're here to learn about that, consider also looking at
lib/Conversion/TorchToSCF/TorchToSCF.cpp
While this doesn't include an e2e test, it is tested downstream in
https://github.com/nod-ai/SHARK-TestSuite/blob/main/e2eshark/onnx/operators/If/model.py
---------
Co-authored-by: Xida Ren <xida.ren.dev@gmail.com>
I spent a little while debugging numerics issues with some tests similar
to the ones in quantized_models.py, only to find that pytorch's
quantized conv transpose is catastrophically inaccurate. I'll upstream
the issue and only leave the tests here which are of the form quantize
-> dequantize -> op.
For some sparse programs (and I am sure other not-seen corner cases for
dense), some passes were missing in the reference pipeline, eventually
resulting in e.g. a unresolved unrealized cast issue. This PR adds some
very obvious missing passes to avoid this situation.
This is a large change because prior to this point, Python files in the
project were not consistently formatted. This reformats them all with
black defaults.
Based on experience with prior projects, if you have a dev/long-term
branch with Python patches, you can minimize merge conflicts prior to
rebasing to include this commit by running `black` on your modified
Python files, squashing, and then rebasing/merging.
This is part 1 of ~3, formatting all miscellaneous text files and CPP files matched by a first run of pre-commit. These tend to be low change-traffic and are likely not disruptive.
Subsequent patches will format Python files and remaining CPP files.
A choice was made to quantize the return type of Relu with a scale and
zero point copied from the input's quantization scheme. With this
choice, the torch-to-linalg conversion of quantized Relu essentially
computes max(input, zeroPoint) in the elementwise payload.
* promote input to output element-type when lowering to stablehlo, so
that it could satisfy stablehlo's type constraints.
* split promote-to-fp unary ops from fp-only unary ops.
This commit also cleans up the OnnxToTorch lowering for the Squeeze and
Unsqueeze op and adds the support for handling edge cases.
Signed-Off By: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>
Version number was set too high. Lowered to support more cases allows
more tests to pass.
Co-authored-by: Robert Suderman <rsuderman@Roberts-MacBook-Pro.local>
Previous implementation erroneously mixed up num_outputs with
slice_size. New version correctly computs the slice size and directly
performs slicing rather than leveraging `aten.split.tensor`. This is due
to `onnx` supporting a fixed number of splits making the size
computation more easily computeable when lowering to `aten` rather than
deferring to `aten.split.tensor`.
---------
Co-authored-by: Robert Suderman <rsuderman@Roberts-MacBook-Pro.local>
We can map to `tensor.reshape` for handling multiple output dynamic
shapes. Later we can perform a more complex analysis for indentifying
expand/collapse cases from the tensor.reshape.
Initially we planned to handle this identification at the `torch` level
however it will be easier to handle once converted to core
mlir-dialects.
Decomposition RepeatInterleaveSelfInt with following ops:
```python
def my_repeat_interleave(input, repeats, dim=None):
if dim is None:
# Flatten the input and then repeat
return input.flatten().unsqueeze(-1).tile((1, repeats)).flatten()
else:
# Calculate the shape after repeat
expanded_shape = list(input.shape)
expanded_shape[dim] *= repeats
# Repeat the tensor along the specified dimension
repeat_shape = [1] * (input.dim() + 1)
repeat_shape[dim + 1] = repeats
input = input.unsqueeze(-1)
# Tile and then reshape
tiled = torch.tile(input, repeat_shape)
# Rearrange and reshape
repeated = tiled.reshape(*expanded_shape)
return repeated
```
I passed the tests of stablehlo and linalg. When testing onnx, strange
things happened.
In torch-mlir's CI **torch_nightly** and my own
environment(torch==2.4.0.dev20240318+cpu), it can **pass the pass**.
In torch-mlir's CI **torch_stable**, it **failed**.
The test case is `RepeatInterleaveSelfIntNoDimModule_basic`, the result
shape should be [120].
```python
class RepeatInterleaveSelfIntNoDimModule(torch.nn.Module):
def __init__(self):
super().__init__()
@export
@annotate_args([
None,
([3, 4, 5], torch.float32, True),
])
def forward(self, x):
return x.repeat_interleave(2)
@register_test_case(module_factory=lambda: RepeatInterleaveSelfIntNoDimModule())
def RepeatInterleaveSelfIntNoDimModule_basic(module, tu: TestUtils):
module.forward(tu.rand(3, 4, 5))
```
The error log is as follows:
```
Unexpected outcome summary: (onnx)
****** Failed tests - 1 tests
FAIL - "RepeatInterleaveSelfIntNoDimModule_basic"
@ trace item #0 - call to "forward"
@ output of call to "forward"
ERROR: shape (torch.Size([6, 4, 5])) is not equal to golden shape (torch.Size([120]))
```
@rsuderman
Would you please help me check what's wrong with my PR? Thanks a lot.
Align corner modes which select what the corners mean.
Either the center of the corner points or the edges of the edge points.
---------
Co-authored-by: Rob Suderman <rob.suderman@gmail.com>
The new cases added for quantized matmuls are:
1. vec-vec
2. vec-mat
3. mat-vec
each of which are now lowered to expand(s), quantized_matmul, and
collapse.
1. onnx.MatMulInteger now converts to aten.matmul instead of aten.mm
2. aten.matmul, for ranks >=2, now allows quantized inputs and will
lower to linalg::quantized_matmul or linalg::quantized_batch_matmul.
3. added AtenMatmulOp to the FuseQuantizeOps rewrite patters
QuantizeOperands, QuantizeTransposedOperands, and QuantizeAccumulator
4. added several tests, including some to test AtenMmOp with varying
quantization signed-ness.
5. a quantized matmul mat-vec test is added to verify the failure to
lower to linalg; cleaned of out-of-date code related to common
torch-mlir lowering xfails.
6. in debugging a real model with quantized matmuls, I found a bug on
the scalarize-shapes pass which resulted from the aten.full op folder
returning an incompatible result type. This is fixed by the small change
here to
[lib/Dialect/Torch/IR/TorchOps.cpp](https://github.com/llvm/torch-mlir/compare/main...zjgarvey:torch-mlir:MatMulIntegerFix?expand=1#diff-dc8ed165c207918e606490eee3984b1ad51d7034e6aac36fc046bf47f6f03f4f).
- Added linalg lowering for `AtenFloorDivideScalarOp`
- Needed `AtenDivScalarModeOp` for the decomp.
- Added linalg lowering for `AtenDivScalarModeOp`
- Moved linalg payload logic to `createDivModePayload()` since the logic
was nearly identical for both `AtenDivScalarModeOp` and
`AtenDivTensorModeOp`. Just a template function
- Added `AtenDivScalarModeOp` lowering for stablehlo
Pytorch's
[`torch.floor_divide()`](https://pytorch.org/docs/stable/generated/torch.floor_divide.html)
in a previous version (for a reason unknown to me) preformed a
truncation instead of "floor". The already implemented op
`AtenFloorDivideTensorOp` was done before this change. However, this
wasn't caught because our testcases only tested positive floor division.
I changed this to floor as well as adding a few test cases.
If there is only a single value scattered there can be an implicit batch
dimension. This includes a check for the implicit batch dimension when
reshaping the update tensor. It includes an e2e test to verify
correctness.
Fix the case PrimListUnpackOp's result num is not equal to PrimList
length.
See the following example:
```python
def forward(self, x):
if len(x.shape) == 5:
b0, t, c0, h0, w0 = x.shape
b, c, h, w = torch.mul(b0, t), c0, h0, w0
else:
b1, c1, h1, w1 = x.shape
b, c, h, w = b1, c1, h1, w1
res = torch.reshape(x, [b, c, h, w])
return res
```
Without this fix, the following error message will occur:
```
/root/torch-mlir/externals/llvm-project/mlir/lib/IR/PatternMatch.cpp:118: virtual void mlir::RewriterBase::replaceOp(mlir::Operation *, mlir::ValueRange): Assertion `op->getNumResults() == newValues.size() && "incorrect # of replacement values"' failed.
```
Previously, it could only handle the situations where outputsize == (1,
1) or outputsize == (input_H, input_W). Now it supports all situations
where input_H % output_H== 0 && input_W % output_W == 0
1. Changes the linalg lowering for dequantization ops to always sign
cast to float to prevent misrepresenting uint32 overflow on subtraction
with zero point.
2. Adds a basic quantized model test which only quantizes and
dequantizes and now passes with these changes in linalg and onnx
configs.
3. Changes the aten.mm lowering to allow mismatched quantized types.
4. If a quantized matmul arg is uint8, we shift by 128 to faithfully
represent the quantization as a signed i8 quantization. This worked fine
in the AtenMmOp lowering, but I'd be happy to move it to a rewrite in
FuseQuantizedOps.cpp instead if that seems more appropriate.
With the changes 3 and 4, the QuantizedMLP_basic and
QuantizedSingleLayer_basic e2e tests now passes with the onnx config.
They replaced all `c10::optional` usages with `std::optional` in
torchgen'd code in
fb90b4d4b2
causing the LTC build to break.
Replacing all usages of `c10::optional` with `std::optional` in
`projects/ltc` has fixed the issue
Issue: #3120