Change linalg.matmul_unsigned to linalg.matmul with unsigned type_fn
Signed-off-by: Max Dawkins <max.dawkins@gmail.com>
Co-authored-by: Max Dawkins <max.dawkins@gmail.com>
The static uneven divisible AdaptiveAvgPool2d means that although the
input size is not an integer multiple of ouput size, but the kernel and
stride size can also be fixed (not dynamic). The derivation logic of
kernel and stride size is consistent with
torch/_decomp/decomposations.py:adaptive_avg_pool2d as described in the
following:
1. Stride Size
Firstly , derive the start index in each reduce operation according to
the output size (`n`), `start_index = ([0, 1, ..., n - 1] * input_size)
// output_size`. For each index `k`, if `k * (input_size % output_size)
< output_size`, then the current and previous stride keeps the same as
`input_size // output_size`. So suppose `(n-1) * (input_size %
output_size) < output_size`, the stride in the whole AdaptiveAvgPool2d
process keeps static, as `input_size // output_size`.
2. Kernel Size
torch/_decomp/decomposations.py:adaptive_avg_pool2d calculates a static
kernel size when the input/output sizes satisfy either of the two
conditions, `input_size % output_size == 0` or `output_size %
(input_size % output_size) == 0`. Here if `input_size % output_size ==
0`, then the kernel size equals `input_size // output_size`, otherwise
`input_size // output_size + 1.`
Torch has all scalars represented as i64 and f64 types which results in
extraneous trunc-extf commands. We can rework this by elliding
widen-narrow cases away.
This PR adds a conversion in the TorchOnnxToTorch pass for the ONNX
Multinomial operation. It also adds a TorchToLinalg lowering for the
`aten.Multinomial` op and does a light refactor of some repeated code
that generates random floating point numbers in
`TorchToLinalg/Random.cpp`.
This PR adds support to `fx_importer.py` for handling custom ops that
return an array of tensors. As long as the length of the array is
consistent across runs (determined statically), then this patch will
work. This does not require that the number of tensors returned is
determined by the op's definition.
CC @sjain-stanford
We do have support for translating unbacked symbolic_ints that arise
from data-dependent ops like `aten.nonzero`. This PR adds the python lit
test coverage for the same.
This patch adds a few misc pad op related changes:
1. Addresses issue <https://github.com/llvm/torch-mlir/issues/3457>
2. Addresses issue <https://github.com/llvm/torch-mlir/issues/3442>
3. Fixes the padding order for asymmetrically padded onnx.Conv ops
4. Enables passing quantization through those onnx.Conv op pre-paddings
5. Modifies the torch-to-linalg lowering of AtenReplicationPad2d op to
enable support for input rank != 4
Unfortunately, even with all of these changes, the e2e tests for the
ReplicationPad2d still fail the onnx config, since the torch export
procedure for rearranging the pad order is complicated enough that the
padding ints end up not being able to fold back to constants.
The LpNormalization lowering was previously just computing the norm,
which is incorrect. This computes the norm then divides the input tensor
by it's norm.
I've tested this against some simple onnx models locally. I'll look into
adding a test case for this in an external test suite.
Register `aten.fake_quantize_per_channel_affine` and
`aten.fake_quantize_per_tensor_affine.tensor_qparams` ops
---------
Co-authored-by: Ze Zhang <ze.zhang@getcruise.com>
Fix the pad tensor rearrangement such that we change the representation
from [x1_begin, x2_begin, ..., x1_end, x2_end,...] to [xn_begin, xn_end,
...., x2_begin, x2_end, x1_begin, x1_end] where x1, x2 .. xn are the
dimensions of the pads tensor argument.
---------
Co-authored-by: zjgarvey <zjgarvey@gmail.com>
Co-authored-by: zjgarvey <47986913+zjgarvey@users.noreply.github.com>
Addresses an issue with onnx.Gather lowering to linalg:
<https://github.com/nod-ai/SHARK-Turbine/issues/242>
The builder for tensor.expand_shape, without an explicitly provided
output shape, fails to infer an output shape in the case of multiple
dynamic reassociation dims. I tried adding the output shape explicitly
for tensor.expand_shape, but ran into compilation issues later on (see
<https://github.com/iree-org/iree/issues/17760>).
This PR adds support by lowering this op to tensor.reshape when multiple
dynamic reassociation dims are provided.
Due to the custom operation parser, the print and parser were expecting
two different forms.
One having the dictionary before the value and the other after.
Following the format of the other constants ops, the constant.int will
follow the `value attr-dict` format. Updated the parser accordingly.
- Adds limited support for lowering onnx.Loop to primLoopOp
- lower in the pipeline`torch-to-scf` there is a check to see if loop is
for like. A primLoopOp is for like when the input condition is a
`trueBoolConstant`. To adapt the onnx to torch lowering to take
advantage of it, the implementation checks for specific op patterns in
the loodBody region and decides if loop is for like and uses the right
input condition op.
- to adapt the onnxLoopBody to torchLoopBody, we need to adapt the input
block arguments and set the correct output condition variable in the
loop body.
- scanOutput variables are currently not supported.
Before this PR, a statically shaped aten.convolution would generate
dynamically shaped linalg IR, and even `-canonicalize` would not be able
to fold it back into static shapes. This PR ensure that shape
calculations are folded on construction to directly generate statically
shaped linalg IR.
We achieve that by ensuring that `arith` ops involved in computing
shapes are created via `createOrFold`, so that later uses of
`getAsOpFoldResult` see constants instead of those ops.
For example
```
module {
func.func @forward(%arg0: !torch.vtensor<[32,336,112,112],f32>,
%arg1: !torch.vtensor<[336,168,3,3],f32>,
%arg2: !torch.vtensor<[336],f32>)
-> !torch.vtensor<[32,336,56,56],f32> {
%false = torch.constant.bool false
%int2 = torch.constant.int 2
%int1 = torch.constant.int 1
%0 = torch.prim.ListConstruct %int1, %int1 : (!torch.int, !torch.int) -> !torch.list<int>
%1 = torch.prim.ListConstruct %int2, %int2 : (!torch.int, !torch.int) -> !torch.list<int>
%2 = torch.prim.ListConstruct : () -> !torch.list<int>
%3 = torch.aten.convolution %arg0, %arg1, %arg2, %1, %0, %0, %false, %2, %int2
: !torch.vtensor<[32,336,112,112],f32>, !torch.vtensor<[336,168,3,3],f32>, !torch.vtensor<[336],f32>, !torch.list<int>,
!torch.list<int>, !torch.list<int>, !torch.bool, !torch.list<int>, !torch.int
-> !torch.vtensor<[32,336,56,56],f32>
return %3 : !torch.vtensor<[32,336,56,56],f32>
}
}
```
would result in
```
[...]
%padded = tensor.pad %2 low[%14, %15, %16, %17] high[%14, %15, %16, %17] {
^bb0(%arg3: index, %arg4: index, %arg5: index, %arg6: index):
tensor.yield %cst : f32
} : tensor<32x336x112x112xf32> to tensor<?x?x?x?xf32>
[...]
%45 = linalg.conv_2d_ngchw_gfchw {dilations = dense<1> : vector<2xi64>, strides = dense<2> : vector<2xi64>}
ins(%expanded, %expanded_37 : tensor<?x2x?x?x?xf32>, tensor<2x168x168x3x3xf32>)
outs(%expanded_44 : tensor<32x2x168x?x?xf32>) -> tensor<32x2x168x?x?xf32>
[...]
```
and with this PR all shapes are static.
This adds a torchvision op to torch-mlir and a path from onnx.DeformConv
to torchvision.deform_conv2d.
I'm not implementing the torch->linalg lowering for the torchvision op
yet, but posting this PR to get feedback on some of the choices being
made here and to flesh out the onnx frontend a bit.
This adds an onnx->torch conversion for onnx.RoiAlign into
torchvision.roi_align or torchvision.roi_pool, and adds those two
torchvision ops to torch-mlir.
In one of our downstreams, we encountered an internal assertion failure
in an intermediate pass from `AtenTensorOp::fold` invocation:
```
external/llvm-project/llvm/include/llvm/Support/Casting.h:650: decltype(auto) llvm::dyn_cast(const From &) [To = mlir::torch::Torch::NonValueTensorType, From = mlir::Type]: Assertion `detail::isPresent(Val) && "dyn_cast on a non-existent value"' failed.
```
for this snippet in the IR:
```
%arg1: !torch.tensor {torch.type_bound = !torch.vtensor<[1,1,15360],f32>}
...
%218 = torch.aten.size %arg1 : !torch.tensor -> !torch.list<int>
%219 = torch.aten.tensor %218, %none, %none, %false : !torch.list<int>, !torch.none, !torch.none, !torch.bool -> !torch.tensor
```
Turns out this was
[fixed](https://github.com/llvm/torch-mlir/pull/3189/files#diff-dc8ed165c207918e606490eee3984b1ad51d7034e6aac36fc046bf47f6f03f4fR3719)
eventually (and we were on an old hash of torch-mlir). This PR submits
just the lit test for test coverage on that specific change:
```c++
OpFoldResult AtenTensorOp::fold(FoldAdaptor adaptor) {
auto resultTy = dyn_cast<ValueTensorType>(getType());
// lit test this
if (!resultTy || !resultTy.hasSizes() || !resultTy.hasDtype())
return nullptr;
...
```
1. truncates zero-points to i32
2. modifies the default accumulator type for i8 from i64 to i32.
3. now uses the input dtype to infer accumulator dtype.
Resolves#3384.
Many ONNX operators are defined by functions and therefore could be
expanded into simpler ONNX operations during importing, avoiding the
need for tools downstream to support these operators directly.
This commit adds this capability to onnx_importer.py. When importing a
node, the schema for the node's operator is retrieved. If the schema
provides a function for the operator, a specialized version for the
node's types and attributes will be created and imported as an MLIR
function with private visibility. An MLIR function call will then be
emitted, instead of a normal operator node. Caching is used to avoid
generating redundant functions within the same module.
In order to avoid a disruptive change to the importer output for a
large number of operators that already have TorchOnnxToTorch support,
an allowlist strategy is used by default. With this commit, only one
operator is allowlisted for expansion, MeanVarianceNormalization.
However, many other operators can be correctly expanded by the current
code, so hopefully the allowlist can be gradually extended. It is
possible to disable the allowlist in the configuration, in which case
all functions are expanded (useful for testing).
Tools downstream of the importer may now need to do inlining when
consuming the output of the importer, e.g.:
cat imported.mlir | torch-mlir-opt --inline --convert-onnx-to-torch
Explanations for subtle code changes:
- Looking up the correct schema and function for an operator requires
knowing the opset version. NodeImporter retrieves this from the
opset imports on the ModelProto retained by the GraphInfo. Previously,
the model_proto field on GraphInfo was None when importing a subgraph
in import_regions, but this conflicts with the new need for opset
version info. Since the apparent purpose of setting it to None was to
control how GraphInfo generates its input map, a new flag is added to
GraphInfo (is_subgraph) to control this behavior, so that the actual
ModelProto can now be provided without breaking this. This also turned
out to be useful for getting the Config via ModelInfo via GraphInfo.
- Some operators' functions are context-dependent, which means the
function definition depends on the types of the inputs. Therefore node
importing now needs to look up the types of a node's inputs, not just
its outputs as was the case previously. Consequently the operand to
find_type_proto_for_name() may now be a graph input or initializer in
some cases, so it has to be updated.
This implements the Onnx.NegativeLogLikelihoodLoss op using the
signature provided
[here](https://onnx.ai/onnx/operators/onnx__NegativeLogLikelihoodLoss.html)
by replacing it with a `NLLLossForward` op.
Additionally, I included a helper function `get_loss_reduction_enum` to
convert from a string `reduction` parameter to the corresponding
intended integer value since this is an operation that will be reused
for any loss function module. This differs from `get_reduction_enum` in
`TorchUpstream.cpp` which handles the `reduce` parameter from
`scatter_reduce` type operations.