Enabled mask and is_causal parameters for torch.aten.scaled_dot_product
attention + relevant comments + tests.
The tests added highlight the new capabilities introduced in this PR,
including:
Attention with F16 mask
Attention with Boolean mask
Causal attention with same Q K V shapes
Causal attention without Q K V shapes
Made sure that one cannot input both mask and is_causal.
The lowering pattern for `aten.T` uses transposition implemented via
`linalg.generic`. For downstream passes it is advantageous to use named
ops wherever possible, so this patch changes the lowering to use
`linalg.transpose` instead.
This PR add `floordiv` to the `PY_BUILTIN_TO_TORCH_OP`. For
`aten.mul.int` and `aten.floordiv.int` ops, we add new Canonicalization
Patterns as follow:
```
%1 = torch.aten.mul.int %input, %const-5
%2 = torch.aten.mul.int %1, %const-6
```
Will be replaced by
`torch.aten.mul.int %input, %const-30`
And
```
%1 = torch.aten.mul.int %input, %const-5
%2 = torch.aten.floordiv.int %1, %const-5
```
Will directly return `%input`
This PR also relaxes the `float` type constraint in TorchToTosa for the
`AtenRsubScalarOp` conversion.
To test:
`cmake --build build --target check-torch-mlir-all`
Supports the result with dynamic shape and scalar indices like
```
func.func @test_gather_scalar(%arg0: !torch.vtensor<[3,4,5],f32>, %arg1: !torch.vtensor<[], si64>) -> !torch.vtensor<[?,?],f32> attributes {torch.onnx_meta.opset_version = 13 : si64} {
%0 = torch.operator "onnx.Gather"(%arg0, %arg1) {torch.onnx.axis = 0 : si64} : (!torch.vtensor<[3,4,5],f32>, !torch.vtensor<[], si64>) -> !torch.vtensor<[?,?],f32>
return %0 : !torch.vtensor<[?,?],f32>
}
```
`Torch::AtenSqueezeOp` is referring to the result shape, so it will
failed on lowering if the result shape is dynamic.
The current implementation uses a `linalg.generic` to broadcast the bias
tensor for the lowering of convolutions. This is suboptimal for later
pattern matching. This patch changes it to use the respective named op,
`linalg.broadcast`, instead.
The `axis` attribute is optionally available. Added support by computing
the pad based on the axis values.
---------
Signed-off-by: Rob Suderman <rob.suderman@gmail.com>
- This PR adds new (and equivalent) more tensorized impl of
MelWeightMatrix which lowers all the way to linalg.
- [Ref Pytorch
Impl](https://gist.github.com/PhaneeshB/4e6dfcded3007b1b686fbe28f07a67cd)
- Thanks to @rsuderman for pointing out the difficulties [earlier
impl](#3503) posed during lowering to linalg and also for providing a
better numpy impl 🙏
Set PyTorch and TorchVision version to nightly release 2024-08-18.
This commit also updates the `scaled_dot_product_attention` op.
A new attribute `enable_gqa` has been added. As of now, only the
default value for the same is supported.
Signed-Off By: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>
This commit extends the OnnxToTorch lowering for BatchNormalization op
for supporting the case when training=True.
Signed-Off By: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>
The `layout` attribute was not considered for the `onnx.RNN` operation.
Added support for the attribute to transpose the inputs / outputs of the
RNN when valid.
Closes#3575
The PyTorch remainder operator is meant to compute the Python modulus
operator entrywise:
https://pytorch.org/docs/stable/generated/torch.remainder.html#torch.remainder
In python the modulus operator is meant to always return a result with
the same sign as the divisor:
https://docs.python.org/3/reference/expressions.html#binary-arithmetic-operations
In other words, torch.aten.remainder should return a Python-style
modulus instead of a C-style modulus. However the remainder operator was
simply translated into arith.ModSI or arith.ModF, which both effectively
compute the C-style modulus. Now the lowering has been modified so that
the modulus operator works properly with negative numbers, both in the
dividend, and the divisor.
This patch adds basic support for lowering graphs with per-channel
quantization. Per-channel quantized ops have to be excluded from
`FuseQuantizedOps` for now but can be used in QDQ quantized form.
Using this patch, we're able to import and execute (on the linalg
backend) graphs with per-channel quantization applied using the "new"
PyTorch 2.0 Export Quantization.
The saga of aligning onnx and torch padding conventions continues.
```python
onnx_pads = [low_x, low_y, low_z, high_x, high_y, high_z]
torch_pads = [low_z, high_z, low_y, high_y, low_x, high_x]
```
So not only is the lexicographical ordering hierarchy swapped (low/high
x spatial-dim -> spatial-dim x low/high) but the ordering in the the
spatial-dim specification is also reversed.
This patch properly reverses the pad ordering (and actually uses the
`shuffledPadding` to pad).
`onnx.Shape` can select only a subset of indices using attributes. Add
support for these attributes.
---------
Co-authored-by: zjgarvey <47986913+zjgarvey@users.noreply.github.com>
Following up from the discussion in
<https://github.com/llvm/torch-mlir/pull/3550>, I've edited the lowering
to prevent OOB extracts in a more direct fashion (i.e., just clamping
directly).
I don't think this affects the lit tests at all, but I've tested the
changes in our external test suite at
<https://github.com/nod-ai/SHARK-TestSuite/tree/main/>. I found the
issue when I was unexpectedly getting `nan`'s along the output image
border for a resize test there.
Change linalg.matmul_unsigned to linalg.matmul with unsigned type_fn
Signed-off-by: Max Dawkins <max.dawkins@gmail.com>
Co-authored-by: Max Dawkins <max.dawkins@gmail.com>