Linalg has quantized specific operations. We can lower to these
operations when there is a known zeropoint and scale operations. This
allows the `convolution` to occur with lower bitwidth's, improving the
overall performance.
We were seeing some assertion failures after some checks around folders
were tightened up in LLVM:
https://github.com/llvm/llvm-project/pull/75887 . This PR essentially
moves the logic that used to be applied at the LLVM level into the
folder, which seems to be the suggested fix.
I'm not sure if the IR that caused issues for us _should_ be valid?
```
%1 = torch.aten.detach %arg0 : !torch.tensor<[1],f32> -> !torch.tensor
```
A better fix might be to create a verifier ensuring the result of
`aten.detach` has the same type as its operand.
---------
Co-authored-by: aaron-stgeorge <aaron.stgeorge@getcruise.com>
Torch does not have an equivalent matmul operation for integers. Instead
it sidechannels the information via its quantized types. For this
lowering we setup these sidechannels then invoke `torch.mm`.
This preserves sparsity at the most obvious places of lowering TORCH
tensors to MLIR RankedTensorType tensors. Other places are marked for
audit. With some initial lowering tests.
This adds an encoding field to the torch type, using the interfaces for
printing, parsing, and verification. Note that although this change
prepares adding sparsity to the torch type (as illustrated by the round
trip and invalid tests), nothing in this change depends on the actual
contents of the encoding field!
This includes custom op matching for decomposed operations and fusing
dequantization into dense operations. As a validation we compare
to the dequant+mm torch implementation.
We can plumb the linear matmul into pytorch using its quantized types
with side channel information. To handle the final int8 operation we
dequantize and requantize.
This commit adds mapping from `onnx.pad` op to `torch.pad` op. Currently
it does not support `axes` parameter of `onnx.pad` op.
Signed-off-by: Gaurav Shukla <gaurav.shukla@amd.com>
Currently transposed convolution is not handled correctly by
`TorchToTosa`. This PR allows transposed convolutions to pass through
the conversion so that they can be handled by other conversion passes
later in a pipeline.
An example input which produces a compilation error is:
```
func.func @forward(%input: !torch.vtensor<[1,64,1,100],f32>) -> !torch.vtensor<[1,64,2,200],f32> {
%true = torch.constant.bool true
%int1 = torch.constant.int 1
%int2 = torch.constant.int 2
%weight = torch.vtensor.literal(dense<0.0> : tensor<64x64x3x3xf32>) : !torch.vtensor<[64,64,3,3],f32>
%bias = torch.vtensor.literal(dense<0.0> : tensor<64xf32>) : !torch.vtensor<[64],f32>
%stride = torch.prim.ListConstruct %int2, %int2 : (!torch.int, !torch.int) -> !torch.list<int>
%int1x1 = torch.prim.ListConstruct %int1, %int1 : (!torch.int, !torch.int) -> !torch.list<int>
%output = torch.aten.convolution %input, %weight, %bias, %stride, %int1x1, %int1x1, %true, %int1x1, %int1 : !torch.vtensor<[1,64,1,100],f32>, !torch.vtensor<[64,64,3,3],f32>, !torch.vtensor<[64],f32>, !torch.list<int>, !torch.list<int>, !torch.list<int>, !torch.bool, !torch.list<int>, !torch.int -> !torch.vtensor<[1,64,2,200],f32>
return %output : !torch.vtensor<[1,64,2,200],f32>
}
```
This MLIR produces an error about a cast operation with a size mismatch
when passed through `torch-to-tosa`:
```
error: 'tensor.cast' op operand type 'tensor<1x64x1x50xf32>' and result type 'tensor<1x64x2x200xf32>' are cast incompatible
```
---------
Co-authored-by: Srinath Avadhanula <srinath.avadhanula@getcruise.com>
We can make the per-tensor version of the operation to the dequantize
operation via marking with the make quantized tensor component. This
introductions the `qint*` and `quint*` tensor type that can be lowered
to teh appropriate dequantization behavior during the torch-to-linalg
conversion.
We can map the per_tensor case to the `torch.aten.quantize_per_linear`
operation. In this case we extract the `scale` and `zeropoint` values
and directly invoke the quantization, then return the integer
representation value.
Implemented ONNX.Range. The spec says the data type for start, limit,
delta are 0-D can be double, float, int16, int32, int64, All int types
mapped to !torch.int and all float types mapped to !torch.float
---------
Co-authored-by: Kumar Deepak <kumar@xilinx.com>
Handles the multiple cases of `onnx` constant values and converts them
to `torch` literal tensors. This can include splats with a single
integer or floating point value, a set of explicit integer values, or
an elements array attr of values.
This PR updates the torch-to-tosa conversion with following changes:
- Support torch.none as min/max input argument for tosa.clamp op
- Support negative value as start index for tosa.slice op
- Add tosa.logical_or lowering support
e2e test:
python -m e2e_testing.main --config=tosa
LIT tests:
cmake --build build --target tools/torch-mlir/all
---------
Co-authored-by: Ze Zhang <ze.zhang@getcruise.com>