Commit Graph

80 Commits (6aa481c20437cb3e985ad0a80327b6bff8a671c4)

Author SHA1 Message Date
penguin_wwy 7616d637fd
Add stateless fx graph import (#3036) 2024-03-21 14:44:54 -07:00
penguin_wwy f34c187ac4
Normalize type hints to be compatible with multiple Python versions (#3028)
Although we provide a wheel package for Python 3.8, it may actually
throw the following exception:
`TypeError: 'type' object is not subscriptable`
2024-03-15 08:29:48 -07:00
Yuanqiang Liu 4282eb9e76
[Torch Dialect] support aten.fake_quantize_per_tensor_affine (#3014) 2024-03-15 08:53:29 +08:00
Yuanqiang Liu 870e63bc3c
[Torch Dialect] support decomposition of aten.linspace (#3006) 2024-03-14 08:28:33 +08:00
Yuanqiang Liu 43c6996a31
[Torch Dialect] add folder for aten.ceil and unify patterns of ceil, … (#3010)
…floor, round
2024-03-14 07:41:58 +08:00
ptrifunovic98 524ff99216
Implement lowering of torch.aten.linalg_cross (#2986)
Closes
[nod-ai/SHARK-Turbine#497](https://github.com/nod-ai/SHARK-Turbine/issues/497)
2024-03-13 12:17:22 -07:00
Yuanqiang Liu 229ca3a9e1
[Torch Dialect] emit aten::mul and add folder (#3007) 2024-03-11 19:59:34 +08:00
Yuanqiang Liu a3fe130f73
[Torch Dialect] emit aten::warn (#3003)
* torch-mlir may not handle `aten.warn`. But it could be handled by
custom users' backend which involves torch-mlir.
2024-03-10 08:29:08 +08:00
Rob Suderman 0723584936
[torch] Add folder for torch.aten.*.Scalar comparisons (#3000)
This folds small version of the tensor-scalar comparison operators as
they are commonly used for shape computations. This includes le, lt, ge,
gt, eq, and ne.
2024-03-08 13:44:00 -08:00
Ze Zhang aa7c9a9653
e2e support aten.linalg_norm to aten.linalg_vector_norm (#2953)
Add e2d support for `aten.linalg_norm` by decompose it to
`aten.linalg_vector_norm`.

Lowering to `aten.linalg_matrix_norm` is still unsupported.

To Test: 

`python -m e2e_testing.main -v`

---------

Co-authored-by: Ze Zhang <ze.zhang@getcruise.com>
2024-03-05 16:31:01 -08:00
Rob Suderman bc0527676b
[torch] Add support for `torch.split_with_sizes` via decompose (#2979)
Convert to individiual slices and tuple together as a list.

---------

Co-authored-by: Scott Todd <scott.todd0@gmail.com>
2024-03-05 15:01:21 -08:00
Rob Suderman 19d4888278
[torch] Make torch.aten.unflatten lower directly to linalg (#2971)
Existing lowering via aten.view does not work as well for dynamic shapes
as the lowering to tensor.expand must re-infer dynamic shape matching.
Better to directly lower.
2024-03-04 10:17:42 -08:00
Rob Suderman 61f0a5facf
[torch] Add an `aten.cat` length-0 canonicalization (#2966)
If an input is length-0 along the dimension of canonicalization we can
remove the tensor from the list
2024-03-01 21:41:12 -08:00
mmakevic 76b81e0ccd
Implement lowering of torch.aten.fmod.Tensor (#2767)
Closing https://github.com/nod-ai/SHARK-Turbine/issues/351
2024-02-29 11:22:03 +05:30
Rob Suderman 6f3d62ab04
[torch] Fix folders and `cat` and `view` torch lowerings (#2963)
A bunch of small fixes are interlinked and trigger crashes if not
addressed as a group. This includes:

- aten view when expand from a rank-0 tensor
- slice folder with negative indices
- `aten._shape_as_tensor` folder on a rank-0 tensor
- `aten.cat` of a tensor with a length-0 tensor
2024-02-28 12:04:52 -08:00
Vivek Khandelwal d81747eadb
[MLIR][TORCH] Extend support for OnnxToLinalg lowering for Dropout and Div op (#2938)
Fixes https://github.com/nod-ai/SHARK-Turbine/issues/451,
https://github.com/nod-ai/SHARK-Turbine/issues/452
2024-02-27 11:02:05 +05:30
ptrifunovic98 c5a1da1910
Implement lowering of torch.aten.norm.Scalar (#2899)
Closes
[nod-ai/SHARK-Turbine#365](https://github.com/nod-ai/SHARK-Turbine/issues/365)
2024-02-26 08:46:56 -08:00
Andreas Falkenberg 55dc8deb92
[torch] GridSample TorchToLinalg lowering (#2883)
Lowers `torch.grid_sample` to the equilvalent `linalg` representation.
2024-02-23 09:14:38 -08:00
Aart Bik 534b266f2d
[torch-mlir][NFC] remove trailing whitespace (#2936) 2024-02-20 11:23:14 -08:00
Rob Suderman 135c81a416
[torch] Add folder for `prim.NumToTensor.Scalar` (#2921)
Useful for `slice` lowerings that depend on tensors made form scalars.
2024-02-19 11:55:54 -08:00
Rob Suderman e80054a3cc
[torch] Folders for `torch.aten.*.tensor` operators [add, sub, mul] (#2878)
Simple folder for limited size aten tensor operations. This is primarily
useful for shape computation folding as they unfortunately can use
`aten` operators. Add, sub, mul are common examples of these folders.
2024-02-19 10:28:23 -08:00
aldesilv d29157b33f
OnnxToTorch support for onnx.InstanceNormalization op (#2710)
https://github.com/nod-ai/SHARK-Turbine/issues/327
2024-02-19 19:53:48 +05:30
Vivek Khandelwal d6d1a173dc
[MLIR][Torch] Add OnnxToTorch and TorchToLinalg support for trig ops (#2903)
This commit adds the OnnxToTorch lowering for cosh, acosh, asin, asinh,
and atanh op.
This commit also adds the TorchToLinalg lowering for acosh, asin, asinh,
and atanh op.

Signed-Off By: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>
2024-02-14 11:58:09 +05:30
Rob Suderman c0f139be0f
[torch] Add `torch.aten.eq.Tensor` comparison folder (#2889)
Added a folded for a equals operator. This allows an equivalent
comparison folder, primarily for when shape computations occur small
size tensor.
2024-02-09 15:02:20 -08:00
Rob Suderman 7d33ba69ac
[torch] Folder for torch.aten.select.int for splat cases (#2890)
If the input or result is a splat value we can just constant fold the
result. This is common for shape computations and can help with shape
inference.
2024-02-09 14:02:54 -08:00
Franz Haniel 4cc62aeb24
Implement trace (#2790)
The lowering decomposes AtenTraceOp into an AtenDiagonalOp followed by
AtenSumOp.

The progress is tracked in
https://github.com/nod-ai/SHARK-Turbine/issues/333.

---------

Co-authored-by: Franz Haniel <franz.haniel@amd.com>
2024-02-09 08:00:24 -08:00
Rob Suderman a8aad2a5ab
[torch] Add `torch.aten.where.*` folders (#2886)
Where operation can be statically computed when involving splats of
known value. Added handling these cases with multiple tests.
2024-02-07 19:43:31 -05:00
Dave Liddell 23647ab2d1
[torhc] aten.index_select folder (#2871)
Folds aten::index_select ops under the following conditions:

1. If the input and output are the same shape, the indexing operation is
a NOP, so just return the input.
2. If the input has shape <1x1x...xNx...x1> (all 1's except for one
dim), and the output shape is <1x1x...x1> (all 1's), then there is a
single index, so extract the single element value and return a tensor
with that value.

---------

Co-authored-by: Dave Liddell <dliddell@xilinx.com>
2024-02-07 16:17:15 -08:00
mmakevic 32dbf99ce2
Implement lowering of torch.aten.all.dim (#2873)
Lowering of torch.aten.all.dim to linalg.

Per PyTorch documentation:

> This function matches the behaviour of NumPy in returning output of
dtype bool for all supported dtypes except uint8. For uint8 the dtype of
output is uint8 itself.

Since there is no support for ui8 in torch-mlir currently
(https://github.com/llvm/torch-mlir/pull/1384#issuecomment-1260011334)
implementation returns failure for that case.
2024-02-07 12:34:52 -08:00
saienduri bfcf93ea21
Rename torch_mlir.compile APIs and introduce FX based analogs (#2842)
Link to related RFC:
https://discourse.llvm.org/t/rfc-rename-torch-mlir-compile-apis-and-introduce-fx-based-analogs/76646
This commit updates the documentation, tests, CMake files, and API for
the proposed changes in the RFC. There is a new torch_mlir/fx.py for
user level APIs related to importing modules and a corresponding test
for this path can be found at test/python/fx_importer/basic_test.py.

---------

Co-authored-by: MaheshRavishankar <mravisha@amd.com>
2024-02-06 19:07:59 -08:00
Xida Ren (Cedar) cc06391630
AtenSortOp Folder (#2864)
A chunk off

https://github.com/llvm/torch-mlir/pull/2856
https://github.com/llvm/torch-mlir/pull/2860

---------

Co-authored-by: Xida Ren <xida.ren.dev@gmail.com>
Co-authored-by: Rob Suderman <rob.suderman@gmail.com>
2024-02-06 21:12:12 +00:00
Dave Liddell 1cb14f6879
Rob's atenTensor folder (#2867)
If a tensor is initialized by a list with a single constant integer,
this folder turns it into a torch.vtensor.literal

---------

Co-authored-by: Dave Liddell <dliddell@xilinx.com>
2024-02-05 17:10:42 -08:00
Xida Ren (Cedar) 24b8c8672a
[torch] Add folders for `torch.fill`, `torch.ones`, `torch.zeros` and `aten.getItem` (#2849)
So that the CumSum Op in OPT can get the constant that it requires to be lowered to TMTensor

---------

Co-authored-by: Rob Suderman <rob.suderman@gmail.com>
Co-authored-by: Xida Ren <xida.ren.dev@gmail.com>
2024-02-02 10:46:33 -08:00
Ilija Kalinić 54ef18c556
Implement lowering of torch.aten.lerp.Scalar (#2773)
Closes nod-ai/SHARK-Turbine#356
2024-01-31 09:39:38 -08:00
Yuanqiang Liu d778950f45
[Torch Dialect] add fold pattern for aten.clone (#2804) 2024-01-31 09:43:21 +08:00
Quinn Dawkins 494089d53d
Clang format refresh (#2812)
After noticing a number of commits with unrelated formatting changes,
I think something was changed with clang-format at one point and we're
seeing a number of unrelated changes. Doing a refresh can help avoid
this.

The changes made here came from
```
find lib -iname *.h -o -iname *.cpp  | xargs clang-format -i --style=llvm
find include -iname *.h -o -iname *.cpp  | xargs clang-format -i --style=llvm
find projects -iname *.h -o -iname *.cpp  | xargs clang-format -i --style=llvm
```
2024-01-29 12:59:33 -05:00
Rob Suderman 2ef228328f
[torch] `torch.dequantize` for per channel tensors to` linalg` (#2769)
Support a lowering for dequantization for per channel tensors from
`torch` dialect to a linalg decomposition. Tested via a numerical
`torch` test.
2024-01-25 16:40:21 -08:00
zjgarvey c531f5495b
AtenAdaptiveMaxPool2d Conversion to Linalg (#2779)
The logic here is very similar to the conversion for AdaptiveAvgPool1d
#2661 with a few modifications:

1. buffVal = -inf instead of 0
2. the main linalg generic op accumulates a max, instead of a sum, to
the first output tensor
3. avg pooling requires dividing the sum pool by the kernel width, which
we stored as an auxilliary tensor (kSizeTensor). Here, the auxiliary
tensor will be recording the indices. Strangely enough, the only
signature available for this function is to return indices, and it
appears that they must be computed whether the user desires them or not.
See
[pytorch/torch/nn/functional.py](https://github.com/pytorch/pytorch/blob/main/torch/nn/functional.py#L1174).

Before writing other adaptive pooling conversions, the logic of this
decomposition should be rolled into a helper function that will work for
both max and avg pooling ops. Even the auxiliary tensor should likely be
automated. This code was written in a slightly more tedious way than
strictly necessary (often using loops to fill SmallVectors up to rank-2,
which is only two in this case), in order to more easily facilitate the
transition to a helper function.
2024-01-24 09:09:56 -08:00
Xida Ren (Cedar) ccaac85788
implement aten.conv1d, aten.conv3d, and aten.conv_tbc (#2757)
convolution with [time,batch,channel] ordering, as opposed to the
default [batch, channel, time]. Currently implementing by transposing
the input and output, but may need to get its own implementation in the
future because this is supposed to be an op that gives a speedup. This
is used by fairseq
(https://github.com/facebookresearch/fairseq/issues/172).

(in case you were wondering like me, this is different from transposed
convolution. Transposed convolution has fractional strides).

---------

Co-authored-by: Xida Ren <xida.ren.dev@gmail.com>
Co-authored-by: Frederik Harwath <frederik.harwath@amd.com>
2024-01-23 21:30:03 -08:00
Franz Haniel b9806cfa38
[TorchToLinalg] Add lowering for torch.aten.diagonal (#2632) 2024-01-22 12:47:13 -05:00
John Wu 704cfdaf08
Add aten.pool_max3d support to torch-to-linalg (#2735)
Added verification logic to the abstract_interpreter_lib_gen.py

Also made some unit tests

Initially, I thought we can use `linalg::pooling_ndhwc_max` to help
implement this problem. However, on a 5-dimensional matrix it does the
pooling on dimensions (2, 3, 4) which is not what we want. We want
pooling on dimensions (3, 4, 5).

To achieve this, we would need to lower our code using the `linalg`
dialect.


Turns out the pooling code in `linalg` looks like this.

```
func @max_pooling_ncdhw(%I: memref<?x?x?x?x?xf32>, %K: memref<3xindex>, %O: memref<?x?x?x?x?xf32>,
                        %strides: memref<3xindex>, %dilations: memref<3xindex>) {
    %c0 = arith.constant 0 : index
    %c1 = arith.constant 1 : index
    %N = memref.dim %I, %c0 : memref<?x?x?x?x?xf32>
    %C = memref.dim %I, %c1 : memref<?x?x?x?x?xf32>
    %D = memref.dim %I, 2 : memref<?x?x?x?x?xf32>
    %H = memref.dim %I, 3 : memref<?x?x?x?x?xf32>
    %W = memref.dim %I, 4 : memref<?x?x?x?x?xf32>

    %kernel_d = memref.load %K[%c0] : memref<3xindex>
    %kernel_h = memref.load %K[%c1] : memref<3xindex>
    %kernel_w = memref.load %K[2] : memref<3xindex>
    %stride_d = memref.load %strides[%c0] : memref<3xindex>
    %stride_h = memref.load %strides[%c1] : memref<3xindex>
    %stride_w = memref.load %strides[2] : memref<3xindex>
    %dilation_d = memref.load %dilations[%c0] : memref<3xindex>
    %dilation_h = memref.load %dilations[%c1] : memref<3xindex>
    %dilation_w = memref.load %dilations[2] : memref<3xindex>

    linalg.generic {
        indexing_maps = [
            affine_map<(n, c, d, h, w, kd, kh, kw) -> (n, c, d * %stride_d + kd * %dilation_d, h * %stride_h + kh * %dilation_h, w * %stride_w + kw * %dilation_w)>,  // Map for input tensor
            affine_map<(n, c, d, h, w, kd, kh, kw) -> (kd, kh, kw)>,                                              // Map for kernel tensor
            affine_map<(n, c, d, h, w, kd, kh, kw) -> (n, c, d, h, w)>                                            // Map for output tensor
        ],
        iterator_types = ["parallel", "parallel", "parallel", "parallel", "parallel", "reduction", "reduction", "reduction"],
        doc = "3D Max Pooling NCDHW with Strides, Dilations, and Kernel Size"
    } ins(%I, %K : memref<?x?x?x?x?xf32>, memref<3xindex>) outs(%O : memref<?x?x?x?x?xf32>) {
        ^bb0(%input_elem: f32, %kernel_elem: index, %output_elem: f32):
            %max_val = arith.maxf %input_elem, %output_elem : f32
            linalg.yield %max_val : f32
    }
    return
}

```

This was implemented based on it's source code with the adjustments
mentioned above:

4ca1b5e094/mlir/include/mlir/Dialect/Linalg/IR/LinalgNamedStructuredOps.yaml (L5647)

Issues related to this can be found here

https://github.com/nod-ai/SHARK-Turbine/issues/324
2024-01-19 21:09:46 +05:30
Ilija Kalinić faa4517e83
Implement lowering of torch.aten.remainder.Tensor (#2763)
Closes nod-ai/SHARK-Turbine#349
2024-01-19 18:09:08 +05:30
Sungsoon Cho a8538e1e3f
Decompose AtenNormalFunctionalOp into AtenRandn* and other arithmetic. (#2737) 2024-01-15 22:49:29 -08:00
lonely eagle f85e5c932b
[Torch Dialect] support aten.isneginf, aten.isposinf, aten.nan_to_num (#2743) 2024-01-16 14:29:34 +08:00
James Newling f78ec78ac8
Adjust bound check to be the same as PyTorch native (i.e. stricter) (#2755)
prims.expand expects the start and end dimensions to be strictly less
than the rank of the tensor.
2024-01-15 11:44:45 -08:00
lisaliu1 09421b1cf3
[TorchToLinalg] Add lowering for aten.replication_pad2d (#2715)
Co-authored-by: Lisa Liu <lingl@xilinx.com>
2024-01-15 14:02:27 -05:00
Rob Suderman dc37616d67
[torch][quant] Support quantize and dequantize for torch (#2731)
Handle both `torch.dequantize` and `torch.quantize_per_tensor` including
the op based quantization parameter tracking. This includes adding
`qint32` to torch types as it was missing during the initial type
inclusion.

For testing we only have `torch.int8` and `torch.float` types on
function boundaries as the `qint8` types require passing the scale
and zero point quantization information which is not supported yet.
2024-01-12 19:11:14 -08:00
Ilija Kalinić e1a86e480a
Implement lowering of torch.aten.logit (#2697)
Closes nod-ai/SHARK-Turbine#290
2024-01-11 20:25:42 +05:30
Frederik Harwath 0860c41ee2 Implement aten.reflection_pad2d lowering to linalg 2024-01-10 21:32:22 -10:00
zjgarvey 07d0645f64
[RFC] general support for Adaptive Pooling Ops (#2661)
Adaptive pooling ops can only be decomposed into their non-adaptive
counterparts in trivial cases.

For example, the current decomposition for AtenAdaptiveAvgPool1dOp in
DecomposeComplexOps.cpp supports outSize = inSize (i.e., do literally
nothing), and outSize = 1 (i.e., do a batched average).

The reason adaptive pooling ops are difficult to lower to linalg is that
they are not constantly strided. They are computed by taking an input
tensor of shape (N, C, Hin), and an output size Hout, and computing the
output tensor at position (n,c, h) in the following way:

1. compute st(h) = (h*Hin)//Hout
2. compute en(h) = 1 + ((h+1)*Hin -1)//Hout
3. apply a computation (max or avg) to the slice: INPUT[n, c,
st(h):en(h)]

The provided sample implementation (for ConvertAtenAdaptiveAvgPool1dOp)
uses tensor.extract to access the input tensor inside the payload of a
linalg generic op. This is likely an unattractive use of linalg generic
ops, which is why I am asking for some more targeted feedback on the
validity of this approach before attempting to support the many other
adaptive pooling ops.

Specifically:

- Is the performance of this implementation bad enough to warrant
targeting different dialects entirely? e.g. TMtensor/linalg ext/ etc.
- If the provided implementation is of acceptable performance to the
community, then is it permissable to remove the Adaptive pooling
decompositions from DecomposeComplexOps.cpp? Based on the current
structure of the -torch-decompose-complex-ops pass, it does not seem
possible to only decompose the adaptive ops in special cases (it seems
to get stuck in an infinite loop on a match failure). I would be happy
to instead incorporate the case logic into the conversion directly, and
remove the decompositions once they are rendered completely obsolete.

As long as this approach is acceptable, I can clean up the
implementation with some helper functions, and quickly add support for
each of the remaining Adaptive pooling ops.
2024-01-09 11:14:10 -08:00