Prior to this patch, the torch dialect included `AtenTriuOp` for
computing the upper triangular part of the input matrix, but there was
no code for lowering the op to the linalg dialect.
This patch adds code to generate a `linalg.generic` operation that
compares indices (computed using `linalg.index`) to choose between zero
or the original value (using `arith.select`). The lowering fails if the
number of dimensions are less than two. This patch also adds a few
end-to-end tests.
This commit adds lowering of `aten.div.Tensor_mode` op.
This commit also fixes formatting for the test file elementwise.py.
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
When compiling without assertions (i.e. in `NDEBUG` mode), a handful of
statements turn to NOPs, which results in warnings such as missing
return statement or unused variables and function. This patch replaces
such statements with `llvm_unreachable()`, which informs the compiler
about program termination regardless of the `NDEBUG` mode. This also
enables torch-mlir to be compiled using the flags `-Wall`, `-Wextra`,
`-Wpedantic`, and `-Werror`.
This patch adds support for the torch.linalg.vector_norm op to the torch
dialect, including the necessary shape function. It also extends the
conversion of reduction operators to support lowering of
AtenLinalgVectorNormOp, in addition to adding a handful of end-to-end
tests to validate the lowering.
There exist several opportunities to make this lowering optimal and
robust. For instance, in its current form, the translation does not
support ord = 0, +inf, or -inf. For L1 norms, we don't need to raise
each element to the power 1.0. Similarly, L2 norms could benefit from
strength reduction. Since the canonicalization pass is not able to
apply these optimizations, we should consider applying them during the
linalg lowering itself.
This commit adds lowering of `aten.masked_fill.Scalar` op.
This commit also fixes the formatting of the file constant_alloc.py.
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
* shape: add shape transfer function for aten.neg
Prior to this patch, the list of shape transfer functions did not
include `aten.neg`, which resulted in errors like below.
```
error: unsupported by backend lowering: tensor with unknown rank or dtype
note: see current operation: %0 = "torch.aten.neg"(%arg0) :
(!torch.vtensor<[256,256],f32>) -> !torch.vtensor<*,f32>
note: this is likely due to a missing shape transfer function in shape_lib_gen.py
```
This patch fixes the problem by adding a shape transfer function to
reflect the point-wise nature of this operation.
* linalg: add translation of aten.neg operation
This patch adds a translation rule to lower `aten.neg` operations on
tensors to an `arith.negf` operation wrapped inside a `linalg.generic`
operation. This patch also adds a rudimentary test.
This commit adds the following support to the op `nll_loss_backward`:
- `input` tensor can be rank-1
- `weight` parameter
- `reduction` parameter
- `target`, `grad_output`, `total_weight` can be rank-0
- Checks that input tensors are of the expected type
In order to make sure that the TorchToLinalg conversions leave the
graph in a valid state, the final result of the conversion has to be
casted to the result type of the op. This commit adds this cast to ops
that did not have it.
This commit fixes the 2nd and 3rd return types of the `aten.native_layer_norm`.
Previously the mean and rSTD were returned with reduction dims removed.
This commit fixes this and keeps the reduction dims of the results.
Signed-Off-By: Prateek Gupta <prateek@nord-labs.com>
The term "pseudo" is very vague and was getting confusing (I felt I had
to explain it in every comment referencing it). Instead, rework the
"pseudo" ops to instead be named:
- MLIR Syntax: `torch.valsem.*`
- C++ / ODS: `ValsemVariant*Op`
This makes it clear what the concept is, and avoids confusion with other
things that might be called "pseudo", since these are very specific and
should be 100% consistently named w.r.t. the non-valsem-variant ops that
they correspond to.
This helps keep things organized and also exposes more parallelism to
the build system. It seems though that most of the compile time is
actually spent in the headers though, so the wall time doesn't decrease
as much as I had hoped (and now that the headers are being included
multiple times, the cpu time actually increases a lot, sadly -- will try
to dig into this).