The issue was in the canonicalizer for torch.aten.ge.int -- in cases
where the operands were swapped, it would miscompile. This issue is
fixed and folding support generalized to `torch.aten.size.int < 0` as
well.
Fixes#716
This commit decomposes different variants of `aten.where.*` op into
`aten.where.Self` op. It covers `aten.where.Scalar`,
`aten.where.ScalarSelf` and `aten.where.ScalarOther` ops.
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
This commit decomposes `aten.new_empty` op into `aten.empty.memory_format` op.
This commit also made a dtype fix to the constant tensor allocation like ops.
Earlier the dtype for the result was inferred from the result type; now, it's
being evaluated as per the original definition of the op.
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
A recent PyTorch commit made ConstantPad2d call a helper function with a
`Union[int, float]` type annotated. This commit adds minimal support for
representing and dealing with that.
https://github.com/pytorch/pytorch/pull/73287
Changes:
- Adding support for `!torch.union<T1, T2, T3>`/`Torch::UnionType`,
along with the importer and CAPI code.
- Add support in isValidSubtype for union types.
- Adding a canonicalizer for `torch.derefine` to help simplify some code
that derefines to a UnionType (this also fixes#664).
There is still more work to do for really supporting UnionType well,
such as canonicalizing UnionType's so that they can be compared with
pointer equality.
- This commit adds decomposition of `aten.dropout` op. It also covers the
training mode of the same op.
- It also adds lowering of `aten.sub.float` op.
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
The `assemblyFormat` stuff (which generates unrolled, per-op C++ code)
was taking up a lot of compile time, and all the ops are essentially
printed with the same logic. So this PR makes them all call the same
helper function. This is done by using
`let hasCustomAssemblyFormat = 1` and then implementing `FooOp::parse`
and `FooOp::print`.
Additionally, the `Generated*Ops.td` files are all collapsed into just
`GeneratedTorchOps.td` (there is no reason to have the files separate,
since the files are very large anyway so one is always having to search
within them -- editors don't care that the file to search is now a bit
bigger :) ).
This reduces TorchOpsODSGenerated.cpp compile time (which is now
GeneratedTorchOps.cpp) from 39 to 31 seconds on my machine. This is
actually less than I expected, but this PR is an overall cleanup to the
code anyway. The next step will be to introduce (better) functionality
upstream for sharding the TorchOps.cpp.inc file, so that we can truly
parallelize the O(#ops) costs. This is also necessary, because after
this PR, TorchDialect.cpp is now the slowest file to compile, due to the
`addOperations<... all the ops ...>` call, which needs to be shareded
too.
This commit adds the op `ValsemVariantAtenCopyOp` that represents
`AtenCopy_Op` without the underscore. This is needed to make sure
that the `ReduceOpVariants` pass turns the in-place op into an op
that takes value tensors as inputs, otherwise the
`MaximizeValueSemantics` pass will not be able to add value
semantics correctly.
This commit also adds the lowering of `ValsemVariantAtenCopyOp`.
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
This commit fixes the 2nd and 3rd return types of the `aten.native_layer_norm`.
Previously the mean and rSTD were returned with reduction dims removed.
This commit fixes this and keeps the reduction dims of the results.
Signed-Off-By: Prateek Gupta <prateek@nord-labs.com>
This commit adds the op `ValsemVariantAtenIndexPutImplOp` that represents
`Aten_IndexPutImpl_Op` without the underscore. This is needed to
make sure that the `ReduceOpVariants` pass turns the in-place op
into an op that takes value tensors as inputs, otherwise the
`MaximizeValueSemantics` pass will not be able to add value
semantics correctly.
This commit also adds the lowering of `ValsemVariantAtenIndexPutImplOp` op.
This commit also updates the `torch.bincount` op test cases.
The term "pseudo" is very vague and was getting confusing (I felt I had
to explain it in every comment referencing it). Instead, rework the
"pseudo" ops to instead be named:
- MLIR Syntax: `torch.valsem.*`
- C++ / ODS: `ValsemVariant*Op`
This makes it clear what the concept is, and avoids confusion with other
things that might be called "pseudo", since these are very specific and
should be 100% consistently named w.r.t. the non-valsem-variant ops that
they correspond to.
See the documentation in `docs/shape_lib.md` and
`docs/adding_a_shape_function.md` for an overview of the system.
This completely overhauls how we represent shape functions. In
particular, RefineTypes does not infer shapes anymore (only dtypes).
Shape functions are now written in (TorchScript'able) Python.
Recommended review order:
1. Read `docs/shape_lib.md` and `docs/adding_a_shape_function.md`.
1. Code and tests for ReifyShapeCalculations, DropShapeCalculations.
1. Code and tests for SimplifyShapeCalculations.
1. shape_lib_gen.py
1. Code and tests for new RefineTypes pass.
1. Random folders/canonicalizers in TorchOps.cpp and associated test in
`canonicalize.mlir`.
1. New ReadOnly trait inferred from the registry.
1. Any miscellaneous remaining stuff.
Example `-print-ir-after-all` for ElementwiseUnaryModule:
[IR lowering dump](https://gist.github.com/silvasean/e4dc8cbc8d00aac7819602e3cbd8e212).
Example `-print-ir-after-all` for ElementwiseBinaryModule:
[IR lowering dump](https://gist.github.com/silvasean/daf6860ecced732af3568af6b1899113).
This pass is added to lower ops, which can not be lowered
via the TorchToLinalg pass, such as `torch.bincount` op.
This pass also uses torch-mlir's TMTensor Dialect to lower the
complex ops.
Also add torch.bincount op lowering with the help of TMTensor dialect
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
- This commit adds E2E support for `aten.rand_like` and
`aten.bernoulli_.Tensor` ops.
- The `aten.bernoulli(x)` was implemented as:
`aten.bernoulli(x) = rand_like(x) < 0.5`, assuming 0.5 as default
probability, whereas according to the pytorch documentation:
https://pytorch.org/docs/stable/generated/torch.bernoulli.html#torch.bernoulli
the input x in `aten.bernoulli(x)` is itself a tensor containing
probabilities to be used for drawing the binary random number.
- So this commit fixes the `aten.bernoulli(x)` implementation as:
`aten.bernoulli(x) = rand_like(x) < x`.
- It also fixes the case where the input to `aten.bernoulli_.float` is
an integer tensor. In this case the input must be casted to float type
before passing it as operand to `aten.rand_like` op.
`aten.bernoulli_.float(x, p) = rand_like(float(x)) < p`.
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
This commit adds the op `PseudoAtenFillScalarOp` that represents
`AtenFill_ScalarOp` without the underscore. The approach is the same
as in commit dd998fa4d4.
Adding this op allows for a simpler and more consistent version of the
`empty` and `empty_like` op e2e tests.
- This commit adds lowering of `aten.le.Scalar` and `aten.ge.Scalar` ops
as a part of `convert-torch-to-linalg` pass.
- It also creates a new test script `elementwise_comparison.py` for all
element-wise comparison ops.
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
- This commit adds lowering of `aten.Bool.Tensor` and
`aten.Float.Tensor` op as a part of `convert-torch-to-linalg` pass.
- It also adds support for returning bool types.
- It also fixes lowering of the `aten.Int.Tensor` op for non-zero rank
input tensors.
- If a scalar number is converted to a 0-d tensor and passed on to the
`aten.Float.Tensor` op, it folds to the scalar number.
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
Prior to this commit, importing a `prim::Constant` node with list type would result in an error since it was not supported. `ivalue_importer::importIValue` was modified to return the MlirValue corresponding to the root so its parent operation could be extracted.
- This commit adds support for `aten.native_batch_norm` operation.
- The current implementation only supports inference mode of
`aten.native_batch_norm` op.
Signed-Off-By: Gaurav Shukla <gaurav@nod-labs.com>
The lowering of aten::nll_loss_backward op has been added
from torch to linalg dialect. The changes has been made as
a part of -torch-convert-to-linalg pass.
Signed-off-by: Prashant Kumar prashant@nod-labs.com
This PR include the following pieces:
- Add torch `Generator` type. `Generator` type is converted to i64 in
refbackend type converter.
- Add seed managment support for the default global generator.
`torch_c.getNextSeed` op is used to get the seed. On refbackend, the
`torch_c.getNextSeed` is lowered to load/store from [0] of global
variable `default_generator` memref<i64> in `InsertRngGlobals` pass.
- Add `aten.uniform_` and testing as an example op for RNG ops. Add
`torch.pseudo.aten.uniform` op. It has the same operands and return as
the `aten.uniform_` from the op registry except for value semantics.
The added e2e maxpool testcase from #545 was not getting a static shape
due to an unfolded prim.If when RefineTypes was called. This was because
of unfolded torch.iaten.__is__ and torch.prim.unchecked_cast operators
with torch.derefine operands.