torch-mlir/frontends/pytorch
Sean Silva 2efda323ff Significantly restructure torch/aten import design.
This is a really major and invasive restructuring of the way we get
torch operators (`torch::jit::Operator` / `c10::OperatorHandle`) into
MLIR. Please forgive the challenging review, but due to the sheer
invasiveness, it wasn't really practical do do it in sane smaller
pieces.

This fully replaces everything that was already working on the
TorchScript path (actually, more -- we added tanh support to
TorchToLinalg in order to delete the older code paths). Additionally,
I've kept the lights on for the acap path too, including what little e2e
stuff was working before (for expediency I made a few tiny compromises
along the way that will be easy to undo when we give that path proper
attention).

Overview of the new design:
- The torch operator `somens::someunqualname.someoverloadname` is
  imported as `torch.somens.someunqualname.someoverloadname` (skip the
  last dotted part if the overload name is empty), OR, if we don't have
  such an op registered, it is imported as
  `torch.operator "somens.someunqualname.someoverloadname" (...) : ...`.
  - The addition of the "overload name" is a critical element here, as
    the `(ns,unqual,overload)` triple is unique, which solves a lot of
    problems we were having.
  - This involves having separate MLIR ops for the `trailing_` and
    `.out` variants and all the different overloads. This seemed
    necessary, because the set of overloads is so wild and varied and
    unstructured. The previous design was leaning into some underlying
    structure that just isn't there -- the default situation is
    the "random overload that we want to manage on the MLIR side",
    rather than that being an exception. E.g.  `aten::ne` (not-equal)
    has 21 overloads, only 4 of which are c10 dispatcher ops see
    [gist](https://gist.github.com/silvasean/190ba918c550c956260e21254e1b8aa1),
    and the "out" variant is really called `.Tensor_out` instead of
    `.out` as it frequently is for other ops.
  - Rationale for all being in `torch` namespace: the set of operators
    are so varied and unstructured that "dialect per namespace"
    doesn't result in anything resembling the typical MLIR dialect
    boundary expectations. We could maybe draw the boundary at
    dispatcher ops vs non-dispatcher ops, but that doesn't seem to
    really result in very much useful structure at this point in time.
  - Note: within the torch operator registry, we effectively have a
    mini-basicpy subdialect (already type-resolved), which is reasonably
    structured.
  - The existing Torch op interfaces are also removed -- now that we
    track the overload name, we can losslessly find the original
    operator.
- Instead of `ATenRecognizeKernelsPass`, we now have a
  `ReduceOpVariantsPass` that keys off certain traits (and perhaps
  eventually interfaces) to reduce variants of ops to a smaller set,
  ideally operating on immutable tensors and using surrounding ops to
  model the mutability/aliasing aspects.
  - Note: `torch.ns.unqual.overload` ops allow both immutable and
    mutable tensors (unlike the previous hard distinction in the common
    case). This is a premonition for a future change that will introduce a
    bona fide `!torch.tensor` type that will clean up a bunch of stuff.
- `TorchToLinalg` / `TorchToStd` supercede the existing
  "ATen->TCF->TCP->Linalg" path.
- The new `torch_ods_gen.py` supercedes `torch_signature_ods_gen.py`.
  It should look somewhat familiar, but the benefit of hindsight has
  allowed a lot of simplifications.

The overall trend seems to be to make the `torch` dialect a nice layer
independent of anything else. It feels like as a natural result of
various future changes we will be removing the reliance on basicpy+numpy
dialects and have a nice self-contained type system too that properly
models the TorchScript type system (including proper subtyping,
mutable/immutable tensors, optional dtype, etc.).

Recommended review order:
- Start at some of the new import IR, e.g. in
  `frontends/pytorch/test/node_import/prim.py`,
  `frontends/pytorch/test/acap_export/test_export_add3.py`, and other
  tests.
- `frontends/pytorch/python/torch_mlir_utils/codegen/torch_ods_gen.py`
  and associated generated files:
  - `include/npcomp/Dialect/Torch/IR/GeneratedAtenOps.td`
  - `include/npcomp/Dialect/Torch/IR/GeneratedPrimOps.td`
- Inspect `ReduceOpVariants.cpp` / `reduce-op-variants.mlir` and the new
  traits in `include/npcomp/Dialect/Torch/IR/TorchTraits.h`
- Various code changes in the import path in
  `frontends/pytorch/csrc/builder`. Probably most interesting is the new
  code in `torch_to_mlir_utils.cpp` that has the logic to create the
  `torch.operator` ops or `torch.ns.unqual.overload` ops.

This is the [new ResNet IR](https://gist.github.com/silvasean/5407aafb710d07612b7b5b92eabecebe),
just to be able to look at a substantial sample of IR in the new style.
2021-05-19 13:37:39 -07:00
..
csrc Significantly restructure torch/aten import design. 2021-05-19 13:37:39 -07:00
docs Add design sketch for aten fallback. 2020-11-24 18:13:35 -08:00
e2e_testing/torchscript Constant fold through basicpy.bool_cast. 2021-04-30 10:57:02 -07:00
examples Add npcomp-verify-backend-contract pass. 2021-04-20 12:00:35 -07:00
python Significantly restructure torch/aten import design. 2021-05-19 13:37:39 -07:00
test Significantly restructure torch/aten import design. 2021-05-19 13:37:39 -07:00
utils [cleanup] Put the root class type for exportPath first. 2021-04-01 18:40:03 -07:00
CMakeLists.txt Delete old PyTorch 1.3 type dispatch oriented code paths. 2020-11-12 22:27:05 -08:00
LICENSE Add pytorch interface to ATen Dialect (#30) 2020-08-21 11:22:47 -07:00
README.md Update README. 2021-03-30 11:33:33 -07:00

README.md

NPComp - PyTorch frontend integration

This directory contains optional components for interfacing PyTorch to NPComp. Integration is targeted at multiple levels:

  • Via program capture with a ATen pseudo-device.
  • Via IR-level integration with PyTorch (via tracing or scripting interfaces).
  • Interfaces to facilitate checking against reference implementations and verification.

In all situations, the target dialects are maintained in the outer project, along with their lowerings to common intermediate dialects and backends. This directory should be purely about interfacing with the PyTorch/LibTorch components for extracting and executing programs.

The code in this directory is intended to integrate tightly with pytorch, and follows the code style for pytorch. See the overall documentation for frontends for further details about code layout and integration philosophy. In particular, this directory exists to provide a working frontend to an MLIR based pytorch compilation flow and is not intended to be contributed to the LLVM monorepo. If the project is successful, it makes more sense to either break it out as an independent project that depends on LLVM/MLIR/npcomp or contribute it upstream to PyTorch. However, as it will be quite some time before the components are in a state to support such a dependency, it is being carried in-tree in the interim.

Program capture with a ATen dispatch capture.

Integration with a pseudo-device is typified by code like the following:

import torch
import torch_mlir

lhs = torch.rand(2, 3)
rhs = torch.rand(3, 4)

mb = torch_mlir.ModuleBuilder()
with mb.capture_function("mm", [lhs, rhs]) as f:
  result = torch.mm(lhs, rhs)
  f.returns([result])

mb.module.operation.print()

All operations that happen under the mb.capture_function context manager are intercepted via PyTorch's dispatcher, and an IR graph is constructed into the module held by the ModuleBuilder.

This technique has several advantages and disadvantages. For training use cases, this technique generates a backward path automatically using the same method that pytorch natively uses. The resulting graph also tends to be simpler, since it will not reflect conditionals in the original python code. Lastly, it is natural if MLIR is being used as a frontend target for an actual device of some sort. In this case, the MLIR could go through a device-specific lowering path and the resulting code run on a device. The implementation of this technique is largely modeled after pytorch/xla.