torch-mlir/frontends/pytorch
Sean Silva 453e29ea05 Add E2E support for tests with heavy dependencies (heavydep tests).
The tests use the same (pure-Python) test framework as the
normal torchscript_e2e_test.sh, but the tests are added in
`build_tools/torchscript_e2e_heavydep_tests` instead of
`frontends/pytorch/e2e_testing/torchscript`. Any needed dependencies can
easily be configured in generate_serialized_tests.sh.

We add an initial machine translation model with a complex set of
dependencies to seed the curriculum there. I verified that this model
gets to the point of MLIR import (it fails there with a segfault due to
not being able to import the "Any" type).

This required moving a few files from the `torch_mlir` Python module
into multiple modules to isolate the code that depends on our C++
extensions (which now live in `torch_mlir` and
`torch_mlir_torchscript_e2e_test_configs`) from the pure Python code
(which now lives in `torch_mlir_torchscript`). This is an entirely
mechanical change, and lots of imports needed to be updated.

The dependency graph is:
```
       torch_mlir_torchscript_e2e_test_configs
                  /              |
                 /               |
                /                |
               V                 V
torch_mlir_torchscript       torch_mlir
```

The `torch_mlir_torchscript_e2e_test_configs` are then dependency-injected
into the `torch_mlir_torchscript` modules to successfully assemble a
working test harness (the code was already structured this way, but this
new file organization allows the isolation from C++ code to actually
happen).  This isolation is critical to allowing the serialized programs
to be transported across PyTorch versions and for the test harness to be
used seamlessly to generate the heavydep tests.

Also:
- Extend `_Tracer` class to support nested property (submodule) accesses.

Recommended review order:
- "user-level" docs in README.md
- code in `build_tools/torchscript_e2e_heavydep_tests`.
- changes in `torch_mlir_torchscript/e2e_test/framework.py`
- misc mechanical changes.
2021-08-03 14:09:56 -07:00
..
cmake/modules Rework the python build to a static assembly of MLIR+NPCOMP (#251) 2021-07-27 16:10:10 -07:00
csrc Bump llvm-project to 5b2e7f50a6798fd9b9c79d9d62fdebcd9e78525b. (#260) 2021-07-29 12:26:54 -07:00
docs Add design sketch for aten fallback. 2020-11-24 18:13:35 -08:00
e2e_testing/torchscript Add E2E support for tests with heavy dependencies (heavydep tests). 2021-08-03 14:09:56 -07:00
examples Add an e2e test example for Resnet18 2021-07-30 11:44:44 -04:00
python Add E2E support for tests with heavy dependencies (heavydep tests). 2021-08-03 14:09:56 -07:00
test Add E2E support for tests with heavy dependencies (heavydep tests). 2021-08-03 14:09:56 -07:00
utils [cleanup] Put the root class type for exportPath first. 2021-04-01 18:40:03 -07:00
.gitignore Build packages for npcomp-torch. 2021-07-29 19:58:59 -07:00
CMakeLists.txt Rework the python build to a static assembly of MLIR+NPCOMP (#251) 2021-07-27 16:10:10 -07:00
LICENSE Add pytorch interface to ATen Dialect (#30) 2020-08-21 11:22:47 -07:00
README.md Update README. 2021-03-30 11:33:33 -07:00
setup.py Add E2E support for tests with heavy dependencies (heavydep tests). 2021-08-03 14:09:56 -07:00

README.md

NPComp - PyTorch frontend integration

This directory contains optional components for interfacing PyTorch to NPComp. Integration is targeted at multiple levels:

  • Via program capture with a ATen pseudo-device.
  • Via IR-level integration with PyTorch (via tracing or scripting interfaces).
  • Interfaces to facilitate checking against reference implementations and verification.

In all situations, the target dialects are maintained in the outer project, along with their lowerings to common intermediate dialects and backends. This directory should be purely about interfacing with the PyTorch/LibTorch components for extracting and executing programs.

The code in this directory is intended to integrate tightly with pytorch, and follows the code style for pytorch. See the overall documentation for frontends for further details about code layout and integration philosophy. In particular, this directory exists to provide a working frontend to an MLIR based pytorch compilation flow and is not intended to be contributed to the LLVM monorepo. If the project is successful, it makes more sense to either break it out as an independent project that depends on LLVM/MLIR/npcomp or contribute it upstream to PyTorch. However, as it will be quite some time before the components are in a state to support such a dependency, it is being carried in-tree in the interim.

Program capture with a ATen dispatch capture.

Integration with a pseudo-device is typified by code like the following:

import torch
import torch_mlir

lhs = torch.rand(2, 3)
rhs = torch.rand(3, 4)

mb = torch_mlir.ModuleBuilder()
with mb.capture_function("mm", [lhs, rhs]) as f:
  result = torch.mm(lhs, rhs)
  f.returns([result])

mb.module.operation.print()

All operations that happen under the mb.capture_function context manager are intercepted via PyTorch's dispatcher, and an IR graph is constructed into the module held by the ModuleBuilder.

This technique has several advantages and disadvantages. For training use cases, this technique generates a backward path automatically using the same method that pytorch natively uses. The resulting graph also tends to be simpler, since it will not reflect conditionals in the original python code. Lastly, it is natural if MLIR is being used as a frontend target for an actual device of some sort. In this case, the MLIR could go through a device-specific lowering path and the resulting code run on a device. The implementation of this technique is largely modeled after pytorch/xla.