torch-mlir/docs/long_term_roadmap.md

251 lines
13 KiB
Markdown

# Long-Term Roadmap for Torch-MLIR
## Overview
Latest update: 2022Q4
Torch-MLIR is about one year old now, and has successfully delivered a lot of
value to the community. In this document we outline the major architectural
changes that will make Torch-MLIR more robust, accessible, and useful to the
community on a 1-2 year timeline.
First, let's recap the goals of Torch-MLIR.
Technically, the goal of Torch-MLIR is to bridge the PyTorch and MLIR
ecosystems. That's vague, but it captures a very important property: Torch-MLIR
is not in the business of "innovating" either on the frontend or backend sides.
The project scope is to be an enabling connector between the two systems.
Non-technically, Torch-MLIR's goal is not to be an end-to-end product, but a
reliable piece of "off the shelf" infrastructure that system designers use as
part of their larger end-to-end flows. The main users are expected to be
"integrators", not end-users writing Python. This has the following facets:
- Community: Users of Torch-MLIR should feel empowered to participate in the
community to get their questions resolved, or propose (and even implement)
changes needed for their use cases.
- Ecosystem alignment: Users of Torch-MLIR should feel that the project is
aligned with all of the projects that it collaborates with, making it safe to
bet on for the long term.
- Ease of use: Users of Torch-MLIR should feel that it "Just Works", or that
when it fails, it fails in a way that is easy to understand, debug, and fix.
- Development: Torch-MLIR should be easy and convenient to develop.
Today, much of the design space and the main problems have been identified, but
larger-scale architectural and cross-project changes are needed to realize the
right long-term design. This will allow us to reach a steady-state that best
meets the goals above.
## The main architectural changes
As described in
[architecture.md](architecture.md),
Torch-MLIR can be split into two main parts: the "frontend" and the "backend".
The main sources of brittleness, maintenance cost, and duplicated work across
the ecosystem are:
- The frontend work required to lower TorchScript to the backend contract.
- The irregular support surface area of the large number of PyTorch ops across
the Linalg, TOSA, and MHLO backends.
Most of this document describes long-term ecosystem changes that will address
these, drastically improving Torch-MLIR's ability to meet its goals.
## Roadmap
### Refactoring the frontend
The primary way to make the frontend more reliable is to leverage new PyTorch
infrastructure that bridges from the PyTorch eager world into compiler-land.
PyTorch has two main projects that together cover almost all user use cases and
provide a technically sound, high quality-of-implementation path from user
programs into the compiler.
- [TorchDynamo](https://github.com/pytorch/torchdynamo) - TorchDynamo uses
tracing-JIT-like techniques and program slicing to extract traces of tensor
operations, which can then be passed to lower-level compilers. It works
seamlessly with unmodified user programs.
- [FuncTorch](https://github.com/pytorch/functorch) - FuncTorch is basically JAX
for PyTorch. It requires manual program tracing and slicing, but that is
actually important for users since it gives them direct control over various
important transformations, such as `grad` and `vmap`.
These are both being heavily-invested-in by PyTorch core developers, and are
generally seen as the next generation of compiler technology for the project,
blending PyTorch's famous usability with excellent compiler integration
opportunities. Torch-MLIR works with these technologies as they exist today, but
significant work remains to enable wholesale deleting the high-maintenance parts
of Torch-MLIR. In the future, we expect the block diagram of Torch-MLIR to be
greatly simplified, as shown in the diagram below. Note that in the "Future"
side, PyTorch directly gives us IR in a form satisfying the backend contract.
![Roadmap of the frontend](images/roadmap_frontend.png)
The primary functional requirement of Torch-MLIR which remains unaddressed by
today's incarnation of TorchDynamo and FuncTorch is the support for dynamic
shapes. PyTorch core devs are
[heavily investing](https://dev-discuss.pytorch.org/t/state-of-symbolic-shapes-branch/777)
in this area, and both TorchDynamo and FuncTorch are being upgraded as PyTorch
rolls out its new symbolic shape infrastructure.
Smaller blockers are related to general API stability and usability of the
various pieces of PyTorch infra.
These blockers are expected to be addressed by the PyTorch core devs over time.
Torch-MLIR's role here is to communicate our requirements to PyTorch core and
align their roadmap and ours. We do this by maintaining connections with the
PyTorch core developers and being "good-citizen power users" of their latest
technology (i.e. trying things out, surfacing bugs, providing feedback, etc.).
Note: Because both TorchDynamo and FuncTorch are TorchFX-based, we could write a
direct TorchFX -> MLIR importer, and delete the TorchScript importer. This would
remove the need for Torch-MLIR to build its own custom Python extension --
Torch-MLIR would be a pure-Python user of the standard MLIR Python bindings.
There is no immediate rush for this though, since TorchFX can be converted to
TorchScript (this may become lossy as the dynamic shape support in PyTorch gets
more advanced).
### Refactoring the backend
Today in Torch-MLIR, we support 3 backends out of the box: Linalg-on-Tensors,
TOSA, and MHLO. These backends take IR in the backend contract form (see
[architecture.md](architecture.md)) and lowers them to the respective dialects.
Today, each backend is implemented completely independently. This leads to
duplication and irregularity across the backends.
Moving forward, we would like for the backends to share more code and for their
op support to be more aligned with each other. Since the backend contract today
includes "all" of PyTorch's operators, it is very costly to duplicate the
lowering of so many ops across backends. Additionally, there are 3
forward-looking efforts that intersect with this effort:
- [StableHLO](https://github.com/openxla/stablehlo) - this is a dialect
initially forked from MHLO which intends to create a stable support surface
area for what today is our "at head" dependency on MHLO. MHLO is a fairly
complete op set, so it is very attractive to have "almost all" models
bottleneck through a stable interface like StableHLO. StableHLO is currently
under relatively early development, but already delivers on many of the goals
of stability.
- [TCP](https://github.com/llvm/torch-mlir/issues/1366) - this is a dialect
which could serve a role very similar to MHLO, while providing community
ownership. TCP is still in early planning phases, but there is strong
alignment with the StableHLO effort. One byproduct of TCP that is expected to
be very valuable is to incorporate the robust dynamic shape strategy from
Linalg into an MHLO-like dialect, and there is a strong desire from StableHLO
developers to adopt this once proven in TCP.
-
[PrimTorch](https://dev-discuss.pytorch.org/t/tracing-with-primitives-update-0/577) -
this is an effort on the PyTorch side to decompose PyTorch operators into a
smaller set of primitive ops. This effort could effectively reduce the op
surface area at the Torch-MLIR level a lot, which would make the duplication
across backends less of an issue. But it still leaves open a lot of
questions, such as how to control decompositions.
This is overall less important than the frontend refactor, because it is "just
more work" for us as Torch-MLIR developers to support things in the current
infrastructure, while the frontend refactor directly affects the user
experience.
As the above efforts progress, we will need to make decisions about how to adopt
the various technologies. The main goal is consolidating the bottleneck point
where the O(100s-1000s) of ops in PyTorch are reduced to a more tractable O(100)
ops. There are two main ways to accomplish this:
- Future A: We concentrate the bottleneck step in the "Backend contract ->
StableHLO/MHLO/TCP" lowering path. This gets us a stable output for most
things. The cascaded/transitive lowerings then let us do O(100) lowerings from
then on down. (exact details are not worked out yet, and depend on e.g. TCP
adoption, etc.)
- Future B: PrimTorch concentrates the bottleneck step on the PyTorch side.
These two efforts synergize, but the need for cascaded lowerings is much less if
PrimTorch solves the decomposition problem on the PyTorch side.
![Roadmap of the backend](images/roadmap_backend.png)
One of the main blockers for doing cascaded lowerings today is the irregular
support for dynamic shapes across TOSA and MHLO. MHLO is much more complete, but
the use of `tensor<Nxindex>` to model shapes results in brittleness of the
system. A dynamic shape model like that being adopted in TCP (and presumably
StableHLO in time) would simplify this. Hence TCP is strategically important for
proving out a design for a "dynamically shaped MHLO-like thing" that doesn't
have this drawback.
### Tools for advanced AoT deployments
PyTorch's future direction is towards TorchDynamo and FuncTorch, which are
tracing-based systems. This means that they inherently struggle to capture
control flow and non-tensor computations. Many deployments, especially
Ahead-of-Time compiled ones such as for edge, require non-tensor computations.
It is extremely costly for people deploying such models to manually stitch
together graphs of traced functions with custom per-model code with existing
tools, and it is also very error-prone. We are awaiting movement on this front
from the PyTorch core team. There is some inspiration from systems like
[IREE-JAX](https://github.com/iree-org/iree-jax) in the JAX space for how to do
this, but ultimately this will depend on what the PyTorch core team decides on
for edge deployments. It is our responsibility to stay connected with them and
make sure that what they are building suits our needs.
### Project Governance / Structure
Torch-MLIR is currently an
[LLVM Incubator](https://llvm.org/docs/DeveloperPolicy.html#incubating-new-projects).
This has had the advantage of being organizationally close to MLIR Core.
However, the long-term direction is likely for Torch-MLIR to live under the
PyTorch umbrella, for a few reasons:
- As discussed in the other parts of this document, the long-term direction is
for Torch-MLIR to be a quite thin component, with much of the code being
obsoleted by infra in PyTorch core.
- The move towards more stable backend output formats will generally reduce
variance on the MLIR side. This means that MLIR will be the "more frozen" of
the two major dependencies (PyTorch and MLIR).
- We would like Torch-MLIR to be hooked into the PyTorch CI systems, and
generally be more tightly integrated with the PyTorch development process
(this includes things like packaging as well).
### Co-design
Many users of MLIR are developing advanced hardware or software systems, and
often these require information from the frontend beyond what PyTorch
provides today. Torch-MLIR should always be a "follower" of the features
available in the frontends and backends it connects to. We want to enable
co-design, of course, but new features such as quantization, sparsity,
distribution, etc. should be viewed from the lens of "the frontend can give us
X information, the backend needs Y information -- how do we connect
them?".
To satisfy those needs, we want to focus on existing extensibility mechanisms in
the frontend rather than inventing new ones. We intend to explore using existing
frontend concepts, such as
[custom ops](https://github.com/llvm/torch-mlir/issues/1462), to enable this
co-design.
If it proves to be absolutely necessary to add new concepts to the frontend
(e.g. new data types), it should be considered very carefully since supporting
such features is a major scope increase to the Torch-MLIR project. It is likely
to be better done in a separate project, with a carefully thought-out
integration with Torch-MLIR that avoids putting the maintenance burden on the
side of Torch-MLIR for the exploratory new frontend concept.
### LazyTensorCore support in Torch-MLIR
Today, Torch-MLIR supports LazyTensorCore. But as mentioned
[here](https://dev-discuss.pytorch.org/t/skipping-dispatcher-with-lazytensor/634/2?u=_sean_silva),
on the 1-2yr time horizon LTC will be more an implementation detail under
TorchDynamo for users that already have compilers written using LTC. That is,
LTC is basically just a way to convert a TorchDynamo FX graph into LTC graphs,
for users that have toolchains written against LTC graphs. But that won't make
much technical sense for Torch-MLIR, because we convert to MLIR in the end no
matter what. That is, in the future going
`TorchDynamo FX graph -> LTC Graph -> MLIR` can just be replaced by the direct
`TorchDynamo FX graph -> MLIR path`. So in the 1-2yr time horizon, LTC will not
make technical sense in Torch-MLIR.
There will still be non-technical blockers, such as if end-users have
`device='lazy'` hardcoded into their code. That will require a migration plan
for current LTC-based toolchains onto TorchDynamo. This migration will improve
the end-user experience since TorchDynamo is more seamless, but it is a
end-user-impacting migration nonetheless and we will want to phase it
appropriately with the community.