mirror of https://github.com/llvm/torch-mlir
[docs] Add long-term roadmap
Add a roadmap covering expected project evolution over the next 1-2 years.pull/1549/head snapshot-20221102.645
parent
031d127940
commit
2162253401
File diff suppressed because it is too large
Load Diff
Binary file not shown.
After Width: | Height: | Size: 242 KiB |
File diff suppressed because it is too large
Load Diff
Binary file not shown.
After Width: | Height: | Size: 193 KiB |
|
@ -0,0 +1,250 @@
|
|||
# Long-Term Roadmap for Torch-MLIR
|
||||
|
||||
## Overview
|
||||
|
||||
Latest update: 2022Q4
|
||||
|
||||
Torch-MLIR is about one year old now, and has successfully delivered a lot of
|
||||
value to the community. In this document we outline the major architectural
|
||||
changes that will make Torch-MLIR more robust, accessible, and useful to the
|
||||
community on a 1-2 year timeline.
|
||||
|
||||
First, let's recap the goals of Torch-MLIR.
|
||||
|
||||
Technically, the goal of Torch-MLIR is to bridge the PyTorch and MLIR
|
||||
ecosystems. That's vague, but it captures a very important property: Torch-MLIR
|
||||
is not in the business of "innovating" either on the frontend or backend sides.
|
||||
The project scope is to be an enabling connector between the two systems.
|
||||
|
||||
Non-technically, Torch-MLIR's goal is not to be an end-to-end product, but a
|
||||
reliable piece of "off the shelf" infrastructure that system designers use as
|
||||
part of their larger end-to-end flows. The main users are expected to be
|
||||
"integrators", not end-users writing Python. This has the following facets:
|
||||
- Community: Users of Torch-MLIR should feel empowered to participate in the
|
||||
community to get their questions resolved, or propose (and even implement)
|
||||
changes needed for their use cases.
|
||||
- Ecosystem alignment: Users of Torch-MLIR should feel that the project is
|
||||
aligned with all of the projects that it collaborates with, making it safe to
|
||||
bet on for the long term.
|
||||
- Ease of use: Users of Torch-MLIR should feel that it "Just Works", or that
|
||||
when it fails, it fails in a way that is easy to understand, debug, and fix.
|
||||
- Development: Torch-MLIR should be easy and convenient to develop.
|
||||
|
||||
Today, much of the design space and the main problems have been identified, but
|
||||
larger-scale architectural and cross-project changes are needed to realize the
|
||||
right long-term design. This will allow us to reach a steady-state that best
|
||||
meets the goals above.
|
||||
|
||||
## The main architectural changes
|
||||
|
||||
As described in
|
||||
[architecture.md](architecture.md),
|
||||
Torch-MLIR can be split into two main parts: the "frontend" and the "backend".
|
||||
|
||||
The main sources of brittleness, maintenance cost, and duplicated work across
|
||||
the ecosystem are:
|
||||
|
||||
- The frontend work required to lower TorchScript to the backend contract.
|
||||
- The irregular support surface area of the large number of PyTorch ops across
|
||||
the Linalg, TOSA, and MHLO backends.
|
||||
|
||||
Most of this document describes long-term ecosystem changes that will address
|
||||
these, drastically improving Torch-MLIR's ability to meet its goals.
|
||||
|
||||
## Roadmap
|
||||
|
||||
### Refactoring the frontend
|
||||
|
||||
The primary way to make the frontend more reliable is to leverage new PyTorch
|
||||
infrastructure that bridges from the PyTorch eager world into compiler-land.
|
||||
PyTorch has two main projects that together cover almost all user use cases and
|
||||
provide a technically sound, high quality-of-implementation path from user
|
||||
programs into the compiler.
|
||||
|
||||
- [TorchDynamo](https://github.com/pytorch/torchdynamo) - TorchDynamo uses
|
||||
tracing-JIT-like techniques and program slicing to extract traces of tensor
|
||||
operations, which can then be passed to lower-level compilers. It works
|
||||
seamlessly with unmodified user programs.
|
||||
- [FuncTorch](https://github.com/pytorch/functorch) - FuncTorch is basically JAX
|
||||
for PyTorch. It requires manual program tracing and slicing, but that is
|
||||
actually important for users since it gives them direct control over various
|
||||
important transformations, such as `grad` and `vmap`.
|
||||
|
||||
These are both being heavily-invested-in by PyTorch core developers, and are
|
||||
generally seen as the next generation of compiler technology for the project,
|
||||
blending PyTorch's famous usability with excellent compiler integration
|
||||
opportunities. Torch-MLIR works with these technologies as they exist today, but
|
||||
significant work remains to enable wholesale deleting the high-maintenance parts
|
||||
of Torch-MLIR. In the future, we expect the block diagram of Torch-MLIR to be
|
||||
greatly simplified, as shown in the diagram below. Note that in the "Future"
|
||||
side, PyTorch directly gives us IR in a form satisfying the backend contract.
|
||||
|
||||
![Roadmap of the frontend](images/roadmap_frontend.png)
|
||||
|
||||
The primary functional requirement of Torch-MLIR which remains unaddressed by
|
||||
today's incarnation of TorchDynamo and FuncTorch is the support for dynamic
|
||||
shapes. PyTorch core devs are
|
||||
[heavily investing](https://dev-discuss.pytorch.org/t/state-of-symbolic-shapes-branch/777)
|
||||
in this area, and both TorchDynamo and FuncTorch are being upgraded as PyTorch
|
||||
rolls out its new symbolic shape infrastructure.
|
||||
|
||||
Smaller blockers are related to general API stability and usability of the
|
||||
various pieces of PyTorch infra.
|
||||
|
||||
These blockers are expected to be addressed by the PyTorch core devs over time.
|
||||
Torch-MLIR's role here is to communicate our requirements to PyTorch core and
|
||||
align their roadmap and ours. We do this by maintaining connections with the
|
||||
PyTorch core developers and being "good-citizen power users" of their latest
|
||||
technology (i.e. trying things out, surfacing bugs, providing feedback, etc.).
|
||||
|
||||
Note: Because both TorchDynamo and FuncTorch are TorchFX-based, we could write a
|
||||
direct TorchFX -> MLIR importer, and delete the TorchScript importer. This would
|
||||
remove the need for Torch-MLIR to build its own custom Python extension --
|
||||
Torch-MLIR would be a pure-Python user of the standard MLIR Python bindings.
|
||||
There is no immediate rush for this though, since TorchFX can be converted to
|
||||
TorchScript (this may become lossy as the dynamic shape support in PyTorch gets
|
||||
more advanced).
|
||||
|
||||
### Refactoring the backend
|
||||
|
||||
Today in Torch-MLIR, we support 3 backends out of the box: Linalg-on-Tensors,
|
||||
TOSA, and MHLO. These backends take IR in the backend contract form (see
|
||||
[architecture.md](architecture.md)) and lowers them to the respective dialects.
|
||||
Today, each backend is implemented completely independently. This leads to
|
||||
duplication and irregularity across the backends.
|
||||
|
||||
Moving forward, we would like for the backends to share more code and for their
|
||||
op support to be more aligned with each other. Since the backend contract today
|
||||
includes "all" of PyTorch's operators, it is very costly to duplicate the
|
||||
lowering of so many ops across backends. Additionally, there are 3
|
||||
forward-looking efforts that intersect with this effort:
|
||||
|
||||
- [StableHLO](https://github.com/openxla/stablehlo) - this is a dialect
|
||||
initially forked from MHLO which intends to create a stable support surface
|
||||
area for what today is our "at head" dependency on MHLO. MHLO is a fairly
|
||||
complete op set, so it is very attractive to have "almost all" models
|
||||
bottleneck through a stable interface like StableHLO. StableHLO is currently
|
||||
under relatively early development, but already delivers on many of the goals
|
||||
of stability.
|
||||
- [TCP](https://github.com/llvm/torch-mlir/issues/1366) - this is a dialect
|
||||
which could serve a role very similar to MHLO, while providing community
|
||||
ownership. TCP is still in early planning phases, but there is strong
|
||||
alignment with the StableHLO effort. One byproduct of TCP that is expected to
|
||||
be very valuable is to incorporate the robust dynamic shape strategy from
|
||||
Linalg into an MHLO-like dialect, and there is a strong desire from StableHLO
|
||||
developers to adopt this once proven in TCP.
|
||||
-
|
||||
[PrimTorch](https://dev-discuss.pytorch.org/t/tracing-with-primitives-update-0/577) -
|
||||
this is an effort on the PyTorch side to decompose PyTorch operators into a
|
||||
smaller set of primitive ops. This effort could effectively reduce the op
|
||||
surface area at the Torch-MLIR level a lot, which would make the duplication
|
||||
across backends less of an issue. But it still leaves open a lot of
|
||||
questions, such as how to control decompositions.
|
||||
|
||||
This is overall less important than the frontend refactor, because it is "just
|
||||
more work" for us as Torch-MLIR developers to support things in the current
|
||||
infrastructure, while the frontend refactor directly affects the user
|
||||
experience.
|
||||
|
||||
As the above efforts progress, we will need to make decisions about how to adopt
|
||||
the various technologies. The main goal is consolidating the bottleneck point
|
||||
where the O(100s-1000s) of ops in PyTorch are reduced to a more tractable O(100)
|
||||
ops. There are two main ways to accomplish this:
|
||||
|
||||
- Future A: We concentrate the bottleneck step in the "Backend contract ->
|
||||
StableHLO/MHLO/TCP" lowering path. This gets us a stable output for most
|
||||
things. The cascaded/transitive lowerings then let us do O(100) lowerings from
|
||||
then on down. (exact details are not worked out yet, and depend on e.g. TCP
|
||||
adoption, etc.)
|
||||
- Future B: PrimTorch concentrates the bottleneck step on the PyTorch side.
|
||||
|
||||
These two efforts synergize, but the need for cascaded lowerings is much less if
|
||||
PrimTorch solves the decomposition problem on the PyTorch side.
|
||||
|
||||
![Roadmap of the backend](images/roadmap_backend.png)
|
||||
|
||||
One of the main blockers for doing cascaded lowerings today is the irregular
|
||||
support for dynamic shapes across TOSA and MHLO. MHLO is much more complete, but
|
||||
the use of `tensor<Nxindex>` to model shapes results in brittleness of the
|
||||
system. A dynamic shape model like that being adopted in TCP (and presumably
|
||||
StableHLO in time) would simplify this. Hence TCP is strategically important for
|
||||
proving out a design for a "dynamically shaped MHLO-like thing" that doesn't
|
||||
have this drawback.
|
||||
|
||||
### Tools for advanced AoT deployments
|
||||
|
||||
PyTorch's future direction is towards TorchDynamo and FuncTorch, which are
|
||||
tracing-based systems. This means that they inherently struggle to capture
|
||||
control flow and non-tensor computations. Many deployments, especially
|
||||
Ahead-of-Time compiled ones such as for edge, require non-tensor computations.
|
||||
It is extremely costly for people deploying such models to manually stitch
|
||||
together graphs of traced functions with custom per-model code with existing
|
||||
tools, and it is also very error-prone. We are awaiting movement on this front
|
||||
from the PyTorch core team. There is some inspiration from systems like
|
||||
[IREE-JAX](https://github.com/iree-org/iree-jax) in the JAX space for how to do
|
||||
this, but ultimately this will depend on what the PyTorch core team decides on
|
||||
for edge deployments. It is our responsibility to stay connected with them and
|
||||
make sure that what they are building suits our needs.
|
||||
|
||||
### Project Governance / Structure
|
||||
|
||||
Torch-MLIR is currently an
|
||||
[LLVM Incubator](https://llvm.org/docs/DeveloperPolicy.html#incubating-new-projects).
|
||||
This has had the advantage of being organizationally close to MLIR Core.
|
||||
However, the long-term direction is likely for Torch-MLIR to live under the
|
||||
PyTorch umbrella, for a few reasons:
|
||||
|
||||
- As discussed in the other parts of this document, the long-term direction is
|
||||
for Torch-MLIR to be a quite thin component, with much of the code being
|
||||
obsoleted by infra in PyTorch core.
|
||||
- The move towards more stable backend output formats will generally reduce
|
||||
variance on the MLIR side. This means that MLIR will be the "more frozen" of
|
||||
the two major dependencies (PyTorch and MLIR).
|
||||
- We would like Torch-MLIR to be hooked into the PyTorch CI systems, and
|
||||
generally be more tightly integrated with the PyTorch development process
|
||||
(this includes things like packaging as well).
|
||||
|
||||
### Co-design
|
||||
|
||||
Many users of MLIR are developing advanced hardware or software systems, and
|
||||
often these require information from the frontend beyond what PyTorch
|
||||
provides today. Torch-MLIR should always be a "follower" of the features
|
||||
available in the frontends and backends it connects to. We want to enable
|
||||
co-design, of course, but new features such as quantization, sparsity,
|
||||
distribution, etc. should be viewed from the lens of "the frontend can give us
|
||||
X information, the backend needs Y information -- how do we connect
|
||||
them?".
|
||||
|
||||
To satisfy those needs, we want to focus on existing extensibility mechanisms in
|
||||
the frontend rather than inventing new ones. We intend to explore using existing
|
||||
frontend concepts, such as
|
||||
[custom ops](https://github.com/llvm/torch-mlir/issues/1462), to enable this
|
||||
co-design.
|
||||
|
||||
If it proves to be absolutely necessary to add new concepts to the frontend
|
||||
(e.g. new data types), it should be considered very carefully since supporting
|
||||
such features is a major scope increase to the Torch-MLIR project. It is likely
|
||||
to be better done in a separate project, with a carefully thought-out
|
||||
integration with Torch-MLIR that avoids putting the maintenance burden on the
|
||||
side of Torch-MLIR for the exploratory new frontend concept.
|
||||
|
||||
### LazyTensorCore support in Torch-MLIR
|
||||
|
||||
Today, Torch-MLIR supports LazyTensorCore. But as mentioned
|
||||
[here](https://dev-discuss.pytorch.org/t/skipping-dispatcher-with-lazytensor/634/2?u=_sean_silva),
|
||||
on the 1-2yr time horizon LTC will be more an implementation detail under
|
||||
TorchDynamo for users that already have compilers written using LTC. That is,
|
||||
LTC is basically just a way to convert a TorchDynamo FX graph into LTC graphs,
|
||||
for users that have toolchains written against LTC graphs. But that won't make
|
||||
much technical sense for Torch-MLIR, because we convert to MLIR in the end no
|
||||
matter what. That is, in the future going
|
||||
`TorchDynamo FX graph -> LTC Graph -> MLIR` can just be replaced by the direct
|
||||
`TorchDynamo FX graph -> MLIR path`. So in the 1-2yr time horizon, LTC will not
|
||||
make technical sense in Torch-MLIR.
|
||||
|
||||
There will still be non-technical blockers, such as if end-users have
|
||||
`device='lazy'` hardcoded into their code. That will require a migration plan
|
||||
for current LTC-based toolchains onto TorchDynamo. This migration will improve
|
||||
the end-user experience since TorchDynamo is more seamless, but it is a
|
||||
end-user-impacting migration nonetheless and we will want to phase it
|
||||
appropriately with the community.
|
Loading…
Reference in New Issue