Remove old outdated roadmaps. Add placeholder new one.

pull/346/head
Sean Silva 2021-10-01 17:22:35 +00:00
parent 25a2c8bd85
commit 9fc059e948
3 changed files with 14 additions and 384 deletions

View File

@ -1,250 +0,0 @@
# Roadmap as of beginning of 2021Q2
## Non-technical project status overview
The project has evolved past "works on my machine" stage. It's hard to provide
meaningful numbers for an open-source project, but I'm seeing >5 people
regularly active on pull requests, bugs, etc. covering aspects ranging from
acap_dispatch, TorchScript, RefBackend, build systems, and even CI. This is very
promising and feels healthy to me.
## Roadmap overview
The project has grown a number of aspects, but effort has converged on 3
workstreams:
- acap_dispatch: The goal of this project is to develop a tracing-based frontend
for Torch interoperability that takes clues from existing working solutions to
enable a gateway from PyTorch to MLIR.
- Why this project is cool: For users that can tolerate the limitations of
tracing systems, this project enables an MLIR-based frontend for PyTorch on
a shorter-time frame than the TorchScript compilation, letting downstream
users focus on their value-add. Also, the tracing-based approach has a
distinct usability advantage for many pure-Python researcher workflows.
- TorchScript compilation: The goal of this project is to build the frontend of
a truly next-generation ahead-of-time machine learning compiler.
- Why this project is cool: This system is designed from day 1 to support
features such as dynamic shapes, control flow, mutable variables,
program-internal state, and non-Tensor types (scalars, lists, dicts) in a
principled fashion. These features are essential for empowering an
industry-level shift in the set of machine learning programs that are
feasible to deploy with minimal effort across many devices (when combined
with a backend using the advanced compilation techniques being developed
elsewhere in the MLIR ecosystem).
- Reference backend (RefBackend): The goal of this project is to develop a
reference end-to-end flow for the MLIR project, using the needs of our
frontends as seeds for new feature development and upstreaming.
- Why this project is cool: Due to its status as an LLVM incubator project,
npcomp is uniquely positioned to develop an end-to-end flow with a clear
path toward upstreaming components as their design converges (example:
bufferization), or rapidly rebasing on newly added upstream components to
replace homegrown pieces (example: linalg on tensors).
## acap_dispatch
acap_dispatch is the name of our implementation of a tracing-based PyTorch
program capture system, analogous to the one used by
[pytorch/xla](https://github.com/pytorch/xla). This system is sufficient to
capture very many programs of interest, and has the benefit of seamlessly
capturing gradients, shapes, and dtypes, while still bottoming out on the same
ATen dialect needed by the TorchScript path. It also trivializes all use of
Python-data structures like lists by directly observing their values as
constants.
Looking a bit longer-term, this flow is a good complement to the TorchScript
flow and has distinct tradeoffs. These are captured nicely in the paper
[LazyTensor: combining eager execution with domain-specific
compilers](https://arxiv.org/abs/2102.13267). In their terminology, our
acap_dispatch path implements "Tracing" while our TorchScript path implements
"Direct compilation". Direct compilation tends to be required for deploying
complex models for inference, edge, or federated learning applications, while
Tracing is the building block for a totally seamless researcher experience when
iterating in Python.
### 2021Q2
- Improve robustness of the flow's program capture, ideally to the level of
`pytorch/xla`.
- Get into a steady-state where adding operations is fairly mechanical.
- Support at least a few programs of community interest.
- Identify demand for a more holistic user experience, analogous to
`pytorch/xla`. For example, building out support for the more runtime-y
aspects like compiling on the fly, moving tensors in and out of the compiler
system's runtime, etc. that makes it an actual user experience rather than
just a way to get compiler IR.
## TorchScript compilation
The TorchScript compiler represents the bulk of core compiler effort in the
npcomp project.
[TorchScript](https://pytorch.org/docs/stable/jit_language_reference.html) is a
restricted (more static) subset of Python, but even TorchScript is quite dynamic
when compared to the needs of lower-levels of the compilation stack, especially
systems like Linalg. The overarching theme of this project is building out
compiler components that bridge that gap. As we do so, the recurring tradeoffs
are:
- user experience: we want a fairly unrestricted programming model -- that's
what users like about PyTorch, and what enables users to deploy without
significant modifications of their code.
- feasibility of the compiler: we want a smart compiler that is feasible to
implement (for our own sanity :) )
- excellent generated code quality: this is of course dependent on the backend
which is paired with the frontend we are building, but there are a number of
transformations that make sense before we reach the backend which strongly
affect the quality of code generated from a backend.
To give a concrete example, consider the problem of inferring the shapes of
tensors at various points in the program. The more precision we have on the
shapes, the better code can be emitted by a backend. But in general, users need
to provide at least some information about their program to help the compiler
understand what shapes are at different points in the program. The smarter our
compiler algorithms are, the less information the user needs to provide. Thus,
all 3 facets are interlinked and there is no single right answer -- we need to
balance them for a workable system.
To accomplish this goal, we intend to be guided by a *model curriculum*, which
consists of programs of escalating complexity, from a simple elementwise
operation all the way to a full-blown end-to-end speech recognition program. Our
development process consists of setting incremental objectives to build out new
layers of the compiler to a satisfactory level on easier programs in the
curriculum, and backfilling complexity as needed to extend to the harder
programs. Ideally, this backfilling does not require deep conceptual changes to
components, but is simply an application of extension points anticipated in the
original design. The trick to making that happen is evaluating designs on enough
programs from the curriculum to ensure that a solution is likely to generalize
and satisfy our objectives, without getting bogged down in theoretical details.
### 2021Q2
- Model curriculum
- Formalize / publish curriculum to ease collaboration
- Incorporate end-to-end ASR (speech recognition) model into curriculum, or
program of similar complexity.
- Incorporate representative quantized models into curriculum.
- End-to-end execution of at least the simplest models in the curriculum.
- User annotation infrastructure for users to provide the compiler
information, such as shapes to seed shape inference.
- Fill out ATen dialect and `aten-recognize-kernels` pass.
- ATen lowering to Linalg-on-tensors
- Implement a minimal amount of linalg-inspired abstractions in the "TCF"
dialect.
- Extend the linalg
[OpDSL tooling](https://llvm.discourse.group/t/rfc-linalg-opdsl/2966/6) to
enable us to programmatically emit shape validity checks.
- Shape/dtype inference
- As needed for other incremental objectives.
- Build a clear picture of the right place(s) in the longer-term compiler
pipeline for shape inference.
- Canonicalizations and general compiler optimizations
- As needed for other incremental objectives.
- Backend choice: RefBackend or IREE candidates.
### 2021Q3
- Start to smell a little production-ey
- For the simplest models at least, get them running on IREE with performance
competitive with other frontends.
- Write initial "user manual" (and any supporting tools) for how to use the
new frontend (+ backend integration points) to deploy something.
- Extend model support:
- Vertically integrated spike building out generalized support for list, dict,
etc. for representative complex models. (co-design with RefBackend or IREE).
- Implement coherent shape/dtype inference design based on Q2 insights.
- Scale up of Q2 compiler features to the curriculum
- Extend user annotation infrastructure as needed.
- ATen dialect and `aten-recognize-kernels` pass
- ATen --> Linalg lowerings.
- Canonicalizations and other compiler optimizations
### 2021Q2 Retrospective (added afterwards)
- Model curriculum
- [✅] Formalize / publish curriculum to ease collaboration
- Note: See `frontends/pytorch/e2e_testing/torchscript`.
- [~] Incorporate end-to-end ASR (speech recognition) model into curriculum,
or program of similar complexity.
- Note: TorchScript'able machine translation model identified, but not
formally added.
- [✅] Incorporate representative quantized models into curriculum.
- Note: See `frontends/pytorch/e2e_testing/torchscript/quantized_models.py`
- End-to-end execution of at least the simplest models in the curriculum.
- [✅] User annotation infrastructure for users to provide the compiler
information, such as shapes to seed shape inference.
- Note: See `frontends/pytorch/csrc/builder/class_annotator.cpp` and
`frontends/pytorch/python/torch_mlir/torchscript/annotations.py`.
- [✅] Fill out ATen dialect and `aten-recognize-kernels` pass.
- Note: Accomplished with significant design shift. See
[PR](https://github.com/llvm/mlir-npcomp/pull/214).
- [❌] ATen lowering to Linalg-on-tensors
- Implement a minimal amount of linalg-inspired abstractions in the "TCF"
dialect.
- Extend the linalg
[OpDSL tooling](https://llvm.discourse.group/t/rfc-linalg-opdsl/2966/6) to
enable us to programmatically emit shape validity checks.
- Note: Not enough programs / ops brought up yet to generalize.
- [✅] Shape/dtype inference
- As needed for other incremental objectives.
- Build a clear picture of the right place(s) in the longer-term compiler
pipeline for shape inference.
- Note: Need one major pass of this in the frontend at the torch level to
get ranks and dtypes, and then one later pass at the linalg level to
propagate specific sizes as much as possible.
- [✅] Canonicalizations and general compiler optimizations
- As needed for other incremental objectives.
- [✅] Backend choice: RefBackend or IREE candidates.
- Note: Used RefBackend this quarter due to simplicity of current programs +
some blocking IREE issues.
## RefBackend
The npcomp reference backend (or "RefBackend") is perhaps the most confusing
part of the project, since it really has nothing to do per se with compiling
numerical Python programs. The RefBackend's biggest impact is really a strategic
play on two time horizons:
- short-medium term: Avoid bad design decisions by avoiding single-sourcing on
IREE.
- Although some key contributors to npcomp are closely affiliated with IREE,
there is a distinct desire to honor the spirit of being an LLVM incubator
and not have the npcomp project evolve into an extension of IREE. We also
believe that this kind of design influence results in a better system in
general.
- medium-long term: Give upstream MLIR a more "batteries included" end to end
flow by incubating minimally-opinionated components and upstreaming them.
- Context: Due to history, all MLIR-based end-to-end flows of nontrivial
capability live in downstream repositories, such as TensorFlow, IREE, etc.
This leads to an awkward situation where sometimes code is added to
upstream, but any load-bearing use case cannot be exercised with upstream
tools (such as quantifying performance, building auto-tuning infrastructure,
etc.). This leads to significant drag on MLIR's overall trajectory.
The way we intend to advance those two strategic goals there is to incorporate
end-to-end execution on the RefBackend as part of the end-to-end execution
milestones of the acap_dispatch and TorchScript frontends.
### 2021Q2
- Build out support for PyTorch kernel fallback.
- Help Nicolas build out and ideally land upstream his linalg-on-tensors
[e2e execution sandbox](https://github.com/google/iree/tree/main/experimental/runners),
with an eye towards rebasing aspects of the RefBackend flow on those
components.
- Build out better runtime calling convention interop.
- Start thinking about a plan to support list, dict, etc. in the runtime,
ideally using MLIR infra to make it magically generalize and be minimally
opinionated.
### 2021Q3
- Using the runtime abstractions built out for list, dict, etc., ditch the
`memref`-based lowering flow and use new primitives for the "top-level" of the
program (use of memref should be isolated from e.g. top-level control flow,
lifetime management, calling conventions, etc.).
- Use (or help build) upstream linalg-on-tensors abstractions analogous to
IREE's `flow.dispatch.workgroups` (parallel computation grid) that
linalg-on-tensors can directly fuse into, avoiding phase ordering issues with
fusion, bufferization, kernel generation.

View File

@ -1,134 +0,0 @@
# Roadmap as of beginning of 2021Q3
## Project status overview
- TorchScript compilation: Significant work has gone into the TorchScript
compilation workstream. Basic multi-layer perceptrons execute end-to-end, and
significant strides have been taken towards ResNet and quantized programs.
Additionally, a full TorchScript'able machine translation model (IDs to IDs;
including beam search) has been identified as representative of the kind of
challenging programs that the TorchScript ahead-of-time compilation flow will
enable.
- `acap_dispatch`: Discussions with stakeholders in the npcomp and PyTorch
community have shifted the `acap_dispatch` workstream to upstream discussions
(see [bug](https://github.com/pytorch/xla/issues/2854)). Work within npcomp on
`acap_dispatch` is temporarily on hold.
- RefBackend: The RefBackend workstream is temporarily on hold as well. The
needs of the TorchScript compilation path are too complex (lists, dicts, error
handling, runtime ABI) and the engineering resources too limited to
meaningfully bring up an alternative backend. The decision going forward is to
single-source on IREE as our needs become more complex. This is somewhat
unfortunate, as the goal of the RefBackend was to somewhat defray the backend
story and prevent single-sourcing on what at the time (~2020Q1) was perceived
as a large external dependency. Somewhat mitigating this situation though is
that in the intervening year, IREE has become significantly "leaner and
meaner", and while still nontrivial, it has found a much more tightly scoped
role that leans much more heavily on upstream infrastructure. In fact,
inclusion of IREE in the LLVM project in some form now seems possible, which
will make this dependency very natural.
## Non-technical project status overview
Community contributions have somewhat petered out due to the shifting focus of
the project. This was somewhat expected as the early aspirations of the project
met with the reality of available resourcing, ecosystem constraints, and more
fine-grained understanding of stakeholder needs. We have brought on 1 new full
time engineer to work on the project though.
## Roadmap overview
The project has converged on the TorchScript workstream as the primary effort:
- TorchScript compilation: The goal of this project is to build the frontend of
a truly next-generation ahead-of-time machine learning compiler.
- Why this project is cool: This system is designed from day 1 to support
features such as dynamic shapes, control flow, mutable variables,
program-internal state, and non-Tensor types (scalars, lists, dicts) in a
principled fashion. These features are essential for empowering an
industry-level shift in the set of machine learning programs that are
feasible to deploy with minimal effort across many devices (when combined
with a backend using the advanced compilation techniques being developed
elsewhere in the MLIR ecosystem).
## TorchScript compilation
The TorchScript compiler represents the bulk of core compiler effort in the
npcomp project.
[TorchScript](https://pytorch.org/docs/stable/jit_language_reference.html) is a
restricted (more static) subset of Python, but even TorchScript is quite dynamic
when compared to the needs of lower-levels of the compilation stack, especially
systems like Linalg. The overarching theme of this project is building out
compiler components that bridge that gap. As we do so, the recurring tradeoffs
are:
- user experience: we want a fairly unrestricted programming model -- that's
what users like about PyTorch, and what enables users to deploy without
significant modifications of their code.
- feasibility of the compiler: we want a smart compiler that is feasible to
implement (for our own sanity :) )
- excellent generated code quality: this is of course dependent on the backend
which is paired with the frontend we are building, but there are a number of
transformations that make sense before we reach the backend which strongly
affect the quality of code generated from a backend.
To give a concrete example, consider the problem of inferring the shapes of
tensors at various points in the program. The more precision we have on the
shapes, the better code can be emitted by a backend. But in general, users need
to provide at least some information about their program to help the compiler
understand what shapes are at different points in the program. The smarter our
compiler algorithms are, the less information the user needs to provide. Thus,
all 3 facets are interlinked and there is no single right answer -- we need to
balance them for a workable system.
To accomplish this goal, we are guided by a *model curriculum*, which consists
of programs of escalating complexity, from a simple elementwise operation all
the way to a full-blown end-to-end speech recognition program. Our development
process consists of setting incremental objectives to build out new layers of
the compiler to a satisfactory level on easier programs in the curriculum, and
backfilling complexity as needed to extend to the harder programs. Ideally, this
backfilling does not require deep conceptual changes to components, but is
simply an application of extension points anticipated in the original design.
The trick to making that happen is evaluating designs on enough programs from
the curriculum to ensure that a solution is likely to generalize and satisfy our
objectives, without getting bogged down in theoretical details.
### 2021Q3
- Theme: Scale up the programs we can run end-to-end.
- End-to-end execution of ResNet.
- Significant strides towards end-to-end execution of the identified
end-to-end machine translation model.
- End-to-end execution of simple programs with lists.
- End-to-end execution of simple stateful programs.
- Significant strides towards end-to-end execution of two major "classes of
models". Tentatively: transformer, LSTM.
- Theme: Start feeling production-ey
- For the simplest programs at least, get them running on IREE with
performance competitive with other frontends
- Stretch: Extend this result to ResNet.
- Write initial "user manual" (and any supporting tools, packaging) for how to
use the new frontend (+ backend integration points) to deploy something on
IREE.
- Redesign frontend API's as needed to be palatable to document.
### 2021Q4
- Theme: Compiler becomes generally functional for a large class of programs
- End-to-end execution of end-to-end MT (machine translation) program.
- End-to-end execution of the two major "classes of models" added to the
curriculum in Q3.
- End-to-end execution of quantized model.
- Identify/build TorchScript'able ASR (automatic speech recognition) program.
- Significant strides towards end-to-end execution of ASR.
- Bringing up new programs should be fairly quick and mechanical.
- Theme: Pathfind next phase after initial compiler bringup.
- Begin talks with potential users / applications to identify a useful
"real" capstone project.
- Goal: Demonstrate viability of the tools.
- Goal: Start rallying support / interest more broadly.
- Begin looking at training use cases.
- Begin looking at building "anti-framework" numerical Python compiler layered
on our TorchScript compiler.

View File

@ -0,0 +1,14 @@
# 2021Q4 Roadmap
**NOTE**: Under construction. Feedback from the
[Torch-MLIR RFC](https://discuss.pytorch.org/t/torch-mlir-bridging-pytorch-and-mlir-ecosystems/133151)
is expected to influence this!
1. Demonstrate viability of our solutions on industry-standard workloads:
- {inference, training} x {BERT-L, MaskRCNN, ResNet50}
1. One industry partner migrates (or begins to migrate) to Torch-MLIR for an e2e
flow on their critical path.
1. Torch-MLIR is seen/accepted as an "off the shelf" solution for Torch/MLIR
interop. This specifically covers "softer" / "quasi-technical" aspects of the
project, such as community outreach/recognition, build system integration,
testing, CI, packaging, documentation.