torch-mlir/README.md

122 lines
4.6 KiB
Markdown

# NPComp - An aspirational MLIR based numpy compiler
NPComp aims to be an idiomatic subset of the Python language, suitable for
extracting isolated, statically typed programs from a running Python session.
It is inspired by many projects that have come before it, including:
* PyPy/RPython
* Numba
* Pythran
* TorchScript
* Autograph
As the name implies, NPComp also seeks to provide compiler-backed support
for Numpy APIs.
The project spawned out of both [LLVM's MLIR project](https://mlir.llvm.org/)
and [The IREE Project](https://github.com/google/iree) and seeks to use the
MLIR and IREE tooling to enable progressive lowering of high level compute
dominant sub-programs in a way that preserves high level semantic information
that is expected to be useful for exploiting parallelism, generating high
performance code, and enabling portability and deployment to a range of
devices. Some of these goals overlap with existing projects, and to a first
approximation, the experiment with NPComp is to determine whether rebasing
on the MLIR tooling and ML backends like IREE produce a lift.
Before getting too excited, keep in mind that this project *barely* exists: it
is very new and doesn't do anything useful yet :) We are using it as a testing
ground for some new ideas and infrastructure improvement, and depending on
how things turn out, may end up carrying it forward or breaking it up for
parts.
See the [features doc](docs/features.md) for a semi-curated status of what is
implemented.
## Architecture
The compiler is separated into:
* [Frontend importer](python/npcomp/compiler/frontend.py): Translates from
various AST levels to corresponding MLIR dialects.
* Frontend compiler: MLIR passes and conversions, mostly operating on the
[basicpy](include/Dialect/Basicpy/IR/BasicpyOps.td) and
[numpy](include/Dialect/Numpy/IR/NumpyOps.td) dialects.
* Backend compiler and runtime: Some effort has been taken to make this
pluggable, but right now, only the [IREE Backend](python/npcomp/compiler/backend/iree.py)
exists. There is in-tree work to also build a minimal reference backend
directly targeting LLVM.
## Repository Layout
The project is roughly split into the following areas of code:
* [User-facing Python code](python/npcomp)
* [_npcomp native module](python_native)
* C++ [include](include) and [lib](lib) trees, following LLVM/MLIR conventions
* LIT testing trees:
* [test](test): Lit/FileCheck tests covering core MLIR based infra
* [pytest/Compiler](pytest/Compiler): Lit test suite that drive the compiler
infra from Python
* [backend_test](backend_test): Lit test suites conditionally enabled for
each backend
* [tools](tools): Scripts and binaries (npcomp-opt, npcomp-run-mlir, etc)
## Quick start
```
LLVM_VERSION=10
export CC=clang-$LLVM_VERSION
export CXX=clang++-$LLVM_VERSION
export LDFLAGS=-fuse-ld=$(which ld.lld-$LLVM_VERSION)
export LLVM_SRC_DIR=/path/to/llvm-project
# Check out last known good commit.
LLVM_COMMIT="$(cat ./build_tools/llvm_version.txt)"
(cd $LLVM_SRC_DIR && git checkout $LLVM_COMMIT)
./build_tools/install_mlir.sh
./build_tools/cmake_configure.sh
# Build and run tests
# ./build_tools/test_all.sh runs all of these commands.
cd build
ninja
ninja check-npcomp
# Setup PYTHONPATH for interactive use
export PYTHONPATH="$(realpath build/python):$(realpath build/python_native):$(realpath build/iree/bindings/python)"
```
## Interactive Use
The cmake configuration populates symlinks in the `build/python` directory
mirroring the source layout. This allows edit-run without rebuilding (unless
if files are added/removed).
Configuring the `PYTHONPATH` as above should be sufficient to run any
interactive tooling (`python3`, Jupyter/Colab, etc).
Note that running the `cmake_configure.sh` script will also output a `.env`
file in the workspace folder with the correct PYTHONPATH set. This allows
tools like VSCode to work by default for debugging.
Notes:
* Python sources are symlinked to the output directory at configure time.
Adding sources will require a reconfigure. Editing should not.
* It is a very common issue to have both python 2.7 (aka. "python") and python
3.x (aka. "python3") on a system at a time (and we can only hope that one
day this ends). Since the native library at development time binds to a
specific version, if you try to run with a different python, you will get
an error about the "native" module not being found.
## Compiler development
For bash users, adding the following to your `.bashrc` defines some aliases
that are useful during compiler development, such as shortcuts for builing
and running `npcomp-opt`.
```
source $WHERE_YOU_CHECKED_OUT_NPCOMP/tools/bash_helpers.sh
```