Update docker, instructions and some fixes for the pytorch 1.3 build. (#45)

* Includes pybind11 directly (for some reason using the pytorch helper header for this depends on a source file not in the image). * Installs nnpack into the image. * Installs new-clang and LLD and configures environment to use it (otherwise, link time is terrible). * Fixes a gcc compile error (in the off chance you build with default gcc compiler). * Tests are failing based on some dialect registration stuff that must not have been factored correctly. Will followup with a fix.
2020-09-16 21:57:46 -07:00 · 2020-09-16 21:57:46 -07:00 · 678989a321
parent 75f57b461e
commit 678989a321
7 changed files with 130 additions and 37 deletions
--- a/README.md
+++ b/README.md
@ -48,30 +48,6 @@ The project is roughly split into the following areas of code:
    each backend
 * [tools](tools): Scripts and binaries (npcomp-opt, npcomp-run-mlir, etc)

-## Quick start
-
-```
-git submodule init
-git submodule update
-
-LLVM_VERSION=10
-export CC=clang-$LLVM_VERSION
-export CXX=clang++-$LLVM_VERSION
-export LDFLAGS=-fuse-ld=$(which ld.lld-$LLVM_VERSION)
-
-./build_tools/install_mlir.sh
-./build_tools/cmake_configure.sh
-
-# Build and run tests
-# ./build_tools/test_all.sh runs all of these commands.
-cd build
-ninja
-ninja check-npcomp
-
-# Setup PYTHONPATH for interactive use
-export PYTHONPATH="$(realpath build/python):$(realpath build/iree/bindings/python)"
-```
-
 ## Interactive Use

 The cmake configuration populates symlinks in the `build/python` directory
@ -104,3 +80,90 @@ and running `npcomp-opt`.
 ```
 source $WHERE_YOU_CHECKED_OUT_NPCOMP/tools/bash_helpers.sh
 ```
+
+## Build Instructions
+
+### Common prep
+
+```shell
+# From checkout directory.
+git submodule init
+git submodule update
+
+# Use clang and lld to build (optional but recommended).
+LLVM_VERSION=10
+export CC=clang-$LLVM_VERSION
+export CXX=clang++-$LLVM_VERSION
+export LDFLAGS=-fuse-ld=$(which ld.lld-$LLVM_VERSION)
+
+# Build and install LLVM/MLIR into the ./install-mlir directory
+./build_tools/install_mlir.sh
+```
+
+### Vanilla - numpy-only, no pytorch
+
+```shell
+# Follow common prep above.
+./build_tools/cmake_configure.sh
+
+# Build and run tests
+# ./build_tools/test_all.sh runs all of these commands.
+cd build
+ninja
+ninja check-npcomp
+
+# Setup PYTHONPATH for interactive use
+export PYTHONPATH="$(realpath python):$(realpath build/python)"
+```
+
+### PyTorch 1.3 - ATen pseudo-device type dispatch
+
+The currently functional approach to PyTorch integration uses an ATen pseudo
+device for program capture. It is activated by including the PyTorch cmake
+path and settind `-DNPCOMP_ENABLE_TORCH_TYPE_DISPATCH=ON`. This approach has a
+very fragile dependency on a specific PyTorch revisions in the ~1.3 era and
+currently must be built via the docker image in `docker/pytorch-1.3`.
+
+We are migrating to newer approaches that build with more recent PyTorch
+versions, but these are not yet functional (see below).
+
+Docker container setup:
+
+```shell
+# Build the docker image
+docker build docker/pytorch-1.3 --tag npcomp-pytorch-1.3:1.0
+
+# Docker workflow (or use your own preferences).
+# Create a volume for npcomp build artifacts.
+docker volume create npcomp-pytorch-1.3-build
+
+# Run the container, mounting /npcomp to the source directory and the volume
+# above to the /build directory. The source directory is mounted read-only to
+# avoid the container putting root owned files there.
+# Replace `$HOME/src/mlir-npcomp` with an appropriate path to where the project
+# is checked out.
+docker run \
+  --mount type=bind,source=$HOME/src/mlir-npcomp,target=/npcomp,readonly \
+  --mount source=npcomp-pytorch-1.3-build,target=/build \
+  --rm -it npcomp-pytorch-1.3:1.0 /bin/bash
+```
+
+```shell
+# From within the docker image.
+# Install MLIR and configure project.
+cd /npcomp
+BUILD_DIR=/build ./build_tools/install_mlir.sh
+BUILD_DIR=/build ./build_tools/cmake_configure.sh \
+  -DCMAKE_PREFIX_PATH=/opt/conda/lib/python3.6/site-packages/torch/share/cmake \
+  -DNPCOMP_ENABLE_TORCH_TYPE_DISPATCH=ON
+
+# Build.
+cd /build
+ninja
+ninja check-npcomp
+ninja check-frontends-pytorch
+```
+
+### PyTorch 1.7+ - Graph API <-> MLIR
+
+TODO
--- a/build_tools/cmake_configure.sh
+++ b/build_tools/cmake_configure.sh
@ -1,11 +1,19 @@
 #!/bin/bash
+# Configures the project with default options.
+# LLVM/MLIR should be installed into the build directory first by running
+# ./build_tools/install_mlir.sh.
+#
+# Usage (for in-tree build/ directory):
+#   ./build_tools/cmake_configure.sh [ARGS...]
+# Usage (for arbitrary build/ directory):
+#   BUILD_DIR=/build ./build_tools/cmake_configure.sh [ARGS...]
 set -e

 # Setup directories.
 td="$(realpath $(dirname $0)/..)"
-build_dir="$td/build"
-install_mlir="$td/install-mlir"
-build_mlir="$td/build-mlir"
+build_dir="$(realpath "${BUILD_DIR:-$td/build}")"
+install_mlir="$build_dir/install-mlir"
+build_mlir="$build_dir/build-mlir"
 declare -a extra_opts

 if ! [ -d "$install_mlir/include/mlir" ]; then
@ -72,9 +80,15 @@ fi
 echo "Using llvm-lit: $LLVM_LIT"

 # Write a .env file for python tooling.
-echo "Updating $td/.env file"
-echo "PYTHONPATH=\"$(realpath "$build_dir/python_native"):$(realpath "$build_dir/python"):$(realpath "$build_dir/iree/bindings/python")\"" > "$td/.env"
-echo "NUMPY_EXPERIMENTAL_ARRAY_FUNCTION=1" >> "$td/.env"
+function write_env_file() {
+  echo "Updating $build_dir/.env file"
+  echo "PYTHONPATH=\"$(realpath "$build_dir/python_native"):$(realpath "$build_dir/python")\"" > "$build_dir/.env"
+  echo "NUMPY_EXPERIMENTAL_ARRAY_FUNCTION=1" >> "$build_dir/.env"
+  if ! cp "$build_dir/.env" "$td/.env"; then
+    echo "WARNING: Failed to write $td/.env"
+  fi
+}
+write_env_file

 set -x
 cmake -GNinja \
--- a/build_tools/install_mlir.sh
+++ b/build_tools/install_mlir.sh
@ -1,6 +1,11 @@
 #!/bin/bash
+# Usage (for in-tree build/ directory):
+#   ./build_tools/install_mlir.sh
+# Usage (for aribtrary build/ directory):
+#   BUILD_DIR=/build ./build_tools/install_mlir.sh
 set -e
 td="$(realpath $(dirname $0)/..)"
+build_dir="$(realpath "${BUILD_DIR:-$td/build}")"

 # Find LLVM source (assumes it is adjacent to this directory).
 LLVM_SRC_DIR="$(realpath "${LLVM_SRC_DIR:-$td/external/llvm-project}")"
@ -10,10 +15,10 @@ if ! [ -f "$LLVM_SRC_DIR/llvm/CMakeLists.txt" ]; then
  exit 1
 fi
 echo "Using LLVM source dir: $LLVM_SRC_DIR"
-
+echo "Build directory: $build_dir"
 # Setup directories.
-build_mlir="$td/build-mlir"
-install_mlir="$td/install-mlir"
+build_mlir="$build_dir/build-mlir"
+install_mlir="$build_dir/install-mlir"
 echo "Building MLIR in $build_mlir"
 echo "Install MLIR to $install_mlir"
 mkdir -p "$build_mlir"
--- a/docker/pytorch-1.3/Dockerfile
+++ b/docker/pytorch-1.3/Dockerfile
@ -19,7 +19,6 @@ RUN /opt/conda/bin/conda install matplotlib pybind11
 #torchvision

 ENV LD_LIBRARY_PATH "${LD_LIBRARY_PATH}:/opt/conda/lib"
-
 # Rebuild pytorch

 WORKDIR /opt/pytorch/pytorch
@ -34,3 +33,13 @@ RUN TORCH_CUDA_ARCH_LIST="5.2 6.0 6.1 7.0 7.5+PTX" \

 WORKDIR /workspace

+# Additional packages for building npcomp
+RUN apt-get install clang-10 lld-10 --assume-yes
+RUN conda install -c gaiar nnpack
+
+# Additional env for building npcomp and running tests.
+ENV LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:/usr/local/cuda-10.1/compat/lib.real"
+ENV CC=clang-10
+ENV CXX=clang++-10
+ENV CXXFLAGS "-I/opt/conda/include"
+ENV LDFLAGS "-fuse-ld=/usr/bin/ld.lld-10 -L/opt/conda/lib"
--- a/frontends/pytorch/csrc/type_dispatch/python_bindings.cpp
+++ b/frontends/pytorch/csrc/type_dispatch/python_bindings.cpp
@ -18,6 +18,8 @@
 // In this case t2_cpu contains the result of the computation, and t2_mlir
 // contains the mlir description of the computation.

+#include <pybind11/pybind11.h>
+
 #include "llvm/Support/Debug.h"
 #include "llvm/Support/MemoryBuffer.h"
 #include "llvm/Support/raw_ostream.h"
@ -36,7 +38,7 @@
 #include "npcomp/Dialect/ATen/ATenPasses.h"
 #include "npcomp/Dialect/ATen/LivenessReport.h"

-#include "torch/csrc/jit/pybind.h"
+namespace py = pybind11;

 // Then ATen headers with workarounds
 #include "ATen/ArrayRef.h"
--- a/frontends/pytorch/lib/aten_ops.cpp
+++ b/frontends/pytorch/lib/aten_ops.cpp
@ -19,7 +19,7 @@
 #include <ATen/ATen.h>
 #include <torch/torch.h>

-#include "nnpack.h"
+#include <nnpack.h>
 #include <ATen/CPUType.h>

 namespace {
--- a/lib/JITRuntime/JITModule.cpp
+++ b/lib/JITRuntime/JITModule.cpp
@ -52,7 +52,7 @@ JITModule::fromCompiledModule(mlir::ModuleOp module,
    return expectedAddress.takeError();
  ret->descriptor =
      reinterpret_cast<npcomprt::ModuleDescriptor *>(*expectedAddress);
-  return ret;
+  return std::move(ret);
 }

 // Converter for bridging to npcomprt llvm-lookalike data structures.