2020-08-05 09:57:59 +08:00
|
|
|
name: Build and Test
|
|
|
|
|
2021-10-22 12:23:21 +08:00
|
|
|
on:
|
2021-10-06 07:26:26 +08:00
|
|
|
pull_request:
|
2022-08-12 07:35:15 +08:00
|
|
|
branches: [ main ]
|
|
|
|
push:
|
|
|
|
branches: [ main ]
|
2021-10-06 07:26:26 +08:00
|
|
|
workflow_dispatch:
|
2020-08-05 09:57:59 +08:00
|
|
|
|
2022-08-13 08:38:48 +08:00
|
|
|
# Ensure that only a single job or workflow using the same
|
|
|
|
# concurrency group will run at a time. This would cancel
|
|
|
|
# any in-progress jobs in the same github workflow and github
|
|
|
|
# ref (e.g. refs/heads/main or refs/pull/<pr_number>/merge).
|
|
|
|
concurrency:
|
|
|
|
group: ${{ github.workflow }}-${{ github.ref }}
|
|
|
|
cancel-in-progress: true
|
|
|
|
|
|
|
|
|
2022-08-12 07:35:15 +08:00
|
|
|
# Provisioned Jobs:
|
2022-09-03 09:35:40 +08:00
|
|
|
# ubuntu/docker - x86_64 - llvm in-tree - pytorch binary - build+test # most used dev flow and fastest signal
|
|
|
|
# ubuntu/docker - x86_64 - llvm out-of-tree - pytorch source - build+test # most elaborate build
|
2022-08-12 21:33:57 +08:00
|
|
|
# macos - arm64 - llvm in-tree - pytorch binary - build only # cross compile, can't test arm64
|
2020-08-05 09:57:59 +08:00
|
|
|
jobs:
|
2022-08-13 02:30:09 +08:00
|
|
|
build-test:
|
2022-07-30 01:52:46 +08:00
|
|
|
strategy:
|
|
|
|
fail-fast: true
|
|
|
|
matrix:
|
2022-10-26 07:13:31 +08:00
|
|
|
os-arch: [ubuntu-x86_64, macos-arm64, windows-x86_64]
|
2022-08-12 07:35:15 +08:00
|
|
|
llvm-build: [in-tree, out-of-tree]
|
2022-07-30 01:52:46 +08:00
|
|
|
torch-binary: [ON, OFF]
|
2023-05-31 03:16:24 +08:00
|
|
|
torch-version: [nightly, stable]
|
2022-07-30 01:52:46 +08:00
|
|
|
exclude:
|
2022-08-12 07:35:15 +08:00
|
|
|
# Exclude llvm in-tree and pytorch source
|
|
|
|
- llvm-build: in-tree
|
2022-07-30 01:52:46 +08:00
|
|
|
torch-binary: OFF
|
2023-11-19 11:43:35 +08:00
|
|
|
# Exclude llvm out-of-tree and pytorch source
|
2022-08-12 07:35:15 +08:00
|
|
|
- llvm-build: out-of-tree
|
2023-11-19 11:43:35 +08:00
|
|
|
torch-binary: OFF
|
2022-08-13 02:30:09 +08:00
|
|
|
# Exclude macos-arm64 and llvm out-of-tree altogether
|
|
|
|
- os-arch: macos-arm64
|
|
|
|
llvm-build: out-of-tree
|
2023-05-31 03:16:24 +08:00
|
|
|
- os-arch: macos-arm64
|
|
|
|
torch-version: stable
|
2022-10-26 07:13:31 +08:00
|
|
|
- os-arch: windows-x86_64
|
|
|
|
llvm-build: out-of-tree
|
2023-05-31 03:16:24 +08:00
|
|
|
- os-arch: windows-x86_64
|
|
|
|
torch-version: stable
|
2023-06-02 13:58:25 +08:00
|
|
|
# For PyTorch stable builds, we don't build PyTorch from source
|
|
|
|
- torch-version: stable
|
|
|
|
torch-binary: OFF
|
2022-08-13 02:30:09 +08:00
|
|
|
include:
|
|
|
|
# Specify OS versions
|
|
|
|
- os-arch: ubuntu-x86_64
|
2022-12-15 21:42:01 +08:00
|
|
|
os: a100
|
2022-08-13 02:30:09 +08:00
|
|
|
- os-arch: macos-arm64
|
CI script improvements (#1547)
* ci: update versions of external actions
Node.js 12 actions are deprecated and will eventually go away, so this
patch bumps the old actions to their latest versions that use Node.js
16.
* ci: replace deprecated action with bash commands
The llvm/actions/install-ninja action uses Node.js 12, which is
deprecated. Since that action is not updated to work with Node.js 16,
this patch replaces that action with equivalent bash commands to install
Ninja.
* ci: use smaller ccache artifacts to reduce evictions
Over time, our ccache sizes have grown quite large (some as large as
1.3 GB), which results in us routinely exceeding GitHub's limits, thus
triggering frequent cache evictions. As a result, cache downloads and
uploads take unnecessary long, in addition to fewer cache entries being
available.
Based on experiments on a clean cache state, it appears that we need
less than 300 MB of (compressed) ccache artifacts for each build type.
Anything larger than that will accrue changes from the past that aren't
needed.
To alleviate the cache burden, this patch sets the maximum ccache size
to be 300 MB. This change should not affect the success or failure of
our builds. I will monitor the build times to check whether this change
causes any performance degradation.
* ci: use consistent platform identifiers
Prior to this patch, some of our builds ran on `ubuntu-latest`, while
some others ran on `ubuntu-20.04` and others ran on `ubuntu-22.04`, with
similar situations for macOS and windows. This patch instead sets all
Linux builds to run on `ubuntu-latest`, all macOS builds to run on
`macos-latest`, and all Windows builds to run on `windows-latest`, to
make debugging future CI failures a little easier.
2022-11-03 10:37:01 +08:00
|
|
|
os: macos-latest
|
2022-10-26 07:13:31 +08:00
|
|
|
- os-arch: windows-x86_64
|
|
|
|
os: windows-latest
|
2022-08-13 02:30:09 +08:00
|
|
|
runs-on: ${{ matrix.os }}
|
2022-07-30 01:52:46 +08:00
|
|
|
|
2020-08-05 09:57:59 +08:00
|
|
|
steps:
|
2023-04-07 01:36:30 +08:00
|
|
|
|
|
|
|
- name: Prepare workspace
|
|
|
|
if: ${{ matrix.os-arch == 'ubuntu-x86_64' }}
|
|
|
|
run: |
|
|
|
|
# Clear the workspace directory so that we don't run into errors about
|
|
|
|
# existing lock files.
|
|
|
|
sudo rm -rf $GITHUB_WORKSPACE/*
|
|
|
|
|
2022-08-10 03:13:16 +08:00
|
|
|
- name: Checkout torch-mlir
|
CI script improvements (#1547)
* ci: update versions of external actions
Node.js 12 actions are deprecated and will eventually go away, so this
patch bumps the old actions to their latest versions that use Node.js
16.
* ci: replace deprecated action with bash commands
The llvm/actions/install-ninja action uses Node.js 12, which is
deprecated. Since that action is not updated to work with Node.js 16,
this patch replaces that action with equivalent bash commands to install
Ninja.
* ci: use smaller ccache artifacts to reduce evictions
Over time, our ccache sizes have grown quite large (some as large as
1.3 GB), which results in us routinely exceeding GitHub's limits, thus
triggering frequent cache evictions. As a result, cache downloads and
uploads take unnecessary long, in addition to fewer cache entries being
available.
Based on experiments on a clean cache state, it appears that we need
less than 300 MB of (compressed) ccache artifacts for each build type.
Anything larger than that will accrue changes from the past that aren't
needed.
To alleviate the cache burden, this patch sets the maximum ccache size
to be 300 MB. This change should not affect the success or failure of
our builds. I will monitor the build times to check whether this change
causes any performance degradation.
* ci: use consistent platform identifiers
Prior to this patch, some of our builds ran on `ubuntu-latest`, while
some others ran on `ubuntu-20.04` and others ran on `ubuntu-22.04`, with
similar situations for macOS and windows. This patch instead sets all
Linux builds to run on `ubuntu-latest`, all macOS builds to run on
`macos-latest`, and all Windows builds to run on `windows-latest`, to
make debugging future CI failures a little easier.
2022-11-03 10:37:01 +08:00
|
|
|
uses: actions/checkout@v3
|
2023-04-07 01:36:30 +08:00
|
|
|
with:
|
|
|
|
submodules: 'true'
|
2023-04-15 02:36:41 +08:00
|
|
|
fetch-depth: 0
|
2022-07-30 01:52:46 +08:00
|
|
|
|
2022-10-18 13:42:42 +08:00
|
|
|
- name: Fetch PyTorch commit hash
|
2022-10-26 07:13:31 +08:00
|
|
|
if: ${{ matrix.os-arch != 'windows-x86_64' }}
|
2022-10-18 13:42:42 +08:00
|
|
|
run: |
|
2022-11-01 11:03:05 +08:00
|
|
|
PT_HASH="$(cat ${GITHUB_WORKSPACE}/pytorch-hash.txt)"
|
2022-10-18 13:42:42 +08:00
|
|
|
echo "PT_HASH=${PT_HASH}" >> ${GITHUB_ENV}
|
|
|
|
|
2022-08-12 07:35:15 +08:00
|
|
|
- name: Setup ccache
|
|
|
|
uses: ./.github/actions/setup-build
|
|
|
|
with:
|
2023-05-31 03:16:24 +08:00
|
|
|
cache-suffix: 'build-${{ matrix.llvm-build }}-${{ matrix.torch-version }}'
|
|
|
|
torch-version: ${{ matrix.torch-version }}
|
2022-08-10 23:48:39 +08:00
|
|
|
|
2022-10-26 07:13:31 +08:00
|
|
|
- name: Set up Visual Studio shell
|
|
|
|
if: ${{ matrix.os-arch == 'windows-x86_64' }}
|
|
|
|
uses: egor-tensin/vs-shell@v2
|
|
|
|
with:
|
|
|
|
arch: x64
|
|
|
|
|
2023-05-13 12:25:17 +08:00
|
|
|
- name: Try to Restore PyTorch Build Cache
|
2023-06-02 13:58:25 +08:00
|
|
|
if: ${{ matrix.torch-binary == 'OFF' }}
|
2022-10-18 13:42:42 +08:00
|
|
|
id: cache-pytorch
|
2023-05-13 12:25:17 +08:00
|
|
|
uses: actions/cache/restore@v3
|
2022-10-18 13:42:42 +08:00
|
|
|
with:
|
|
|
|
path: ${{ github.workspace }}/build_tools/python_deploy/wheelhouse
|
2022-11-02 14:26:17 +08:00
|
|
|
key: ${{ runner.os }}-pytorch-${{ env.PT_HASH }}
|
2022-10-18 13:42:42 +08:00
|
|
|
|
2022-09-03 09:35:40 +08:00
|
|
|
- name: Build and Test os-arch='ubuntu-x86_64' llvm-build='${{ matrix.llvm-build }}' torch-binary='${{ matrix.torch-binary }}'
|
|
|
|
if: ${{ matrix.os-arch == 'ubuntu-x86_64' }}
|
2022-04-04 23:20:56 +08:00
|
|
|
run: |
|
2022-09-03 09:35:40 +08:00
|
|
|
cd $GITHUB_WORKSPACE
|
2022-11-01 11:03:05 +08:00
|
|
|
TORCH_MLIR_SRC_PYTORCH_BRANCH="$(cat pytorch-hash.txt)" \
|
2022-10-18 13:42:42 +08:00
|
|
|
TM_PACKAGES="${{ matrix.llvm-build }}" \
|
|
|
|
TM_USE_PYTORCH_BINARY="${{ matrix.torch-binary }}" \
|
|
|
|
TM_PYTORCH_INSTALL_WITHOUT_REBUILD="${{ steps.cache-pytorch.outputs.cache-hit }}" \
|
2023-05-31 03:16:24 +08:00
|
|
|
TM_TORCH_VERSION="${{ matrix.torch-version }}" \
|
2022-10-18 13:42:42 +08:00
|
|
|
./build_tools/python_deploy/build_linux_packages.sh
|
2023-05-13 12:25:17 +08:00
|
|
|
|
2022-08-13 02:30:09 +08:00
|
|
|
- name: Configure os-arch='macos-arm64' llvm-build='in-tree' torch-binary='${{ matrix.torch-binary }}'
|
|
|
|
# cross compile, can't test arm64
|
|
|
|
if: ${{ matrix.os-arch == 'macos-arm64' && matrix.llvm-build == 'in-tree' }}
|
2022-08-12 07:35:15 +08:00
|
|
|
run: |
|
2022-08-20 03:25:00 +08:00
|
|
|
# TODO: Reenable LTC after build on macOS-arm64 is fixed (https://github.com/llvm/torch-mlir/issues/1253)
|
2022-08-12 07:35:15 +08:00
|
|
|
cmake -GNinja -Bbuild_arm64 \
|
|
|
|
-DCMAKE_BUILD_TYPE=Release \
|
|
|
|
-DCMAKE_C_COMPILER=clang \
|
|
|
|
-DCMAKE_CXX_COMPILER=clang++ \
|
|
|
|
-DCMAKE_C_COMPILER_LAUNCHER=ccache \
|
|
|
|
-DCMAKE_CXX_COMPILER_LAUNCHER=ccache \
|
|
|
|
-DCMAKE_LINKER=lld \
|
|
|
|
-DCMAKE_OSX_ARCHITECTURES=arm64 \
|
|
|
|
-DLLVM_ENABLE_ASSERTIONS=ON \
|
|
|
|
-DLLVM_ENABLE_PROJECTS=mlir \
|
2023-11-03 10:45:55 +08:00
|
|
|
-DLLVM_EXTERNAL_PROJECTS="torch-mlir" \
|
2022-08-12 07:35:15 +08:00
|
|
|
-DLLVM_EXTERNAL_TORCH_MLIR_SOURCE_DIR="$GITHUB_WORKSPACE" \
|
|
|
|
-DLLVM_TARGETS_TO_BUILD=AArch64 \
|
|
|
|
-DLLVM_USE_HOST_TOOLS=ON \
|
|
|
|
-DLLVM_ENABLE_ZSTD=OFF \
|
|
|
|
-DMLIR_ENABLE_BINDINGS_PYTHON=ON \
|
2023-02-02 21:29:47 +08:00
|
|
|
-DTORCH_MLIR_ENABLE_STABLEHLO=OFF \
|
2022-08-20 03:25:00 +08:00
|
|
|
-DTORCH_MLIR_ENABLE_LTC=OFF \
|
2022-08-12 07:35:15 +08:00
|
|
|
-DTORCH_MLIR_USE_INSTALLED_PYTORCH="${{ matrix.torch-binary }}" \
|
|
|
|
-DMACOSX_DEPLOYMENT_TARGET=12.0 \
|
|
|
|
-DPython3_EXECUTABLE="$(which python)" \
|
|
|
|
$GITHUB_WORKSPACE/externals/llvm-project/llvm
|
2023-05-13 12:25:17 +08:00
|
|
|
|
2022-08-12 07:35:15 +08:00
|
|
|
- name: Build torch-mlir (cross-compile)
|
2022-08-13 02:30:09 +08:00
|
|
|
if: ${{ matrix.os-arch == 'macos-arm64' }}
|
2022-08-12 07:35:15 +08:00
|
|
|
run: |
|
|
|
|
cmake --build build_arm64
|
ci: enable ccache on Windows (#1548)
This patch makes a few small, but key, changes to enable ccache on
Windows. First, it replaces the hendrikmuhs/ccache-action action with
command line invocations to the ccache binary, since the action has two
bugs, one of which causes CI to refer to different ccache artifacts
before versus after the build on Windows whereas the other bug can
sometimes cause the action to incorrectly infer that the cache is empty.
Second, this patch slightly alters the cache key, so that our old cache
artifacts, which have grown too big, are eventually discarded in favor
of the new, smaller cache artifacts. Along the way, this patch also
keeps the RollPyTorch's cache artifact separate from the regular build's
cache artifact so as to keep these artifacts small, and also because the
RollPyTorch action is off the critical path for most contributors.
Finally, this patch makes small changes to the CMake file so that on
Windows, the ccache binary is added as a prefix, as recommended on the
[ccache Wiki](https://github.com/ccache/ccache/wiki/MS-Visual-Studio).
2022-11-04 01:17:22 +08:00
|
|
|
|
|
|
|
- name: Build (Windows)
|
2022-10-26 07:13:31 +08:00
|
|
|
if: ${{ matrix.os-arch == 'windows-x86_64' }}
|
ci: enable ccache on Windows (#1548)
This patch makes a few small, but key, changes to enable ccache on
Windows. First, it replaces the hendrikmuhs/ccache-action action with
command line invocations to the ccache binary, since the action has two
bugs, one of which causes CI to refer to different ccache artifacts
before versus after the build on Windows whereas the other bug can
sometimes cause the action to incorrectly infer that the cache is empty.
Second, this patch slightly alters the cache key, so that our old cache
artifacts, which have grown too big, are eventually discarded in favor
of the new, smaller cache artifacts. Along the way, this patch also
keeps the RollPyTorch's cache artifact separate from the regular build's
cache artifact so as to keep these artifacts small, and also because the
RollPyTorch action is off the critical path for most contributors.
Finally, this patch makes small changes to the CMake file so that on
Windows, the ccache binary is added as a prefix, as recommended on the
[ccache Wiki](https://github.com/ccache/ccache/wiki/MS-Visual-Studio).
2022-11-04 01:17:22 +08:00
|
|
|
shell: bash
|
|
|
|
run: ./build_tools/python_deploy/build_windows_ci.sh
|
2022-12-14 23:50:43 +08:00
|
|
|
|
2023-05-13 12:25:17 +08:00
|
|
|
- name: Save PyTorch Build Cache
|
2023-06-02 13:58:25 +08:00
|
|
|
if: ${{ github.ref_name == 'main' && matrix.torch-binary == 'OFF' }}
|
2023-05-13 12:25:17 +08:00
|
|
|
uses: actions/cache/save@v3
|
|
|
|
with:
|
|
|
|
path: ${{ github.workspace }}/build_tools/python_deploy/wheelhouse
|
|
|
|
key: ${{ runner.os }}-pytorch-${{ env.PT_HASH }}
|
|
|
|
|
CI: prepare CI for ccache updates for MSVC/Windows (#2120)
This patch, by itself, doesn't fix caching on Windows, but once a new
release of ccache is available, caching for Windows builds should start
working again (validated by building ccache from source and using it
with LLVM builds).
Ccache rejects caching when either the `/Zi` or `/ZI` flags are used
during compilation on Windows, since these flags tell the compiler to
embed debug information in a PDB file (separate from the object file
produced by the compiler). In particular, our CI builds add the `/Zi`
flag, making ccache mark these compiler invocations as uncacheable.
But what caused our CI to add debug flags, especially when we specified
`-DCMAKE_BUILD_TYPE=Release`? On Windows, unless we specify the
`--config Release` flag during the CMake build step, CMake assumes a
debug build. So all this while, we had been producing debug builds of
torch-mlir for every PR! No doubt it took so long to build the Windows
binaries.
The reason for having to specify the configuration during the _build_
step (as opposed to the _configure_ step) of CMake on Windows is that
CMake's Visual Studio generators will produce _both_ Release and Debug
profiles during the CMake configure step (thus requiring a build-time
value that tells CMake whether to build in Release or Debug mode).
Luckily, on Linux and macOS, the `--config` flag seems to be simply
ignored, instead of causing build errors.
Strangely, based on cursory tests, it seems like on Windows we need to
specify the Relase configuration as both `-DCMAKE_BUILD_TYPE=Release` as
well as `--config Release`. Dropping either made my build switch to a
Debug configuration.
Additionally, there is a bug in ccache v4.8 (although this is addressed
in trunk) that causes ccache to reject caching if the compiler
invocation includes any flag that starts with `/Z`, including /`Zc`,
which is added by LLVM's HandleLLVMOptions.cmake and which isn't related
to debug info or PDB files. The next release of ccache should include
the fix, which is to reject caching only for `/Zi` and `/ZI` flags and
not all flags that start with `/Z`.
As a side note, debugging this problem was possible because of ccache's
log file, which is enabled by: `ccache --set-config="log_file=log.txt"`.
2023-05-13 01:45:01 +08:00
|
|
|
- name: Print ccache statistics
|
2022-12-14 23:50:43 +08:00
|
|
|
shell: bash
|
CI: prepare CI for ccache updates for MSVC/Windows (#2120)
This patch, by itself, doesn't fix caching on Windows, but once a new
release of ccache is available, caching for Windows builds should start
working again (validated by building ccache from source and using it
with LLVM builds).
Ccache rejects caching when either the `/Zi` or `/ZI` flags are used
during compilation on Windows, since these flags tell the compiler to
embed debug information in a PDB file (separate from the object file
produced by the compiler). In particular, our CI builds add the `/Zi`
flag, making ccache mark these compiler invocations as uncacheable.
But what caused our CI to add debug flags, especially when we specified
`-DCMAKE_BUILD_TYPE=Release`? On Windows, unless we specify the
`--config Release` flag during the CMake build step, CMake assumes a
debug build. So all this while, we had been producing debug builds of
torch-mlir for every PR! No doubt it took so long to build the Windows
binaries.
The reason for having to specify the configuration during the _build_
step (as opposed to the _configure_ step) of CMake on Windows is that
CMake's Visual Studio generators will produce _both_ Release and Debug
profiles during the CMake configure step (thus requiring a build-time
value that tells CMake whether to build in Release or Debug mode).
Luckily, on Linux and macOS, the `--config` flag seems to be simply
ignored, instead of causing build errors.
Strangely, based on cursory tests, it seems like on Windows we need to
specify the Relase configuration as both `-DCMAKE_BUILD_TYPE=Release` as
well as `--config Release`. Dropping either made my build switch to a
Debug configuration.
Additionally, there is a bug in ccache v4.8 (although this is addressed
in trunk) that causes ccache to reject caching if the compiler
invocation includes any flag that starts with `/Z`, including /`Zc`,
which is added by LLVM's HandleLLVMOptions.cmake and which isn't related
to debug info or PDB files. The next release of ccache should include
the fix, which is to reject caching only for `/Zi` and `/ZI` flags and
not all flags that start with `/Z`.
As a side note, debugging this problem was possible because of ccache's
log file, which is enabled by: `ccache --set-config="log_file=log.txt"`.
2023-05-13 01:45:01 +08:00
|
|
|
run: ccache --show-stats
|