torch-mlir/.github/workflows/bazelBuildAndTest.yml

name: Bazel Build and Test

on:
  push:
    branches: [ main ]
  workflow_dispatch:

# Ensure that only a single job or workflow using the same
# concurrency group will run at a time. This would cancel
# any in-progress jobs in the same github workflow and github
# ref (e.g. refs/heads/main or refs/pull/<pr_number>/merge).
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true


jobs:
  ubuntu-build:
    name: ubuntu-x86_64
    runs-on: ubuntu-22.04

    steps:
    - name: Checkout torch-mlir
      uses: actions/checkout@v3
      with:
        submodules: 'true'

    - name: Setup cache for bazel
      uses: actions/cache@v3
      with:
        path: ~/.cache/bazel
        key: ubuntu_x86_64_torch_mlir_bazel_build_cache

    # Change bazel cache directory to root ownership
    # to allow writing to it from within the docker container.
    # If no cache hits, this directory is not present
    # so don't run chown (will error otherwise).
    - name: Set bazel cache permissions
      run: |
        if [ -d "${HOME}/.cache/bazel" ]; then
          sudo chown -R root:root "${HOME}/.cache/bazel"
        fi

    - name: Build docker image
      run: |
        docker build -f utils/bazel/docker/Dockerfile \
                     -t torch-mlir:ci \
                     .

    - name: Bazel build torch-mlir
      run: |
        docker run --rm \
                   -v "$(pwd)":"/opt/src/torch-mlir" \
                   -v "${HOME}/.cache/bazel":"/root/.cache/bazel" \
                   torch-mlir:ci \
                   ./utils/bazel/docker/run_bazel_build.sh

    # Switch back bazel cache directory to user ownership
    # to allow GHA post-cache step to save cache without
    # permissions issue.
    - name: Switch bazel cache permissions
      run: |
        if [ -d "${HOME}/.cache/bazel" ]; then
          sudo chown -R "$USER":"$USER" "${HOME}/.cache/bazel"
        fi

    - name: Send mail
      if: failure()
      uses: dawidd6/action-send-mail@v3
      with:
        server_address: ${{ secrets.SMTP_SERVER }}
        server_port: ${{ secrets.SMTP_PORT }}
        username: ${{ secrets.SMTP_USERNAME }}
        password: ${{ secrets.SMTP_PASSWORD }}
        subject: GitHub Action Bazel Build and Test failed!
        body: Bazel Build job failed! See https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }} for more information.
        to: ${{ secrets.MAIL_RECEIVER }}
        from: Torch-MLIR Bazel Build GitHub Actions
Add bazel build support (2/N) (#744) - Add bazel GitHub actions. 2022-04-26 03:33:15 +08:00			`name: Bazel Build and Test`

			`on:`
			`push:`
Simplify matrix configuration for CI workflows (#1213) Addresses https://github.com/llvm/torch-mlir/issues/1207. #### Provisioned jobs: ``` # ubuntu - x86_64 - llvm in-tree - pytorch binary - build+test # most used dev flow and fastest signal # ubuntu - x86_64 - llvm out-of-tree - pytorch source - build+test # most elaborate build # macos - arm64 - llvm in-tree - pytorch source - build only # cross compile, can't test arm64 ``` #### Main changes - [x] Spawn macos builds from a separate matrix (in the same workflow). It made sense to do this as they are fairly different from ubuntu (cross compile, use a different cmake configuration). This simplifies the matrix configuration and exclusions quite a bit, and makes the workflow a bit more tractable and maintenance friendly. - [x] Remove the submodule md5sum step for ccache config. This was [broken](https://github.com/llvm/torch-mlir/runs/7779288734?check_suite_focus=true#step:3:145) for a while now. - [x] Removes unused matrix options - `os`, `targetarch`, `python-version`, `llvmtype`. - [x] Address ZSTD [comment](https://github.com/llvm/torch-mlir/pull/1204#discussion_r942349282) on @powderluv's cross compile [PR](https://github.com/llvm/torch-mlir/pull/1204). #### Further improvements (to be addressed in follow-on): * ubuntu-x86_64 out-of-tree integration tests fail ([error](https://github.com/sjain-stanford/torch-mlir/runs/7781264029?check_suite_focus=true)); only run unit tests for now (tests are excluded in current CI too) #### Passing workflow: https://github.com/sjain-stanford/torch-mlir/actions/runs/2840676309 ![image](https://user-images.githubusercontent.com/19234106/184194535-f3807991-401a-4cb9-b030-0ee8c334eba3.png) 2022-08-12 07:35:15 +08:00			`branches: [ main ]`
[Bazel] Allow workflow_dispatch manual trigger on bazel workflow (#1203) At the moment we don't gate torch-mlir PRs with bazel builds. This means bazel builds don't get run on open PRs, and so there's no good way to validate a fix PR which is meant to fix a broken bazel build. This option allows a bazel build to be manually triggered as needed on open PRs. 2022-08-10 04:28:21 +08:00			`workflow_dispatch:`
Add bazel build support (2/N) (#744) - Add bazel GitHub actions. 2022-04-26 03:33:15 +08:00
[Build] Add concurrency groups to address long queue times (#1219) We're seeing large CI queue times ([example](https://discord.com/channels/636084430946959380/742573221882364009/1007631811184164944)) especially with MacOS VMs on GHA. Part of the problem is follow-on commits to the same branch which trigger new runs while the previous runs are still in-progress, hogging on the scarce VMs. This PR adds concurrency groups to the GHA workflow which ensures that only a single job or workflow using the same concurrency group will run at a time. This would cancel any in-progress jobs in the same github workflow and github ref (e.g. `refs/heads/main` or `refs/pull/<pr_number>/merge`). As discussed on discord [thread](https://discord.com/channels/636084430946959380/1007787336848912386/1007787338895740928), once this lands we may have to closely monitor the workflows to see this didn't introduce unintended consequences. If so, we could either revert, or decide to selectively cancel particular runs (e.g. macos only which is the main bottleneck right now) instead of entire workflow. This will also require some expectation management. As in, if you see an ❌ on the main branch, it may not necessarily mean things broke, it could mean the run was killed by a more recent run. Making it a bit harder to traceback a failure to a commit in a sequence of commits (requiring to run those builds again). Thanks @powderluv for the proposal and pointer to this! It should help with the scarce VMs on GHA and save on queue time. References: * https://docs.github.com/en/actions/using-jobs/using-concurrency#example-only-cancel-in-progress-jobs-or-runs-for-the-current-workflow * https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#example-only-cancel-in-progress-jobs-or-runs-for-the-current-workflow 2022-08-13 08:38:48 +08:00			`# Ensure that only a single job or workflow using the same`
			`# concurrency group will run at a time. This would cancel`
			`# any in-progress jobs in the same github workflow and github`
			`# ref (e.g. refs/heads/main or refs/pull/<pr_number>/merge).`
			`concurrency:`
			`group: ${{ github.workflow }}-${{ github.ref }}`
			`cancel-in-progress: true`


Add bazel build support (2/N) (#744) - Add bazel GitHub actions. 2022-04-26 03:33:15 +08:00			`jobs:`
Simplify matrix configuration for CI workflows (#1213) Addresses https://github.com/llvm/torch-mlir/issues/1207. #### Provisioned jobs: ``` # ubuntu - x86_64 - llvm in-tree - pytorch binary - build+test # most used dev flow and fastest signal # ubuntu - x86_64 - llvm out-of-tree - pytorch source - build+test # most elaborate build # macos - arm64 - llvm in-tree - pytorch source - build only # cross compile, can't test arm64 ``` #### Main changes - [x] Spawn macos builds from a separate matrix (in the same workflow). It made sense to do this as they are fairly different from ubuntu (cross compile, use a different cmake configuration). This simplifies the matrix configuration and exclusions quite a bit, and makes the workflow a bit more tractable and maintenance friendly. - [x] Remove the submodule md5sum step for ccache config. This was [broken](https://github.com/llvm/torch-mlir/runs/7779288734?check_suite_focus=true#step:3:145) for a while now. - [x] Removes unused matrix options - `os`, `targetarch`, `python-version`, `llvmtype`. - [x] Address ZSTD [comment](https://github.com/llvm/torch-mlir/pull/1204#discussion_r942349282) on @powderluv's cross compile [PR](https://github.com/llvm/torch-mlir/pull/1204). #### Further improvements (to be addressed in follow-on): * ubuntu-x86_64 out-of-tree integration tests fail ([error](https://github.com/sjain-stanford/torch-mlir/runs/7781264029?check_suite_focus=true)); only run unit tests for now (tests are excluded in current CI too) #### Passing workflow: https://github.com/sjain-stanford/torch-mlir/actions/runs/2840676309 ![image](https://user-images.githubusercontent.com/19234106/184194535-f3807991-401a-4cb9-b030-0ee8c334eba3.png) 2022-08-12 07:35:15 +08:00			`ubuntu-build:`
			`name: ubuntu-x86_64`
			`runs-on: ubuntu-22.04`

Add bazel build support (2/N) (#744) - Add bazel GitHub actions. 2022-04-26 03:33:15 +08:00			`steps:`
Simplify matrix configuration for CI workflows (#1213) Addresses https://github.com/llvm/torch-mlir/issues/1207. #### Provisioned jobs: ``` # ubuntu - x86_64 - llvm in-tree - pytorch binary - build+test # most used dev flow and fastest signal # ubuntu - x86_64 - llvm out-of-tree - pytorch source - build+test # most elaborate build # macos - arm64 - llvm in-tree - pytorch source - build only # cross compile, can't test arm64 ``` #### Main changes - [x] Spawn macos builds from a separate matrix (in the same workflow). It made sense to do this as they are fairly different from ubuntu (cross compile, use a different cmake configuration). This simplifies the matrix configuration and exclusions quite a bit, and makes the workflow a bit more tractable and maintenance friendly. - [x] Remove the submodule md5sum step for ccache config. This was [broken](https://github.com/llvm/torch-mlir/runs/7779288734?check_suite_focus=true#step:3:145) for a while now. - [x] Removes unused matrix options - `os`, `targetarch`, `python-version`, `llvmtype`. - [x] Address ZSTD [comment](https://github.com/llvm/torch-mlir/pull/1204#discussion_r942349282) on @powderluv's cross compile [PR](https://github.com/llvm/torch-mlir/pull/1204). #### Further improvements (to be addressed in follow-on): * ubuntu-x86_64 out-of-tree integration tests fail ([error](https://github.com/sjain-stanford/torch-mlir/runs/7781264029?check_suite_focus=true)); only run unit tests for now (tests are excluded in current CI too) #### Passing workflow: https://github.com/sjain-stanford/torch-mlir/actions/runs/2840676309 ![image](https://user-images.githubusercontent.com/19234106/184194535-f3807991-401a-4cb9-b030-0ee8c334eba3.png) 2022-08-12 07:35:15 +08:00			`- name: Checkout torch-mlir`
Dockerize and Cache Bazel {Local, CI} Builds (#1240) This PR adds: - A minimal docker wrapper to the bazel GHA workflow to make it reproducible locally - Bazel cache to speed up GHA workflows (down to ~5 minutes from ~40+minutes) This is a no-op for non-bazel workflows and an incremental improvement. 2022-08-18 03:46:17 +08:00			`uses: actions/checkout@v3`
Add bazel build support (2/N) (#744) - Add bazel GitHub actions. 2022-04-26 03:33:15 +08:00			`with:`
			`submodules: 'true'`
Simplify matrix configuration for CI workflows (#1213) Addresses https://github.com/llvm/torch-mlir/issues/1207. #### Provisioned jobs: ``` # ubuntu - x86_64 - llvm in-tree - pytorch binary - build+test # most used dev flow and fastest signal # ubuntu - x86_64 - llvm out-of-tree - pytorch source - build+test # most elaborate build # macos - arm64 - llvm in-tree - pytorch source - build only # cross compile, can't test arm64 ``` #### Main changes - [x] Spawn macos builds from a separate matrix (in the same workflow). It made sense to do this as they are fairly different from ubuntu (cross compile, use a different cmake configuration). This simplifies the matrix configuration and exclusions quite a bit, and makes the workflow a bit more tractable and maintenance friendly. - [x] Remove the submodule md5sum step for ccache config. This was [broken](https://github.com/llvm/torch-mlir/runs/7779288734?check_suite_focus=true#step:3:145) for a while now. - [x] Removes unused matrix options - `os`, `targetarch`, `python-version`, `llvmtype`. - [x] Address ZSTD [comment](https://github.com/llvm/torch-mlir/pull/1204#discussion_r942349282) on @powderluv's cross compile [PR](https://github.com/llvm/torch-mlir/pull/1204). #### Further improvements (to be addressed in follow-on): * ubuntu-x86_64 out-of-tree integration tests fail ([error](https://github.com/sjain-stanford/torch-mlir/runs/7781264029?check_suite_focus=true)); only run unit tests for now (tests are excluded in current CI too) #### Passing workflow: https://github.com/sjain-stanford/torch-mlir/actions/runs/2840676309 ![image](https://user-images.githubusercontent.com/19234106/184194535-f3807991-401a-4cb9-b030-0ee8c334eba3.png) 2022-08-12 07:35:15 +08:00
Dockerize and Cache Bazel {Local, CI} Builds (#1240) This PR adds: - A minimal docker wrapper to the bazel GHA workflow to make it reproducible locally - Bazel cache to speed up GHA workflows (down to ~5 minutes from ~40+minutes) This is a no-op for non-bazel workflows and an incremental improvement. 2022-08-18 03:46:17 +08:00			`- name: Setup cache for bazel`
			`uses: actions/cache@v3`
			`with:`
			`path: ~/.cache/bazel`
			`key: ubuntu_x86_64_torch_mlir_bazel_build_cache`

[Bazel] Check cache directory exists before changing owners (#1241) This fixes a seeding issue with the [previous PR](https://github.com/llvm/torch-mlir/pull/1240) where bazel build's GHA cache is not present to begin with and one of the commands (chown) fails on it. Should get the Bazel build back to green. 2022-08-18 08:04:50 +08:00			`# Change bazel cache directory to root ownership`
			`# to allow writing to it from within the docker container.`
			`# If no cache hits, this directory is not present`
			`# so don't run chown (will error otherwise).`
Dockerize and Cache Bazel {Local, CI} Builds (#1240) This PR adds: - A minimal docker wrapper to the bazel GHA workflow to make it reproducible locally - Bazel cache to speed up GHA workflows (down to ~5 minutes from ~40+minutes) This is a no-op for non-bazel workflows and an incremental improvement. 2022-08-18 03:46:17 +08:00			`- name: Set bazel cache permissions`
			`run: \|`
[Bazel] Check cache directory exists before changing owners (#1241) This fixes a seeding issue with the [previous PR](https://github.com/llvm/torch-mlir/pull/1240) where bazel build's GHA cache is not present to begin with and one of the commands (chown) fails on it. Should get the Bazel build back to green. 2022-08-18 08:04:50 +08:00			`if [ -d "${HOME}/.cache/bazel" ]; then`
			`sudo chown -R root:root "${HOME}/.cache/bazel"`
			`fi`
Dockerize and Cache Bazel {Local, CI} Builds (#1240) This PR adds: - A minimal docker wrapper to the bazel GHA workflow to make it reproducible locally - Bazel cache to speed up GHA workflows (down to ~5 minutes from ~40+minutes) This is a no-op for non-bazel workflows and an incremental improvement. 2022-08-18 03:46:17 +08:00
			`- name: Build docker image`
			`run: \|`
			`docker build -f utils/bazel/docker/Dockerfile \`
			`-t torch-mlir:ci \`
			`.`

			`- name: Bazel build torch-mlir`
			`run: \|`
			`docker run --rm \`
			`-v "$(pwd)":"/opt/src/torch-mlir" \`
			`-v "${HOME}/.cache/bazel":"/root/.cache/bazel" \`
			`torch-mlir:ci \`
			`./utils/bazel/docker/run_bazel_build.sh`

[Bazel] Check cache directory exists before changing owners (#1241) This fixes a seeding issue with the [previous PR](https://github.com/llvm/torch-mlir/pull/1240) where bazel build's GHA cache is not present to begin with and one of the commands (chown) fails on it. Should get the Bazel build back to green. 2022-08-18 08:04:50 +08:00			`# Switch back bazel cache directory to user ownership`
			`# to allow GHA post-cache step to save cache without`
			`# permissions issue.`
Dockerize and Cache Bazel {Local, CI} Builds (#1240) This PR adds: - A minimal docker wrapper to the bazel GHA workflow to make it reproducible locally - Bazel cache to speed up GHA workflows (down to ~5 minutes from ~40+minutes) This is a no-op for non-bazel workflows and an incremental improvement. 2022-08-18 03:46:17 +08:00			`- name: Switch bazel cache permissions`
Add bazel build support (2/N) (#744) - Add bazel GitHub actions. 2022-04-26 03:33:15 +08:00			`run: \|`
[Bazel] Check cache directory exists before changing owners (#1241) This fixes a seeding issue with the [previous PR](https://github.com/llvm/torch-mlir/pull/1240) where bazel build's GHA cache is not present to begin with and one of the commands (chown) fails on it. Should get the Bazel build back to green. 2022-08-18 08:04:50 +08:00			`if [ -d "${HOME}/.cache/bazel" ]; then`
			`sudo chown -R "$USER":"$USER" "${HOME}/.cache/bazel"`
			`fi`
Simplify matrix configuration for CI workflows (#1213) Addresses https://github.com/llvm/torch-mlir/issues/1207. #### Provisioned jobs: ``` # ubuntu - x86_64 - llvm in-tree - pytorch binary - build+test # most used dev flow and fastest signal # ubuntu - x86_64 - llvm out-of-tree - pytorch source - build+test # most elaborate build # macos - arm64 - llvm in-tree - pytorch source - build only # cross compile, can't test arm64 ``` #### Main changes - [x] Spawn macos builds from a separate matrix (in the same workflow). It made sense to do this as they are fairly different from ubuntu (cross compile, use a different cmake configuration). This simplifies the matrix configuration and exclusions quite a bit, and makes the workflow a bit more tractable and maintenance friendly. - [x] Remove the submodule md5sum step for ccache config. This was [broken](https://github.com/llvm/torch-mlir/runs/7779288734?check_suite_focus=true#step:3:145) for a while now. - [x] Removes unused matrix options - `os`, `targetarch`, `python-version`, `llvmtype`. - [x] Address ZSTD [comment](https://github.com/llvm/torch-mlir/pull/1204#discussion_r942349282) on @powderluv's cross compile [PR](https://github.com/llvm/torch-mlir/pull/1204). #### Further improvements (to be addressed in follow-on): * ubuntu-x86_64 out-of-tree integration tests fail ([error](https://github.com/sjain-stanford/torch-mlir/runs/7781264029?check_suite_focus=true)); only run unit tests for now (tests are excluded in current CI too) #### Passing workflow: https://github.com/sjain-stanford/torch-mlir/actions/runs/2840676309 ![image](https://user-images.githubusercontent.com/19234106/184194535-f3807991-401a-4cb9-b030-0ee8c334eba3.png) 2022-08-12 07:35:15 +08:00
Add bazel build support (2/N) (#744) - Add bazel GitHub actions. 2022-04-26 03:33:15 +08:00			`- name: Send mail`
			`if: failure()`
			`uses: dawidd6/action-send-mail@v3`
			`with:`
			`server_address: ${{ secrets.SMTP_SERVER }}`
			`server_port: ${{ secrets.SMTP_PORT }}`
			`username: ${{ secrets.SMTP_USERNAME }}`
			`password: ${{ secrets.SMTP_PASSWORD }}`
			`subject: GitHub Action Bazel Build and Test failed!`
			`body: Bazel Build job failed! See https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }} for more information.`
			`to: ${{ secrets.MAIL_RECEIVER }}`
			`from: Torch-MLIR Bazel Build GitHub Actions`