torch-mlir/.github/workflows/releaseSnapshotPackage.yml

# yamllint disable rule:line-length
name: Release snapshot package

on:
  # schedule:
  #   - cron: '0 11 * * *'
  workflow_dispatch:

jobs:
  release_snapshot_package:
    name: "Tag snapshot release"
    runs-on: ubuntu-latest
    # Don't run this in everyone's forks.
    if: github.repository == 'llvm/torch-mlir'
    steps:

      - name: Prepare workspace
        run: |
          # Clear the workspace directory so that we don't run into errors about
          # existing lock files.
          sudo rm -rf $GITHUB_WORKSPACE/*

      - name: Checking out repository
        uses: actions/checkout@v3
        with:
          token: ${{ secrets.WORKFLOW_INVOCATION_TOKEN }}

      - name: Compute version
        run: |
          git fetch --depth=1 origin +refs/tags/*:refs/tags/*
          package_version="$(printf '%(%Y%m%d)T.${{ github.run_number }}')"
          tag_name="snapshot-${package_version}"
          echo "package_version=${package_version}" >> $GITHUB_ENV
          echo "tag_name=${tag_name}" >> $GITHUB_ENV

      - name: Updating snapshot tag
        run: |
          git tag "${tag_name}"

      - name: Pushing changes
        uses: ad-m/github-push-action@v0.6.0
        with:
          github_token: ${{ secrets.WORKFLOW_INVOCATION_TOKEN }}
          branch: main
          tags: true

      - name: Create Release
        id: create_release
        uses: actions/create-release@v1
        env:
          GITHUB_TOKEN: ${{ secrets.WORKFLOW_INVOCATION_TOKEN }}
        with:
          tag_name: ${{ env.tag_name }}
          release_name: torch-mlir snapshot ${{ env.tag_name }}
          body: |
            Automatic snapshot release of torch-mlir.
          draft: true
          prerelease: false

      - name: "Invoke workflow :: Build and Test"
        uses: benc-uk/workflow-dispatch@v1
        with:
          workflow: Build and Test
          token: ${{ secrets.WORKFLOW_INVOCATION_TOKEN }}
          ref: "${{ env.tag_name }}"

      - name: "Invoke workflow :: Release Build"
        uses: benc-uk/workflow-dispatch@v1
        with:
          workflow: Release Build
          token: ${{ secrets.WORKFLOW_INVOCATION_TOKEN }}
          ref: "${{ env.tag_name }}"
          inputs: '{"release_id": "${{ steps.create_release.outputs.id }}", "python_package_version": "${{ env.package_version }}"}'
[ci] Add lint job and enable yaml linting of GH files. (#2819) 2024-01-28 07:48:06 +08:00			`# yamllint disable rule:line-length`
Add torch_mlir snapshot packages. This closely follows IREE's [schedule_snapshot_release.yml](https://github.com/google/iree/blob/f2f153d39472b3abb1b629517e3b2c0bb1812c77/.github/workflows/schedule_snapshot_release.yml#L1) workflow. The snapshot releases can be installed with: ``` python -m pip install torch_mlir -f "https://github.com/llvm/torch-mlir/releases" ``` 2021-10-06 07:26:26 +08:00			`name: Release snapshot package`

			`on:`
[ci] Upgrade to new runners and disable unsupported jobs. (#2818) Per the RFC and numerous conversations on Discord, this rebuilds the torch-mlir CI and discontinues the infra and coupling to the binary releases (https://discourse.llvm.org/t/rfc-discontinuing-pytorch-1-binary-releases/76371). I iterated on this to get latency back to about what it was with the old (much larger and non-ephemeral) runners: About 4m - 4.5m for an incremental change. Behind the scenes changes: * Uses a new runner pool operated by AMD. It is currently set to manual scaling and has two runners (32-core, 64GiB RAM) while we get some traction. We can either fiddle with some auto-scaling or use a schedule to give it an increase during certain high traffic hours. * Builds are now completely isolated and cannot have run-to-run interference like we were getting before (i.e. lock file/permissions stuff). * The GHA runner is installed directly into a manylinux 2.28 container with upgraded dev tools. This eliminates the need to do sub-invocations of docker on Linux in order to run on the same OS that is used to build wheels. * While not using it now, this setup was cloned from another project that posts the built artifacts to the job and fans out testing. Might be useful here later. * Uses a special git cache that lets us have ephemeral runners and still check out the repo and deps (incl. llvm) in ~13s. * Running in an Azure VM Scale Set. In-repo changes: * Disables (but does not yet delete): * Old buildAndTest.yml jobs * releaseSnapshotPackage.yml * Adds a new `ci.yml` pipeline and scripts the steps in `build_tools/ci` (by decomposing the existing `build_linux_packages.sh` for in-tree builds and modularizing it a bit better). * Test framework changes: * Adds a `TORCH_MLIR_TEST_CONCURRENCY` env var that can be used to bound the multiprocess concurrency. Ended up not using this in the final version but is useful to have as a knob. * Changes the default concurrency to `nproc * 0.8 + 1` vs `nproc * 1.1`. We're running on systems with significantly less virtual memory and I did a bit of fiddling to find a good tradeoff. * Changed multiprocess mode to spawn instead of fork. Otherwise, I was getting instability (as discussed on discord). * Added MLIR configuration to disable multithreaded contexts globally for the project. Constantly spawning `nproc * nproc` threads (more than that actually) was OOM'ing. * Added a test timeout of 5 minutes. If a multiprocess worker crashes, the framework can get wedged indefinitely (and then will just be reaped after multiple hours). We should fix this, but this at least keeps the CI pool from wedging with stuck jobs. Functional changes needing followup: * No matter what I did, I couldn't get the LTC tests to work, and I'm not 100% sure they were being run in the old setup as the scripts were a bit twisty. I disabled them and left a comment. * Dropped out-of-tree build variants. These were not providing much signal and increase CI needs by 50%. * Dropped MacOS and Windows builds. Now that we are "just a library" and not building releases, there is less pressure to test these commit by commit. Further, since we bump torch-mlir to known good commits on these platforms, it has been a long time since either of these jobs have provided much signal (and they take ~an hour+ to run). We can add them back later post-submit if ever needed. 2024-01-28 10:35:45 +08:00			`# schedule:`
			`# - cron: '0 11 * * *'`
Add torch_mlir snapshot packages. This closely follows IREE's [schedule_snapshot_release.yml](https://github.com/google/iree/blob/f2f153d39472b3abb1b629517e3b2c0bb1812c77/.github/workflows/schedule_snapshot_release.yml#L1) workflow. The snapshot releases can be installed with: ``` python -m pip install torch_mlir -f "https://github.com/llvm/torch-mlir/releases" ``` 2021-10-06 07:26:26 +08:00			`workflow_dispatch:`

			`jobs:`
			`release_snapshot_package:`
			`name: "Tag snapshot release"`
CI script improvements (#1547) * ci: update versions of external actions Node.js 12 actions are deprecated and will eventually go away, so this patch bumps the old actions to their latest versions that use Node.js 16. * ci: replace deprecated action with bash commands The llvm/actions/install-ninja action uses Node.js 12, which is deprecated. Since that action is not updated to work with Node.js 16, this patch replaces that action with equivalent bash commands to install Ninja. * ci: use smaller ccache artifacts to reduce evictions Over time, our ccache sizes have grown quite large (some as large as 1.3 GB), which results in us routinely exceeding GitHub's limits, thus triggering frequent cache evictions. As a result, cache downloads and uploads take unnecessary long, in addition to fewer cache entries being available. Based on experiments on a clean cache state, it appears that we need less than 300 MB of (compressed) ccache artifacts for each build type. Anything larger than that will accrue changes from the past that aren't needed. To alleviate the cache burden, this patch sets the maximum ccache size to be 300 MB. This change should not affect the success or failure of our builds. I will monitor the build times to check whether this change causes any performance degradation. * ci: use consistent platform identifiers Prior to this patch, some of our builds ran on `ubuntu-latest`, while some others ran on `ubuntu-20.04` and others ran on `ubuntu-22.04`, with similar situations for macOS and windows. This patch instead sets all Linux builds to run on `ubuntu-latest`, all macOS builds to run on `macos-latest`, and all Windows builds to run on `windows-latest`, to make debugging future CI failures a little easier. 2022-11-03 10:37:01 +08:00			`runs-on: ubuntu-latest`
Add torch_mlir snapshot packages. This closely follows IREE's [schedule_snapshot_release.yml](https://github.com/google/iree/blob/f2f153d39472b3abb1b629517e3b2c0bb1812c77/.github/workflows/schedule_snapshot_release.yml#L1) workflow. The snapshot releases can be installed with: ``` python -m pip install torch_mlir -f "https://github.com/llvm/torch-mlir/releases" ``` 2021-10-06 07:26:26 +08:00			`# Don't run this in everyone's forks.`
			`if: github.repository == 'llvm/torch-mlir'`
			`steps:`
CI: clear workspace directory before checkout (#1900) We have recently started seeing errors like: ``` Synchronizing submodule url for 'externals/llvm-project' Synchronizing submodule url for 'externals/mlir-hlo' /usr/bin/git -c protocol.version=2 submodule update --init --force --depth=1 Error: fatal: Unable to create '/home/anush/actions-runner/_work/torch-mlir/torch-mlir/.git/modules/externals/llvm-project/index.lock': File exists. ``` As a workaround, this patch removes the workspace directory before the checkout step. 2023-02-25 04:44:35 +08:00
Revert "CI: drop deletion of workspace and limit submodule fetch concurrency (#1921)" (#2007) This reverts commit 07f5f042c787a51716b1b6d18e3999c04720735e. 2023-04-07 01:36:30 +08:00			`- name: Prepare workspace`
			`run: \|`
			`# Clear the workspace directory so that we don't run into errors about`
			`# existing lock files.`
			`sudo rm -rf $GITHUB_WORKSPACE/*`

Add torch_mlir snapshot packages. This closely follows IREE's [schedule_snapshot_release.yml](https://github.com/google/iree/blob/f2f153d39472b3abb1b629517e3b2c0bb1812c77/.github/workflows/schedule_snapshot_release.yml#L1) workflow. The snapshot releases can be installed with: ``` python -m pip install torch_mlir -f "https://github.com/llvm/torch-mlir/releases" ``` 2021-10-06 07:26:26 +08:00			`- name: Checking out repository`
CI: clear workspace directory before checkout (#1900) We have recently started seeing errors like: ``` Synchronizing submodule url for 'externals/llvm-project' Synchronizing submodule url for 'externals/mlir-hlo' /usr/bin/git -c protocol.version=2 submodule update --init --force --depth=1 Error: fatal: Unable to create '/home/anush/actions-runner/_work/torch-mlir/torch-mlir/.git/modules/externals/llvm-project/index.lock': File exists. ``` As a workaround, this patch removes the workspace directory before the checkout step. 2023-02-25 04:44:35 +08:00			`uses: actions/checkout@v3`
Add torch_mlir snapshot packages. This closely follows IREE's [schedule_snapshot_release.yml](https://github.com/google/iree/blob/f2f153d39472b3abb1b629517e3b2c0bb1812c77/.github/workflows/schedule_snapshot_release.yml#L1) workflow. The snapshot releases can be installed with: ``` python -m pip install torch_mlir -f "https://github.com/llvm/torch-mlir/releases" ``` 2021-10-06 07:26:26 +08:00			`with:`
			`token: ${{ secrets.WORKFLOW_INVOCATION_TOKEN }}`

			`- name: Compute version`
			`run: \|`
			`git fetch --depth=1 origin +refs/tags/:refs/tags/`
			`package_version="$(printf '%(%Y%m%d)T.${{ github.run_number }}')"`
			`tag_name="snapshot-${package_version}"`
			`echo "package_version=${package_version}" >> $GITHUB_ENV`
			`echo "tag_name=${tag_name}" >> $GITHUB_ENV`

			`- name: Updating snapshot tag`
			`run: \|`
			`git tag "${tag_name}"`

			`- name: Pushing changes`
			`uses: ad-m/github-push-action@v0.6.0`
			`with:`
			`github_token: ${{ secrets.WORKFLOW_INVOCATION_TOKEN }}`
			`branch: main`
			`tags: true`

			`- name: Create Release`
			`id: create_release`
			`uses: actions/create-release@v1`
			`env:`
			`GITHUB_TOKEN: ${{ secrets.WORKFLOW_INVOCATION_TOKEN }}`
			`with:`
			`tag_name: ${{ env.tag_name }}`
			`release_name: torch-mlir snapshot ${{ env.tag_name }}`
			`body: \|`
			`Automatic snapshot release of torch-mlir.`
			`draft: true`
Try to get the release packages publishing again. As per the docs on: https://github.com/eregon/publish-release > Note that the release must not be marked as prerelease for this to work. For some reason, we were marking the release as pre-release before and this was working, but the docs here seem pretty clear, so I'm going to try it. 2022-03-30 08:33:11 +08:00			`prerelease: false`
Add torch_mlir snapshot packages. This closely follows IREE's [schedule_snapshot_release.yml](https://github.com/google/iree/blob/f2f153d39472b3abb1b629517e3b2c0bb1812c77/.github/workflows/schedule_snapshot_release.yml#L1) workflow. The snapshot releases can be installed with: ``` python -m pip install torch_mlir -f "https://github.com/llvm/torch-mlir/releases" ``` 2021-10-06 07:26:26 +08:00
			`- name: "Invoke workflow :: Build and Test"`
			`uses: benc-uk/workflow-dispatch@v1`
			`with:`
			`workflow: Build and Test`
			`token: ${{ secrets.WORKFLOW_INVOCATION_TOKEN }}`
			`ref: "${{ env.tag_name }}"`
Switch to using the new Release builds (#780) 2022-04-22 09:46:34 +08:00
			`- name: "Invoke workflow :: Release Build"`
			`uses: benc-uk/workflow-dispatch@v1`
			`with:`
			`workflow: Release Build`
			`token: ${{ secrets.WORKFLOW_INVOCATION_TOKEN }}`
			`ref: "${{ env.tag_name }}"`
Attempt to set Python package version to the snapshot identifier. This should make the releases sort properly when `pip`'s `-f`/`--find-links` argument is used. 2022-03-31 01:51:52 +08:00			`inputs: '{"release_id": "${{ steps.create_release.outputs.id }}", "python_package_version": "${{ env.package_version }}"}'`