build: make PyTorch caching more robust (#1510)

Whether or not the PyTorch build is cached should not affect the success
of the Torch-MLIR build, but based on the existing code, a build may
fail if the `TM_PYTORCH_INSTALL_WITHOUT_REBUILD` variable was set but
the build cache doesn't exist.

Although that variable is set by CI upon a cache hit, nuances of
Github's caching behavior can create situations where the coupling
between `TM_PYTORCH_INSTALL_WITHOUT_REBUILD` and the cache lookup fails.

Specifically, a branch other than our default branch (`main`) may create
the cache entry, but because Github doesn't share this cache entry with
builds running on the `main` branch, the `main` branch build tries to
create it's own cache entry.  However, since cache identifiers are
unique and because caches are immutable, the caching step running in the
`main` branch appears to create an invalid cache entry (of 233 bytes,
instead of the expected ~60 MB).

Consequently, subsequent builds observe a cache "hit", since caches
created by the `main` branch are shared with all other branches, but
because this cache entry is invalid (since it doesn't actually contain
the ~60 MB PyTorch WHL file), the builds fail.

One workaround would be to let only the `main` branch create caches, but
in doing so, we would also prevent other branches from _reading_ the
cache, making the builds in those branches terribly slow.

So this patch uses a different workaround, which is to check whether the
PyTorch WHL file exists, even if the build observed a cache hit.  If the
file doesn't exist, even if it was a purported cache hit, the code
builds PyTorch from source, which is probably intuitive.

A longer term fix will follow, after a discussion with the wider team.
pull/1515/head
Ashay Rane 2022-10-20 08:50:18 -05:00 committed by GitHub
parent 724d8d183a
commit 4a776be156
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 3 additions and 1 deletions

View File

@ -152,7 +152,9 @@ unpack_pytorch() {
#main #main
echo "Building libtorch from source" echo "Building libtorch from source"
if [[ $TM_PYTORCH_INSTALL_WITHOUT_REBUILD != "true" ]]; then wheel_exists=true
compgen -G "$WHEELHOUSE/*.whl" > /dev/null || wheel_exists=false
if [[ $TM_PYTORCH_INSTALL_WITHOUT_REBUILD != "true" || ${wheel_exists} == "false" ]]; then
checkout_pytorch checkout_pytorch
install_requirements install_requirements
build_pytorch build_pytorch