[MNT] MPS backend test failures on MacOS #1596

fkiraly · 2024-08-22T20:59:49Z

The CI fails with MPS backend failures on a number of tests:

RuntimeError: MPS backend out of memory (MPS allocated: 0 bytes, other allocations: 0 bytes, max allowed: 7.93 GB). Tried to allocate 256 bytes on shared pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).
[W 2024-08-22 20:58:07,168] Trial 0 failed with value None.

The text was updated successfully, but these errors were encountered:

fkiraly · 2024-08-23T12:36:06Z

Update: these seem to happen only on Mac

…ests`, MacOS MPS Fixes #1594, fixes #1595, fixes #1596 Added or moved some dependencies to core dependency set. Fixed some `numpy2` and `optuna-integrations` problems. `requests` replaced by `urllib.request.urlretrieve`.

fkiraly · 2024-09-03T22:27:12Z

happens on macos-latest but not on macos-13 - a temporary fix that @XinyuWuu discovered is to downgrade mac runners to macos-13

fkiraly · 2024-09-03T22:31:28Z

The failure on macos-latest is here: #1633

fnhirwa · 2024-09-04T04:15:15Z

This is an issue related to device handling on GPU for macos. Will open a PR fixing this generally.

benHeid · 2024-09-04T06:16:09Z

MPS errors can only happen to macOS since they are the shaders of the m Chips as @fnhirwa said.

This is probably caused by large neural networks that might perhaps run in parallel on the shaders. Fix could be to set the device to cpu for all tests (bottleneck now will become the normal ram) or alternatively reduce the ram.

A more complicated solution could also be to check if we can control the parallel execution of tests so that no neural networks are run in parallel to others but only in parallel with simpler models.

fnhirwa · 2024-09-04T10:12:02Z

Given that this is a resource issue PyTorch helps to set an environment variable that falls back to CPU via PYTORCH_ENABLE_MPS_FALLBACK=1 when there is overutilization of the mps device. We can use monkeypatch fixture to set this variable in the tests.

I am adding the changes to #1648 to see if it works.

XinyuWuu · 2024-09-05T02:52:42Z

A more complicated solution could also be to check if we can control the parallel execution of tests so that no neural networks are run in parallel to others but only in parallel with simpler models.

We can do it by using a filelock as a fixture. I have tried it in sktime/sktime#6774.

XinyuWuu · 2024-09-05T07:08:14Z

It's caused by lack of nested-virtualization support:
https://docs.github.com/en/actions/using-github-hosted-runners/using-github-hosted-runners/about-github-hosted-runners#limitations-for-arm64-macos-runners
actions/runner-images#9254 (comment)
actions/runner-images#9918

My tests:
https://github.com/jdb78/pytorch-forecasting/actions/runs/10714861256/job/29709292560?pr=1654
https://github.com/jdb78/pytorch-forecasting/actions/runs/10714737635/job/29708945795?pr=1654

torch.backends.mps.is_available() returns true in macos-latest but it should return false.

@fnhirwa I am afraid PYTORCH_ENABLE_MPS_FALLBACK won't help. It enables fallback for some operators such as aten::_slow_conv2d_forward but in our case MPS is totally unusable.

We need to find a way to make torch.backends.mps.is_available() return false in macos-latest.

Unfortunately, we do not have something like CUDA_VISIBLE_DEVICES for MPS.

fkiraly added the maintenance Continuous integration, unit testing & package distribution label Aug 22, 2024

fkiraly changed the title ~~[MPS] MPS backend test failures~~ [MNT] MPS backend test failures Aug 22, 2024

fkiraly mentioned this issue Aug 22, 2024

[MNT] maintenance & handover items for integration with sktime org #1592

Open

21 tasks

fkiraly changed the title ~~[MNT] MPS backend test failures~~ [MNT] MPS backend test failures on MacOS Aug 23, 2024

fkiraly mentioned this issue Aug 25, 2024

[MNT] Fix dependency issues and CI runners: numpy2, optuna, requests, MacOS MPS #1599

Merged

fkiraly closed this as completed in #1599 Aug 25, 2024

fkiraly closed this as completed in d82eaaf Aug 25, 2024

fkiraly reopened this Sep 3, 2024

fkiraly mentioned this issue Sep 3, 2024

[MNT] Remove .env file (we shouldn't be pushing the .env file #1646

Merged

fkiraly mentioned this issue Sep 7, 2024

Fix/ar-tutorial #1655

Merged

XinyuWuu mentioned this issue Sep 8, 2024

[MNT] handle mps backend for lower versions of pytorch and fix mps failure on macOS-latest runner #1648

Merged

fkiraly closed this as completed in #1648 Sep 13, 2024

fkiraly closed this as completed in f233d92 Sep 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MNT] MPS backend test failures on MacOS #1596

[MNT] MPS backend test failures on MacOS #1596

fkiraly commented Aug 22, 2024

fkiraly commented Aug 23, 2024

fkiraly commented Sep 3, 2024

fkiraly commented Sep 3, 2024

fnhirwa commented Sep 4, 2024

benHeid commented Sep 4, 2024

fnhirwa commented Sep 4, 2024 •

edited

Loading

XinyuWuu commented Sep 5, 2024

XinyuWuu commented Sep 5, 2024 •

edited

Loading

[MNT] MPS backend test failures on MacOS #1596

[MNT] MPS backend test failures on MacOS #1596

Comments

fkiraly commented Aug 22, 2024

fkiraly commented Aug 23, 2024

fkiraly commented Sep 3, 2024

fkiraly commented Sep 3, 2024

fnhirwa commented Sep 4, 2024

benHeid commented Sep 4, 2024

fnhirwa commented Sep 4, 2024 • edited Loading

XinyuWuu commented Sep 5, 2024

XinyuWuu commented Sep 5, 2024 • edited Loading

fnhirwa commented Sep 4, 2024 •

edited

Loading

XinyuWuu commented Sep 5, 2024 •

edited

Loading