Some systems are experiencing issues

Stickied Incidents

Wednesday 15th May 2024

High Performance Primary Storage Volumes in a degraded state

Volumes [ 19 - 35 ] are currently in a degraded state, this may impact read/write performance while systems return to normal.

To check which volume your group is in users may use the command ls -l $HOME

Past Incidents

Friday 3rd June 2022

V100 Nodes Temporarily Unavailable [Resolved]

Due to a recent failure in our cooling system within the Mangi cluster, V100 nodes have been taken down as we work on getting the temperatures to reasonable values.

For the time being, we recommend using our other GPU partitions such as a100-4 or a100-8 on Agate with A100 GPUs. More information on the available partitions at MSI can be found here.

Tuesday 31st May 2022

Mangi Cooling System Failure

There was a hardware failure on one of Mangi's cooling systems over the weekend. This has interrupted the service and we advise users if possible to use one of the other clusters such as Mesabi or Agate while our staff work to bring the service back online.

The login.msi.umn.edu Host is also unavailable because it was backed by the mangi login nodes that were impacted