2024-09-25
A workaround has been applied to a problem with NVIDIA MPS that prevents other users from using the GPU, as described in the following announcement.
As of today, NVIDIA MPS can be used with resources other than node_f (node_h, node_q, node_o, gpu_1, gpu_h) without any problems.
Please note that some sites recommend setting the environment variable CUDA_MPS_PIPE_DIRECTORY when using MPS, but do not change this environment variable in TSUBAME4.0 as it may cause problems.
(Amendement on 2024-10-24) Please refer to the documentation for details. "module load cuda" is required to use MPS.
2024-07-03
It has been found that when using some of TSUBAME's compute nodes, GPUs are not available from users who share the same compute nodes when NVIDIA's Multi-Process Service (MPS) is activated.
For the time being, until a solution is found, please do not use MPS when using resources other than node_f (node_h, node_q, node_o, gpu_1, gpu_h). Also, please note that such jobs will be deleted by administrative authority without notice.