Srorage , Group Disc , File Management¶
Info
The command line examples on this page use the following notation
[login]$ : login node
[rNnN]$ : compute node
[login/rNnN]$ : login node or compute node
[yourPC]$ : environment from which the login node is connected
Rules for Handling User Data on TSUBAME4 (Home Directory, Work Directory, Group Disks, etc.)¶
User data on TSUBAME4 (home directory, work directory, group disks, etc.) is managed based on the "Storage Service Terms of Use, Center for Information Infrastructure, Institute of Science Tokyo (Japanese only)" within the Regulations.
Frequently asked questions regarding user data handling are summarized below.
1) I want deleted or corrupted data restored.
TSUBAME4 does not back up user data. Therefore, we cannot accommodate requests for data restoration.
Furthermore, even if data is lost due to system issues, compensation (such as providing computational resources for data recovery) is generally not provided.
Please ensure you back up important data yourself.
2) I want to access or delete data on a group disk belonging to an account that has been suspended or left the group.
TSUBAME4 does not, as a rule, perform operations on user storage with administrator privileges.
If an account is suspended or leaves a group due to graduation, etc., please ensure migration tasks are completed within the group beforehand.
How many the number of files is acceptable per one directory?¶
As the number of files per directory increases, the processing time for metadata operations (file creation, deletion, and opening) on the files under the directory increases, or the file system may generate errors, making it impossible to create files.
Even when using a group disk, we recommend arranging files hierarchically with a target of less than 10,000 per directory.
In past cases of inquiries, we observed access delays caused by metadata operations under the condition of around 70,000 files per directory.
Example:
- NG: 00000.dat ~ 99999.dat
- If 100,000 files are placed flat in one directory, the load during file access will increase, causing performance degradation and failure.
- OK: 000/00000.dat ~ 000/00999.dat, 001/01000.dat ~ 001/01999.dat, …
- The hierarchical arrangement minimizes the cost of metadata operations by limiting the number of files per directory to about 1000.
About file transfer¶
File transfer by rsync, scp, and sftp is available on TSUBAME4.0. As well as login, you need to access with your SSH private key which is a pair of your SSH public key registered in TSUBAME3 portal. Also, please check the settings of the application you are using carefully, as some applications may time out.
To install a file transfer application¶
If you are using MobaXterm or RLogin, it is easier to use the built-in file transfer function of these software.
If you are using other software such as PuTTY for connection, you need to install a file transfer application such as FileZilla or WinSCP that supports sftp and rsync protocols. In this case, as well as login, you need to access using SSH private key which is a pair of SSH public key registered in TSUBAME4.0 portal. For Filezilla and WinSCP, you can use the .ppk format key files that you usually use for PuTTY. For details on how to use each software, please refer to the manual of each software.
If the option feature "OpenSSH Client" in Windows 11 is enabled, you can use scp and sftp command from command prompt or powershell.
If you are using Linux/Mac/Cygwin (Windows) (rsync, scp, sftp commands)¶
In these environments, rsync, scp, and sftp commands are available. Describes three ways each, rsync, scp, sftp.
rsync:
To transfer from the local to the remote host, execute the following command. If you set the standard path/file name as the key pair location, the -i option is not required.
$ rsync -av --progress -e "ssh -i <Private_Key_File> -l Login_Name" <Local_Directory> <Remote_Host:Remote_Directory>
[yourPC]$ rsync -av --progress -e "ssh -i ~/.ssh/ecdsa -l TSUBAMEUSER00" ./ login.t4.gsic.titech.ac.jp:/gs/bs/TSUBAMEUSER
For details such as how to specify the transfer source and tranfer destination, please execute the following command and confirm it.
$ man rsync
scp:
To transfer from local to remote host, execute the following command. If you set the standard path/file name as the key pair location, the -i option is not required.
$ scp -i <Private_Key_File> <Login_Name*@*Remote_Host>:<Remote_Directory> <Local_directory>
Please enter the suitable phrase for your situation in the < >.For example, the command when the user with login name "TSUBAMEUSER00" copies the current directory to /gs/bs/TSUBAMEUSER of TSUBAME 4.0 using ~/.ssh/ecdsa of private key is as follows.
[yourPC]$ scp -i ~/.ssh/ecdsa TSUBAMEUSER00@login.t4.gsic.titech.ac.jp:/gs/bs/TSUBAMEUSER .
For details such as how to specify the transfer source and tranfer destination, please execute the following command and confirm it.
$ man scp
sftp:
To transfer interactively, execute the following command.
If you set the standard path/file name as the key pair location, the -i option is not required.
$ sftp -i <Private_Key_File> <Login_Name>@<Remote_Host>
$ sftp -i ~/.ssh/ecdsa TSUBAMEUSER00@login.t4.gsic.titech.ac.jp
For details such as how to specify the transfer source and tranfer destination, please execute the following command and confirm it.
$ man sftp
To use CIFS access¶
In addition, only on-campus terminals can be accessed via CIFS.
The CIFS address is
\\gshs.t4.gsic.titech.ac.jp.
Refer to "CIFS access from inside campus" in TSUBAME4.0 User's Guide.
If the connection fails, please also see Can not establish CIFS connection to the group disk, Unable to open TSUBAME group disk on Windows.
I want to copy a large amount of data from/to TSUBAME¶
Please consider the following topics to improve the performance of data transfer between TSUBAME and external computers.
Pack the files to appropriate size¶
Large amounts of small files reduce transfer speed. Pack such files using the tar command to archives of 1GB size each.
Change transfer protocols¶
If you do not get enough speed with scp / sftp, consider using rsync or CIFS(Science Tokyo users only) protocols.
Transfer speeds may also be improved by using mscp, which executes scp in parallel.
mscp can be used simply by installing it on the computer on which the commands are input. mscp is also installed in TSUBAME 4.0, so please try it.
For more details on the CIFS connection, please refer "CIFS access from inside campus" section of the TSUBAME4.0 users guide.
If the connection fails, please also see Can not establish CIFS connection to the group disk, Unable to open TSUBAME group disk on Windows.
Remove the bottleneck on the network route¶
- If you have old LAN cables (CAT-3 or CAT-5 (not CAT-5e)), switching hubs, or routers whose link speed is lower than 1000 Mbps, replace them with newer ones.
- When using a router (WiFi router, NAT router, broadband router, etc.), connect your computer to the external network (in Science Tokyo, IP address starting with 131.112 or 172.16-31) directly.
For details of the network at Science Tokyo, please contact the network administrator of the laboratory. If you are not sure, please contact the branch manager for each building or organization.
Science Tokyo Users Only) Use the iMac terminal of Education Computer Systems¶
If it is difficult to change the network configuration, you can bring your HDD to the CII and connect it to the iMac terminal of the Education Computer Systems in the exercise room to transfer the data. Please check the opening hours.
Terminal room location and Opening Hours (in Japanese)
How to synchronize data between TSUBAME and PC¶
The advantage of the rsync command is that it transfers only the difference. If the transfer is interrupted for any reason, you can start again, or if you run it again after a certain period of time, you can transfer only those files that have changed their content. Data deleted from the source can also be deleted at the destination for complete synchronization.
An example command is shown below. It's a good idea to check the log or run it multiple times, in case the command fails along the way.
Synchronize TSUBAME with the data of the terminal on your local PC.
[yourPC]$ rsync -auv (source directory) (your login name)@login.t4.gsic.titech.ac.jp:(full path of the destination directory)
Synchronize TSUBAME data to the terminal on your local PC.
[login]$ rsync -auv (your login name)@login.t4.gsic.titech.ac.jp:(full path of the source directory) (destination directory)
How to solve "Disk quota exceeded" error¶
This message indicates there is no space left in ether a home directory or a group disk.
When you face it, you should delete unused files or purchase an additional group disk to keep enough free disk space.
The following command can be used to check disk usage for all directories, including hidden directories.
[login/rNnN]$ cd $HOME
[login/rNnN]$ du -h --max-depth=1 | sort -hr
Please note that temporary files are generated at the home directory in some cases, and an application sometimes needs over 25 GB of a disk space for creating temporary files. (25 GB is the capacity of a home directory)
I want to change the directory where cache files, user files, etc. used by the application are stored.
To avoid running out of disk space, we recommend not to use home directory but to use local scratch area or shared scratch area for temporary file location.
** Related FAQ ** How to check TSUBAME points, group disk usage, home directory usage
FAQ about group disk¶
About group disk¶
The Group disk is the high-speed storage area (SSD) and large-capacity storage area (HDD) described in the "Usage Guide". This is a shared storage that allows each group to use the capacity set on the TSUBAME portal.
Usage period: In one-month increments until the end of the fiscal year (end of March) including the month of purchase.
Point and inodes per purchase unit
| Type | Purchase unit | Point | inode |
|---|---|---|---|
| large-capacity storage area (HDD) | 1TB | 0.5 | 2,000,000 |
| high-speed storage area (SSD) | 100GB | 0.2 | 200,000 |
How to set: you can set from the TSUBAME Portal Reference:TSUBAME Portal User's Guide "10. Management of Group Disk"
What is the group disk grace period ?¶
At the boundary of the fiscal year, all group disks are in a grace state that can only be read and deleted, to prevent the data become inaccessible due to the delay in payment code approval and purchase of the group disk. The grace state lasts until mid-April, and unpurchased group disk space will become inaccessible after the grace period ends. It is necessary to purchase the group disk for the current fiscal year to continue using.
Reference:TSUBAME Portal User's Guide
If the data of the previous year remains and you purchase it after the grace period, it becomes as follows.
For example, if you purchased 50TB in the previous year and you used a capacity of 45TB.
1) When 45TB is deleted during the grace period and the used capacity is 0.
You can purchase from 1TB which is the minimum capacity.
2) When 25TB is deleted during the grace period and the used capacity is 20TB.
Available from over 20TB.
3) If the used capacity is not deleted during the grace period (used capacity is 45TB)
Available from over 45TB.
If you do not need the previous year's data, please delete it during the grace period.
Related FAQ¶
- Checking the usage of group disks with command
- Can not establish CIFS connection to the group disk
- "Disk quota exceeded" error is output
The group disk suddenly became unusable.¶
Since a group disk is allocated every month, the amount of the group disk used may exceed the allocated size when the month is crossed.
If this situation continues, all access to the target group disk will be prohibited at a specific time.
- If you wish to check the usage status of the group disk, please refer to Confirmation of group disk usage.
- For information on what to do when the amount of group disk usage is exceeded, please refer to What to do when the group disk usage exceeds the reserved size.
Please also refer to FAQ about group disk.
Can not establish CIFS connection to the group disk, Unable to open TSUBAME group disk on Windows¶
Access to the TSUBAME group disk using CIFS is only available within the university. It cannot be accessed from off-campus.
Even in the campus network, CIFS may be blocked by routers or other devices in the middle of the network, such as laboratories, and cannot be used in such cases.
Please check the default settings of routers in general, as communication on TCP/UDP port 445 is often blocked by default.
Info
The communication between branch lines in the campus network is set to TCP/UDP 445 is blocked. However, because an exception is made for the TSUBAME4 network, the communication is not blocked from the branch line network (building switch) onward.
If there is no problem with the port blocking setting, the CIFS server may not be able to be reached in the first place, please check the network communication to the CIFS server.
Please check the PING using a DOS prompt etc.
C:\> ping gshs.t4.gsic.titech.ac.jp
To access the group disk from Windows, you need to set a TSUBAME password. Please configure from the TSUBAME portal. For the setting method, please see here. And, if the message "Password is incorrect" or "Password has expired" is displayed, please reset your TSUBAME password.
To allow other members to read and write on a group disk¶
Warning
This article is about [group disks (/gs/bs,/bs/fs)]((../../handbook.ja/storage/#group), do not run the following sample in your home directory.
Users are not allowed to change the owner of their files. Therefore, please change the group permissions so that it can be read and written. The point is,
- Change permissions for all files and directories below the directory, not just the top-level directory.
- Add read (R) as well as write (W) permissions to the file. If you don't have a write (w), you can't erase it later.
- The directory should contain not only read (r) but also write (w) and execute (x). You can't access the directory without the execution (x).
Some example commands are shown below. Depending on the original permissions of the file, some errors may occur, in which case, try re-running the command until the output no longer changes.
Find your own directories under /gs/bs/tgX-XXXXXX/ and make them readable and writable by group members.
[login]$ find /gs/bs/tgX-XXXXXX/ -type d -user $USER ! -perm -2770 -print0 | xargs -r0 chmod -v ug+rwx,g+s
Find your own files under /gs/bs/tgX-XXXXXX/ and make them readable and writable by group members.
[login]$ find /gs/bs/tgX-XXXXXX/ -type f -user $USER ! -perm -660 -print0 | xargs -r0 chmod -v ug+rw
Find your own files under /gs/bs/tgX-XXXXXX/ and match the ownership group to the TSUBAME group.
[login]$ find /gs/bs/tgX-XXXXXX/ -user $USER ! -group (TSUBAME group name) -print0 | xargs -r0 chgrp -v (TTSUBAME group name)
I want to use an external storage or cloud storage service¶
Mounting external disks using user privileges is not permitted. Cloud storage services are also not supported. Please use a group disk.
If you have an HPCI account, you can mount only HPCI Shared Storage with user privileges. For details, please refer to the HPCI User Manual or contact the HPCI Help Desk.
Work directory is not available.¶
When you create a new account, it may take some time before the Work directory becomes available. For details, please refer to Work Directory.