Skip to main content

47 posts tagged with "Maintenance"

View All Tags

· One min read

Publication date: August 8, 2023

A hardware failure has occurred in one of the redundant OST controllers in the Lustre7 high-speed storage system in the General Analysis Compartment. The controller will be replaced and the system will be taken back on the following schedule

schedule

13:00 - 15:00 Replacement work 15:00 - 16:00 Take-back work *I/O suspension occurred

System Impact at this time

No system impact at this time.

System impact at the time of the work

  • The NIG supercomputer general analysis division
    • I/O suspension of about 10 minutes will occur once on Lustre7 during the replacement work. I/O will be automatically resumed after the work is completed.
  • The NIG supercomputer personal genome analysis division
    • Not affected.
  • DDBJ Service
    • Not affected.

Please understand that it may take some time depending on the I/O status at the time of the work. Thank you for your understanding and cooperation.

· One min read

Due to switching work associated with the update of the storage system for the DDBJ database, the FTP service and connections with Aspera will be temporarily unavailable during the following time period.

Date

Monday, 27 July 2023, 9:00 - 15:00 (24h notation)

  • Communication interruptions of about 15 minutes occur.

Scope of impact

  • General Analysis division and Personal genome Analysis division of the NIG supercomputer
    • Login and data transfer operations using scp and HCPtools will not be affected.
    • The running jobs will not be stopped.
    • Access to the DDBJ database from within the supercomputer (access to under /usr/local/resources/) will not be affected.
  • DDBJ services
    • Downloading of DDBJ databases using FTP, Aspera and HTTPS will not be available.

We appreciate your understanding and cooperation.

· 2 min read

Until now, a part of the CPU of the GPU compute node has been allocated to short.q because the utilisation rate of the CPU of the GPU compute node is low, but in recent years, various software using the GPU has been created and the usage methods have changed, so the AGE queue configuration will be changed as follows to increase the number of CPU cores available in the GPU node.

  • Before change
QueueConfiguration NodesTotal Number of nodesTotal Number of CPU coresMemory
gpu.qThin nodes Type2b756 (8/node)1,344GB (192GB/node)
short.qThin nodes Type2b7112 (16/node)1,344GB (192GB/node)
  • After change
QueueConfiguration NodesTotal Number of nodesNumber of CPU coresMemory
gpu.qThin nodes Type2b7168 (24/node)2,688GB (384GB/node)
short.qThin nodes Type1a2128 (64/node)1,024GB (512GB/node)

For short.q, the CPU will be changed from AMD EPYC 7501 to Intel Xeon Gold 6130 due to the node type change. Please review the execution job if necessary.

Date

Wednesday, July 26, 2023 11:00 - 11:30 (24h notation)

Scope of Impact

  • During the work,

  • Before and after the work, there is no change in the method of submitting jobs for each queue.

· One min read

Publication date: June 27, 2023

Due to network device maintenance, the network will be temporarily out of service during the following time period.

Date

Monday, July 3, 2023 11:00 - 12:00 (24h notation)

  • Communication interruptions of about 30 minutes occur.

Scope of Impact

  • During the communication breakdown, login to the supercomputer and data transfer operations will not be available.
  • No jobs in operation will be suspended.

We appreciate your understanding and cooperation.

· One min read

Publication date: June 2, 2023

Due to maintenance work on SINET6 equipment, the network will be temporarily out of service during the following times.

  • Date and time: 4:30 - 6:00, Monday, June 5, 2023
    • Communication breakdowns will occur a maximum of two times during the above time period for 15 minutes.
  • Scope of impact
    • During the disconnection, you will not be able to log in to the supercomputer or transfer data.
    • There will be no suspension of active jobs.

Thank you for your understanding and cooperation.

· One min read

The storage system used to build the DDBJ database was replaced in April 2023 and the disk space was renewed from approximately 15 PB to 40 PB.

Currently, data on Lustre6 high-speed storage for DDBJ operations and data on GPFS1,2 storage for the previous database are being migrated to the new storage system. The data migration will be completed around July, after which full-scale operations will be started.

After the new storage is fully operational, DRA data etc. will be directly mounted from the NIG supercompute and will be available directly.

Lustre6 was mainly used for building the DDBJ database, but some users' data of the previous supercomputer (NIG supercomputer 2012) still remained. We are informing the applicable users by email. The user home directories in the general analysis division of the current NIG supercomputer is located in Lustre7, so we ask users who have received the email to transfer your data there or delete your data.

For information on the current storage types, see below.

Hardware > "Storage" High-speed storage : Lustre file systems

· One min read

In the Lustre8 high-speed storage system for the Personal Genome Analysis division, we are going to upgrade the server version and perform maintenance related with it.

Date

Wednesday, March 8, 2023 14:00 - 17:00 (24h notation)

Scope of impact

  • In the Personal Genome Analysis division, I/O suspensions to Lustre8 occur multiple times (at least 6 times).
  • The general analysis division will not be affected.
  • DDBJ services and other services are not affected.

· One min read

Publication date: December 22, 2022

Due to network device maintenance, the network will be temporarily out of service during the following time period.

  • Date: Tuesday, December 27, 2022 13:00 - 14:00 (24h notation)

    • Communication interruptions of about 30 minutes to one hour occur.
  • Scope of Impact

    • During the communication breakdown, login to the supercomputer and data transfer operations will not be available.
    • No jobs in operation will be suspended.

We appreciate your understanding and cooperation.

· One min read

Publication date: December 20, 2022

The Lustre7 high-speed storage system for the General Analysis division has experienced an equipment failure, and as of 15:59 p.m. on Monday, December 19, there has been no impact on users. The cables will be replaced at the following time and date.

Date

Tuesday, December 20, 2022 12 noon - 17:00(24h notation)

Scope of impact

  • A couple of 4-minute I/O suspensions are expected to occur in general analysis division Lustre 7 between 12:00 and 17:00 on Tuesday, 20 December 2022.
  • The personal genome analysis division will not be affected.
  • DDBJ services and other services will not be affected.

· One min read

Publication date: November 24, 2022

The Lustre6 high-speed storage system for the DDBJ service has experienced an equipment failure.

The equipment will be replaced at the following time and date.

Date

Thursday, November 24, 2022 09:30 - 12 noon (24h notation)

Scope of impact

  • In the DDBJ service division, I/O suspensions to Lustre6 are expected to occur before and after the work for approximately 4 minutes each.
  • The general analysis and personal genome analysis division will not be affected.