Skip to main content

54 posts tagged with "Maintenance"

View All Tags

· One min read

Publication date: February 26, 2024

Currently, there is a problem with the Account Registration System where an error message is displayed and the application cannot be completed when you click 'I am not a robot' on the confirmation screen after entering your updated information on the Account Information Change Request page.

We apologise for any inconvenience this may cause.

A notice will be posted on the website as soon as the problem has been resolved.

Resolved at 17:17, (Mon)February 26, 2024. Please re-apply.

· One min read

Publication date: February 14, 2024

Due to network device maintenance, the network will be temporarily out of service during the following time period.

Date

Monday, February 19, 2024 12:00 - 13:00 (24h notation)

  • Communication interruptions of about 5 minutes occur.

Scope of Impact

  • During the communication breakdown, login to the supercomputer and data transfer operations will not be available.
  • No jobs in operation will be suspended.

We appreciate your understanding and cooperation.

· One min read

Publication Date: December 27, 2023

The account registration system will be suspended due to the year-end and New Year's holidays.

Date and Time

  • December 27, 2023 17:00 - January 9, 2024 12:00

Scope of impact

  • New account registration and changes to account information will not be available during the above year-end and New Year's period.
  • There will be no impact on network communication, login, job execution, etc. in the general analysis division and personal genome analysis division.
  • There will be no impact on DDBJ services.

We appreciate your understanding and cooperation.

· 3 min read

Publication date: October 2, 2023

The scheduled maintenance of the NIG supercomputer is scheduled on the following date and time in accordance with the legal power outage of the NIG. The supercomputer will not be available during the scheduled maintenance.

Period

November 24, 17:00 - November 30, 2023, 17:00(24h)

Work schedule

  • 11/24(Fri.) 17:00~  Supercomputer outage
  • 11/25(Sat.)      Legal power outage
  • 11/26(Sun.)~11/29(Wed.) Supercomputer scheduled maintenance work (UPS maintenance, Lustre maintenance, software updates, etc.)
  • 11/30(Thu.) is a spare day.

Work Description

Works for scheduled maintenance are as follows.

  1. Software version upgrade
  2. OS migration (from CentOS 7.9 to Ubuntu Linux 22.04LTS)
    • FAQ page for OS migration has been created.\🆕/
    • The medium nodes and the fat node hardware (that is, HPE ProLiant DL560 Gen10 and HPE Superdome Flex) do not support Ubuntu Linux and could not be migrated from Cent OS 7.9 to Ubuntu Linux 22.04 this time.
  3. Grid Engine version upgrade
  4. yum update for Cent OS that did not migrate OS
  5. Firmware and device driver version upgrade for InfininBand and Lustre
  6. LDAP configuration changes
  7. UPS inspection work

Software Version Upgrade Details

Table: Software upgrade plan for development/analysis

#SoftwareBefore scheduled maintenanceAfter scheduled maintenancee
(1)Apptainer1.11.2.2-1
(2)SingularityCE3.10.23.11.4
(3)NVIDIA HPC SDK
(Previous PGI compiler)
22.923.7
(4)*NVIDIA CUDA12.212.1
(5)Intel OneAPI2022.2.02023.2.0
(6)Altair Grid Engine8.6.19/8.6.48.8.1

*: CUDA is downgraded to 12.1 because the supported version of the Ubuntu Linux 22.04LTS GA kernel is 12.1.

OS migration (from CentOS 7.9 to Ubuntu Linux 22.04LTS)

As CentOS 7 will reach End-Of-Life on 30 June 2024, the migration from CentOS 7.9 to Ubuntu Linux 22.04LTS will be performed during scheduled maintenance.

  • All compute nodes in the general analysis division will be migrated from CentOS 7.9 to Ubuntu Linux 22.04LTS. With this, the analysis environment may need to be re-installed. Please make sure to check the development environment and reinstall the analysis environment on your own.
  • Users who are using occupied compute nodes will be asked by email whether you would like to migrate your OS during scheduled maintenance. Please let us know when the OS migration is convenient for you.

The Personal Genome Analysis Section

  • The GPU compute nodes under Slurm will be migrated from CentOS 7.9 to Ubuntu Linux 22.04LTS. With this, the analysis environment may need to be re-installed. Please make sure to check the development environment and reinstall the analysis environment on your own.
  • Users who are using occupied compute nodes will be asked by email whether you would like to migrate your OS during scheduled maintenance. Please let us know when the OS migration is convenient for you.

Notes

  • Running jobs will be deleted, so please resubmit jobs after the scheduled maintenance.

· One min read

Publication date: September 26, 2023

Due to maintenance work on SINET6 equipment, the network will be temporarily out of service during the following times.

  • Date and time: 0:00 - 5:00, Thursday, October 26, 2023

    • Communication breakdowns will occur during the above time period for 5 minutes.
    • Basically, communication breakdowns are not planned to occur, but if it is deemed necessary to restart the equipment during the work, a restart will be carried out, and communication will be broken for about five minutes.
  • Scope of impact

    • During the disconnection, you will not be able to log in to the supercomputer or transfer data.
    • There will be no suspension of active jobs.

Thank you for your understanding and cooperation.

· One min read

publication date: october 2, 2023

Summary

On Saturday, September 30, 2023, at around 16:00 and 23:56, power outages of less than 5 minutes occurred in a wide area east of Shizuoka prefecture, affecting networks and other facilities.

https://teideninfo.tepco.co.jp/day/teiden/index-j.html

Restoration work is underway.

Recovered at 16.00, Monday 2 October.

Scope of impact

  • External network, etc.
    • SINET connection was interrupted from 23:56 on 30 Sep 2023 to 00:02 on 01 Oct 2023. (Recovered)
  • General analysis division
    • Not affected
  • Personal genome analysis division
    • Unable to send SSL-VPN tokens.
  • DDBJ Service
    • Under investigation.

· One min read

Publication date: September 27, 2023

Summary

On Wednesday, September 27, 2023, at 13:48, one of the 66 OSTs comprising Lustre9 experienced an I/O outage.

Restoration work is currently underway.

On Thursday 28 September 2023, at 0:41pm, the restoration work was completed.

Scope of impact

  • General analysis division
    • When accessing Lustre9 (under /usr/local/shared_data, /usr/local/resources), some files cannot be read.
    • Also, the account application system is affected.
    • Access to the files in the part of the area where the fault has occurred is not possible. (The status is waiting.)
    • When you try to display a list of directories that contain files corresponding to the above, there is no response. (The status is waiting.)
  • Personal genome analysis division
    • When accessing Lustre9 (under /usr/local/shared_data, /usr/local/resources), some files cannot be read.
    • Also, the account application system has been affected.
    • Access to the files in the part of the area where the fault has occurred is not possible. (The status is waiting.)
    • When you try to display a list of directories that contain files corresponding to the above, there is no response. (The status is waiting.)
  • DDBJ services
    • Some services are affected.

· One min read

Publication date: August 8, 2023

A hardware failure has occurred in one of the redundant OST controllers in the Lustre7 high-speed storage system in the General Analysis Compartment. The controller will be replaced and the system will be taken back on the following schedule

schedule

13:00 - 15:00 Replacement work 15:00 - 16:00 Take-back work *I/O suspension occurred

System Impact at this time

No system impact at this time.

System impact at the time of the work

  • The NIG supercomputer general analysis division
    • I/O suspension of about 10 minutes will occur once on Lustre7 during the replacement work. I/O will be automatically resumed after the work is completed.
  • The NIG supercomputer personal genome analysis division
    • Not affected.
  • DDBJ Service
    • Not affected.

Please understand that it may take some time depending on the I/O status at the time of the work. Thank you for your understanding and cooperation.

· One min read

Due to switching work associated with the update of the storage system for the DDBJ database, the FTP service and connections with Aspera will be temporarily unavailable during the following time period.

Date

Monday, 27 July 2023, 9:00 - 15:00 (24h notation)

  • Communication interruptions of about 15 minutes occur.

Scope of impact

  • General Analysis division and Personal genome Analysis division of the NIG supercomputer
    • Login and data transfer operations using scp and HCPtools will not be affected.
    • The running jobs will not be stopped.
    • Access to the DDBJ database from within the supercomputer (access to under /usr/local/resources/) will not be affected.
  • DDBJ services
    • Downloading of DDBJ databases using FTP, Aspera and HTTPS will not be available.

We appreciate your understanding and cooperation.

· 2 min read

Until now, a part of the CPU of the GPU compute node has been allocated to short.q because the utilisation rate of the CPU of the GPU compute node is low, but in recent years, various software using the GPU has been created and the usage methods have changed, so the AGE queue configuration will be changed as follows to increase the number of CPU cores available in the GPU node.

  • Before change
QueueConfiguration NodesTotal Number of nodesTotal Number of CPU coresMemory
gpu.qThin nodes Type2b756 (8/node)1,344GB (192GB/node)
short.qThin nodes Type2b7112 (16/node)1,344GB (192GB/node)
  • After change
QueueConfiguration NodesTotal Number of nodesNumber of CPU coresMemory
gpu.qThin nodes Type2b7168 (24/node)2,688GB (384GB/node)
short.qThin nodes Type1a2128 (64/node)1,024GB (512GB/node)

For short.q, the CPU will be changed from AMD EPYC 7501 to Intel Xeon Gold 6130 due to the node type change. Please review the execution job if necessary.

Date

Wednesday, July 26, 2023 11:00 - 11:30 (24h notation)

Scope of Impact

  • During the work,

  • Before and after the work, there is no change in the method of submitting jobs for each queue.