Skip to main content

46 posts tagged with "Maintenance"

View All Tags

· One min read

Publication date: March 8, 2023

Due to maintenance work on SINET6 device, the network will be temporarily out of service during the following times.

  • Date and time: 00:00 - 01:00, Saturday, March 9, 2024

    • Communication breakdowns will occur within 5 minutes during the above time periods.
  • Scope of impact

    • During the disconnection, you will not be able to log in to the supercomputer or transfer data.
    • There will be no suspension of active jobs.

Thank you for your understanding and cooperation.

· One min read

Publication date: February 27, 2024

On Tuesday 27 February 2024, an issue occurred that the email containing the Token code for the SSL-VPN connection was not sent.

Date of occurrence: 10:00 - 15:45, Tuesday, 27 February 2024

Currently, the situation has been restored and the email is being sent.

We apologise for any inconvenience caused to users of the Personal Genome Analysis division.

· One min read

Publication date: February 26, 2024

Currently, there is a problem with the Account Registration System where an error message is displayed and the application cannot be completed when you click 'I am not a robot' on the confirmation screen after entering your updated information on the Account Information Change Request page.

We apologise for any inconvenience this may cause.

A notice will be posted on the website as soon as the problem has been resolved.

Resolved at 17:17, (Mon)February 26, 2024. Please re-apply.

· One min read

Publication date: February 14, 2024

Due to network device maintenance, the network will be temporarily out of service during the following time period.

  • Date: Monday, February 19, 2024 12:00 - 13:00 (24h notation)

    • Communication interruptions of about 5 minutes occur.
  • Scope of Impact

    • During the communication breakdown, login to the supercomputer and data transfer operations will not be available.
    • No jobs in operation will be suspended.

We appreciate your understanding and cooperation.

· One min read

Publication Date: December 27, 2023

The account registration system will be suspended due to the year-end and New Year's holidays.

Date and Time

  • December 27, 2023 17:00 - January 9, 2024 12:00

Scope of impact

  • New account registration and changes to account information will not be available during the above year-end and New Year's period.
  • There will be no impact on network communication, login, job execution, etc. in the general analysis division and personal genome analysis division.
  • There will be no impact on DDBJ services.

We appreciate your understanding and cooperation.

· 3 min read

Publication date: October 2, 2023

The scheduled maintenance of the NIG supercomputer is scheduled on the following date and time in accordance with the legal power outage of the NIG. The supercomputer will not be available during the scheduled maintenance.

Period

November 24, 17:00 - November 30, 2023, 17:00(24h)

Work schedule

  • 11/24(Fri.) 17:00~  Supercomputer outage
  • 11/25(Sat.)      Legal power outage
  • 11/26(Sun.)~11/29(Wed.) Supercomputer scheduled maintenance work (UPS maintenance, Lustre maintenance, software updates, etc.)
  • 11/30(Thu.) is a spare day.

Work Description

Works for scheduled maintenance are as follows.

  1. Software version upgrade
  2. OS migration (from CentOS 7.9 to Ubuntu Linux 22.04LTS)
    • FAQ page for OS migration has been created.\🆕/
    • The medium nodes and the fat node hardware (that is, HPE ProLiant DL560 Gen10 and HPE Superdome Flex) do not support Ubuntu Linux and could not be migrated from Cent OS 7.9 to Ubuntu Linux 22.04 this time.
  3. Grid Engine version upgrade
  4. yum update for Cent OS that did not migrate OS
  5. Firmware and device driver version upgrade for InfininBand and Lustre
  6. LDAP configuration changes
  7. UPS inspection work

Software Version Upgrade Details

Table: Software upgrade plan for development/analysis

#SoftwareBefore scheduled maintenanceAfter scheduled maintenancee
(1)Apptainer1.11.2.2-1
(2)SingularityCE3.10.23.11.4
(3)NVIDIA HPC SDK
(Previous PGI compiler)
22.923.7
(4)*NVIDIA CUDA12.212.1
(5)Intel OneAPI2022.2.02023.2.0
(6)Altair Grid Engine8.6.19/8.6.48.8.1

*: CUDA is downgraded to 12.1 because the supported version of the Ubuntu Linux 22.04LTS GA kernel is 12.1.

OS migration (from CentOS 7.9 to Ubuntu Linux 22.04LTS)

As CentOS 7 will reach End-Of-Life on 30 June 2024, the migration from CentOS 7.9 to Ubuntu Linux 22.04LTS will be performed during scheduled maintenance.

  • All compute nodes in the general analysis division will be migrated from CentOS 7.9 to Ubuntu Linux 22.04LTS. With this, the analysis environment may need to be re-installed. Please make sure to check the development environment and reinstall the analysis environment on your own.
  • Users who are using occupied compute nodes will be asked by email whether you would like to migrate your OS during scheduled maintenance. Please let us know when the OS migration is convenient for you.

The Personal Genome Analysis Section

  • The GPU compute nodes under Slurm will be migrated from CentOS 7.9 to Ubuntu Linux 22.04LTS. With this, the analysis environment may need to be re-installed. Please make sure to check the development environment and reinstall the analysis environment on your own.
  • Users who are using occupied compute nodes will be asked by email whether you would like to migrate your OS during scheduled maintenance. Please let us know when the OS migration is convenient for you.

Notes

  • Running jobs will be deleted, so please resubmit jobs after the scheduled maintenance.

· One min read

Publication date: September 26, 2023

Due to maintenance work on SINET6 equipment, the network will be temporarily out of service during the following times.

  • Date and time: 0:00 - 5:00, Thursday, October 26, 2023

    • Communication breakdowns will occur during the above time period for 5 minutes.
    • Basically, communication breakdowns are not planned to occur, but if it is deemed necessary to restart the equipment during the work, a restart will be carried out, and communication will be broken for about five minutes.
  • Scope of impact

    • During the disconnection, you will not be able to log in to the supercomputer or transfer data.
    • There will be no suspension of active jobs.

Thank you for your understanding and cooperation.

· One min read

publication date: october 2, 2023

Summary

On Saturday, September 30, 2023, at around 16:00 and 23:56, power outages of less than 5 minutes occurred in a wide area east of Shizuoka prefecture, affecting networks and other facilities.

https://teideninfo.tepco.co.jp/day/teiden/index-j.html

Restoration work is underway.

Recovered at 16.00, Monday 2 October.

Scope of impact

  • External network, etc.
    • SINET connection was interrupted from 23:56 on 30 Sep 2023 to 00:02 on 01 Oct 2023. (Recovered)
  • General analysis division
    • Not affected
  • Personal genome analysis division
    • Unable to send SSL-VPN tokens.
  • DDBJ Service
    • Under investigation.

· One min read

Publication date: September 27, 2023

Summary

On Wednesday, September 27, 2023, at 13:48, one of the 66 OSTs comprising Lustre9 experienced an I/O outage.

Restoration work is currently underway.

On Thursday 28 September 2023, at 0:41pm, the restoration work was completed.

Scope of impact

  • General analysis division
    • When accessing Lustre9 (under /usr/local/shared_data, /usr/local/resources), some files cannot be read.
    • Also, the account application system is affected.
    • Access to the files in the part of the area where the fault has occurred is not possible. (The status is waiting.)
    • When you try to display a list of directories that contain files corresponding to the above, there is no response. (The status is waiting.)
  • Personal genome analysis division
    • When accessing Lustre9 (under /usr/local/shared_data, /usr/local/resources), some files cannot be read.
    • Also, the account application system has been affected.
    • Access to the files in the part of the area where the fault has occurred is not possible. (The status is waiting.)
    • When you try to display a list of directories that contain files corresponding to the above, there is no response. (The status is waiting.)
  • DDBJ services
    • Some services are affected.

· One min read

Publication date: August 8, 2023

A hardware failure has occurred in one of the redundant OST controllers in the Lustre7 high-speed storage system in the General Analysis Compartment. The controller will be replaced and the system will be taken back on the following schedule

schedule

13:00 - 15:00 Replacement work 15:00 - 16:00 Take-back work *I/O suspension occurred

System Impact at this time

No system impact at this time.

System impact at the time of the work

  • The NIG supercomputer general analysis division
    • I/O suspension of about 10 minutes will occur once on Lustre7 during the replacement work. I/O will be automatically resumed after the work is completed.
  • The NIG supercomputer personal genome analysis division
    • Not affected.
  • DDBJ Service
    • Not affected.

Please understand that it may take some time depending on the I/O status at the time of the work. Thank you for your understanding and cooperation.