Skip to main content

69 posts tagged with "maintenance"

View all tags

(Ended) The NIG Supercomputer Service Suspension (6th - 10th November)

At approximately 13:40 on Friday, 5 September 2025, the National Institute of Genetics (NIG) was struck by a lightning bolt, causing significant damage to the facility’s electrical infrastructure and resulting in a power outage.

As a consequence, the NIG Supercomputer was temporarily rendered unavailable. However, the login service was restored promptly at 12:00 on Tuesday, 9 September 2025. (For those dedicated virtual machines that had not been confirmed to be operational by 12:00 on 9 September, all were successfully restarted by 15:00 on Wednesday, 10 September.)

Repairs to the damaged equipment were completed on Thursday, 6 November 2025. Due to the repair work being close to the weekend, the NIG Supercomputer was scheduled to be offline from 08:30 on Thursday, 6 November 2025, until 09:00 on Monday, 10 November 2025. However, the work proceeded smoothly, and the NIG Supercomputer was restored earlier than planned, by 17:30 on Friday, 7 November 2025.

(Postponed) The NIG Supercomputer Service Suspension (24th–27th October 2025) Due to NIG Power Restoration Following 5th September Lightning Strike

We would like to inform you that, due to ongoing restoration work related to a lightning strike on Friday, 5th September 2025, which affected the power supply equipment at the National Institute of Genetics, there will be a scheduled power outage on Saturday, 25th October 2025.

As a result of this outage, the supercomputer will be temporarily unavailable during the weekend of 25th October.

Further details regarding the restoration timeline and any potential impacts will be communicated once confirmed.

(Restored) Power Outage at NIG on September 8, 2025

From around 13:40 on Friday, September 5, 2025, a lightning strike associated with a typhoon directly hit NIG, causing a power outage. As a result, the supercomputer executed an automatic shutdown, and the entire NIG supercomputer system was halted. Since this power outage was classified as an electrical accident due to lightning, power restoration required inspection procedures, which prolonged the outage.

Recovery work for the supercomputer began on Monday, September 8, and login services were resumed at 12:00 on Tuesday, September 9, except for some systems. (For dedicated VMs that had not been confirmed to be running as of 12:00 on September 9, all of them were successfully restarted by 15:00 on the same day.)

(Restored) [Outage] May 22, 2025: Slurm Outage in General Analysis Division on Thursday, May 22, 2025

At 02:54 on Thursday, May 22, 2025 (24-hour format; all times below are in 24-hour format), the Slurm management server for the general analysis division encountered a service outage.

The cause of the issue was insufficient memory on the compute node hosting the Slurm management server.

Recovery procedures were completed at 10:34 on the same day, and job submission has since resumed.

Scope of Impact

Power Outage on April 7, 2025

A brief power outage occurred in Mishima City around 13:55, lasting approximately one minute.
The supercomputer itself did not experience a power outage due to the UPS, but network connectivity was lost for about 5 minutes.
Other impacts are currently under investigation.