Skip to main content

67 posts tagged with "maintenance"

View all tags

Supercomputer Service Suspension (24th–27th October 2025) Due to NIG Power Restoration Following 5th September Lightning Strike

We would like to inform you that, due to ongoing restoration work related to a lightning strike on Friday, 5th September 2025, which affected the power supply equipment at the National Institute of Genetics, there will be a scheduled power outage on Saturday, 25th October 2025.

As a result of this outage, the supercomputer will be temporarily unavailable during the weekend of 25th October.

Further details regarding the restoration timeline and any potential impacts will be communicated once confirmed.

(Restored) Power Outage at NIG on September 8, 2025

From around 13:40 on Friday, September 5, 2025, a lightning strike associated with a typhoon directly hit NIG, causing a power outage. As a result, the supercomputer executed an automatic shutdown, and the entire NIG supercomputer system was halted. Since this power outage was classified as an electrical accident due to lightning, power restoration required inspection procedures, which prolonged the outage.

Recovery work for the supercomputer began on Monday, September 8, and login services were resumed at 12:00 on Tuesday, September 9, except for some systems. (For dedicated VMs that had not been confirmed to be running as of 12:00 on September 9, all of them were successfully restarted by 15:00 on the same day.)

(Restored) [Outage] May 22, 2025: Slurm Outage in General Analysis Division on Thursday, May 22, 2025

At 02:54 on Thursday, May 22, 2025 (24-hour format; all times below are in 24-hour format), the Slurm management server for the general analysis division encountered a service outage.

The cause of the issue was insufficient memory on the compute node hosting the Slurm management server.

Recovery procedures were completed at 10:34 on the same day, and job submission has since resumed.

Scope of Impact

Power Outage on April 7, 2025

A brief power outage occurred in Mishima City around 13:55, lasting approximately one minute.
The supercomputer itself did not experience a power outage due to the UPS, but network connectivity was lost for about 5 minutes.
Other impacts are currently under investigation.