Research Computing and Data

Upcoming Spring 2025 Maintenance Work

The RCD team has scheduled a maintenance window to perform work on the Palmetto Cluster, Indigo Data Lake, and other systems at the end of the Spring semester.

This work will begin on Saturday, May 31st, 2025, at 9:00 AM. While maintenance work is in progress, all RCD services will be unavailable.

During the maintenance window, we plan to complete the following:

  • Minor OS Upgrades
  • Networking Maintenance
  • System Testing and Benchmarking

There are no plans to purge scratch space during this maintenance, but users should be mindful that scratch space is never backed up and critical files should always be stored on home or project storage.

Users should expect that services will be restored no earlier than Friday, June 6th, 2025, at 5:00 PM and should monitor their email for updates from RCD.

Please feel free to reach out to RCD with any questions or concerns that you have about the maintenance work by submitting a support ticket – we would love to hear from you!

Partial Outage for Palmetto on February 24th

There will be a partial outage on February 24, 2025, at 9 AM. We expect the maintenance to take approximately one hour to complete. 

We have identified an unexpected issue with one of the network switches in Palmetto, requiring us to reboot the switch. This will only affect a subset of compute nodes on Palmetto. 

We are preventing new jobs from landing on the affected nodes to minimize disruptions. Jobs currently running on these nodes will be allowed to continue until the maintenance period begins. Users will still be able to log in, submit jobs, and use other RCD services, such as Open OnDemand. However, please keep in mind that you may experience extended wait times due to the affected compute nodes. 

We have chosen to perform this emergency maintenance as soon as possible to avoid a larger, unplanned outage. We apologize for any inconvenience this may cause. 

If you have any questions, please reach out to us by submitting a support ticket.

Data Transfer Node Replacement Maintenance

We are replacing our old Data Transfer nodes with new Data Transfer nodes on Tuesday, 02/11/2025 at 9:00 am. We expect the replacement process to take about 2 hours.

There will be no change to the Data transfer node names and details. Users will not have to update any details on their end.

Active SCP/SFTP transfers will be interrupted during this time. Globus transfers should be able to restart after the new nodes come online.

Please reach out to us if you have any questions or concerns by submitting a support ticket.

Winter 2024 Maintenance Reminders

Before the start of winter break, we wanted to remind you about our upcoming maintenance and let you know about a scheduled power outage that will affect Palmetto.

Summary of Outage Dates:

  • RCD Maintenance: Friday, December 20th
  • Duke Energy Maintenance Part 1: Monday, December 23rd
  • Duke Energy Maintenance Part 2: Friday, December 27th

We had previously announced our Winter 2024 maintenance plans, which will occur this Friday (December 20th) between 9:00 AM and 11:59 PM. See the Winter 2024 maintenance blog post for more details.

Additionally, Duke Energy has recently notified us that they will need to perform maintenance on power infrastructure at the data center during the holidays next week. This is necessary to resolve unmitigated impacts due to Hurricane Helene and ensure our energy supply remains stable. This maintenance will cause a partial power outage affecting some parts of Palmetto.

They will complete the power maintenance in two parts, with the first scheduled for Monday, December 23rd and the second planned for Friday, December 27th. We expect the cluster to be down starting at 7:00 AM and for power to be restored no later than 6:00 PM on both dates.

The same notes regarding job cancellation, job queueing, and scratch space as our other scheduled maintenance apply (see the Winter 2024 maintenance blog post for details).

As always, if you have any questions about this maintenance work, please submit a support ticket.

Upcoming Winter 2024 Maintenance

The RCD team wants to remind everyone about our upcoming Winter 2024 maintenance window on December 20thbetween 9:00 AM and 11:59 PM.

While maintenance work is in progress, all RCD services will be unavailable, including Palmetto 2, Open OnDemand, RCD GitLab, RCD Mattermost, and the Indigo Data Lake. Any batch jobs submitted before maintenance that cannot be completed in time will be held in the queue, but all interactive jobs will be canceled. Data transfers will be interrupted, so please complete them ahead of time to avoid possible corruption.

During this maintenance window, our engineers will work on the following:

  • Changes to the research network configuration to improve performance. Making these changes will disrupt connections to Indigo storage, so they must be completed during a maintenance window.
  • Installation of software updates for RCD GitLab.
  • There are no plans to purge scratch space. However, users are encouraged to ensure they do not have any valuable data stored in scratch space, as always.

If you have any questions about this maintenance work, please submit a support ticket.

Summer 2024 Maintenance is Complete!

We are excited to announce that the Summer 2024 maintenance work was completed successfully. All RCD services have been restored and are ready for users to access.

During the maintenance period, we made the following improvements:

  • Critical updates to network and storage infrastructure were completed.
    • These improvements have improved performance and stability for all users of the cluster.
  • GitLab was updated to a new major version, v17.2.2.
  • ColdFront was updated to v1.9.0.
    New features include:
    • Ability to request a GitLab group.
    • GitLab group provides enhanced features for organizing projects and repositories under a common namespace, simplifying permission management and collaboration on larger, team-based initiatives.
    • PIs can create GitLab groups by requesting a GitLab Group Allocation on ColdFront.
    • To learn more, please see the GitLab Group Allocation documentation page.
  • Slurm was upgraded to v24.05.2.
    • Spack MPI modules built with the old version of Slurm will need to be rebuilt.
  • Some MPI-enabled modules were moved in the module hierarchy.
    • The following were affected:
      • fftw/3.3.10
      • hdf5/1.14.3
      • netcdf-c/4.9.2
      • netcdf-cxx4/4.3.1
      • netcdf-fortran/4.6.1
      • netlib-scalapack/2.2.0
      • osu-micro-benchmarks/7.3
      • parallel-netcdf/1.12.3
    • To load any moved modules, you must first load the openmpi/5.0.1 module.
  • Additionally, some AMD-optimized MPI modules were moved in the module hierarchy.
    • The following were affected:
      • amdfftw/4.1
      • amdscalapack/4.1
      • lammps/20231121
    • To load these packages, you must first load both the openmpi/5.0.1 and aocc/4.1.0 modules.
  • These changes were made to better reflect the dependencies of certain packages and distinguish between MPI and non-MPI packages of the same version:
    • HDF5 was recompiled with support for C++, Fortran, High-Level APIs (hl).
      • 2 versions of HDF5 are available, one with MPI and one without.
    • hdf5/1.14.2 was upgraded to hdf5/1.14.3 (non-MPI)
  • CUDA-Aware Open MPI was rebuilt with better support for multi-node GPU detection when using Kokkos.

We appreciate your patience during the maintenance period and hope that these changes will improve the user experience.

If you have any questions or have encountered post-maintenance issues, please let us know by submitting a support ticket.

Upcoming Summer 2024 Maintenance

The RCD team has scheduled a brief maintenance window at the end of the Summer 2024 semester.

The maintenance work will begin on Monday, August 12th, 2024 at 9:00 am. All RCD services, including the Palmetto Cluster and the Indigo Data Lake, will be unavailable until the maintenance work is complete.

During the maintenance period, we plan to complete the following:

  • Our system administrators will perform critical updates to the network and storage systems for Palmetto 2.
    • These updates will improve performance and stability for all users on the cluster.
    • These changes are not directly user-facing and should not break any existing user workflows.
  • Slurm will be updated to a new version.
    • Packages built with MPI will be updated to link with the updated Slurm MPI libraries.

This page will be updated with more details about the work completed after maintenance is over.

Users should expect that RCD services will be restored no earlier than Wednesday, August 14th, 2024 at 9:00 am and should monitor email for updates from RCD.

Please feel free to reach out to us with any questions or concerns that you have about this notice by submitting a support ticket – our team would love to hear from you!

Upcoming Spring 2024 Maintenance Work

The RCD team has scheduled a maintenance window to complete major changes to the Palmetto Cluster and other systems at the end of the Spring semester.

This work will begin on Monday, May 6th, 2024, at 9:00 am. While maintenance work is in progress, all RCD services, including the Palmetto Cluster and the Indigo Data Lake, will be unavailable.

During this maintenance window, the RCD team will complete the following updates, which may have user impact:

  1. The Palmetto 2 (Slurm) cluster will move into general availability.
  2. Additional nodes will move into Palmetto 2:
    • All nodes in owner queues
    • All nodes from HDR phases
  3. Our new allocation management system, ColdFront, will become available.
    • Current Palmetto 1 (PBS) accounts do not grant access to Palmetto 2.
    • Current Palmetto 1 users must use ColdFront to request new allocations to make use of Palmetto 2
    • No new accounts will be added to Palmetto 1 (PBS). 
  4. /scratch1 and /fastscratch will be decommissioned and will no longer be available.
    • All data on /scratch1 and /fastscratch will be erased.
  5. /scratch will be re-initialized.
    • All data on /scratch will be erased.
  6. ZFS systems are being decommissioned.
    • ZFS storage owners have been contacted about transitioning to the new storage system.
    • All data stored on ZFS file systems will be migrated to the Indigo Data Lake, so no data will be lost.
    • If you are a ZFS storage owner and have not received an email from us, please reach out to us.
  7. A new software module system will be introduced for Palmetto 2 (Slurm). This system will provide a more user-friendly and efficient way to manage software installations and versions.
  8. A refreshed Open OnDemand interface will be available for Slurm. The OnDemand interface will be updated to provide a more modern and user-friendly experience for Slurm users.
  9. A new job monitoring and visualization tool, jobstats, will be deployed across the cluster. This new tool allows users to monitor their jobs more easily and efficiently and will replace many existing monitoring methods.

Users should expect that services will be restored no earlier than Friday, May 13th, 2024, at 5:00 pm and should monitor their email for updates from RCD.

We understand that these changes are significant and want to help users transition smoothly. RCD will make updated documentation, training/tutorial sessions, and additional support resources available after maintenance.

Please feel free to reach out to RCD with any questions or concerns that you have about the maintenance work by submitting a support ticket – we would love to hear from you!

Upcoming Maintenance for January 2024

The RCD team has planned a scheduled maintenance window for the Palmetto Cluster between Thursday, January 4th and Sunday, January 7th, 2024. Users should expect the cluster to be unavailable during this time, and any jobs running when maintenance begins will be canceled.