Research Computing and Data

Winter 2024 Maintenance Reminders

Before the start of winter break, we wanted to remind you about our upcoming maintenance and let you know about a scheduled power outage that will affect Palmetto.

Summary of Outage Dates:

  • RCD Maintenance: Friday, December 20th
  • Duke Energy Maintenance Part 1: Monday, December 23rd
  • Duke Energy Maintenance Part 2: Friday, December 27th

We had previously announced our Winter 2024 maintenance plans, which will occur this Friday (December 20th) between 9:00 AM and 11:59 PM. See the Winter 2024 maintenance blog post for more details.

Additionally, Duke Energy has recently notified us that they will need to perform maintenance on power infrastructure at the data center during the holidays next week. This is necessary to resolve unmitigated impacts due to Hurricane Helene and ensure our energy supply remains stable. This maintenance will cause a partial power outage affecting some parts of Palmetto.

They will complete the power maintenance in two parts, with the first scheduled for Monday, December 23rd and the second planned for Friday, December 27th. We expect the cluster to be down starting at 7:00 AM and for power to be restored no later than 6:00 PM on both dates.

The same notes regarding job cancellation, job queueing, and scratch space as our other scheduled maintenance apply (see the Winter 2024 maintenance blog post for details).

As always, if you have any questions about this maintenance work, please submit a support ticket.

Upcoming Winter 2024 Maintenance

The RCD team wants to remind everyone about our upcoming Winter 2024 maintenance window on December 20thbetween 9:00 AM and 11:59 PM.

While maintenance work is in progress, all RCD services will be unavailable, including Palmetto 2, Open OnDemand, RCD GitLab, RCD Mattermost, and the Indigo Data Lake. Any batch jobs submitted before maintenance that cannot be completed in time will be held in the queue, but all interactive jobs will be canceled. Data transfers will be interrupted, so please complete them ahead of time to avoid possible corruption.

During this maintenance window, our engineers will work on the following:

  • Changes to the research network configuration to improve performance. Making these changes will disrupt connections to Indigo storage, so they must be completed during a maintenance window.
  • Installation of software updates for RCD GitLab.
  • There are no plans to purge scratch space. However, users are encouraged to ensure they do not have any valuable data stored in scratch space, as always.

If you have any questions about this maintenance work, please submit a support ticket.

RCD Town Hall on October 22nd, 2024

The Research Computing and Data (RCD) team held a Town Hall event on October 22nd, 2024 at 3 PM to share some important updates with the community. 

Below is a summary of what we discussed:

  • People
    • New People in RCD
    • ReDCAT
    • Internship Openings
    • Supercomputing 2024 (SC24)
  • Palmetto
    • Palmetto 1 End of Life
    • New Onboarding Training
    • Globus Maintenance
    • Upcoming Improvements to Palmetto 2
    • Upgrades to Indigo Backup
    • Winter 2024 Maintenance
    • Updates to Compute Node Purchasing
    • Condominium Model
    • Wall Time
  • Other Updates
    • New Specialized Workshop Series
    • Anaconda Transition
    • ColdFront Improvements
    • Account Status Check Tool
    • Cloud Beta
  • Open Discussion and Q&A

If you are interested in learning more about these topics, you can watch the recording or review the slide deck
Note: you must sign in with your Clemson account to view these resources, which will be available until November 23rd, 2024.

Feel free to reach out to us by submitting a support ticket, scheduling an office hours meeting, or sending a message on our chat server if you have questions or want to discuss anything from the Town Hall.

Palmetto 1 Going Offline on October 31, 2024

Palmetto 1 will be taken offline on October 31, 2024. As part of this transition, we will move Palmetto 1 hardware resources into Palmetto 2

There will be no change to data in users’ home directories when Palmetto 1 is taken offline.

If you do not currently have access to Palmetto 2, you can obtain access by following the Account Setup procedure.

To help you get started with Palmetto 2, we offer a few training/onboarding options:

If you encounter any issues or have any questions, please do not hesitate to reach out to us by submitting a support ticket.

Thank you for your understanding and cooperation during this transition.

NEW Palmetto 2 Onboarding Training

We are extremely excited to announce the launch of our new Palmetto Onboarding on Tiger Training!

You can access the new training program through the Onboarding page on our documentation website.

If you have previously attended our Onboarding or Workshop sessions, we would greatly appreciate it if you could take the new training program and share your valuable feedback with us.

Summer 2024 Maintenance is Complete!

We are excited to announce that the Summer 2024 maintenance work was completed successfully. All RCD services have been restored and are ready for users to access.

During the maintenance period, we made the following improvements:

  • Critical updates to network and storage infrastructure were completed.
    • These improvements have improved performance and stability for all users of the cluster.
  • GitLab was updated to a new major version, v17.2.2.
  • ColdFront was updated to v1.9.0.
    New features include:
    • Ability to request a GitLab group.
    • GitLab group provides enhanced features for organizing projects and repositories under a common namespace, simplifying permission management and collaboration on larger, team-based initiatives.
    • PIs can create GitLab groups by requesting a GitLab Group Allocation on ColdFront.
    • To learn more, please see the GitLab Group Allocation documentation page.
  • Slurm was upgraded to v24.05.2.
    • Spack MPI modules built with the old version of Slurm will need to be rebuilt.
  • Some MPI-enabled modules were moved in the module hierarchy.
    • The following were affected:
      • fftw/3.3.10
      • hdf5/1.14.3
      • netcdf-c/4.9.2
      • netcdf-cxx4/4.3.1
      • netcdf-fortran/4.6.1
      • netlib-scalapack/2.2.0
      • osu-micro-benchmarks/7.3
      • parallel-netcdf/1.12.3
    • To load any moved modules, you must first load the openmpi/5.0.1 module.
  • Additionally, some AMD-optimized MPI modules were moved in the module hierarchy.
    • The following were affected:
      • amdfftw/4.1
      • amdscalapack/4.1
      • lammps/20231121
    • To load these packages, you must first load both the openmpi/5.0.1 and aocc/4.1.0 modules.
  • These changes were made to better reflect the dependencies of certain packages and distinguish between MPI and non-MPI packages of the same version:
    • HDF5 was recompiled with support for C++, Fortran, High-Level APIs (hl).
      • 2 versions of HDF5 are available, one with MPI and one without.
    • hdf5/1.14.2 was upgraded to hdf5/1.14.3 (non-MPI)
  • CUDA-Aware Open MPI was rebuilt with better support for multi-node GPU detection when using Kokkos.

We appreciate your patience during the maintenance period and hope that these changes will improve the user experience.

If you have any questions or have encountered post-maintenance issues, please let us know by submitting a support ticket.

Upcoming Summer 2024 Maintenance

The RCD team has scheduled a brief maintenance window at the end of the Summer 2024 semester.

The maintenance work will begin on Monday, August 12th, 2024 at 9:00 am. All RCD services, including the Palmetto Cluster and the Indigo Data Lake, will be unavailable until the maintenance work is complete.

During the maintenance period, we plan to complete the following:

  • Our system administrators will perform critical updates to the network and storage systems for Palmetto 2.
    • These updates will improve performance and stability for all users on the cluster.
    • These changes are not directly user-facing and should not break any existing user workflows.
  • Slurm will be updated to a new version.
    • Packages built with MPI will be updated to link with the updated Slurm MPI libraries.

This page will be updated with more details about the work completed after maintenance is over.

Users should expect that RCD services will be restored no earlier than Wednesday, August 14th, 2024 at 9:00 am and should monitor email for updates from RCD.

Please feel free to reach out to us with any questions or concerns that you have about this notice by submitting a support ticket – our team would love to hear from you!

New AI/ML Nodes Available

RCD is excited to announce that we have added five new powerful compute nodes to the Palmetto 2 cluster. The hardware in the new nodes is optimized for supporting large Artificial Intelligence (AI) and Machine Learning (ML) workflows.

Each node has the following hardware specifications:

  • Model: Dell PowerEdge XE9680
  • CPU: 2x Intel Xeon Platinum 8470
  • Memory: 1 TB
  • Networking:
    • 8x NDR (3.2Tbit aggregate) Internode Communication
    • 200Gbit NDR200 Infiniband for Storage
    • 100Gb Ethernet
  • GPU: 8x NVIDIA Tesla H100 80G

These are available in the cluster for immediate use by Clemson users. We encourage you to make the most of these new resources!

Introducing the new Palmetto 2 cluster!

The RCD team is excited to announce that the new Palmetto 2 cluster is now online and ready for use. This marks the first major step in our transition from PBS to Slurm.

The logo for the new Palmetto 2 cluster.

For those who need assistance with the transition, we have prepared a Slurm Migration Guide that explains the most important differences between the two clusters. We are also offering Palmetto 2 Onboarding sessions, which provide a live walkthrough of how to use the new cluster.

Please note that access to Palmetto 2 is controlled by ColdFront, our new allocation management system. Faculty members can create projects on ColdFront and grant other users on their projects access to Palmetto 2. Students will need to ask their faculty advisor or course instructor for assistance with this process. To get started, please see the Palmetto 2 Accounts page on our documentation.

We have also launched a New Open OnDemand instance for Palmetto 2, located at ondemand.rcd.clemson.edu.

If you have any questions or need further assistance, please do not hesitate to reach out to RCD by submitting a support ticket.