Partilhar via


CycleCloud version 7.9.0

This release adds IBM Spectrum LSF as a cluster type. This release also added a number of features that improved error reporting, self-supportability, and make it easier to build and debug MPI applications.

New Features:

  • The node display now includes detailed information on preparation and configuration issues
  • The cluster summary now displays issues encountered during node preparation and configuration
  • New nodes can be added to a placement group from the web interface
  • Nodes now have a "Keep-Alive" feature to prevent them from being accidentally terminated
  • Ephemeral OS disks for virtual machines and scale sets are now supported
  • Cluster owners with an SSH key in their profile can now ssh as the cyclecloud user
  • CycleCloud now includes IBM Spectrum LSF as a cluster type
  • Subnets, vnets, and VMs are now shown on the Accounts page
  • Information on Infiniband support is included in the user interface and REST API
  • A node's placement group is now more prominently displayed in the UI
  • CycleCloud now shows problems with nodes connecting back to CycleCloud on startup
  • Nodes can get their node name and ID from the jetpack command
  • Jetpack version can now be determined on the VM via the Python API
  • Nodes must be terminated before they can be removed from a cluster
  • Active Directory authentication now supports 'user@domain.com' logins

Resolved Issues:

  • Server commands such as cycle_server start/stop would print out a stack trace on some versions of Ubuntu
  • Hv1 "promo" machine types now use the same HPC CentOS image as the Hv1 machine types
  • The Nodearray CoreCount attribute is now the autoscaling factor
  • Changes to the low priority VM checkbox sometimes failed to save in the UI
  • MPI nodes with public IPs could cause "Scaleset attributes do not match" errors
  • SSH keys that contained newlines would cause login errors
  • Invalid Azure password errors are no longer cryptically reported as "No JSON object could be decoded"
  • Certain subscriptions which do not support querying for price information no longer cause errors
  • Requesting a new certificate from Let's Encrypt would fail due to a deprecated protocol
  • Adding and then immediately removing a node from a cluster would cause an error
  • PBS head nodes occasionally had transient software installation failures
  • There was a race condition between user management and scheduler start up
  • In some cases, the managed users for a node would not be configured before the node started running jobs
  • Jetpack converge cron used an incorrect output redirect
  • Nodes booted without Jetpack installed caused a NullPointerException
  • The cyclecloud initialize command did not work with the HTTP port on sites with HTTPS
  • The "new cluster" dialog box included a Next button even without a next page
  • GridEngine autoscale occasionally spawned errors related to trying to resize previously deleted ScaleSets.
  • Nodes being reimaged could not be terminated until the reimage process completed
  • Execute nodes that were terminating could remain around after the VM is deleted
  • Terminating nodes in a placement group would be removed before the VM finished deleting
  • Force password reset option was not working properly
  • Nodes became unselected after performing an action on them
  • Removed forced upgrade of glib2 in support of ganglia
  • Node IDs were regenerated if a cluster was reimported
  • The cyclecloud connect command threw an error when using an SSH bastion without a private key
  • Azure Portal hyperlinks to scaleset VMs were broken
  • VMs could not be deleted if they were started with data disks
  • Removing a previously added cluster-init via the UI did not work properly
  • Adding nodes to a scaleset after one failed would make the failed node appear successful in the UI
  • The jetpack shutdown command did not support deallocate
  • HB60rs_v2 VMs are were not properly filtered as an "HPC" VM type
  • Adding/removing Slurm execute nodes manually is now disallowed since they would not be able to run jobs
  • 'Off' nodes were incorrectly counting against available quota
  • Region OutOfCapacity errors during Node orchestration sometimes resulted Nodes showing a LIST of MachineTypes and blocked further autoscaling
  • BeeGFS storage nodes removed on termination
  • The cyclecloud connect command printed out a warning about modifying known_hosts when it did not modify this file
  • Users on a node could not be managed after that node was rebooted
  • Corrected "stack level too deep (SystemStackError)" crash on CentOS 6
  • The default heap size for the CycleCloud web server is now 4GB
  • Updated dependencies to address following CVE issues: 2012-0881, 2014-0107, 2014-0114, 2015-7501, 2016-3092, 2017-15708, 2018-14720, 2018-16492, 2019-10744, 2019-10746, 2019-14379

Deprecated:

  • Removed Gluster-based cluster type