Monitoring and management system solutions

The monitoring managmeent system of a high-performance computing center consists of related hardware and cluster monitoring software. The hardware includes login management node, KVM system and monitoring management system.

Login management node

The management node is mainly used to run the cluster monitoring management services such as user information management, Infiniband sub-network management, job scheduling service, system monitoring service and time synchronization service. The management node does not require a high performance, but it requires higher reliability. To improve availability, two or more management nodes should be configured. The critical system should be configured as redundant mode.

The login node is used for user interaction jobs such as user program compiling, algorithm preparation, file uploading/downloading and job submission control. The login node load will change much with the user quantity and operation. The login node may crash down due to illegal user operation, so the login node should not be multiplexed with the management node to improve reliability of the whole system. If the user access traffic is high, multiple login nodes can be configured to share the user traffic.

Monitoring managmeent software

An excellent high-performance computing platform not only provides high performance and high reliability, but also is easy to operate and manage. The Sugon Gridview cluster monitoring management system provides a simple, easy-to-use, friendly and central cluster monitoring, management and operation platform to users and administrators, and provides the cluster deployment, cluster monitoring, cluster management, alarm management, statistics report, job scheduling and other functions.


Figure: Core functions of Gridview

