Monitor hardware

+

Search Tips   |   Advanced Search

Monitor the hardware

Contents

1. Monitor compute nodes
2. Monitor management nodes
3. Monitor Flex System Enterprise Chassis
4. Monitor storage devices
5. Monitor network devices


Monitor the hardware

You can monitor various hardware devices included in the system.

Facilities for monitoring hardware are accessible from the system console.


Monitor compute nodes

Use the system console to monitor the details of the compute nodes in the system. You must be assigned the Hardware administration role with permission to View all hardware resources (Read-only) to perform these steps.


Procedure

  1. Access the console pane. Click System Console > Hardware > Compute Nodes.

  2. View the compute node details. Details include the status and associated warnings, architecture, firmware level, energy information, service processor, unified extensible firmware interface level, machine type, location in the system, cloud group status, physical cores, physical memory, virtual machines, and health statistics.

    The details listed on this window match the information that is displayed on the Infrastructure Map pane.

  3. Review your hardware capacity settings. You can view a variety of hardware information, such as details included in the following list:

    Events

    Events that are associated with the compute node. Click the View details link to view the list of events.

    Jobs

    Jobs that are associated with the compute node. Click the View details link to view the list of jobs.

    Type

    Type of hardware device.

    Power status

    Power status of the compute node, including powered on or powered off, or quiesced.

    Energy information

    Energy range and average

    Location

    System, chassis and node in which the compute node is located.

    In cloud group

    View the cloud group that is associated with the compute node. Click the link to view the list of cloud groups. The details about the compute node are displayed in the Cloud Group pane details.

    Compute node information

    Specifies serial number, machine type, architecture, firmware level, service processor level, PVU value and IP address.

    State

    State of the compute node:

    • Available: The compute node is ready for use.
    • Quiesced: The compute node will not accept any new instances, but current instances on the compute node will continue to run.
    • Discovery: The compute node is currently being inventoried by the system. After the compute node is inventoried, it is automatically initialized.
    • Unlicensed: The compute node is not in a valid chassis slot for the model of the system. For example, small, medium, large and extra large system models contain valid slots.
    • Initializing: A hypervisor is currently being installed on the compute node.
    • Failed: The hypervisor failed to install correctly on the compute node.
    • Maintenance: The compute node has evacuated all instances and is ready for maintenance to be applied.

    Temperature

    Temperature of the compute node.

    Health statistics

    Health statistics of the compute node, including core temperature warning number, error LED number, hardware inventory warnings number, VMS inventory warnings number, successful deploy number and deploy number. Click Clear to clear the statistics.

    Physical cores

    Number of physical cores used by the compute node. Expand the Physical core field to view the graph.

    Physical memory

    Physical memory used by the compute node. Expand the Physical memory field to view the memory allocation and memory utilization percentage graph.

    Virtual machines

    Specifies the virtual machines that are associated with the compute node. Click the link to view the list of virtual machines. Details are displayed in the pane.

    LEDs

    You can also view LEDs from the Troubleshooting menu. Click System > Troubleshooting, and expand LED status. Using the Troubleshooting menu will show LEDs for all components in the system, not just for one compute node.

    Physical IO Adapter

    Specifies the input and output statistics, RDMA packets, and RDMA bytes of the compute node.


Monitor management nodes

You can use the system console to monitor details of the management nodes. You must be assigned the Hardware administration role with permission to View all hardware resources (Read-only) to perform these steps.


Procedure

  1. Access the console pane. Click System Console > Hardware > Management Nodes.

  2. Select the management node to monitor.

  3. View the management node details. Details include:

    Events

    Provides the count of Error and Warning events. Click on the number to the right of the event type to view the details of each recorded event.

    Jobs

    Provides the count of Pending and Started jobs. Click on the number to the right of the job type to view the details of each queued or running job.

    Type

    Displays the management node type.

    Status

    Indicates the availability of the management node.

    Power status

    Indicates whether the management node is powered on or off.

    Energy information

    Displays the Input power range and Average input measurements.

    Location

    Displays information about the physical location of the management node. Click on a rack component to view the Infrastructure Map (Graphics View).

    IP address

    Displays the IP address of the management node.

    Vitualization System Management

    Displays the version of the Virtualization System Manager.

    Management Node Information

    Displays the software version of the management node. Expand this section to view the machine type, architecture, and firmware level.

    Temperature

    Indicates the Ambient temperature and Maximum ambient temperature of the management node.

    High availability Status

    Indicates the overall health state of the management node, and the date and time of the last health check update. Expand this section for a full list of high availability statuses.

    High availability statuses

    Item State Details
    Health state Active All of the high availability components for this management node appear to be healthy.
    Error One or more high availability components for this management node are not in a healthy state.
    Unknown One or more of the component states were not able to be determined within a reasonable amount of time.
    System leader state Active This PureSystems. Manager is the leader.
    Standby This PureSystems Manager is the non-leader.
    Unknown This PureSystems Manager's leader state could not be determined.
    System data replication Active System Data Replication is healthy for this PureSystems Manager.
    Disconnected System Data Replication is disconnected (off the network) or the PureSystems Manager has been stopped.
    Error System Data Replication reports an Error state for one of the servers.
    Pending System Data Replication is not synchronizing properly.
    Quiesced System Data Replication is not running on this PureSystems Manager.
    Unknown The system cannot determine the state of System Data Replication.
    Workload state Available The workload console for this PureSystems Manager is available.
    Pending The workload console for this PureSystems Manager is not configured properly.
    Quiesced The workload console for this PureSystems Manager has not been configured.
    Standby The workload console for this PureSystems Manager is in standby.
    Unknown The workload console state for this PureSystems Manager is unattainable.
    Customer management floating IP state Applied The Customer Management Floating IP address is assigned to this PureSystems Manager.
    Offline The Customer Management Network has not been configured on this system.
    Quiesced The Customer Management Floating IP address is not assigned to this PureSystems Manager.
    Unknown It cannot be determined that this PureSystems Manager has been assigned the Customer Management Floating IP address.
    Cloud group floating IP state Applied All of the Cloud Groups' management VLAN floating IP addresses are assigned to this PureSystems Manager.
    Quiesced At least one of the Cloud Groups' management VLAN floating IP addresses is not assigned to this PureSystems Manager.
    Unknown It cannot be determined if this PureSystems Manager has the Cloud Groups' management VLAN floating IP addresses.
    Internal floating IP state Applied All of the system's floating IP addresses are assigned to this PureSystems Manager.
    Quiesced Not all of the system's floating IP addresses are assigned to this PureSystems Manager.
    Unknown It cannot be determined if the system's floating IP addresses are assigned to this PureSystems Manager.
    Placement state Started The internal placement application is started, which is expected on the Active PureSystems Manager.
    Stopped The internal placement application is stopped, which is expected on the Standby PureSystems Manager.
    Unknown The internal placement application status is not able to be determined.
    Leader state Started The internal leader application is started.

    Note: This does not mean this PureSystems Manager is the leader. Check the System Leader state to determine that.

    Stopped The internal leader application is not started.

    Note: This does not mean this PureSystems Manager is not the leader. Check the System Leader state to determine that.

    Unknown The internal leader application status is not able to be determined.

    Note: This information is valid only if the Updated on field displays a time within 10 minutes of the current time.

    Health statistics

    Indicates the overall Health status of the management node.

    Physical cores

    Displays the collective use percentage of the management node's physical cores. Expand this section to view the CPU Allocation and Utilization graph, and CPU details.

    Physical memory

    Displays the collective use percentage of the management node's physical memory. Expand this section to view the Memory Allocation and Utilization graph, and physical memory details.

    LEDs

    Displays the LED statuses of the management node. Expand this section to view the state of each node component.

    The details listed in this window match the information that is displayed in the Infrastructure Map pane.

  4. Review the management node events. From the Events field, click View details. The Events pane is displayed. Click the event to review. You can also view this pane by clicking System > Events.

  5. Review the management node jobs. From the Jobs field, click View details. The Job Queue pane is displayed. Click the job to review. You can also view this pane by clicking System > Job Queue.


Monitor Flex System Enterprise Chassis

Use the system console to monitor the IBM Flex System Enterprise Chassis. You must be assigned the Hardware administration role with permission to View all hardware resources (Read-only) to perform these steps.

You can also view the Flex System Enterprise Chassis information from the tree view of the Infrastructure Map. This console page displays a list of all chassis in the left pane. The details for each chassis include events, jobs, type, status, machine type, energy information, location, temperature, and LEDs. There are expandable tables for all the various hardware components that exist in each chassis including compute nodes, Chassis Management Modules (CMMs), network switches, management nodes, power supplies and fans.


Procedure

  1. Access the console pane. Click Hardware > IBM Flex Management Chassis

  2. From the left pane, select a Flex System Enterprise Chassis to view. The following list provides a description of each section in the right pane:

    Events

    Click the underlined number next to the Error or Warning fields to view specific errors or warnings. Click View details in the Events field to view a combined list of the warnings and critical events. The Events pane includes a table that lists all of the events that can be filtered by event text, type, severity and category.

    Jobs

    Click the underlined number next to the Pending Queues or Started Queues fields to view pending and started queues. Click View details in the Jobs field to view a combined list of the of pending and started queues.

    Type

    Type of hardware component.

    Status

    Specifies if the Flex System Enterprise Chassis is available.

    Machine type

    Machine type associated with the Flex System Enterprise Chassis.

    Energy information

    Specifies the input power range and the average input power range.

    Location

    Rack and unit where the Flex System Enterprise Chassis is located.

    Temperature

    Ambient and maximum ambient temperature of the Flex System Enterprise Chassis.

    LEDs

    Specifies the LEDs associated with the Flex System Enterprise Chassis.

    Compute nodes

    Total number of compute nodes and the number of available compute nodes in the chassis.

    Expand Compute nodes to display a table that lists all of the compute nodes in the Flex System Enterprise Chassis. The table includes the name of the compute node, status and firmware level. The compute nodes are named by serial number. Click a serial number to view the details about the specific compute node.

    Management nodes

    Total number of management nodes and the number of available management nodes in the chassis.

    Expand Management nodes to display a table that lists all of the management nodes in the chassis. The table includes the name of the management node, power status and firmware level. The management nodes are named by serial number. Click a serial number to view the details about the specific management node.

    Chassis Management Modules

    Total number of CMMs and the available number of CMMs in the chassis.

    Expand Chassis management modules to display a table that lists all of the CMMs in the chassis. The table includes the name of the CMM, status and role. The modules are named by serial number. Click a serial number and the Infrastructure Map pane is displayed. The CMMs are listed under the Flex System Enterprise Chassis in the tree view.

    Network devices

    Total number of network devices and the available number of network devices in the chassis.

    Expand Network devices to display a table that lists all of the network devices in the chassis.

    Chassis fans

    Total number of fans and available number of fans in the chassis.

    Expand Chassis fans to display a table that lists all of the fans in the chassis. The table includes the serial number of the fan, status, and speed.

    Power supplies

    Total number of power supplies and the available number of power supplies in the chassis.

    Expand Power supplies to display a table that lists all of the power supplies in the chassis. The table includes the serial number of the power supply, status, and description.


Monitor storage devices

Use the system console to monitor storage devices in your system. You must be assigned the Hardware administration role with permission to View all hardware resources (Read-only) to perform these steps. By monitoring your storage devices, you can view various details such as associated events and jobs, firmware levels, status, capacity, and usage and allocation of disks and volumes.


Procedure

  1. Click Hardware > Storage Devices.

  2. Select a storage device and monitor its details:
    • Events: Number of events that are associated with the device.
    • Jobs: Specifies the pending and started jobs that are associated with the device.
    • Type: Specifies the type of storage device.
    • Firmware: Specifies the firmware release level.
    • Status: Specifies the availability status of the device.
    • Capacity: Specifies the capacity of the device. This information is displayed only for the storage controller and includes the total space and the percentage of space used.
    • Location: Specifies the physical location of the device in the system.
    • Temperature: Specifies the temperature information for the device.
    • Physical cores: Specifies the percentage in a graph of CPU utilization and CPU allocation.
    • Disk Drives: Specifies a list of disk drives their information, including location, state, capacity, and type.
    • Operating system volumes: Specifies the operating system volumes.
    • Storage volumes: Specifies a list of storage volumes that are associated with the device, including their name, size, and state.

      In a VMware environment, PureApplication System allocates Virtual Machine File System (VMFS) volumes on a cloud group basis as needed. VMFS is a high-performance file system that is optimized for storing virtual machines. The system first searches for a storage controller that has at least 1.8 TB of free capacity and an average latency less than 50 ms for the new VMFS storage volume. If no storage controller with such latency exists, the request to create a disk or volume fails.

      For each storage controller that has an acceptable average latency, the system searches for an existing storage volume on the controller that contains sufficient free capacity for the new disk or volume. Otherwise, a new storage volume is created. After the VMFS storage volume is identified, the disk or volume is placed in the VMFS storage volume.

      When a cloud group is deleted, all remaining VMFS storage volumes for the cloud group are released. As disks or volumes are released, the system attempts to free unused storage volumes while it retains excessive capacity to avoid creating and deleting storage volume cycles unnecessarily. Deleting a 1.8 TB storage volume (when a disk or volume is deleted) to create a new 1.8 TB storage volume is an example of a cycle that is unnecessarily deleted.

    • LUNs: Specifies the total and pending number of logical unit numbers (LUNs).
    • Storage controller ports: Specifies information about the number of total ports, availability, the device state, identifier and speed, and the corresponding worldwide port name.
    • Storage node statistics: Specifies the input and output statistics for the storage node, including bytes, latency, and number of messages.
    • LEDs: Specifies information about the LEDs for the device.


Monitor network devices

Monitor the status and attributes of the network devices on your system. Regular monitoring helps to ensure that you are up to date on any changes. You must be assigned the Hardware administration role with permission to View all hardware resources (Read-only) to perform these steps. You can use the system console, the command line interface, or the REST API to complete this task. For the command line and REST API information, see the Related information section.


Procedure

  1. Click Hardware > Network Devices.

  2. Select a network device to view. The following list provides a description of each section in the pane:
    • Events - Specifies the problem events that have occurred.
    • Jobs - Specifies the jobs that have occurred.
    • Type - Specifies whether the switch is an IBM Flex System EN4093 10Gb Ethernet Scalable Switch, IBM Flex System FC5022 16Gb SAN Scalable Switch, or IBM RackSwitch. G8264 device.
    • Description - Specifies the make and model of the network device.
    • Power status - Specifies whether the network device is powered on.
    • Status - Specifies whether the network device is running.
    • Location - Specifies the physical location in the system.
    • Firmware Level - Specifies the firmware level of the network device.
    • Software Version - Specifies the software level of the network device.
    • Model - Specifies the model number of the network device.
    • Network Ports - Specifies the total number of ports, how many ports are being used, the port state, the port identifier and information about port communication.

  3. Click View details in the Events field to view the warnings and critical events. The Events pane includes a table that lists all of the events that can be filtered by type, severity, category, and time interval.

    1. Filter the events. To filter by event text, type the text in the Enter text field. To filter by type, select All or a specific network device in the menu. To filter by time interval, select Time Interval. The Severity menu includes a list of severity levels to choose from, or you can select All. The Category menu lists the category for which the event is a member, including All, Alert, Resolution, Call support, and Customer serviceable.
    2. Comment on the event. Select the Comments icon in the Actions column. Enter the comment and click OK. You can also view all of the comments by clicking Expand all.
    3. Click the Show details icon in the Actions column to view a collective summary of the event.
    4. Export the events to your local system. Click the Export icon, and choose to export either a specified number of events or all filtered events. The exported XML file contains related event data, time stamps, comments, and so on.

  4. Expand Customer Ports to view the port status, speed, input and output.

  5. Expand Network Ports to view the port status, speed, input and output.