Monitoring Broadcom Switches: A Comprehensive Guide

how to monitor broadcom switches

Monitoring the health of Broadcom switches is important to ensure optimal performance and identify any potential issues. The Brocade Fabric OS Web Tools dashboard provides several widgets that offer insights into the physical health of the switch, including fan, power, and temperature status. A summary view is also available, providing a high-level health status with the option to investigate further if needed. The Switch Status Policy category allows you to define the number and types of errors that indicate an unhealthy switch, such as power supply failures or temperature issues. By effectively monitoring Broadcom switches, administrators can proactively address any problems and maintain a stable network environment.

Characteristics Values
Switch health monitoring Brocade® Fabric OS® Web Tools User Guide, 9.1.x
Switch health parameters Number of problematic fans, power supply failures, temperature thresholds, faulty core blades, percentage of ports with errors, installation with expired switch certificates, faulty blades, hardware-related port faults, flash usage, sync status, missing SFP media, faulty WWN cards, mismatched airflow of fans, SFP thresholds, marginal ports, faulty ports, error ports, missing SFP transceivers
Switch status report Connect to the switch and log on using an admin account, run mapsdb --show with no other parameters to display the summary status
Switch configuration Erase startup configuration, set service port protocol to none, assign IP address to the service port, configure SSH server, generate keys, enable SSH server, configure domain and name server, configure time zone and time synchronization, configure switch name, save configuration

shundigital

Monitoring the physical health of the switch

One of the critical components to monitor is the fans. The Fan widget in the dashboard displays the number of healthy, faulty, and absent fans in the chassis. By clicking on the widget, administrators can access detailed information about the fans, including the fan number or fan FRU number, which can consist of one or more fans. This information helps identify any issues with airflow and cooling in the switch.

Power supply monitoring is another essential aspect of switch health. The Power widget in the dashboard indicates the number of healthy, faulty, and absent power supplies. Clicking on the widget, especially the red section, provides additional information about any faulty power supplies. This feature assists in promptly identifying and addressing power-related issues.

Temperature management is also crucial for switch health. The Temperature widget displays the overall temperature of the chassis, which can be viewed in Fahrenheit or Celsius. For switches, each bar in the graph represents a single thermal sensor, and hovering over a bar shows the exact temperature. Clicking on a bar provides detailed information about the temperature status, helping administrators ensure the switch operates within optimal temperature ranges.

In addition to the widgets, the Brocade Fabric OS MAPS User Guide offers a summary switch status report. This report provides a high-level health status, including information on port health, BE port health, GE port health, FRU health, security violations, fabric state changes, switch resource utilisation, traffic performance, and extension health. This comprehensive report enables administrators to quickly identify any areas requiring further investigation or attention.

By leveraging the tools and features provided by Brocade Fabric OS, administrators can effectively monitor the physical health of Broadcom switches, ensuring optimal performance, identifying potential issues early on, and minimising downtime due to hardware failures.

shundigital

Viewing a summary switch status report

To view a summary switch status report, you need to connect to the switch and log in with an account that has admin permissions. Once you have done this, you can use the command mapsdb --show with no other parameters to display the summary status.

This will show you the general status of the switch, such as whether it is healthy, marginal, or critical. It will also list the overall status of the monitoring categories for the current day (since midnight) and for the last seven days. If any categories are shown as out of range, the last five rules that caused this status will be listed. If a monitoring rule is triggered, the corresponding RASLog message will appear under the "Rules Affecting Health" section of the dashboard.

For example, the following display indicates that the switch health is marginal:

> switch:admin> mapsdb --show all 1 Dashboard Information: ======================= DB start time: Thu Feb 20 15:28:01 2021 Active policy: dflt_aggressive_policy Configured Notifications: RASLOG,SNMP,SW_CRITICAL,SW_MARGINAL,SFP_MARGINAL,SDDQ Fenced Ports : None Decommissioned Ports : None Fenced circuits : None Quarantined Ports : 3/20,3/45,3/46,4/0,4/19,4/20 Top Zoned PIDs : 0x731400(21) 0x734c00(21) 0x734900(4) 0x735400(1) 0x735300(1) 2 Switch Health Report: ======================= Current Switch Policy Status: MARGINAL

You can also view historical data on a switch by entering mapsdb --show history. This will show you a summarized status history of the switch since midnight, including front-end, back-end, and GE ports (if present).

shundigital

Switch status policy

The Switch Status Policy category in the Brocade Fabric OS MAPS User Guide allows you to monitor the health of your switch by defining the number and types of errors that will transition the overall switch state into an unhealthy state.

The following parameters are monitored and affect the health of the switch:

  • Number of problematic fans (missing or faulty)
  • Number of power supply threshold issues (absent, faulty, or in the wrong slot for redundancy)
  • Number of temperature threshold issues (faulty temperature sensors)
  • Number of faulty core blades (modular switches only)
  • Percentage of ports with errors (fenced, decommissioned, or segmented due to security violations)
  • Installation details with expired switch certificates
  • Number of faulty blades (modular switches only)
  • Percentage of hardware-related port faults
  • Percentage of flash usage by the system (e.g., faulty SFPs or laser FTL)
  • System sync status
  • Percentage of physical ports, E_Ports, and F_Ports exceeding threshold settings (optical and copper)
  • System temperature
  • Percentage of ports missing SFP media
  • Number of faulty WWN cards (modular switches only)
  • FAN_AIRFLOW_MISMATCH (mismatched airflow of fans)
  • Percentage of SFPs exceeding threshold settings

The marginal ports, faulty ports, error ports, and missing SFP transceivers are calculated as a percentage of the physical FC ports, excluding logical ports, FCoE_Ports, and VE_Ports.

You can customise the switch status policy by cloning an existing default policy and adding your own rules. When creating a custom rule, you must consider the constraints associated with each rule parameter, which are detailed in the Monitoring Systems Support Matrix.

An example of a default rule is one that enables the switch to enter a marginal state due to temperature issues, a faulty blade, or a faulty port. The rule details include the rule name, condition, actions, and associated policies. The associated RASLog message is also generated when a faulty blade or port occurs.

shundigital

Fan health

Understanding Fan Health

Fans play a crucial role in keeping the switch's physical circuits at optimal temperatures by cooling the processor. Monitoring fan health helps ensure the switch operates within a safe temperature range and prevents potential overheating issues.

Monitoring Fan Status

The Brocade Fabric OS Web Tools dashboard provides a dedicated fan widget that displays detailed information about the fans in the chassis. This widget indicates the number of healthy, faulty, and absent fans. By clicking on the widget, you can view additional details, such as the fan number or fan FRU (Field-Replaceable Unit) number, which can include one or more fans.

Fan Status Indicators

Fan status is typically indicated using different terms, such as "OK," "Faulty," "Up," "Down," or "Warning." For example, in Cisco switches, an "OK" status indicates ideal fan operation, while a "Faulty" status signifies a problem with the fan. Similarly, an "Up" status means the fan is functioning, while a "Down" status indicates a non-functioning fan. A "Warning" status may suggest potential issues that require further investigation.

Troubleshooting Fan Issues

In some cases, you may encounter discrepancies between the physical inspection of the fans and the status reported by monitoring tools. For instance, the PRTG sensor may show an alarm for the fan status even though the fans are operating normally. This could be due to bugs or specific scenarios, such as a missing redundant power supply fan. Referring to the manufacturer's documentation or community forums can provide insights into troubleshooting such issues.

Various tools are available to monitor fan health in switches. Broadcom's Brocade Fabric OS Web Tools dashboard offers a comprehensive overview of switch health, including fan status. Additionally, SNMP (Simple Network Management Protocol) can be used to monitor the state of fans on Cisco devices, providing status indicators such as "unknown," "up," "down," or "warning."

Should You Buy a New Monitor?

You may want to see also

shundigital

Power supply health

To monitor the health of your power supply in Broadcom switches, you can utilise the Brocade Fabric OS Web Tools dashboard. This dashboard offers a range of widgets that provide insights into the physical health of your switch, including power supply health.

The Power widget specifically allows you to see the number of healthy, faulty, and absent power supplies in the chassis. By clicking on the widget, you can access additional information about the power supplies. For instance, if you click on the red section, you will be able to view details about any faulty power supplies.

To further investigate the health of your power supply, you can also view a Summary Switch Status Report. To do this, connect to the switch and log in using an account with admin permissions. Then, use the command "mapsdb --show" without any other parameters to display the summary status. This report will provide a high-level overview of the switch's health, including the general status and the monitoring categories for the current day and the last seven days.

Additionally, you can set up monitoring and alerts for switch power supply health using tools like SolarWinds Network Performance Monitor (NPM). This involves enabling hardware health monitoring and creating alerts to receive notifications, such as emails, when specific conditions are triggered.

For Cisco switches, you can also use SNMP (Simple Network Management Protocol) to monitor the power supply status. By walking the CISCO-ENVMON-MIB, you can access specific OIDs (Object Identifiers) related to the power supply, temperature, and fan status of the switches in the stack.

Frequently asked questions

The Brocade Fabric OS Web Tools User Guide provides a dashboard with several widgets that allow you to monitor the physical health of your switch. The widgets include a fan widget, a power widget, and a temperature widget.

To view a summary switch status report, you need to connect to the switch and log on using an account with admin permissions. Then, you can use the command "mapsdb --show" with no other parameters to display the summary status.

The summary switch status report provides a high-level health status of the switch, including information such as the general status of the switch, the overall status of monitoring categories for the current day and the last seven days, and any monitoring rules that have been triggered.

Written by
Reviewed by
Share this post
Print
Did this article help you?

Leave a comment