Monitoring Application Server Performance: Strategies For Success

how to monitor application server performance

Monitoring application server performance is crucial to ensure optimal performance and prevent disruptions to your business. It involves tracking various metrics, such as CPU usage, memory consumption, disk activity, and network utilization, to ensure smooth operation and easily identify bottlenecks.

This process becomes more challenging as server infrastructure and surrounding networks become more complex and dispersed. A successful monitoring strategy requires identifying key performance indicators (KPIs) specific to the server's function, establishing baselines for these KPIs, and utilising tools to collect and analyse data.

Additionally, monitoring server performance helps with capacity planning, cybersecurity threat detection, and understanding system resource usage, enabling better management of server capacity.

Several tools are available, such as Datadog, Paessler PRTG Network Monitor, and ManageEngine Applications Manager, which offer features like anomaly detection, application dependency mapping, and automated device discovery. These tools provide valuable insights to optimise server performance and prevent issues.

shundigital

Monitoring CPU usage

Understanding CPU Usage

CPU usage is the percentage of time the CPU is being used to complete its tasks. CPUs can be in three states: idle, busy, or waiting for Input/Output (I/O). In a single-processor system, the CPU is rarely idle as it executes operating system tasks even when no applications are running. In a multi-processor system, you may find idle processors if only a few processors are being used. The busy state is when the CPU executes a process, and the waiting state is when the CPU waits to receive or output data from/to another component.

Tools for Monitoring CPU Usage

Most operating systems have built-in tools for monitoring CPU usage. For instance, Windows users can use the Task Manager or the Resource Monitor to view CPU consumption and identify processes or applications causing high CPU usage. Additionally, the Performance Monitor can be used to collect performance counter logs during high CPU usage. For more advanced capabilities, third-party monitoring tools like Scalyr are recommended. Scalyr provides live tailing, log parsers, real-time alerts, and support for various data sources.

Best Practices for Monitoring CPU Usage

  • Monitor continuously: Checking CPU usage at a single point in time is not sufficient. Continuously monitor CPU usage to identify any spikes or sudden changes, which may indicate an issue.
  • Store and analyse data: Collect CPU usage data at regular intervals and visualise it using graphs. Look for trends and anomalies to identify potential problems.
  • Optimise code: If CPU usage is high even for short, simple programs, optimise the code to improve efficiency and reduce CPU usage.
  • Manage resources: Ensure that CPU usage is not too low or too high. Low CPU usage indicates wasted resources, while very high usage may slow down the system.
  • Address high CPU usage: If CPU usage is consistently high (e.g., 80% or greater), identify and address the cause to improve system performance. Common causes include processes with high CPU requirements, background processes, malware, and unoptimised code.

shundigital

Tracking disk I/O metrics

Disk Busy Time

Disk busy time measures the percentage of time the disk is active. A high value here indicates that your requests to access the disk are piling up, which can impact the performance of the server. Monitoring this metric helps identify potential delays in processing requests.

Input/Output Operations (I/Ops)

I/Ops indicate the workload on the disk drive. By monitoring this metric, you can understand how much work your disk is performing. High I/Ops can suggest that your server is handling a heavy workload, which may impact its responsiveness.

Disk Read/Write

This metric measures the time taken to read or write blocks of data from the disk. Ideally, you want a lower value here, as it indicates that the disk can read and write data quickly. Slow read/write speeds can impact the overall performance of the server.

Disk Queue Length

The disk queue length measures the time taken to service a request in a queue. For optimal performance, you want to keep this metric as minimal as possible. Long queue lengths suggest that your server is struggling to keep up with the incoming requests, leading to potential delays.

Tools for Tracking Disk I/O

There are several tools available to help you track disk I/O metrics:

  • Resource Monitor: The Disk tab in Resource Monitor (available in Windows 7/Vista) provides detailed information about disk I/O.
  • Process Explorer: This tool allows you to add relevant I/O column counters to view disk I/O by application.
  • Process Monitor: Part of the Windows Sysinternals suite, Process Monitor can track process I/O activity to the file system, registry, or network.
  • Disk Monitor: Another tool from Windows Sysinternals, Disk Monitor focuses on monitoring disk I/O by request type, sectors, duration, etc.
  • PERFMON: This tool can show process and data I/O operations per second, similar to Process Monitor.

By utilising these tools and tracking the aforementioned metrics, you can effectively monitor disk I/O performance and identify any areas that may require optimisation or upgrades.

shundigital

Measuring network latency

There are several methods and tools available to measure network latency:

  • Ping and Traceroute: These are simple latency tests that can be performed from a PC to test internet and local network latency. Ping measures the round-trip time (RTT) between the computer and the target server, providing information about the total latency. Traceroute helps identify intermediate routers between the source and destination, giving more insight into the origin of network latency issues.
  • OWAMP (One-Way Active Measurement Protocol): OWAMP provides more precise latency measurements by testing latency in one direction using UDP packets. It requires OWAMP testing capabilities at both ends of the network connection and proper clock synchronization.
  • TWAMP (Two-Way Active Measurement Protocol): TWAMP tests latency in both directions simultaneously, first establishing a connection using TCP and then monitoring latency using UDP packets. Similar to OWAMP, it also requires testing capabilities at both ends.
  • IPerf: iPerf is used for network speed testing and primarily measures throughput and packet transmission rather than latency.
  • Network Analyzers: Tools like Wireshark, Microsoft Network Monitor, Colasoft Capsa, Nmap, and NetWitness provide detailed insights into network traffic and can help identify patterns and trends related to latency issues.
  • Network Performance Monitoring Tools: These tools offer continuous latency monitoring, real-time alerts, accurate diagnostics, and troubleshooting assistance. They provide a comprehensive view of network performance and help ensure SLA compliance.

It is important to choose the right tool or combination of tools that align with specific needs and requirements when measuring network latency.

shundigital

Analysing server logs

Centralised Log Management:

Use a centralised log management tool to aggregate logs from multiple sources, including application servers, databases, and web servers. Tools like ManageEngine EventLog Analyzer offer near real-time log collection and parsing, making it easier to analyse large volumes of log data.

Log Analysis and Pattern Recognition:

Analyse log data to identify patterns and trends that indicate system performance issues. Look for slow response times, memory issues (such as out-of-memory errors or memory leaks), deadlocks, high resource usage, and database queries. Set up alerts and notifications for critical events to take prompt action.

Security and Compliance:

Server logs can help detect security threats and ensure compliance with regulatory mandates. Monitor logs for suspicious activities, such as sudden increases in web traffic or SYN requests, which could indicate a denial-of-service attack. Comply with standards like PCI DSS, FISMA, HIPAA, and more using predefined reports and alerts.

Application Performance Metrics:

Monitor application performance metrics such as requests per second, average and peak response times, hardware utilisation, thread count, disk usage, CPU usage, memory usage, and network bandwidth. These metrics help identify bottlenecks and optimise application performance.

User Experience Metrics:

In addition to application performance, consider user experience metrics such as server uptime and HTTP server error rate. Aim for high uptime percentages (e.g., 99.9%) and minimise internal server errors to ensure a seamless user experience.

Visualisation and Dashboards:

Use charts and dashboards to visualise log data and monitor application performance. Tools like SolarWinds Loggly offer pre-configured and custom dashboards to monitor specific applications or sections of your application stack. This enables collaboration and facilitates reporting to stakeholders.

By following these guidelines and utilising appropriate tools, you can effectively analyse server logs to optimise application server performance, troubleshoot issues, and enhance the overall user experience.

shundigital

Monitoring hardware utilisation

  • CPU and Memory Monitoring: Keep a close eye on the CPU utilisation and memory resources. High CPU usage and low free memory space can lead to degraded application performance. Monitor metrics such as CPU process count, CPU thread count, CPU interrupt time, and available free memory. These metrics will help identify resource usage issues and fix them promptly.
  • Disk Activity Analysis: Analyse disk activity metrics such as disk busy time, input/output operations (I/O), disk read/write speeds, and disk queue length. High disk busy time and I/O operations indicate a heavy workload on the disk drive. Monitoring these metrics can help identify potential bottlenecks and ensure smooth data processing.
  • Server Uptime and Availability: Continuously monitor server uptime to ensure the system remains operational. Compare the expected server availability period with the server uptime figure to identify any system failures or disruptions. This helps maintain high availability and minimise downtime.
  • Network Interface Monitoring: Monitor the network interface for bandwidth usage, throughput, and potential bottlenecks. Keep track of input/output (I/O) activities on the network card to detect hardware failures or overloading issues. This information is crucial for optimising server performance and ensuring efficient data transmission.
  • Hardware Utilisation Tools: Utilise tools such as ManageEngine OpManager, which can monitor hardware utilisation on both Windows and Linux servers. OpManager provides insights into CPU, memory, and disk utilisation, helping identify performance bottlenecks. It also offers customisable dashboards and alerts to keep you informed about the server's health.
  • Proactive Monitoring and Notifications: Adopt a proactive approach to hardware utilisation monitoring. Set up automated alerts to notify you when specific thresholds are met or when resources are running low. For example, you can configure alerts for high CPU usage or low memory availability. These alerts can be sent via email or SMS, allowing you to take timely action.
  • Virtual Infrastructure Tracking: In addition to physical hardware, monitor the activities of virtual infrastructure, especially if you're using virtual machines (VMs). Tools like OpManager can identify virtualisations and map their infrastructure, including dependent VMs. This ensures that you have visibility into all aspects of your server environment.
  • Contextual Analysis: When monitoring hardware utilisation, consider the context of your server environment. Evaluate the executed processes and regularly performed tasks to identify any patterns or correlations with performance issues. This can help in optimising resource allocation and troubleshooting.

Frequently asked questions

The key components of a successful server monitoring strategy are to identify the key metrics to target, baseline the metrics so that server performance is properly interpreted for alerting, and reap additional value from the key metrics via reporting.

Some examples of KPIs include CPU, Memory, Disk, Network and Process utilization, Database Instance Activity, Query behaviour, User Activity, Network Bandwidth, Busy and Idle Threads, and Throughput.

You can monitor the performance of an application server by tracking resource issues and transaction volume.

Written by
Reviewed by
Share this post
Print
Did this article help you?

Leave a comment