Performance monitoring is essential for the secure and reliable operation of a cloud environment.
Data on the performance of the underlying components may provide early indications of hardware failure
Traditionally, four key subsystems are recommended for monitoring in cloud environments:
- Network: Excessive dropped packets
- Disk: Full disk or slow reads and writes to the disks (input/output operations per second [IOPS])
- Memory: Excessive memory usage or full utilization of available memory allocation
- CPU: Excessive CPU utilization
Familiarize yourself with these four subsystems and learn about the vendor-specific monitoring recommendations, best practice guidelines, and thresholds for performance as required.
Although each vendor has specific thresholds and ranges for acceptable operation identified by area for their products and platforms, generally, for each of the four subareas identified, a lower value based on measurement over time indicates better performance. However, this is directly dependent on the specific parameters of the monitored item in question.
Cloud Outsourcing Monitoring
Adequate staffing should be allocated for the 24/7 monitoring of the cloud environment.
One option is to outsource the monitoring function to a trusted third party.
Exercise due care and due diligence if you’re pursuing an outsourcing option.
The need to assess risk and manage a vendor relationship in such a critical area for the enterprise means that you must take your time vetting potential cloud monitoring partners.
Use common-sense approaches such as these:
- Having HR check references
- Examining the terms of any SLA or contract being used to govern service terms
- Executing some form of a trial of the managed service in question before implementing into production
Cloud Hardware Performance Monitoring
In cloud environments, regardless of how much-virtualized infrastructure you deploy, there is always physical infrastructure underlying it that has to be managed, monitored, and maintained.
Extend your monitoring of the four key subsystems discussed in the previous section to include the physical hosts and infrastructure that the virtualization layer rides on top of.
The same monitoring concepts and thought processes apply, as have already been discussed.
The only difference to account for is the need to add some additional items that exist in the physical plane of these systems, such as CPU temperature, fan speed, and ambient temperature within the data center hosting the physical hosts.
Many of the monitoring systems to be deployed to observe virtualized infrastructure can be used to monitor the physical performance aspects of the hosts as well.
These systems can also be used to alert on thresholds established for performance based on several methods, whether activity or task-based, metric-based, or time-based.
Each vendor has its specific methodologies and tools to be deployed to monitor its infrastructure according to its requirements and recommendations.
Ensure that you are aware of the vendor recommendations and best practices pertinent to their environments and they are implemented and followed as required to ensure compliance
Redundant System Architecture
The use of redundant system architecture is an acceptable and standard practice in cloud environments to accomplish the following:
- Allow for additional hardware items to be incorporated directly into the system as an online real-time component
- Share the load of the running system or in a hot standby mode
- Allow for a controlled failover to minimize downtime Work with the vendors that supply the data center infrastructure to fully understand what the available options are for designing and implementing system resiliency through redundancy
Cloud Performance Monitoring Functions
Many hardware systems offer built-in monitoring functions specific to the hardware itself, separate from any centralized monitoring that the enterprise may engage in.
Be aware of what vendor-specific hardware system monitoring capabilities are already bundled or included in the platforms that they are asked to be responsible for.
The use of any vendor-supplied monitoring capabilities to their fullest extent is necessary to maximize system reliability and performance.
Hardware data should be collected along with the data from any external performance monitoring undertaken.
Monitoring hardware may provide early indications of hardware failure and should be treated as a requirement to ensure the stability and availability of all systems being managed.
Some virtualization platforms offer the capability to disable hardware and migrate live data from the failing hardware if certain thresholds are met.
You may need to work with other professionals in the organization on the networking and administration teams to fully understand and plan for the proper usage of these kinds of technology options