What is cloud security operations? - Cloud Security and Computing

Cloud security operations management there are many aspects and processes of operations that need to be managed, and they often relate to each other.

Cloud security operations management include the following:

Information security management

Configuration management

Change management

Incident management

Problem management

Release and deployment management

Service-level management

Availability management

Capacity management

Business continuity management (BCM)

Continual service improvement management

The following sections explore each of these types of management and then look more closely at how they relate to each other

Information Security

Organizations should have a documented and operational information security management plan that generally covers the following areas

Security management

Security policy

Information security organization

Asset management

Human resources security

Physical and environmental security

Communications and operations management

Access control

Information systems acquisition, development, and maintenance

Provider and customer responsibilities

What is Cloud security operations and Configuration Management?

Cloud security operations & Configuration Management

Configuration management aims to maintain information about CIs required to deliver an IT service, including their relationships.

As mentioned in the “Release and Deployment Management” section, there are lateral ties between many of the management areas discussed in this section.

All these lateral connections are extremely important because they form the basis for the mutually reinforcing web that is created to support the proper documentation and operation of the cloud infrastructure.

In the case of configuration management, the specific ties to change management and availability management are important to mention. You should develop a configuration-management process for the cloud infrastructure

The process should include policies and procedures for each of the following:

The development and implementation of new configurations that should apply to the hardware and software configurations of the cloud environment

Quality evaluation of configuration changes and compliance with established security baselines

Changing systems, including testing and deployment procedures, that should include adequate oversight of all configuration changes

The prevention of any unauthorized changes in system configurations

Change Management

Change management is an approach that allows organizations to manage and control the impact of change through a structured process.

The primary goal of change management within a project-management context is to create and implement a series of processes that allow changes to the scope of a project to be formally introduced and approved

Change Management Objectives

Change management has several objectives:

Respond to a customer’s changing business requirements while maximizing value and reducing incidents, disruption, and rework.

Respond to business and IT requests for change that aligns services with business needs.

Ensure that changes are recorded and evaluated.

Ensure that authorized changes are prioritized, planned, tested, implemented, documented, and reviewed in a controlled manner.

Ensure that all changes to CIs are recorded in the configuration management system.

Optimize overall business risk. It is often correct to minimize business risk, but sometimes it is appropriate to knowingly accept a risk because of the potential benefit.

Change Management Process

You should develop or augment a change-management process for the cloud infrastructure to address any cloud-specific components or components that may not have been captured under historical processes.

You may not be a change-management expert, but you do still bear responsibility for change and its impact on the organization.

To ensure the best possible use of change management within the organization, attempt to partner with the project management professionals (PMPs) who exist in the enterprise to incorporate the cloud infrastructure and service offerings into an existing change-management program if possible.

The existence of a project management office (PMO) is usually a strong indication of an organization’s commitment to a formal change-management process that is fully developed and broadly communicated and adopted.

A change-management process focused on the cloud should include policies and procedures for each of the following:

The development and acquisition of new infrastructure and software

Quality evaluation of new software and compliance with established security baselines

Changing systems, including testing and deployment procedures; should include adequate oversight of all changes

Preventing the unauthorized installation of software and hardware

Preventing the Unauthorized Installation of Software: Critical Security Control Implementation Example

The cloud security operations management professionals should be focused on all the change-management activities outlined previously and how they will be implemented for the cloud within the framework of the enterprise architecture.

At this point, you may be asking yourself, “What exactly does that mean, and just how am I supposed to do that?” Well, the topic of preventing the unauthorized installation of the software will be used as an example of how to answer those questions.

Although there are many acceptable ways to effectively implement a system that prevents the unauthorized installation of software, the need to do so in a documented and auditable manner is important.

To that end, the use of the CIS/SANS Critical Security Controls provides a well-documented solution that allows the cloud security operations management professionals to actively manage (inventory, track, and correct) all software on the network so that only authorized software is installed and can execute and that unauthorized and unmanaged software is found and prevented from installation or execution.

The cloud security operations management professionals needs to evaluate the nine mechanisms listed in Table 5.6 and decide which, if any, are relevant for use in the organization that she manages.

Once the mechanisms have been selected, she must devise a plan to evaluate, acquire, implement, manage, monitor, and optimize the relevant technologies involved.

The plan then must be submitted for approval to senior management to ensure that there is support for the recommended course of action, the allocated budget (if necessary), and the ability to ensure alignment with any relevant strategic objectives and business drivers that may be pertinent to this project.

Once senior management has approved the plan, the cloud security operations management professionals can engage in the various activities outlined, in the proper order, to ensure successful implementation of the plan according to the timeline specified and agreed to.

Once the plan has been successfully executed and the new systems are in place and operational, the cloud security operations management professionals must think about monitoring and validation to ensure that the system is compliant with any relevant security policies as well as regulatory requirements and that it is effective and operating as designed.

A critical element of this type of solution is the ability to highly automate many, if not all, of the monitoring and processes, as well as the resulting workflows that are generated when an unauthorized software installation is detected and blocked.

These objectives can be achieved as described in the following sections

CSC 2 Effectiveness Metrics

When testing the effectiveness of the automated implementation of this control, organizations should determine the following:

The amount of time it takes to detect new software installed on the organization’s systems

The amount of time it takes the scanning functions to alert the organization’s administrators when an unauthorized application has been discovered on a system

The amount of time it takes for an alert to be generated when a new application has been discovered on a system

Whether the scanning function identifies the department, location, and other critical details about the unauthorized software that has been detected

CSC 2 Automation Metrics

Organizations should gather the following information to automate the collection of relevant data from these systems:

The total number of unauthorized applications located on the organization’s business systems

The average amount of time it takes to remove unauthorized applications from the organization’s business systems

The total number of the organization’s business systems that are not running whitelisting software

The total number of applications that have been recently blocked from executing by the organization’s whitelisting software

The cloud security operations management professionals also needs to create some sort of ongoing, periodic sampling system that allows for the testing of the effectiveness of the system deployed in its entirety.

The specific approach to be used to achieve this is open to discussion, but the implemented solution should use a predetermined number of randomly sampled endpoints deployed in the production network and assess the responses generated by an unauthorized software deployment to them within a specified period.

As a follow-up, the automated messaging and logging generated by the unauthorized deployment need to be monitored and evaluated as well.

If failures are detected, these need to be logged and investigated.

A failure in this case is defined as successful deployment of the unauthorized software package to the targeted endpoint without notification being generated and sent, as well as logging of that activity taking place.

If blocking is not allowed or is unavailable, the cloud security operations management professionals must verify that unauthorized software is detected and results in a notification to alert the security team

Incident Management

Incident management describes the activities of an organization to identify, analyze, and correct hazards to prevent a future reoccurrence.

Within a structured organization, an incident response team (IRT) or an incident management team (IMT) typically addresses these types of incidents.

These are often designated beforehand or during the event and are placed in control of the organization while the incident is dealt with to restore normal functions.

Events Vs Incidents

According to the ITIL framework, an event is defined as a change of state that has significance for the management of an IT service or other CI.

The term can also be used to mean an alert or notification created by an IT service, CI, or monitoring tool. Events often require IT operations staff to take action and lead to incidents being logged.

According to the ITIL framework, an incident is defined as an unplanned interruption to an IT service or a reduction in the quality of an IT service

Purpose of Incident Management

Incident management has three purposes:

Restore normal service operation as quickly as possible

Minimize the adverse impact on business operations

Ensure service quality and availability are maintained

Objectives of Incident Management

Incident management has five objectives:

Ensure that standardized methods and procedures are used for the efficient and prompt response, analysis, documentation of ongoing management, and reporting of incidents

Increase visibility and communication of incidents to business and IT support staff

Enhance the business perception of IT by using a professional approach in quickly resolving and communicating incidents when they occur

Align incident management activities with those of the business

Maintain user satisfaction

Incident Management Plan

You should have a detailed incident management plan that includes the following:

Definitions of an incident by service type or offering

Customer and provider roles and responsibilities for an incident

Incident management process from detection to resolution

Incident management process from detection to resolution

Response requirements

Media coordination

Legal and regulatory requirements such as data breach notification

You may also want to consider the use of an incident management tool.

The incident management plan should be routinely tested and updated based on lessons learned from real and practice events.

Incident Classification

with the organization and customers to ensure that the correct criteria are used for incident identification and classification and that these criteria are well documented and understood by all parties to the system.

Incident prioritization is made up of the following items:

Impact = Effect upon the business

Urgency = Extent to which the resolution can bear delay

Priority = Urgency × Impact When these items are combined into

Example of an Incident Management Process

Incident management should be focused on the identification, classification, investigation, and resolution of an incident, with the ultimate goal of returning the affected systems to normal as soon as possible.

To manage incidents effectively, a formal incident management process should be defined and used.

Problem Management

The objective of problem management is to minimize the impact of problems on the organization by identifying the root cause of the problem at hand.

Problem management plays an important role in the detection of and providing of solutions to problems (workarounds and known errors) and prevents their reoccurrence.

A problem is the unknown cause of one or more incidents, often identified as a result of multiple similar incidents.

A known error is an identified root cause of a problem.

A workaround is a temporary way of overcoming technical difficulties (that is, incidents or problems).

It’s important to understand the linkage between the incident and problem management. In addition, you need to ensure there is a tracking system established to track and monitor all system-related problems.

The system should gather metrics to identify possible trends.

Problems can be classified as minor or major depending on several criteria.

Work with the organization and the customers to ensure that the correct criteria are used for problem identification and classification and that these criteria are well documented and understood by parties to the system

Release and Deployment Management

Release and deployment management aims to plan, schedule, and control the movement of releases to test and live environments.

The primary goal of release and deployment management is to ensure that the integrity of the live environment is protected and that the correct components are released.

Following are the objectives of release and deployment management

Define and agree upon deployment plans

Create and test release packages

Ensure the integrity of released packages

Record and track all release packages in the Definitive Media Library (DML)

Manage stakeholders

Check the delivery of utility and warranty (utility + warranty = value in the mind of the customer) 1 Utility is the functionality offered by a product or service to meet a specific need; it’s what the service does. 2

Warranty is the assurance that a product or service will meet agreed-upon requirements (SLA); it’s how the service is delivered.

Manage risks

Ensure knowledge transfer

New software releases should be done by the configuration management plan.

You should conduct security testing on all new releases before deployment. Release management is especially important for SaaS and PaaS providers.

You may not be directly responsible for release and deployment management and may be involved only tangentially in the process.

Regardless of who is in charge, the process must be tightly coupled to change management, incident and problem management, and configuration and availability management, and the help desk.

Service Level Management

Service-level management aims to negotiate agreements with various parties and to design services by the agreed-upon service-level targets.

Typically negotiated agreements include the following:

SLAs are negotiated with the customers.

Operational-level agreements (OLAs) are SLAs negotiated between internal business units within the enterprise.

Underpinning contracts (UCs) are external contracts negotiated between the organization and vendors or suppliers.

Ensure that policies, procedures, and tools are put in place so the organization meets all service levels as specified in their SLAs with their customers.

Failure to meet SLAs can have a significant financial impact on the provider. The legal department should be involved in developing the SLA and associated policies to ensure that they are drafted correctly.

Availability Management

Availability management aims to define, analyze, plan, measure, and improve all aspects of the availability of IT services.

Availability management is responsible for ensuring that all IT infrastructure, processes, tools, roles, and so on, are appropriate for the agreed-upon availability targets.

Systems should be designed to meet the availability requirements listed in all SLAs.

Most virtualization platforms allow for the management of system availability and can act in the event of a system outage (that is, failover running guest OSs to a different host).

Capacity Management

Capacity management is focused on ensuring that the business IT infrastructure is adequately provisioned to deliver the agreed service-level targets in a timely and cost-effective manner.

Capacity management considers all resources required to deliver IT services within the scope of the defined business requirements. Capacity management is a critical function.

The system capacity must be monitored and thresholds must be set to prevent systems from reaching an over-capacity situation

Business Continuity Management

Business continuity management (BCM) is focused on the planning steps that businesses engage in to ensure that their mission-critical systems can be restored to service following a disaster or service interruption event.

To focus the BCM activities correctly, a prioritized ranking or listing of systems and services must be created and maintained. This is accomplished through the use of a business impact analysis (BIA) process.

The BIA is designed to identify and produce a prioritized listing of systems and services critical to the normal functioning of the business.

Once the BIA has been completed, the cloud security operations management professionals can go about devising plans and strategies that will enable the continuation of business operations and the quick recovery from any type of disruption.

Comparing BC and BCM

It is important to understand the difference between BC and BCM:

BC is defined as the capability of the organization to continue the delivery of products or services at acceptable predefined levels following a disruptive incident.

BCM is defined as a holistic management process that identifies potential threats to an organization and the impacts to business operations those threats, if realized, might cause.

It provides a framework for building organizational resilience with the capability of an effective response that safeguards the interests of its key stakeholders, reputation, brand, and value-creating activities.

Continuity Management Plan

A detailed continuity management plan should include the following:

Required capability and capacity of backup systems

Trigger events to implement the plan

Clearly defined roles and responsibilities by name and title

Clearly defined continuity and recovery procedures

Notification requirements The plan should be tested at regular intervals

Continual Service Improvement Management

Metrics on all services and processes should be collected and analyzed to find areas of improvement using a formal process. You can use various tools and standards to monitor performance.

One example is the ITIL framework.

The organization should adopt and utilize one or more of these tools.

How Management Processes Relate to Each Other

It is inevitable in operations that management processes will have an impact on each other and interrelate.

The following sections explore some of how this happens.

Release and Deployment Management and Change Management

Release and deployment management need to be tied to change management because change management must approve any activities that release and deployment management will be engaging in before the release.

In other words, change management must approve the request to carry out the release, and then deployment management can schedule and execute the release.

Release and Deployment Management Role and Incident and Problem Management

Release and deployment management is tied to the incident and problem management because if anything were to go wrong with the release, incident and problem management would need to be involved to fix whatever went wrong.

This is typically done by executing whatever rollback or back-out plan may have been created along with the release for just such an eventuality.

Release and Deployment Management and Configuration Management

Release and deployment management is tied to configuration management because once the release is officially live in the production environment, the existing configurations for all systems and infrastructure affected by the release have to be updated to accurately reflect their new running configurations and status within the configuration management database (CMDB).

Release and Deployment Management Is Related to Availability Management

Release and deployment management is tied to availability management because if the release were not to go as planned, any negative impacts on system availability would have to be identified, monitored, and remediated as per the existing SLAs for the services and systems affected.

In addition, once the release was officially “live” in the production environment, the impact against the existing systems and infrastructure affected by the release would have to be monitored to accurately reflect their new running status to ensure compliance with all SLAs.

Release and Deployment Management and the Help Desk

Release and deployment management is tied to the help desk because the communication around the release and the status updates need to be centrally coordinated and managed.

Configuration Management and Availability Management

Configuration management is tied to availability management. If an existing configuration were to have negative impacts on system availability, it would have to be identified, monitored, and remediated as per the existing SLAs for the services and systems affected.

In addition, any changes to existing system configurations would have to be monitored to accurately reflect their new running status to ensure compliance with all SLAs

Configuration Management and Change Management

Configuration management must be tied to change management because change management has to approve modifications to all production systems before them taking place.

In other words, there should never be a change that is allowed to take place to a CI in a production system unless change management has approved the change first.

Service-Level Management and Change Management

Service-level management has to be tied to change management because change management must approve changes to all SLAs as well as ensure that the legal function has a chance to review them and offer guidance and direction on the nature and language of the proposed changes before they taking place.

In other words, there should never be a change that is allowed to take place to an SLA that governs a production system unless change management has approved the change first.

Incorporating Management Processes

There are traditional business cycles or rhythms that all businesses experience. Some are seasonal; some are cyclical based on a variety of variables.

Whatever the case, be aware of these business cycles to work with capacity management as well as change, availability, incident and problem, service level and release, and deployment management to ensure that the appropriate infrastructure is always provisioned and available to meet customer demand.

An example of this is a seasonal or holiday-related spike in system capacity requirements for web-based retailers.

Another example is a spike in bandwidth and capacity requirements for streaming media outlets during high-profile news or sporting events, such as the World Cup, the Olympics, and the NBA playoffs.