Understanding the Collection and Preservation of Digital Evidence. Forensic science is generally defined as the application of science to the law. Digital forensics, also known as computer and network forensics, has many definitions.
Generally, it is considered the application of science to the identification, collection, examination, and analysis of data while preserving the integrity of the information and maintaining a strict chain of custody for the data.
Data is refer to distinct pieces of digital information that have been formatted in a specific way. Organizations have an ever-increasing amount of data from many sources.
For example, data can be stored or transferred by standard computer systems, networking equipment, computing peripherals, smartphones, and various types of media, among other sources.
Because of the variety of data sources, digital forensic techniques can be used for many purposes, such as investigating crimes and internal policy violations, reconstructing computer security incidents, troubleshooting operational problems, and recovering from accidental system damage.
Practically every organization needs to have the capability to perform digital forensics.
Without such a capability, an organization will have difficulty determining what events have occurred within its systems and networks, such as exposures of protected, sensitive data.
Cloud Forensics Challenges
Working with the cloud has several forensics challenges:
- Control over data: In traditional computer forensics, investigators have full control over the evidence (such as router logs, process logs, and hard disks).
- In a cloud, the control over data varies by service model.
- Cloud users have the highest level of control in IaaS and the least level of control in SaaS.
- This physical inaccessibility of the evidence and lack of control over the system make evidence acquisition a challenging task in the cloud.
- Multitenancy: Cloud computing platforms can be a multitenant system, while traditional computing is a single-owner system.
- In a cloud, multiple VMs can share the same physical infrastructure; that is, data for multiple customers can be colocated.
- An alleged suspect may claim that the evidence contains information of other users, not just theirs. In this case, the investigator needs to prove to the court that the provided evidence belongs to the suspect.
- Conversely, in traditional computing systems, a suspect is solely responsible for all the digital evidence located in his computing system. Moreover, in the cloud, the forensics investigator may need to preserve the privacy of other tenants.
- Data volatility: Volatile data cannot be sustained without power. Data residing in a VM is volatile because once the VM is powered off, all the data is lost unless some form of image is used to capture the state data of the VM.
- To provide the on-demand computational and storage services required in the cloud, cloud service providers do not always supply persistent storage to VM instances.
- The chain of custody should depict how the evidence was collected, analyzed, and preserved to be presented as admissible evidence in court.
- In traditional forensic procedures, it is easy to maintain an accurate history of time, location, and persons accessing the target computer, hard disk, and so on, of a potential suspect.
- On the other hand, in a cloud, it is not obvious where a VM is physically located. Investigators can acquire a VM image from any workstation connected to the Internet.
- The investigator’s location and a VM’s physical location can be in different time zones. Hence, maintaining a proper chain of custody is much more challenging in the cloud.
- Evidence acquisition: Currently, investigators are completely dependent on CSPs for acquiring cloud evidence.
- However, the employee of a CSP, who collects data on behalf of investigators, is most likely not a licensed forensics investigator, so it is not possible to guarantee this person’s integrity in a court of law.
- A dishonest employee of a CSP can collude with a malicious user to hide important evidence or to inject invalid evidence into a system to prove the malicious user is innocent.
On the other hand, a dishonest investigator can collude with an attacker.
Even if CSPs provide valid evidence to investigators, a dishonest investigator can remove some crucial evidence before presenting it to the court or can provide some fake evidence to the court to frame an honest cloud user.
In traditional storage systems, only the suspect and the investigator can collude. The potential for three-way collision in the cloud certainly increases the attack surface and makes cloud forensics more challenging
Data Access Within Service Models
Access to data will be decided by the following:
- The service model
- The legal system in the country where data is legally stored
When using various service models, the CCSP can access different types of information, as shown in Table 5.9.
If the CCSP needs additional information from the service model that is being used, which is not specified in Table 5.9, she needs to have the CSP provide the required information.
In Table 5.9, the first column contains different layers that you might have access to when using cloud services.
Steps of digital forensics vary according to the service and deployment model of cloud computing that is being used.
Forensics Readiness
Many incidents can be handled more efficiently and effectively if forensic considerations have been incorporated into the information system lifecycle.
Examples of such considerations follow:
- Performing regular backups of systems and maintaining previous backups for a specific period
- Enabling auditing on workstations, servers, and network devices
- Forwarding audit records to secure centralized log servers
- Configuring mission-critical applications to perform auditing, including recording all authentication attempts
- Maintaining a database of file hashes for the files of common OS and application deployments and using file integrity–checking software on particularly important assets
- Maintaining records (such as baselines) of network and system configurations
- Establishing data-retention policies that support performing historical reviews of system and network activity, complying with requests or requirements to preserve data relating to ongoing litigation and investigations, and destroying data that is no longer needed
Proper Methodologies for Forensic Collection of Data
- Collection: Identifying, labeling, recording, and acquiring data from the possible sources of relevant data, while following procedures that preserve the integrity of the data
- Examination: Forensically processing collected data using a combination of automated and manual methods, and assessing and extracting data of particular interest, while preserving the integrity of the data
- Analysis: Analyzing the results of the examination, using legally justifiable methods and techniques, to derive useful information that addresses the questions that were the impetus for performing the collection and examination
- Reporting: Reporting the results of the analysis, which may include describing the actions used, explaining how tools and procedures were selected, determining what other actions need to be performed (such as forensic examination of additional data sources, securing of identified vulnerabilities, improvement of existing security controls), and providing recommendations for improvement to policies, procedures, tools, and other aspects of the forensic process
The following sections examine these phases in more detail.
Data Acquisition and Collection After identifying potential data sources, acquire the data from the sources.
Data acquisition should be performed using a three-step process:
- Develop a plan to acquire the data: Developing a plan is an important first step in most cases because there are multiple potential data sources.
- Create a plan that prioritizes the sources, establishing the order in which the data should be acquired. Important factors for prioritization include the following:
- Likely value: Based on your understanding of the situation and previous experience in similar situations, estimate the relative likely value of each potential data source.
- Volatility: Volatile data refers to data on a live system that is lost after a computer is powered down or due to the passage of time. Volatile data may also be lost as a result of other actions performed on the system.
- In many cases, volatile data should be given priority over nonvolatile data. However, nonvolatile data may also be somewhat dynamic (for example, log files that are overwritten as new events occur).
- Amount of effort required: The amount of effort required to acquire different data sources may vary widely.
- The effort involves not only the time spent by security professionals and others within the organization (including legal advisors) but also the cost of equipment and services (such as outside experts).
- For example, acquiring data from a network router probably requires much less effort than acquiring data from a cloud service provider
- Acquire the data: If the data has not already been acquired by security tools, analysis tools, or other means, the general process for acquiring data involves using forensic tools to collect volatile data, duplicating nonvolatile data sources to collect their data, and securing the original nonvolatile data sources.
- Data acquisition can be performed either locally or over a network. Although it is generally preferable to acquire data locally because there is greater control over the system and data, local data collection is not always feasible (such as for a system in a locked room or a system in another location).
- When acquiring data over a network, decisions should be made regarding the type of data to be collected and the amount of effort to use.
- For instance, it might be necessary to acquire data from several systems through different network connections, or it might be sufficient to copy a logical volume from just one system.
- Verify the integrity of the data: After the data has been acquired, its integrity should be verified. It is particularly important to prove that the data has not been tampered with if it might be needed for legal reasons.
- Data integrity verification typically consists of using tools to compute the message digest of the original and copied data and then comparing the digests to make sure they are the same.
- Note that before you begin to collect data, a decision should be made based on the need to collect and preserve evidence in a way that supports its use in future legal or internal disciplinary proceedings.
- In such situations, a clearly defined chain of custody should be followed to avoid allegations of mishandling or tampering with evidence.
- This involves keeping a log of every person who had physical custody of the evidence, documenting the actions that they performed on the evidence and at what time, storing the evidence in a secure location when it is not being used, making a copy of the evidence and performing examination and analysis using only the copied evidence, and verifying the integrity of the original and copied evidence.
- If it is unclear whether evidence needs to be preserved; by default, it generally should be.
Challenges in Collecting Evidence
The CCSP faces several challenges in the collection of evidence due to the nature of the cloud environment. You have already read about many of these in the
“Cloud Forensics Challenges” section earlier; however, they bear repeating here in the context of the collection phase to emphasize the issues and concerns that the CCSP must contend with.
Following are the main challenges with a collection of data in the cloud:
- The seizure of servers containing files from many users creates privacy issues among the multitenant homed within the servers.
- The trustworthiness of evidence is based on the CSP, with no ability to validate or guarantee on behalf of the CCSP.
- Investigators are dependent on CSPs to acquire evidence.
- Technicians collecting data may not be qualified for forensic acquisition.
- Unknown location of the physical data can hinder investigations.
One of the best ways for the CCSP to address these challenges is to turn to the area of network forensics for help and guidance.
Network forensics is defined as the capture, storage, and analysis of network events. The idea is to capture every packet of network traffic and make it available in a single searchable database so that the traffic can be examined and analyzed in detail.
Network forensics can uncover the low-level addresses of the systems communicating, which investigators can use to trace an action or conversation back to a physical device.
The entire contents of emails, IM conversations, web-surfing activities, and file transfers can be recovered and reconstructed to reveal the original transaction.
This is important because of the challenges with the cloud environment already noted, as well as some additional underlying issues.
Networks are continuing to become faster in terms of transmission speed. As a result, they are handling larger and larger volumes of data.
The increasing use of converged networks and the data streams that they make possible has led to data that is multifaceted and richer today than it has ever been.
(Think voice over IP [VoIP] and streaming HD video, as well as the metadata that comes with the content.)
Network forensics has various use cases:
- Uncovering proof of an attack
- Troubleshooting performance issues
- Monitoring activity for compliance with policies
- Sourcing data leaks
- Creating audit trails for business transactions
Collecting Data from a Host OS
Physical access is required to collect forensic evidence from a host. Due to the nature of virtualization technology, a VM that was on one host may have been migrated to one or more hosts after the incident occurred.
Additionally, the dynamic nature of storage may affect the collection of digital evidence from a host OS.
Collecting Data from a Guest OS
For guest OSs, a snapshot may be the best method for collecting a forensic image. Some type of write blocker should be in place when collecting digital evidence to prevent the inadvertent writing of data to the host or guest OS.
You can use various tools for the collection of digital evidence.
Consider presaging and testing of forensics tools as part of their infrastructure design for the enterprise cloud architecture.
Collecting Metadata
Specifically, consider the issue of metadata needs carefully. Whether to allow metadata or not is not a decision point any longer because metadata exists and is created by end-users at every level of the cloud architecture.
Be aware of the metadata that exists in the enterprise cloud, and have a plan and a policy for managing and acquiring it, if required.
This issue can become more complicated in multitenant clouds because the ability to isolate tenants from each other can influence the scope and reach of metadata.
If tenant isolation is not done properly, one tenant’s metadata may be exposed to others, allowing for “data bleed” to occur
Examining the Data
After data has been collected, the next phase is to examine the data, which involves assessing and extracting the relevant pieces of information from the collected data.
This phase may also involve the following:
- Bypassing or mitigating OS or application features that obscure data and code, such as data compression, encryption, and access control mechanisms
- Using text and pattern searches to identify pertinent data, such as finding documents that mention a particular subject or person or identifying email log entries for a particular email address
- Using a tool that can determine the type of contents of each data file, such as text, graphics, music, or a compressed file archive
- Using knowledge of data file types to identify files that merit further study, as well as to exclude files that are of no interest to the examination
- Using any databases containing information about known files to include or exclude files from further consideration
Analyzing the Data
The analysis should include identifying people, places, items, and events and determining how these elements are related so that a conclusion can be reached. Often, this effort includes correlating data among multiple sources.
For instance, a NIDS log may link an event to a host, the host audit logs may link the event to a specific user account, and the host IDS log may indicate what actions that user performed.
Tools such as centralized logging and security event management software can facilitate this process by automatically gathering and correlating the data. Comparing system characteristics to known baselines can identify various types of changes made to the system.
Reporting the Findings
The final phase is reporting, which is the process of preparing and presenting the information resulting from the analysis phase.
Many factors affect reporting, including the following:
- Alternative explanations: When the information regarding an event is incomplete, it may not be possible to arrive at a definitive explanation of what happened.
- When an event has two or more plausible explanations, each should be given due consideration in the reporting process. Use a methodical approach to attempt to prove or disprove each possible explanation that is proposed.
- Audience consideration: Knowing the audience to which the data or information will be shown is important. An incident requiring law enforcement involvement requires highly detailed reports of all information gathered and may also require copies of all evidentiary data obtained.
- A system administrator might want to see network traffic and related statistics in great detail. Senior management might simply want a high-level overview of what happened, such as a simplified visual representation of how the attack occurred, and what should be done to prevent similar incidents.
- Actionable information: Reporting also includes identifying actionable information gained from data that may allow you to collect new sources of information.
For example, a list of contacts may be developed from the data that can lead to additional information about an incident or crime.
Also, information might be obtained that can prevent future events, such as a backdoor on a system that can be used for future attacks, a crime that is being planned, a worm scheduled to start spreading at a certain time, or a vulnerability that can be exploited.