How Data is Stored *
Reading and Writing Digital Data *
Why is Deleted Not Always Deleted? *
The Process for Recovering Electronic Evidence *
Searching Digital Evidence *
Advantages of Digital Evidence *
Effective Strategies for Electronic Discovery *
Steps to Preserve (or Destroy) Electronic Evidence *
Summary *
How Hard Drives Work
A hard drive (such as the "C" drive) contains hard round flat disks (known as platters) coated on both sides with a magnetic material designed to store information as binary numbers (magnetic patterns of 0’s and 1’s). The platters are mounted on a spindle that rotates at high speed, generally 5,000 to 10,000 rpm.
Electromagnetic read/write devices, known as ‘heads,’ mounted on sliders and connected to an actuator arm, are positioned over the surface of the disk. A logic board controls the motion of the heads, the process for reading and writing data, and the protocol for communicating with the rest of the computer.
Conceptually, the inside of a hard drive is similar to the inside of an old jukebox, with the record being the platter, the jukebox tone arm being the hard drive’s actuator arm, and the needle being the read/write heads.
How Data is Stored
The surface of each platter can hold tens of billions of individual ‘bits’ of data. Groups of bits, of either a 0 or 1, could be eight, sixteen, or thirty-two bits in length. These bits form a ‘byte,’ representing an alphabetical character or numerical number. Most desktop and server hard drives are 3.5 inches in diameter and notebook PCs have 2.5-inch and 1.8 inch drives. A large capacity drive, measured in gigabytes (GB), can store hundreds of billions of individual bits of data and is commonly available for under $200.
Each platter has two surfaces capable of holding data (the top and bottom), and each surface has a read/write head. Thus on a hard drive with three platters, there are a total of six ‘surfaces’ with information being read by six heads.
The recording surface of each platter is divided into concentric tracks (circles), and the vertical area of similar tracks on multiple platters is referred to as a "cylinder." These tracks are further subdivided into sectors and clusters, which are groups of sectors. The logical organization of information on the platter is similar to slices of a pie. Data is stored in all sectors of each track, except parts of the outside track, which is generally reserved for the file allocation table (FAT) directory. The FAT contains the file names and the locations of active files on the disk. The file allocation table tells the computer’s operating system which sectors (the "geographic location") contain data. A sector typically will hold 512 bytes of data (about the length of this sentence), plus "address" information used by the drive controller circuitry. There can be over 40 million sectors on a 20 GB hard drive. "Formatting" a hard drive is the process by which the disk surface is organized into tracks and sectors.
Sectors are also grouped sequentially into clusters, and generally there are 32 sectors per cluster. More often than not, data is stored sequentially in sectors within the clusters.
Reading and Writing Digital Data
When a user clicks on a file to open it, the application being used passes the file name to the computer operating system, which consults the FAT to determine the address (platter track and sector) where the first portion of the file is located. The operating system transmits this information to the disk controller, which positions the heads on the actuator arm over the correct physical location. The initial cluster will contain the address of subsequent sectors from which the controller must retrieve data. The controller retrieves the packets of data and reassembles them in the correct order before sending the ‘file’ to the central processing unit (CPU) for display on the screen.
Disk systems, unlike tape, do not store records together physically. With tape, each time a change is made to a block of data, such as an insertion in a text file, the entire block of "data" is rewritten onto the tape with the new data incorporated. When a similar change is made to text stored on disk, the original file usually remains intact. The disk-controller checks the file allocation table for the location of an unallocated cluster (a group of sectors available to store data), and inserts the data there.
Thus the various parts of a file, such as this article, can be scattered randomly among hundreds of sectors and clusters on various tracks. [Hence the term, random access device, meaning a drive that can retrieve or store data in any order to any location on the disk. Sequential access devices, such as backup tapes, store data in sequential order, and are unable to retrieve data as quickly.]
Allocated clusters contain data that is "active" according to the file allocation table. Unallocated clusters may contain data, but in storage space that the computer is no longer using for active files (see Deleted Files below). Thus, although unallocated clusters frequently contain "residual" data, this space will be randomly used (overwritten) to store new active data.
Why is Deleted Not Always Deleted?
When a user deletes a file, the operating system only deletes the first letter of the file name from the file allocation table, and reports the sectors containing the "deleted" data as "empty," or available for the storage of new data. However the old data remains unchanged and "intact" until new data is stored in the specific sector and cluster containing the "residual" data. It is during this process of ‘overwriting’ new data into the sectors containing the old data that the residual data is truly deleted. However, since data is randomly stored into the millions of potentially available sectors, it is unusual for all sectors containing a file to be overwritten with new data. This provides the opportunity for portions of deleted files to be recovered from "unallocated" clusters long after the user has deleted the file from the computer.
The Process for Recovering Electronic Evidence
There are two primary steps for recovering electronic data. The first is the "acquisition" of the target media, and the second is a forensic byte-by-byte analysis of the data.
Utilizing special computer forensic tools the target media is acquired through a non-invasive procedure by making a complete sector-by-sector bit-stream mirror image. During the imaging process, it is critical that the mirror image of the target drive be acquired in a DOS environment. Turning on the computer and booting into the operating system (usually Windows) will subtly modify the file system and destroy some potentially recoverable evidence.
The resulting image becomes the "evidence file," which is mounted as a read-only or "virtual" file, on which the forensic examiner will perform an analysis of the data. The forensics software used by OnlineSecurity will create a evidence file that will be continually verified by a Cyclical Redundancy Checksum ("CRC") algorithm for every 64 sectors (block) of data and a by a MD5 128 bit encryption hash file for the entire image. Both steps verify the integrity of the evidence file, and confirm the image remains unaltered and forensically intact, and that critical date and time stamps remain unchanged. (Under MD5 hash encryption, changing one bit of one byte of data will result in a notice stating that the evidence file data has been changed and that the evidence is no longer forensically intact.)
Searching Digital Evidence
Specialized forensic software provides several methodologies for searching the evidence file. Multiple pieces of evidence, for example 2 hard drives, a floppy and a multiple session CD-ROM, can be searched, sorted, and analyzed simultaneously.
A Windows Explorer view displays the files and folders of the target hard drive in an easy to browse format. Each file is displayed in a spreadsheet format where the files can be sorted and filtered under numerous fields. The examiner can designate which files to include in this view, such as files from a single folder or a single volume. A preview pane displays any the highlighted files allowing the examiner to easily scroll through individual files. A hex/text viewer shows the contents of any file, with the file slack – portions of unallocated clusters - shown in red. Search hits are highlighted automatically.
Forensic search utilities are used to search for keywords in order to locate relevant documents. These searches will locate any "bytes" of data matching the search term. The development of effective search terms is a critical component of recovering digital evidence and will be a major factor in the success of the forensic investigation. As an example, searching for the word "info" may locate tens of thousands of hits where the letters "info" where used in a file or line of code. Redefining the search for "info@OnlineSecurity.com" will help narrow the number of responses. Reviewing the hits from the word keyword searches consumes a major portion of the time resources necessary for searching digital evidence. Narrowing the search to terms or phrases that are unique to the specific case situation will enhance the results and reduce the cost.
Forensic software will also locate drafts of documents, back-up files (.bak, .wbk), temporary files (.tmp), cache files, autosaves, registry data and residual data. Advance searches can be conducted for "general formats" such as telephone numbers, network ID, logon records, or Internet protocol addresses (IP numbers), even when the specific number is not known
Time and date stamps, access logs and recycle bin activity are often a critical focus of the investigation and can be recovered. Files (but not residual data) can be sorted by creation date, last accessed, or last saved. Swap files and file slack, which are locations on the disk were deleted residual data often resides, can be recovered. Print spooler files, with their original time stamps, can be recovered and reviewed. Files that were recently accessed can be determined and a list of all Internet sites (URL’s) accessed, and the time and date of access, can be compiled. Also, a forensic picture gallery automatically identifies all graphic files and displays them as thumbnails that can easily be copied onto a CD ROM
Forensic investigators will also be able to identify any attempts to hide a file by merely changing its name. Each file’s extension (i.e. .jpg, .gif, .doc) is matched against the file’s actual "signature" to determine if an attempt has been made to "hide" the file. If a file was created in Word (.doc) and the extension was changed to .jpg, the forensic examiner is able to identify and flag that file.
Advantages of Digital Evidence
In addition to the advantages of recovering deleted files, digital evidence contains a wealth of critical data and "embedded" information for both intact files as well as deleted files. For example, forensic software can view the contents of a PowerPoint file to reveal, (depending on how it was configured by the user), information such as: the creation date and original author; dates the file was last accessed, modified and printed; when the file was last saved and by whom; the number of times the file was edited, for how long and by whom; the number of revisions; client name, id and matter number; hidden key words and comments that identify who edited or collaborated on the file; and the original file location.
WordPerfect allows the user to open a saved file and "undo" the last 20 or so entries, charting the latest changes that have been made to the document. Word has a tracking device that can be secured with a password that can invisibly track ALL changes made on a document allowing a subsequent user with a password to review every keystroke and peruse every comment made to the document.
Searches also could reveal embedded information in email headers including routing details and a list of associated file attachments. Palms and other digital assistants leave a log of when they were last synched with the desktop, and what information was downloaded. WinFax keeps a log file of all electronic faxes sent, sometimes for years after the original document was lost or destroyed.
Effective Strategies for Electronic Discovery
Prior to conducting on-site electronic discovery, preliminary information pertaining to the target machine and operating systems must be determined. Each computer system (platform) is different, and poses different types of technological issues for the effective and non-invasive imaging of the drive. Determining in advance whether the computer is a desktop or notebook, the size and type of the hard drive, the manufacturer and year of manufacture, the operating system, and the type of browser and email package being used (Netscape mail, Outlook, AOL, etc.) is critical and will eliminate the potential for numerous technological glitches in the field. Each computer may require a different type of interface and adaptor. Additionally, determining the system architecture of the opponents’ premises will assist the forensic investigative team in verifying that all applicable systems and source media are identified and imaged.
In addition to the "traditional" locations of electronic evidence, such as computer hard drives, off-site servers, mirror sites, backup tapes, and removable media such as diskettes, etc., critical evidence may exist in a number of other locations. Some fax machines contain exact duplicates of the last several hundred pages of documents transmitted and received. Digital telephone systems may contain computer logs of all calls made and received, and often store voice mail messages in digital form on hard drives (.wav files). Network audit programs (if properly configured) can contain a history of all files accessed, downloaded or printed. Network firewalls monitor all web sites visited, external (outside of the Network) communication and information transmitted or received from the Internet.
Steps to Preserve (or Destroy) Electronic Evidence
Preservation of electronic evidence is critical. Depending on numerous factors, electronic evidence can be very perishable, or can last for years. The key to the success of electronic discovery and forensic investigations is to gain access to (or preserve the integrity of) the target media as quickly as possible. Relevant target media includes not only PC hard drives, but other types of storage media including tape backups and archives, floppy diskettes, PDAs (personal digital assistants such as Palms) and other removable electronic media.
Recently we have observed an increase in the types of actions that can impact the integrity and availability of electronic evidence including: (1) the use of data compression, disk de-fragmentation and optimization programs; (2) the downloading or transfer of large files (such as .JPG pictures) which rapidly overwrite data in unused clusters; (3) the use of programs that overwrite sectors with a string of 0’s, such as Norton Utilities’ Wipe-Info; (4) the reuse of back-up tapes; (5) installing new software applications (6) low level formats, operating system formats, partitioning formats, etc.; (7) deleting of temporary Internet files, browser history and cookies; and (8) changing of the time clock on the computer. All of the steps taken above will destroy potentially recoverable evidence, and a number of the steps above could wipe the drive clean. Any of the steps above could alter, delete or modify recoverable evidence.
Summary
During the last five years, there have been exponential advances in technology and with the advent of the Internet; computers have become pervasive in everyday life. As a result, digital data in some form or another will be critical to most types of civil litigation and criminal proceedings.
The tools for conducting forensic investigations have also rapidly evolved, expediting the ability for securing evidentiary images, guaranteeing the integrity of digital evidence, and reducing the time and resources necessary for conducting a comprehensive investigation of electronic media.
There is a rapidly emerging trend to use computer forensics for a broad range of civil litigation matters involving intellectual property rights, trademark infringement, misuse and theft of trade secrets, patent and copyright violations, as well as more traditional matters such as employment law litigation and criminal fraud.
This primer was written to assist corporate and outside counsel to gain a comprehensive understanding of how computers store and delete data, and the types of data that can be recovered.
This primer has been written by James Gordon.