Making eDiscovery simple: Let's work on your new project together!

Predictive Coding

Predictive Coding

In recent years, discovery costs have ballooned as people increasingly write, transmit, and store documents electronically. As a result of this “electronically stored information,” or “ESI,” clients store vastly more documents than ever before. Predictive coding, which is also known as “computer-assisted review,” is a means of fighting back against this expense by enlisting computer technology to help identify responsive or privileged documents. Predictive coding can also generate work product such as lists of significant documents.

Definitions of “predictive coding” vary, but a common form of predictive coding includes the following steps. First, the data is uploaded onto a vendor’s servers. Next, representative samples of the electronic documents are identified. These “seed sets” can be created by counsel familiar with the issues, by the predictive coding software, or both. Counsel then review the seed sets and code each document for responsiveness or other attributes, such as privilege or confidentiality. The predictive coding system analyzes this input and creates a new “training set” reflecting the system’s determinations of responsiveness. Counsel then “train” the computer by evaluating where their decisions differ from the computer’s and then making appropriate adjustments regarding how the computer will analyze future documents.

This process is repeated until the system’s output is deemed reliable. Reliability is determined by statistical methods that measure recall—the percentage of responsive documents in the entire data set that the computer has located—and precision—the percentage of documents within the computer’s output set that are actually responsive. (That is, “recall” tests the extent to which the predictive coding system misses responsive documents, while “precision” tests the extent to which the system is mixing irrelevant documents in with the production set.) The resulting output can be either produced as is or further refined by subsequent human review.

In large productions, predictive coding can provide real cost savings. If humans need not look at a significant percentage of the collected documents, the savings over millions of documents is tremendous. Proponents of predictive coding, citing published studies, also assert that it is more accurate than having humans review every document.

Of course, like any tool, predictive coding has its disadvantages, and it is not the right tool for every case. Setting up a predictive coding system is expensive. While there can be benefits from predictive coding even in ordinary-sized cases, the cost savings are amplified as volume rises. Furthermore, the amount of “training” necessarily increases when the predictive coding system is asked to find documents responsive to multiple concepts. Thus, the system works best where the ratio of documents to document requests is high.

Where the savings are marginal—where the number of electronic documents is substantial but not overwhelming—counsel should evaluate other means of being efficient. Where electronic documents are well organized, through folders or otherwise, it may be relatively easy to determine the irrelevancy of entire folders. Thus, if a client has distinct lines of business or distinct projects that are irrelevant to the litigation, the corresponding folders might not warrant review. Of course, concepts important to litigation often cut across the organizational structures used for business purposes, and the strength of predictive coding lies in dealing with those situations.

Even where predictive coding is used, counsel must still evaluate, for each document request, the best approach to locating responsive documents. A resolution of the board of directors might be best found by looking in the minute book, a request for an accounting report might be best fulfilled by asking the appropriate employee to generate a report, etc.

Keyword (Boolean) searches yield cruder results than predictive coding systems but may nevertheless be helpful, either in place of or in addition to other techniques. For example, a witness may be of such importance that it is worthwhile to look at all documents bearing his or her name or email address. A keyword search might also be warranted for a distinctly named project that lies at the heart of the litigation.

Visit the ClayDesk e-Discovery Blog for latest insights into e-Discovery matters.

Your Data, Our Value

We are the only company to offer a complete portfolio of services and technologies that span all phases of the Electronic Discovery Reference Model (EDRM). We integrate services and technologies to support you from the beginning of the discovery process to the end. We are the only provider that offers a full suite of integrated services with on-site, onshore, and offshore services

Our range

Our range of integrated services offers you flexible e-discovery options – we can help you choose just the services you need or you can take advantage of our end-to-end solution that can be delivered with our flat “All-in” pricing solution. Our project management approach – which focuses on consultative planning, disciplined execution, and repeatable results – enables us to deliver a transparent, consistent, high-quality solution

How we do e-Discovery

A holistic approach to strategy, services, and technology to ensure efficiency, cost control, and defensibility
Depth of domain expertise and knowledgeable employees
A simple yet integrated discovery process that is more defensible and transparent
The ability to scale for greater efficiency and lower costs

ClayDesk’s e-discovery services are trusted by top law firms and corporations, including nine of the top 10 global law firms, 32 of the top 50 Am Law firms, and numerous Fortune 100 companies. Our clients trust us because our services are:

Our Skills







e-Discovery is a complicated realm

For a comprehensive review of your ediscovery needs, get in touch!