Definitions of “predictive coding” vary, but a common form of predictive coding includes the following steps. First, the data is uploaded onto a vendor’s servers. Next, representative samples of the electronic documents are identified. These “seed sets” can be created by counsel familiar with the issues, by the predictive coding software, or both. Counsel then review the seed sets and code each document for responsiveness or other attributes, such as privilege or confidentiality. The predictive coding system analyzes this input and creates a new “training set” reflecting the system’s determinations of responsiveness. Counsel then “train” the computer by evaluating where their decisions differ from the computer’s and then making appropriate adjustments regarding how the computer will analyze future documents.
This process is repeated until the system’s output is deemed reliable. Reliability is determined by statistical methods that measure recall—the percentage of responsive documents in the entire data set that the computer has located—and precision—the percentage of documents within the computer’s output set that are actually responsive. (That is, “recall” tests the extent to which the predictive coding system misses responsive documents, while “precision” tests the extent to which the system is mixing irrelevant documents in with the production set.) The resulting output can be either produced as is or further refined by subsequent human review.
In large productions, predictive coding can provide real cost savings. If humans need not look at a significant percentage of the collected documents, the savings over millions of documents is tremendous. Proponents of predictive coding, citing published studies, also assert that it is more accurate than having humans review every document.
Of course, like any tool, predictive coding has its disadvantages, and it is not the right tool for every case. Setting up a predictive coding system is expensive. While there can be benefits from predictive coding even in ordinary-sized cases, the cost savings are amplified as volume rises. Furthermore, the amount of “training” necessarily increases when the predictive coding system is asked to find documents responsive to multiple concepts. Thus, the system works best where the ratio of documents to document requests is high.
Where the savings are marginal—where the number of electronic documents is substantial but not overwhelming—counsel should evaluate other means of being efficient. Where electronic documents are well organized, through folders or otherwise, it may be relatively easy to determine the irrelevancy of entire folders. Thus, if a client has distinct lines of business or distinct projects that are irrelevant to the litigation, the corresponding folders might not warrant review. Of course, concepts important to litigation often cut across the organizational structures used for business purposes, and the strength of predictive coding lies in dealing with those situations.
Even where predictive coding is used, counsel must still evaluate, for each document request, the best approach to locating responsive documents. A resolution of the board of directors might be best found by looking in the minute book, a request for an accounting report might be best fulfilled by asking the appropriate employee to generate a report, etc.
Keyword (Boolean) searches yield cruder results than predictive coding systems but may nevertheless be helpful, either in place of or in addition to other techniques. For example, a witness may be of such importance that it is worthwhile to look at all documents bearing his or her name or email address. A keyword search might also be warranted for a distinctly named project that lies at the heart of the litigation.
Visit the ClayDesk e-Discovery Blog for latest insights into e-Discovery matters.
Your Data, Our Value
How we do e-Discovery
ClayDesk’s e-discovery services are trusted by top law firms and corporations, including nine of the top 10 global law firms, 32 of the top 50 Am Law firms, and numerous Fortune 100 companies. Our clients trust us because our services are: