UMD Study Supports Efficacy of Machine Learning in FOIA Review

A University of Maryland (UMD) study recently published by the Association for Computing Machinery supports the usefulness of machine learning tools to assist the Freedom of Information Act (FOIA) review process. The study, “Providing More Efficient Access To Government Records: A Use Case Involving Application of Machine Learning to Improve FOIA Review for the Deliberative Process Privilege,” examines the ways in which classifiers trained in machine learning can assist FOIA reviewers for the deliberative process privilege of Exemption 5.

According to the U.S. Department of Justice, the deliberative process privilege serves “to encourage honest and frank communication within the agency without fear of public disclosure.” Records exempt under the deliberative process privilege must prove that they are both “predecisional,” as in “”antecedent to the adoption of an agency policy,” as well as “deliberative,” i.e. “a direct part of the deliberative process in that it makes recommendations or expresses opinions on legal or policy matters.” Within the deliberative process privilege, there is room for judgment among reviewers, as the researchers acknowledge.

The purpose of this study was to examine ways in which machine learning could assist FOIA processing in the wake of growing demand for information due to COVID-19 and other government initiatives such as “Capstone” policy, which will make millions of email records available for review, and the NARA transition to full-electronic archives by 2022. The study, “designed to model the workflow that agency staff follow in carrying out FOIA reviews,” employed two FOIA professionals and four approaches to text classification to flag material exempt under the deliberative process privilege within two datasets: about 500 documents related to former White House lawyers Elena Kagan and Cynthia Rice.

Files were then categorized into various batches and classified by reviewers based on whether they were likely to contain content protected by the deliberative process privilege. Researchers then measured inconsistencies in annotations between the two reviewers, finding that most inconsistencies stemmed from “lack of settled precedent as to whether the type of document was ‘predecisional’ or ‘deliberative.’” Finally, the reviewers created an agreed final set of annotations for the documents.

Researchers then set evaluation measures for their machine learning classifiers by recall and precision. They employed four separate classifiers to the documents: “(1) Linear Regression (LR); (2) Support Vector Machine (SVM); (3) Begin-Inside-Outside (BIO) tagger using Conditional Random Fields; and (4) keyword search.” The classifiers searched the datasets and flagged material that could be protected by the deliberative process privilege through various approaches. Results indicated the validity of certain classifiers over others, as well as the efficacy of use of classifiers overall. For instance, machine learning classifiers far outpaced keyword classifiers, as was “consistent with other studies,” according to the researchers.

The study indicated several advantages and limitations to the employment of classifiers over manual review. For instance, the researchers noted that classifiers that approach review through isolated paragraphs are not able to identify documents “that human reviewers can easily recognize as categorically non-exempt.” On the other hand, the researchers noted that classifiers are able to identify vocabulary terms that human reviewers might miss, suggesting that it was “possible to augment human performance . . . by suggesting terms for human designers to consider.”

Researchers concluded that “classifiers trained using supervised machine learning can potentially be of benefit in highlighting portions of records that are within the scope of the deliberative process privilege under FOIA Exemption 5.” Researchers also identified three “efficiencies” of employing classifiers to the FOIA review process: (1) machine learning as beneficial to “large scale” review; (2) the ability to highlight passages that require “the most careful” review; and (3) for machine learning to detect inconsistencies between reviewers at the end of the process. Ultimately, the researchers recommend that machine learning be used to augment human review, as there is always room for judgment in FOIA review.

If your agency is seeking ways to cut down on FOIA backlog and to assist the review process, consider implementing the FOIAXpress AI Assistant, which employs machine learning to vastly assist the review process within FOIAXpress.

To learn more about the FOIAXpress AI Assistant, please request a demo or email us at info@ains.com.