ADELPHI, Md. - An Army researcher partnered with university-based collaborators to develop an enhanced model to provide better clues for analysts who track new and emerging events around the world.
This research contributes to the growing range of natural language applications that support analysts overwhelmed with documents and other written materials, which will assist in detecting new and emerging events around the world and in connecting them to other reported patterns of activities by their locations and participants, said Dr. Clare Voss, researcher at the U.S. Army Combat Capabilities Development Command, known as DEVCOM, Army Research Laboratory.
Researchers published their peer-reviewed findings available via the ACL Anthology. In the paper, reseachers said recurring instances of activity patterns captured in the graph schemas will provide clues or indicators to analysts who track ongoing geopolitical influence campaigns.
Automating the extraction of information about events that can be found in text documents, such as news stories, scientific articles, after-action reports, closed caption transcriptions of spoken broadcasts, or other types of natural language more generally, belongs to the application area called information extraction, Voss said.
Understanding events in these contexts requires knowledge in the form of higher-order structures called event schemas that encode frequently-recurring event sequences, ordered by temporal or causal relations or for narrative impact.
“Once instances of events have been extracted from a collection of texts, the next phase is to discern how these instances may be related to each other, sometimes referred to as connecting the dots,” Voss said.
This approach to information extraction leverages both the explicit knowledge from decades of established methods in natural language processing and explainable representations in linguistics, as well as the most recent neural network-based methods of discovering implicit or tacit information that may be automatically learnable by machine learning algorithms.
It is also a progression of work conducted under the Network Science Collaborative Technology Alliance, which concluded in 2019, and continues under the Defense Advanced Research Projects Agency’s Active Interpretation of Disparate Alternatives and Knowledge-Directed Artificial Intelligence Reasoning over Schemas programs.
The University of Illinois, Urbana-Champaign–lead on this paper, New York University, the Information Sciences Institute at the University of Southern California and the United States Naval Academy partnered with the Army’s corporate research laboratory on this research.
“Event schemas can guide our understanding and ability to make predictions with respect to what might happen next,” Voss said.
Previous schema induction methods have mostly ignored uncertainty, re-occurring events and multiple hypotheses, with limited attention to capture complex relations among events, other than temporal or causal relations, Voss said. Similarly, many existing approaches to automated event extraction have retained the overly simplistic assumption that individual events are atomic occurrences.
This research presents three unique contributions that take this research to the next level, she said.
First, it describes a novel framework for inducing semantically-based event graph schemas, each of which encodes rich event structures and event-to-event connections and is evaluated in terms of two new evaluation metrics for coverage and coherence.
“The coverage metric captures the proportion of event-to-event connections in the source text document that are actually represented in the graph schema,” Voss said. “The coherence metric conveys the probability of any two given events co-occurring within a given document.”
Second, the paper introduces a form of language model, Path Language Model, to select frequently recurring and coherent event-to-event paths over which an event graph schema repository is constructed that is probabilistic and semantically coherent.
Finally, the research results presented show that features from the schema are effective in enhancing an end-to-end information extraction system.
For example, Voss said, consider the following sentence:
“Following the trail of Mohammed A. Salameh, the first suspect arrested in the bombing, investigators discovered a jumble of chemicals, chemistry implements and detonating materials...”
The researchers found that a state-of-the-art information extraction system will successfully extract an ARREST-JAIL event type for the verb arrested, but will fail to extract the INVESTIGATE-CRIME event type triggered by the verb of the sentence, discovered, and the verb’s argument, Mohammed A. Salameh, of type DEFENDANT.
“Event graph schemas can connect the dots to show that a person who is arrested is also usually the person who is investigated,” Voss said. “By augmenting the state-of-the-art [information extraction] system with features from an event graph schema that makes this connection, our approach can fix this missing error. In this paper, we report the results of conducting just such an extrinsic evaluation and show the effectiveness of the induced schema repository in enhancing downstream end-to-end IE tasks.”
According to Voss, several challenges lie ahead for this research.
One will be developing an inventory of reliable clues to the presence of influence agendas in text. Once sufficient data can be annotated with consistency for such indicators, the next step will be to develop systems for automatically detecting them in the context of event extraction applications.
DEVCOM Army Research Laboratory is an element of the U.S. Army Combat Capabilities Development Command. As the Army’s corporate research laboratory, ARL is operationalizing science to achieve transformational overmatch. Through collaboration across the command’s core technical competencies, DEVCOM leads in the discovery, development and delivery of the technology-based capabilities required to make Soldiers more successful at winning the nation’s wars and come home safely. DEVCOM is a major subordinate command of the Army Futures Command.