Army research advances state-of-the-art cybersecurity methods

By U.S. Army CCDC Army Research Laboratory Public AffairsMay 28, 2020

To enhance cybersecurity and Soldier protection, Army researchers develop a novel design framework to lower the memory and computational requirements of graph machine learning without sacrificing accuracy.
To enhance cybersecurity and Soldier protection, Army researchers develop a novel design framework to lower the memory and computational requirements of graph machine learning without sacrificing accuracy. (Photo Credit: Shutterstock) VIEW ORIGINAL

ADELPHI, Md. -- Soldiers in combat face challenging scenarios on the battlefield as they work to complete their missions. To assist them in the decision making process, Army researchers have discovered a novel cybersecurity technique that will enhance threat detection and prevention for Soldier protection.

Dr. Rajgopal Kannan, a researcher in the Context Aware Processing Branch at the U.S. Army Combat Capabilities Development Command’s Army Research Laboratory, along with collaborators at the University of Southern California, has discovered a novel technique to parallelize Graph Neural Network, or GNN, training that outperforms state-of-the-art methods in scalability, efficiency and accuracy.

“Graph Convolutional Networks, a type of GNN, are fundamental methods for deep learning on graphs that allow us to learn/understand both the graph structure, or how the graph looks connected, and graph features, or the properties of each node/element,” Kannan said. “They are analogous to convolutional neural networks that are ubiquitous in Army machine learning applications for vision and image processing such as in target detection, object recognition, etc.”

They can enable cybersecurity applications for threat detection and prevention and enhanced situational awareness, and can also accurately be used for human activity recognition and motion detection to enhance Soldier protection.

Graphs are a powerful tool used far and wide to represent complex data, from molecular structures to relationships in the realm of social science.

In today’s world of big data, graphs are more prevalent than ever, Kannan said. In particular, the U.S. Army has vast real-time information databases, such as the laboratory’s Global Database of Events, Language, and Tone, or GDELT, that can be represented as graphs.

“Since these graphs offer a compact, yet very powerful, way of representing dense relationships between entities as well as their high-level attributes, or features, it is critical to develop fast, efficient, accurate and scalable methods for learning these relationships – this is what our research is focused on,” Kannan said.

The most common graph machine learning training technique is called Stochastic Gradient Descent, and is done over small chunks of training data – so called batches, he said.

(Photo Credit: U.S. Army) VIEW ORIGINAL
(Photo Credit: U.S. Army) VIEW ORIGINAL

Existing state-of-the-art techniques for GCNs such as this require sampling the graph i.e., selecting nodes and their neighbors, layer by layer, to form each batch and then run the classification training.

“This leads to a problem called neighbor explosion, in which the number of neighbors of nodes grows exponentially large as we peek further out into its connections,” Kannan said. “This makes training of even moderately sized graphs very time consuming, especially social network graphs, which have a power law connectivity relationship.”

Kannan and his fellow researchers propose GraphSAINT, a graph sampling based inductive learning method that improves training efficiency in a fundamentally different way.

“We developed a novel design framework called GraphSAINT for parallelizing and speeding up deep learning on graphs,” Kannan said. “GraphSAINT, which stands for Graph SAmpling Based INductive Learning Method, uses novel methods to lower the memory and computational requirements of graph machine learning without sacrificing accuracy, thereby enabling deeper models and faster learning on much larger graphs, even using lightweight edge devices.”

This will allow for the exploitation and mining of the Army’s vast database of graph-based information and provide actionable intelligence for Soldiers,” he said.

More specifically, this research is on graph inductive learning – in which the machine learning model is trained on one graph, but can generalize to make inferences on completely unknown graphs.

“This has potentially very useful applications – imagine a machine learning network trained on a known graph of adversary communication patterns,” Kannan said. “Our graph machine learning model can then generalize this learning to a completely new and dynamically updated network of adversarial communications. We can then infer potential malicious activities by our adversaries and deny them, thereby achieving area superiority and dominance, as desired in Command, Control, Computers, Communications, Cyber, Intelligence, Surveillance and Reconnaissance.”

However, he said, for the team’s graph machine learning methods to be useful, both their graph training and inference should be extremely fast yet accurate and work on large graphs.

“This is what our research attempts to do, beating state-of-the-art Graph Neural Networks in both training time and accuracy on several large graph datasets, including Reddit, Amazon, Yelp and Flickr,” Kannan said. “For example, we trained a deep GNN on Reddit, a popular social connection graph, and significantly improved the accuracy, but with 100 times less computation time.”

By making inductive graph learning faster and scalable to extremely large graphs, this research will enable the researchers to leverage the Army’s existing real-time information database, Kannan said.

The 8th International Conference on Learning Representations, which was conducted virtually from April 26 to May 1, published their paper, GraphSAINT: Graph Sampling Based Inductive Learning Method, which supports Position, Navigation and Timing research for the Army Modernization Priority for Networks/C5ISR.

ICLR is the premier gathering of professionals dedicated to the advancement of the branch of artificial intelligence called representation learning, but generally referred to as deep learning.

There are two promising next steps for this research, according to Kannan.

The first is to leverage the branch’s expertise in accelerating graph algorithms and develop an optimized distributed graph training system that will help the researchers learn deeper on even larger graphs. The system will exploit massive task-level parallelism through independent training graph partitioning and sampling on each processor along with algorithms for data shuffling to drive convergence.

The second goal is to build a complete graph processing system consisting of a querying front-end and an extended GraphSAINT based back-end that includes distributed accelerated graph training and inference along with other graph analytics algorithms that can help Army modernization priority applications. The initial goal is to demonstrate a small prototype that uses available ARL graph data sets.

“I am very optimistic, especially once we start building our prototype graph querying and learning system that leverages the Army’s information database to learn on unknown graphs and provide actionable intelligence for C5ISR dominance,” Kannan said. “This research is very important to my mission as an Army civilian supporting the Army's modernization priorities.”

Machine learning research, with its intensive experimental focus requiring the accurate implementation of complex models trained on very large datasets and the need to run accurate and replicable experiments, is a fertile field for collaboration that was brought to life in this instance by the lab’s Open Campus business model, Kannan said.

The ideas explored in this research were developed in the laboratory’s Context Aware Processing Branch at its regional campus in Playa Vista, California, in collaboration with researchers at USC. The Department of Commerce recently selected it as one of the lab’s high-impact cooperative research and development agreements, or CRADAs.

“This research illustrates not just a new discovery, but high-impact [CCDC] ARL collaborations for putting ideas into practice benefiting warfighters,” Kannan said.
(Photo Credit: U.S. Army) VIEW ORIGINAL

CCDC Army Research Laboratory is an element of the U.S. Army Combat Capabilities Development Command. As the Army's corporate research laboratory, ARL discovers, innovates and transitions science and technology to ensure dominant strategic land power. Through collaboration across the command’s core technical competencies, CCDC leads in the discovery, development and delivery of the technology-based capabilities required to make Soldiers more lethal to win the nation’s wars and come home safely. CCDC is a major subordinate command of the U.S. Army Futures Command.