ADELPHI, Md. -- Army researchers discovered a way to quickly get information to Soldiers in combat using new machine learning techniques. The algorithms will play a significant role in enhancing how the future military will operate.
Researchers from the U.S. Army’s Combat Capabilities Development Command’s Army Research Laboratory Defence Science and Technology Laboratory, IBM Thomas J. Watson Research Center and Pennsylvania State University, created the ability to train a number of classical machine learning algorithms to operate in constrained environments, particularly those involving coalitions, that can be implemented in various devices used by Soldiers.
“Tactical networks often suffer from intermittent and low-bandwidth connections due to their hostile operation environment,” said Dr. Ting He, associate professor at Pennsylvania State University. “In addition, although artificial intelligence techniques have the potential to greatly improve the situational awareness of Soldiers and commanders, to keep them updated about the fast-changing situations, the machine learning models need to be frequently retrained using updated data, which are often distributed across data sources with unreliable or poor connections.”
According to the researchers, this challenge calls for new generations of model training techniques that strike a desirable tradeoff between the quality of the obtained models and the amount of data transfer needed.
Their research, called coreset, tackles this challenge using the approach of a lossy data compression technique designed for machine learning applications. This compression method filters and discards needless and redundant data to reduce the amount of data being compressed.
“Coreset looks like a smaller version of the original dataset that can be used to train machine learning models with guaranteed approximation to the models trained on the original dataset,” He said. “However, existing coreset construction algorithms are each tailor-made to a targeted machine learning model, and thus multiple coresets need to be generated from the same dataset and transferred to a central location to train multiple models, offsetting the benefit of using coresets for data reduction.”
To address this problem, the researchers studied the robustness of different coreset construction algorithms with respect to the machine learning models they are used to training, with the goal of developing a robust coreset construction algorithm whose output can simultaneously support the training of multiple machine learning models with guaranteed qualities.
“Via a careful classification of more than 16 years of research on coresets, we identified three classes of coreset construction algorithms and evaluated the robustness of representative algorithms from each class on real datasets to obtain insights on what works better in a mixed-use setting and why,” He said. “Our study revealed that a clustering-based algorithm has outstanding robustness compared to the other evaluated algorithms in supporting both unsupervised and supervised learning.”
The researchers further established the theoretical condition under which the algorithm is guaranteed to provide a coreset, based on which near-optimal models can be obtained.
A distributed version of the algorithm was also developed with a very low communication overhead.
“Take the neural network as an example,” He said. “Compared to training the neural network on the raw data, training it on a coreset generated by our proposed algorithm can reduce the data transfer by more than 99% at only 8% loss of accuracy.”
According to Dr. Kevin Chan, an electronics engineer at the lab, this research will enhance the performance of machine learning algorithms, particularly in tactical environments where bandwidth is scarce.
“Given advanced techniques to increase the rate at which analytics can be updated, Soldiers will have access to updated and accurate analytics,” Chan said. “This research is crucial to Army Networking Priorities in support of machine learning that enable multi-domain operations, with direct applicability to the Army’s Network Modernization Priority.”
The developed algorithm is straightforward to implement and can be used with various data-capturing devices, especially high-volume, low-entropy devices such as surveillance cameras, to significantly reduce the amount of collected data while ensuring guaranteed near-optimal performance for a broad set of machine learning applications, He said.
This means that Soldiers will be able to obtain faster updates and smoother transitions as the situation changes at a competitive accuracy.
“In addition to applications in the military domain, coresets and distributed machine learning in general are also widely applicable in the commercial setting, where multiple organizations would like to jointly learn a model but cannot share all their data,” said Dr. Shiqiang Wang, research staff member at IBM Research and collaborator on this work. “This can be very useful for a wide range of AI-driven applications, such as fraud detection in the banking industry, disease diagnosis leveraging patient data across multiple hospitals and even autonomous driving. These emerging use cases enabled by distributed AI will be essential in our future society.”
As for the next steps for this research, the team is exploiting various ways of combining coreset construction with other data reduction techniques to achieve more aggressive data compression at a controllable loss of accuracy.
“For example, we are exploring how to optimally allocate bits between coreset construction (i.e., generating more samples) and quantization (i.e., having a more accurate representation per sample),” He said. “As another example, we are exploring how to optimally combine two approaches – reducing the number of data records using coreset and reducing the number of features per data record using dimensionality reduction techniques.
Throughout this research, collaboration has been a constant and vital mechanism.
“We have been working closely with our government and industrial partners on this project through weekly teleconferences, site visits, internships, and various face-to-face meetings at International Technology Alliance on Distributed Analytics and Information Sciences events and scientific conferences,” He said. “The collaboration has not only kept us informed about the needs of the government and the industry users, but also fostered new ideas and directions.”
The team published their paper, Robust Coreset Construction for Distributed Machine Learning, in the IEEE Journal on Selected Areas in Communications special issue on Advances in Artificial Intelligence and Machine Learning for Networking.
He said being the journal paper is helpful to the continuation of this game-changing research.
“Artificial intelligence and machine learning are promising techniques to revolutionize how we operate our networked systems and satisfy users’ information needs,” He said. “Being featured in this special issue provides our work a timely opportunity to reach the right audience, including not only peer researchers, but also potential downstream developers and users that can facilitate follow-ups and adoptions.”
CCDC Army Research Laboratory is an element of the U.S. Army Combat Capabilities Development Command. As the Army's corporate research laboratory, ARL discovers, innovates and transitions science and technology to ensure dominant strategic land power. Through collaboration across the command's core technical competencies, CCDC leads in the discovery, development and delivery of the technology-based capabilities required to make Soldiers more lethal to win the Nation’s wars and come home safely. CCDC is a major subordinate command of the U.S. Army Futures Command.