Deep Green: The Army's Data Science Competition
Deep Green logo (Photo Credit: Office of Business Transformation) VIEW ORIGINAL

Washington – On June 25, the U.S. Army announced the selection of a team from the U.S. Army National Guard as winner of its premier Data Science Competition, closely followed by a team from the U.S. Army Combat Capabilities Development Command, or DEVCOM, Data & Analysis Center, known as DAC.

The 2021 Deep Green challenge, sponsored by the Assistant Secretary of the Army (Acquisition, Logistics and Technology), or ASA(ALT), through the Office of Business Transformation, known as OBT,  is designed to engage the data analytics community to develop data-driven approaches to solve complex Army problems. Kicked off in January of this year, 15 teams of DOD data scientists, statisticians, engineers and analysts entered the competition to cultivate innovative responses to a unique RAND Corporation-procured data set and predict the materiel readiness of the M2A3 Bradley fighting vehicle.

The competition consisted of an initial stage where participants dug into the data set, applied tools to develop an understanding and created models to fit the data. Based on the quality of the models, challenge organizers identified five teams to compete in a final round and present results.

The two teams from DAC who participated both made it into the final round. A cross-organizational team with representatives from multiple divisions, led by Paul Soper, was runner-up in the overall competition. Analysts included Ryan Barker, Jon Blood, Richard Haberstroh, Alexon Kirksey and Jonathan Zgorski. The second DAC team was led by John Wang.

Among 28 competition-compliant and scorable models received, the two DAC teams submitted four of out five of the top models. Each team was able to submit multiple entries, while receiving rolling feedback and keeping track of their competition placement via a publicly-ranked scoreboard.

“Deep Green is crowdsourcing solutions to help make the business of the Army better. This outcome is proof of DAC’s expertise and can-do attitude,” said Military Deputy Director, Col. Gregory Smith. “Our people get in there and work hard to solve problems. We do this with the vision of making both the warfighter and our Army the most lethal fighting force in the world.”

After finalists were selected on May 20th, the teams presented insights from their modeling efforts to the competition sponsor, emphasizing the factors that they found to be indicative or predictive of equipment readiness for the M2A3 Bradley. The findings help provide recommendations for the ASA(ALT) Operational Sustainment Review process.

According to Bakari Dale, director of the OBT Enterprise Data Analytics Office, the motivating force behind Deep Green is to bring data science communities together to encourage their professional development while working to solve a real problem of immediate import to a sponsor’s mission.

“This competition is opening the door for DOD analytics experts to expand their data skillsets and help the Army improve business practices and processes,” Dale said. “By examining the materiel readiness of the M2A3 Bradley fighting vehicle, they’re helping us by providing valuable recommendations to the challenge sponsor.”

Inspired by the professional development opportunity to apply topics learned from educational ventures to real applications and an interesting new problem, Soper’s team got to work—not just building the models, but familiarizing themselves with new tools and approaches. The Deep Green challenge offered satellite and data science trainings virtually, as well as trainings on DataRobot: a software platform that enables automated machine learning.

Zgorski, who has been with DAC for a little over a year, saw Deep Green as a chance for technical development and an improved understanding of machine learning and DataRobot. “It’s a good way for me to provide DAC with a  with a more diverse skill set, especially since DataRobot is a tool we’ll want to use going forward for multiple different solutions. This is a way to be ready for it in the future.”

For Barker, Kirksey and Soper, the challenge presented a prime venue for learning. “It was definitely a new experience,” said Soper. “There was training in DataRobot and machine learning principles, but also training in time series, which isn’t often covered in academic environments and is very relevant in our world.”

Soper’s team met virtually on a weekly basis to brainstorm ideas and work together. Aside from challenges associated with working in a virtual working environment, and the nature of a global pandemic, the data sets themselves were challenging. For some members, it was the largest data set they’ve seen.

“A big challenge for us, too, was dealing with imputed values, which are values for missing data in the data system, and how to handle features known in advance. These issues kept the whole Deep Green community busy,” said Soper. “One thing we learned was that we couldn’t use DataRobot for everything. In order to be successful, we needed to be able to do a whole lot of processing before DataRobot got its hands on the data, and in some cases, after it came up with predictions. This is called feature engineering. You can’t just feed in data, turn the crank, and have something wonderful pop out.”

As for advice for future Deep Green challenge participants? The team recommends deliberate collaboration and adopting a willingness to experiment.

“Directly collaborating with others when building the models seemed to produce the best batches for us,” said Zgorski. “Breaking out independently is fine in the initial stages of discovery, but after that point, you need tighter collaboration for better results.”

The team also relied on research and persistence. “We had a number of weeks where we were saying ‘try this,’ and it didn’t work very well, ‘try this,’ and that didn’t work very well either,” said Soper. “We ended up with about 30 different models, but toward the end, it all started coming together and we ended up with models that performed much better. The key was being willing to take that chance, experiment and put the information out there to share with the other team members.”

Leveraging the spirit of competition, the Deep Green challenge aimed to empower the participating community to advance the state of the art of artificial intelligence and machine learning modeling of equipment readiness for the Bradley platform. “All the finalists’ models outperformed the generic time series model created by the challenge administrators as a baseline. It’s clear that we’re working with not only talent, but drive,” Dale said. “We posted a problem, and there was a crowd of eager analysts ready to solve it.”

The impact of the winning model, as well as lessons learned, will be presented to the Army Analytics Board and shared with the rest of the Army. Findings are applicable to the M2A3 Bradley fighting vehicle, and potentially extensible to other vehicles and maintenance problems.

The DEVCOM Data & Analysis Center is an element of the U.S. Army Combat Capabilities Development Command. DEVCOM is a major subordinate command of the U.S. Army Futures Command. Visit the DEVCOM website at https://www.army.mil/devcom.