I recently graduated from the Naval Postgraduate School with an MS in Operations Research; a curriculum focused heavily on data analysis. Prior to NPS, I had no technical background and would have been skeptical at best of the idea that data analysis could be used to evaluate performance in training. After two years at school, I have changed my mind on that. My thesis is titled “Changing the Perspective of Infantry Training Evaluation: The Case for Quantitative Comparison of Unit Performance.” This article attempts to pitch to the community some thoughts that came to mind working on that thesis. I owe a special thanks to the Marines around me who helped flush out and critique the ideas presented here, and to my thesis advisor who provided analytical guidance throughout the thesis writing process.
1. Requirement for Change
Since the release of Force Design 2030, the Marine Corps has been in a race to modernize the force. With documents like Talent Management 2030 and Training and Education 2030, the Commandant has made it clear that the Marine Corps will challenge all existing paradigms in an effort to improve. These documents charge each of us to identify gaps in the status quo and propose solutions to fix them. In my opinion, we are very good at planning and executing training but are weak in assessment and evaluation. If you disagree, think back to debriefs at IOC (or IULC, or ISULC) following field events and live-fire; has your unit ever conducted a debrief that came close to that level of detail, specificity, or tactical competence? If you have, write about that, and share how you made it work. From my perspective, learning points from training events in the fleet are not clearly emphasized to the degree they are at the schoolhouse, and that is to our detriment.
2. Why Detailed Evaluation Matters Now
We have a peer threat in the People’s Republic of China. To counter that threat, the Marine Corps will operate in dispersed formations, and therefore increased responsibility will be placed on leaders at all levels. Marines will be more capable, equipment will be more advanced, and emerging concepts will need to be tested and critiqued. Changes are being made with a sense of urgency as it will be the adversary, not us, who determines when the next war will begin. Commanders at all levels will lead formations that are trained, equipped, and organized differently than they were just a few years ago. That means, unlike in the past, the experience of leaders in the unit cannot be the only thing relied upon to evaluate performance. If we are going to effectively employ new technologies in accordance with new concepts, we need to be sharing information laterally, especially at the company-level and below. For that to work, we need leaders to have the humility and maturity to accept that they are not experts in conducting missions like EABO or employing weapons like OPF or NMESIS. We need leaders who are willing to try new TTPs, and share the results, whether positive or negative. To know whether something is working or not, we need to be comparing our performance to other units. The evaluation based on observation from unit leaders will remain a key component in our training; but that observation needs to be informed, and knowing how other units are operating is the best way to become informed and broaden your perspective.
3. The Issues with T&R Standards as the Basis for Evaluation
The current T&R standards can be binary in nature, meaning the standard establishes a pass/fail line, like 0311-M27-1005 (execute the IAR transition course of fire) which has the standard “by achieving a minimum score of 70%.” It is easy to evaluate whether a Marine hit 70% or higher, but without knowing the distribution of scores that information is not that relevant. For example, if a Marines score on that task is 75%, but the average score across the service for all individuals is 90%, then that Marine is performing below average despite meeting the T&R standard. Simple statistics, like distributions and percentiles, provide much more context than does a pass-fail line. On the other end of the spectrum, some standards are too vague to be objectively applied, such as 0341-WPNS-1005 (fire a mortar in handheld mode) with standard “to achieve effects on target.” The ordnance requirements have three TP rounds and three HE rounds per Marine but give no definition of “effects” for those six rounds. The implication of vague standards is that evaluators must have the experience and competence to apply sound judgement to evaluate the training they are observing. However, there are situations where evaluators lack that experience or competence, and the training value of that event suffers for it. If data is collected, analyzed, and visualized in a manner that is accessible and intuitive, evaluators can be given the proper context by which to judge performance.
The examples of T&R tasks I just gave are used to illustrate why I’m proposing a comparison tool, though I trust that most units have a good handle on how well their Marines are firing their M27s and 60mm mortars. In TE2030, General Berger twice mentions revising infantry T&R standards to ensure that the T&R manual is aligned to the missions we will be expected to accomplish. Thus, it can be assumed that a major revision of T&R tasks will happen with the next update. Assuming that the standards for the updated T&R tasks will be worded similarly to current standards, and with the argument I made in Section 2, I propose a performance comparison tool, called Expectations of Infantry Marines (EIM), to help evaluate unit performance.
4. Performance Comparison Tool: Expectations of Infantry Marines (EIM)
To showcase how a tool like EIM could work, I’ll continue with the example of 0341-WPNS-1005. The data for my thesis came from the Ground Combat Element Integrated Task Force (GCEITF) study conducted in 2015. As part of that study, Marines fired 60mm mortars in the handheld and impacts were observed via VECTOR-DAGR. MCWP 3-15.2, Tactical Employment of Mortars, states that “if a 60-mm mortar round lands within 35 meters of a target, there is a 50 percent chance it will be suppressed. Beyond 50 meters, little suppression takes place” and “if an 81-mm mortar round lands within 75 meters of a target, there is a 50 percent chance that the target will be suppressed.” For the GCEITF, 23% of rounds landed within 35m of the nearest target, and 60% of rounds impacted within 75m of the nearest target. These two points should be captured in an additional T&R component; evaluator notes. Whether officially documented in the T&R Manual, or a product of EIM, evaluator notes can be simple statements or statistics that reinforce the intent of the standard and provide the necessary context to evaluate closer to objectivity. For this task, evaluator notes may be “60mm suppression is defined in MCWP 3-15.2 as rounds impacting within 35m of the target. It is expected that three of the six rounds will impact within 75m of the target." These provide any evaluator a rough benchmark to compare performance for immediate feedback. It is important to note that the GCEITF data should not be the authoritative source; but it is data that was very well collected and presents a good starting point to demonstrate the ideas of quantitatively measuring infantry training.
I made a visualization tool for what I imagine EIM could look like, and I’ll use screenshots for illustrative purposes. First, leaders can visit the site and view summary information of the existing data. I am using GCEITF data, but the ideal system would have a data repository that is automatically updated when units upload data, and the MOS specific page would have tables with summary statistics and accompanying graphs.
After an initial view of information available by MOS, users can go to the home page and select a T&R task that they will be completing in upcoming training. The download template is an Excel CSV file with column headers giving the required information. That is all that is required of a user: downloading a CSV before training, typing in the results, and uploading it back into the system. For this example, there are several outputs the system provides. First, a table giving some information about impact distances and how those impacts compare to the baseline (the GCEITF in this case). Note that these tables have a scroll feature. The first table shows all unit uploaded data, and the following for each impact: the distance to each target, the distance to the closest and farthest target, and the average distance to all targets. The second table compares the proportion of rounds within a threshold distance compared to the GCEITF data.
Last, a visualization of the MFPs (depicted in blue), the target locations (depicted in red), and all impacts (depicted in black) is presented in a map with zoom capability as shown in the next figure.
As mentioned earlier, I do not expect any unit leaders to have trouble evaluating how proficient their 60s section is with handheld employment. However, this task does provide a great example of using data to draw insights beyond that provided by the T&R standard, and how easy it is to extrapolate additional contextual information through data analysis and visualization that is transparent to the user.
5. Relevant Applications
One challenge for a tool like EIM is that most infantry tasks are too situationally specific to have a truly direct comparison. However, for almost all marksmanship tasks, it is extremely simple. When M320s were originally fielded, there was a lot of debate about whether that weapon should be rifle-mounted or employed as a standalone system. By asking the Marines employing the weapon which method they prefer, we can get the qualitative feedback required. By firing the weapon with both configurations and tracking the number of hits, we can get the quantitative feedback. Taking that one step further, units could upload this data to EIM and see how well their M320 Gunners perform against other units. For the Carl Gustaf, leaders can compare their gunners’ performance to ensure that such a capable weapon system is being employed by the right Marine. The same concepts can apply to tasks outside of marksmanship, albeit as extra variables are introduced, the complexity of quantitative comparison increases. I’ll write more about those applications in a separate article.
6. Shortfalls of this Methodology
The biggest hurdle for this idea is the actual implementation of such a tool. The design I created and illustrated in the previous section is fragile in its current implementation and not nearly organized or secure enough for service-wide use. However, the principle of finding ways to compare performance across the force remains important. Next, if there did exist a stable implementation of a performance comparison tool, it will only be as good as the data that is uploaded into it. Data collection is surprisingly difficult, and if it is not prioritized then the results will be effectively worthless. For any honest comparison of performance, there cannot be a way for the Commander to be evaluated on it in MCTIMS-fashion. MCTIMS serves the purpose of tracking training completion for METL accomplishments; however, the data uploaded is too oversimplified to be used as a true comparison tool. Also, since it is mandatory, data pedigree issues arise. For the comparison tool to be useful, it must be seen solely as a tool; an option for unit leaders to widen their perspective.
7. Closing Thoughts
I strongly believe that lateral communication at all echelons is how we will adapt most quickly. It is important that there is an appropriate forum to do so; the Connecting File is a perfect example of a useful forum to share ideas and lessons learned. Another idea of creative information sharing was brought up in conversation by 1stLt Jon Manuel, which envisioned an expansion of the training resource tab in MCTIMS. Tabs could be filtered by installation and training type. For example, if 2/6 conducted platoon attacks at G29 on CLNC, and it went well, the CONOPs and a short write-up could be uploaded under “Camp Lejeune, Live Fire, G29” so that adjacent units planning to run G29 could leverage lessons already learned. Creating training packages from scratch is inefficient, and having a starting point of good ideas from previous unit experiences will start the planning process with a 70% solution. Units having the option to upload a write-up following a training event to share with adjacent units will likely be more useful than searching through the battalion SharePoint/Teams in the hopes of finding an old CONOPs that will serve as a starting point. Additionally, that type of forum would be a great chance to upload unit SOPs/TTPs with new technologies and weapon systems that are being fielded so that each unit is not learning the same lessons the hard way. The force we know is rapidly changing, as it must, to be prepared for conflict with a peer adversary. We must be creative in finding new ways to prepare our units for combat as efficiently as possible. To do so, we need leaders willing to share their ideas, look to adjacent forces for inspiration, and the infrastructure in place to share our results. If we can do that effectively, we will accomplish our core mission of fighting and winning in combat.
Captain Kevin Benedict is currently serving as an Operations Research Analyst at Manpower & Reserve Affairs. He can be reached at kevin.benedict@usmc.mil.