đź“–Â Readings
[1] Metacognitive Difficulties Faced by Novice Programmers in Automated Assessment Tools by Prather, Pettit, McMurry, Peters, Homer, & Cohen
[2] First Things First: Providing Metacognitive Scaffolding for Interpreting Problem Prompts by Prather, Pettit, Becker, Denny, Loksa, Peters, Albrecht, & Masci
[3] Executable Examples for Programming Problem Comprehension by Wrenn & Krishnamurthi
Metacognitive awareness
Weâ€™ve talked a lot this quarter about various aspects of teaching and learning programming: syntax, behaviour, cognition and memory, managing motivation and selfefficacy, and sense of belonging.
Researchers and educators are increasingly starting to focus on an additional aspect: metacognitive awareness, particularly as it relates to problemsolving in programming.
Having metacognitive awareness means knowing where one is in the problemsolving process, being aware of strategies that have been successful or unsuccessful in the past, and monitoring oneâ€™s selfefficacy and emotions. This is surprisingly difficult to do!
Novice programmers often lack this ability to think about and reflect on their problemsolving process, and lack the vocabulary to articulate it to a peer or instructor. This can cause all sorts of trouble: without knowledge of a systematic problemsolving process (and without knowledge of oneâ€™s own progress through that process), a novice can flounder while solving a programming problem without knowing where their difficulties lie, and can even have difficulties seeking help to overcome those difficulties.
To overcome this, instructors often explicitly teach problemsolving strategies. For example, consider the Design Recipe from How to Design Programs:
 Identifying the types of data needed for a function to solve a problem
 Write down the function signature and statement of purpose
 Provide some functional examples to illustrate the functionâ€™s purpose
 Sketch out the function with a template
 Implement the function
 Test the function
Similarly, Loksa et al.^{1} propose the following sixstep problemsolving process:
 Reinterpret the problem prompt
 Search for analogous problems
 Search for solutions
 Evaluate a potential solution
 Implement a solution
 Evaluate implemented solution
When these processes are taught, students are often asked to identify where they are in the process before receiving help from instructors. Additionally, they may only be given help up to and including their current step. For example, a student may not receive help with implementing a function (step 5 in both processes above) if they have skipped the steps to understand and concretise the functionâ€™s purpose (steps 1â€”4).
The highestperforming novices tend to display higher metacognitive awareness about their problemsolving and learning strategies.
Feedback in CS courses
CS students often receive feedback about their program submissions from automated assessment tools (AATs). Feedback can focus on
 Syntax (e.g., compiler error messages, which can often be inscrutable, frustrating, and demotivating to novices)
 Behaviour (e.g., instructorwritten reference tests)
 Programming standards (using linting tools or manual checking)
 Thoroughness and validity of studentwritten tests (using test adequacy criteria like condition coverage or, rarely, mutation analysis)
In this way, most AATs only attend to steps 5 and 6 of the Design Recipe (and equivalent steps in Loksaâ€™s steps). There is little focus on steps 1â€“4, i.e., limited feedback to help promote metacognitive awareness, or to nudge students to reflect on their problemsolving process.
Worse, other factors can come into play that may encourage bad habits. For example, many AATs support unlimited submissions so students can receive rapid feedback. While frequent feedback can be a good thing, this can encourage a trialanderror approach to programming, where students come to rely on the AAT as the oracle to test their programs.
Youâ€™ve likely had plenty of experience with autograders. In what ways have they been helpful? Do you see opportunities for improvement?
How can automated assessment tools support metacognitive awareness?
In paper [1], Prather et al. study how AATs could be built to better promote metacognitive awareness in novices.
The researchers conducted a number of oneonone meetings with students and asked them to work on a â€śmoderately challengingâ€ť programming problem while thinking out loud, i.e., verbalising their thought process as they worked on the problem. They were given 35 minutes to write a function that, given n numbers, would compute whether there were more positive or negative integers given as input. The researcher took extensive notes as the student solved the programming problem.
Students who completed the problem
These students almost universally started with step 1 in Loksaâ€™s processâ€”reinterpret the problem prompt. They formed and verbalised a conceptual model of what the problem was asking for before starting to design a solution. These students also more or less followed the steps in the problemsolving process.
Some students took a long time to complete the problem. They started by interpreting the problem prompt, and got tripped up at some later stage of the problemsolving process.
Two students tried to solve the problem of counting the number of even and odd integers instead of counting the number of positive and negative integers. (It should be noted that the even/odd problem had been asked before and was familiar to them.) These students didnâ€™t immediately realise that they were solving the wrong problem, and grew increasingly frustrated as they failed to pass the test cases.
Eventually, they would reread the prompt, realise their mistake, and very quickly solve the problem. The lack of metacognitive awareness almost kept otherwise capable students from succeeding, and the AATâ€™s feedback did not alert them to the issue at all.
Students who did not complete the problem
These students failed to successfully move through the problemsolving steps. The most frequent issue for these students was that they failed to build a conceptual model of the problem. The immediately moved to identifying analogous problems and searching for solutions, and the AAT was not able to alert them that they had a wrong conceptual model.
Type of metacognitive difficulties faced when working with an AAT
To sum up, the authors noticed the following metacognitive difficulties the students faced when dealing with feedback from an AAT:
 Forming a wrong conceptual model about the right problem
 Dislodging an incorrect conceptual problem may not be solved by rereading the prompt
 Making assumptions; forming the correct conceptual model for the wrong problem
 Moving too quickly through the stages leads to a false sense of accomplishment
 Unwillingness to abandon a wrong solution due to a false sense of nearly being done^{2}
Metacognitive scaffolding for problem comprehension
Importantly, 3 of the difficulties above are related to problem comprehension, i.e., understanding what is being asked for. That is, they are related to
 Step 1 in Loksaâ€™s problemsolving process (reinterpret the problem prompt), and
 Step 3 in the Design Recipe (design functional examples to illustrate the problemâ€™s purpose).
Therefore, these same authors introduced a simple intervention to target problem comprehension [2]. The intervention is as follows:
 Students were given the problem prompt
 They were then given a test case to solve immediately after reading the problem prompt
 Then they were asked to solve the problem
That simple interventionâ€”asking the student to demonstrate that theyâ€™ve understood the problem prompt by predicting a working programâ€™s output for some given inputâ€”had significant impacts on studentsâ€™ abilities to complete the problem, compared to students who did not receive the intervention.
Compared to a control group, the students who received the intervention:
 were more likely to complete the problem (76.19% vs. 52.94%)
 completed the problem slightly faster (22.62 minutes vs. 23.92 minutes on average)
 required fewer submissions to do so (4.48 submissions vs. 7.59 submissions on average)
What do you suppose are the reasons for these results?
â€śAutomatedâ€ť scaffolding for problem comprehension
In separate but related work, Wrenn & Krishamurthi [3] introduced a feature in their pedagogic IDE, giving students tooling to attend to the Examples step of the Design Recipe. The feature works as follows.
Programming assignments are distributed inside the pedagogic IDE code.pyret.org. The assignments come with
 Wheats: correct solutions implemented by the instructors; wheats would vary in terms of how they implemented underspecified requirements
 Chaffs: buggy solutions implemented by the instructors; the buggy solutions are curated so that they represent bugs that could plausibly appear in student solutions
For example, if a problem asked students to compute the median of a given list of integers,
 Wheats would include implementations that treat evensized lists of numbers differently. That is, in a sorted list of 4 items, some median calculations would take the second item as the median, while others would take the average of the second and third items as the median.
 Chaffs would include implementations that compute the mean instead of the median, or implementations that do not correctly handle empty lists.
Before working on implementing solutions to problems, students were asked to create executable examples for the problem. The examples are simple input+output pairsâ€”that is, for some given input, what is the expected output from a correctly implemented solution? The examples were operationalised using unittestlike assertions.
These examples are assessed as follows:
 Validity was assessed based on whether or not the examples were accepted by the wheats. If a wheat â€śfailsâ€ť an example, it means the example is incorrect. This means the student has formed an incorrect understanding of the problem.^{3}
 Thoroughness was assessed based on the number of chaffs that were detected by the studentâ€™s set of examples. If a chaff does not â€śfailâ€ť against at least one example, it means the examples are incomplete in terms of the number of cases it covers. This could mean the student has an incomplete understanding of the problem.
Itâ€™s important to note that these examples are not test cases, even though they look like test cases. The examples are not being run against the studentâ€™s own solution (which may or may not exist yet)â€”they are being run against the problem, which is embodied as a series of correct and incorrect implementations created by the instructors.
Results were promising:
 Students used executable examples even when it was not required; it was a helpful metacognitive scaffold.
 The examples students created turned into tests once they had solutions implemented; a win for step 6 of the Design Recipe!
 Test suites were more valid than beforeâ€”they were less likely to overzealously reject correct implementations based on underspecified behaviour.
 The authors also suggest that the existence of this feature helped reduce load on course staff: if students were not sure of underspecified requirements, they could just â€śask the IDEâ€ť by probing the problem with examples.
Discussion
In groups, discuss amongst yourselves ways in which this focus on metacognitive awareness can factor into your teaching. Consider the different teaching contexts youâ€™ve been in this quarter while you discuss (tutoring, workshops, lab activities).

Programming, Problem Solving, and SelfAwareness: Effects of Explicit Guidance. Loksa, Ko, Jernigan, Oleson, Mendez, & Burnett.Â ↩

I thought this was particularly insidious. Consider the students who were mistakenly solving the Even/Odd problem instead of Positive/Negative. The list
[4, 3, 5, 7, 1, 2]
has more negative numbers than positive numbers and more odd numbers than even numbers. If the function is simply returningfalse
for â€śmore negativeâ€ť (or â€śmore oddâ€ť for the mistaken students), there will be some overlap in the passing test cases for Even/Odd and Positive/Negative. So these students actually passed some test cases, even though they were solving the wrong problem! And those passing test cases led them to believe that they were on track to correctly solve the problem.Â ↩ 
It can also mean the instructor messed up their solution and the student has found a bug, but this is less likely.Â ↩