Papers and posters I have authored or co-authored. Click on items to see their abstracts. Many of the articles are accompanied by abridged summaries.
Community Action Computing: A Data-centric CS0 Course (to appear)
Abstract: A student's sense of belonging in computing can be positively impacted when coursework can authentically be connected to real community contexts. We describe the design, materials, and preliminary evaluation of an introductory programming (CS0) course infused with a focus on societal responsibility and relevance. We take a data-centric, constructionist approach to introductory computing. Data-centricity allows us to authentically connect coursework with students' communal and societal interests, and students' motivation was enhanced given that they were creating and sharing artifacts as part of their coursework. Students used TypeScript to manipulate and analyze real data-sets, and created shareable websites containing statistics, data visualizations, and reflections based on the data-set of their choosing. Students chose varied topics for their assignments---they worked with data about access to CS education, climate change, and data provided by local non-profit organizations. A preliminary evaluation indicated that students who took this CS0 course attained CS-specific learning objectives equally well in the two subsequent follow-on courses as students who took alternative CS0 courses at our University. We close with instructor perspectives and reflections on lessons learned.
Background and Context. Students' programming projects are often assessed on the basis of their tests as well as their implementations, most commonly using test adequacy criteria like branch coverage, or, in some cases, mutation analysis. As a result, students are implicitly encouraged to use these tools during their development process (i.e., so they have awareness of the strength of their own test suites).
Objectives. Little is known about how students choose test cases for their software while being guided by these feedback mechanisms. We aim to explore the interaction between students and commonly used testing feedback mechanisms (in this case, branch coverage and mutation-based feedback).
Method. We use grounded theory to explore this interaction. We conducted 12 think-aloud interviews with students as they were asked to complete a series of software testing tasks, each of which involved a different feedback mechanism. Interviews were recorded and transcripts were analyzed, and we present the overarching themes that emerged from our analysis.
Findings. Our findings are organized into a process model describing how students completed software testing tasks while being guided by a test adequacy criterion. Program comprehension strategies were commonly employed to reason about feedback and devise test cases. Mutation-based feedback tended to be cognitively overwhelming for students, and they resorted to weaker heuristics in order to address this feedback.
Implications. In the presence of testing feedback, students did not appear to consider problem coverage as a testing goal so much as program coverage. While test adequacy criteria can be useful for assessment of software tests, we must consider whether they represent good goals for testing, and if our current methods of practice and assessment are encouraging poor testing habits.
Exploring the Impact of Cognitive Awareness Scaffolding for Debugging in an Introductory Computer Science Class
Abstract: Debugging involves the simultaneous application of a number of programming skills—reading code, writing code, problem comprehension, etc. This makes it a challenging activity for novice programmers. Unfortunately, debugging is rarely taught explicitly in introductory programming courses, and is often learned as an implicit goal through programming assignments. In this experience report we explore the impact of a cognitive awareness scaffold to help students monitor their progress as they debug their code. We created a simple form that students used to document their debugging process when they ran into bugs. The form asks questions that students are likely to be asked by course staff during office hours, e.g., ``What have you tried so far?''. This act of verbalizing errors and enumerating successful and unsuccessful strategies to fix them is meant to help students monitor their own debugging progress. We examined the cognitive awareness demonstrated in form responses, finding that responses were more superficial on projects of higher difficulty. Additionally, we gave students an exit survey to measure the perceived impact of the debugging form on students' ability to regulate their debugging process and their confidence while debugging. Students indicated that the form helped them better verbalize errors in their programs, and helped them surmount problems with which they would otherwise have needed help.
Abstract: Knowing when and how to seek academic help is crucial to the success of undergraduate computing students. While individual help-seeking resources have been studied, little is understood about the factors influencing students to use or avoid certain resources. Understanding students' patterns of help-seeking can help identify factors contributing to utilization or avoidance of help resources by different groups, an important step toward improving the quality of resources. We present a mixed-methods study investigating the help-seeking behavior of undergraduate computing students. We collected survey data (n=138) about students' frequency of using several resources followed by one-on-one student interviews (n=15) to better understand why they use those resources. Several notable patterns were found. Women sought help in office hours more frequently then men did and computing majors sought help from their peers more often than non-computing majors. Additionally, interview data revealed a common progression in which students started from easily accessible but low utility resources (online sources and peers) before moving on to less easily accessible, high utility resources (like instructor office hours).
The Impact of Programming Project Milestones on Procrastination, Project Outcomes, and Course Outcomes
Abstract: When faced with a large and complex project for the first time, students face numerous self-regulatory challenges that they may be ill-equipped to overcome. These challenges can result in degraded project outcomes, as commonly observed in programming-intensive mid-level CS courses. We have previously found that success in these situations is associated with a disciplined personal software process. Procrastination is a prominent failure of self-regulation that can occur for a number of reasons, e.g., low expectancy of success, low perceived value of the task at hand, or decision-paralysis regarding how to begin when faced with a large task. It is pervasive, but may be addressed through targeted interventions. We draw on theory related to goal theory and problem-solving in engineering education to evaluate the value of explicit project milestones at curbing procrastination and its negative impacts on relatively long-running software projects. We conduct a quasi-experiment in which we study differences in project and course outcomes between students in a treatment (with milestones) and control group (without milestones). We found that students in the treatment group were more likely to finish their projects on time, produced projects with higher correctness, and finished the course with generally better outcomes. Within the treatment group, we found that students who completed more milestones saw better outcomes than those who completed fewer milestones. We found no differences in withdrawal or failure rates between the treatment and control groups. An end-of-term survey indicated that student perceptions of the milestones were overwhelmingly positive.
Fast and Accurate Incremental Feedback for Students' Software Tests Using Selective Mutation Analysis
Abstract: As incorporating software testing into programming assignments becomes routine, educators have begun to assess not only the correctness of students' software, but also the adequacy of their tests. In practice, educators rely on code coverage measures, though its shortcomings are widely known. Mutation analysis is a stronger measure of test adequacy, but it is too costly to be applied beyond the small programs developed in introductory programming courses. We demonstrate how to adapt mutation analysis to provide rapid automated feedback on software tests for complex projects in large programming courses. We study a dataset of 1389 student software projects ranging from trivial to complex. We begin by showing that although the state-of-the-art in mutation analysis is practical for providing rapid feedback on projects in introductory courses, it is prohibitively expensive for the more complex projects in subsequent courses. To reduce this cost, we use a statistical procedure to select a subset of mutation operators that maintains accuracy while minimizing cost. We show that with only 2 operators, costs can be reduced by a factor of 2–3 with negligible loss in accuracy. Finally, we evaluate our approach on open-source software and report that our findings may generalize beyond our educational context.
Abstract: We seek to understand bug investigation practices among intermediate student developers. To this end, we used a mixed-methods approach to study the testing and debugging practices of students in a junior-level, post-CS2 Data Structures course with relatively large projects (4-week lifecycle). First, we interviewed 12 students of varying project performances. From the interviews, we identified five techniques that students use for both testing and debugging: 1) writing diagnostic print statements, 2) unit testing, 3) using a source-level debugger, 4) submission to an online auto-grader, and 5) manual tracing. Using the Grounded Theory approach, we developed four hypotheses regarding students' use of multiple techniques and their possible impact on performance. We used clickstream data from the IDE (Eclipse) to analyze the level of use of the first four of these bug investigating techniques. We found that over 92%, 87%, and 73% of the students used JUnit testing, diagnostic print statements, and the source-level debugger, respectively. Most of the students (91%) used more than one technique to investigate bugs in their projects. We found a positive correlation between using multiple bug investigation techniques and a higher score in the projects. Finally, we identified some ineffective practices correlated with lower project scores.
Abstract: In this paper, we introduce ProgSnap2, a standardized format for logging programming process data. ProgSnap2 is a tool for computing education researchers, with the goal of enabling collaboration by helping them to collect and share data, analysis code, and data-driven tools to support students. We give an overview of the format, including how events, event attributes, metadata, code snapshots and external resources are represented. We also present a case study to evaluate how ProgSnap2 can facilitate collaborative research. We investigated three metrics designed to quantify students' difficulty with compiler errors - the Error Quotient, Repeated Error Density and Watwin score - and compared their distributions and ability to predict students' performance. We analyzed five different ProgSnap2 datasets, spanning a variety of contexts and programming languages. We found that each error metric is mildly predictive of students' performance. We reflect on how the common data format allowed us to more easily investigate our research questions.
Abstract: Graduating CS students face well-documented difficulties upon entering the workforce, with reports of a gap between what they learn and what is expected of them in industry. Project management, software testing, and debugging have been repeatedly listed as common "knowledge deficiencies" among newly hired CS graduates. Similar difficulties manifest themselves on a smaller scale in upper-level CS courses, like the Data Structures and Algorithms course at Virginia Tech: students are required to develop large and complex projects over a three to four week lifecycle, and it is common to see close to a quarter of the students drop or fail the course, largely due to the difficult and time-consuming nature of the projects. My research is driven by the hypothesis that regular feedback about the software development process, delivered during development, will help ameliorate these difficulties. Assessment of software currently tends to focus on qualities like correctness, code coverage from test suites, and code style. Little attention or tooling has been developed for the assessment of the software development process. I use empirical software engineering methods like IDE-log analysis, software repository mining, and semi-structured interviews with students to identify effective and ineffective software practices to formulate. Using the results of these analyses, I have worked on assessing students' development in terms of time management, test writing, test quality, and other "self-checking" behaviours like running the program locally or submitting to an oracle of instructor-written test cases. The goal is to use this information to formulate formative feedback about the software development process. In addition to educators, this research is relevant to software engineering researchers and practitioners, since the results from these experiments are based on the work of upper-level students who grapple with issues of design and work-flow that are not far removed from those faced by professionals in industry.
Abstract: The regular expression (regex) practices of software engineers affect the maintainability, correctness, and security of their software applications. Empirical research has described characteristics like the distribution of regex feature usage, the structural complexity of regexes, and worst-case regex match behaviors. But researchers have not critically examined the methodology they follow to extract regexes, and findings to date are typically generalized from regexes written in only 1-2 programming languages. This is an incomplete foundation. Generalizing existing research depends on validating two hypotheses: (1) Various regex extraction methodologies yield similar results, and (2) Regex characteristics are similar across programming languages. To test these hypotheses, we defined eight regex metrics to capture the dimensions of regex representation, string language diversity, and worst-case match complexity. We report that the two competing regex extraction methodologies yield comparable corpuses, suggesting that simpler regex extraction techniques will still yield sound corpuses. But in comparing regexes across programming languages, we found significant differences in some characteristics by programming language. Our findings have bearing on future empirical methodology, as the programming language should be considered, and generalizability will not be assured. Our measurements on a corpus of 537,806 regexes can guide data-driven designs of a new generation of regex tools and regex engines.
Abstract: Assessment of software tends to focus on postmortem evaluation of metrics like correctness, mergeability, and code coverage. This is evidenced in the current practices of continuous integration and deployment that focus on software's ability to pass unit tests before it can be merged into a deployment pipeline. However, little attention or tooling is given to the assessment of the software development process itself. Good process becomes both more challenging and more critical as software complexity increases. Real-time evaluation and feedback about a student's software development skills, such as incremental development, testing, and time management, could greatly increase productivity and improve the ability to write tested and correct code. In my research, I develop models to quantify a student's programming process in terms of these metrics. By measuring the programming process, I can empirically evaluate its adherence to known best practices in software engineering. With the ability to characterize this, I can build tools to provide them with intelligent and timely feedback when they are in danger of straying from those practices. In the long term, I hope to contribute to the standardization and adoption of continuous software assessment techniques that include not only the final product, but also the process undertaken to produce it.
Abstract: Learning to program can be challenging. Many instructors use drill-and-practice strategies to help students develop basic programming techniques and improve their confidence. Online systems that provide short programming exercises with immediate, automated feedback are seeing more frequent use in this regard. However, the relationship between practicing with short programming exercises and performance on larger programming assignments or exams are unclear. This paper describes an evaluation of short programming questions in the context of a CS1 course where they were used on both homework assignments, for practice and learning, and on exams, for assessing individual performance. The open-source drill-and-practice system used here provides for full feedback during practice exercises. During exams, it allows limiting feedback to compiler errors and to a very small number of example inputs shown in the question, instead of the more complete feedback received during practice. Using data collected from 200 students in a CS1 course, we examine the relationship between voluntary practice on short exercises and subsequent performance on exams, while using an early exam as a control for individual differences including ability level. Results indicate that, after controlling for ability, voluntary practice does contribute to improved performance on exams, but that motivation to improve may also be important.
Abstract: Debugging is an important part of the software development process, studied by both the CS education and software engineering communities. Most prior work has focused either on novice or professional programmers. Intermediate-to-advanced students (such as those enrolled in post-CS2 Data Structures courses) who are working on large and complex projects have largely been ignored. We present results from an empirical observational study that examined junior-level undergraduate students' debugging practices on relatively large (4-week lifecycle) projects, using IDE clickstream data collected by a custom Eclipse plugin. Specifically, we hypothesize that there are differing debugging behaviors exhibited, and that differing behaviors lead to differing project out-comes. For example, how often do students use the symbolic debugger available in modern IDEs, versus how often do they use diagnostic print statements, or both? What triggers a debugging session? What follows a debugging session? Does it matter when in the project lifecycle that debugging takes place? We have a number of interesting preliminary results. When using the debugger, there was a negative relationship between step-over and step-into actions versus final course grades, indicating that when students "spin their wheels" while debugging, they tend to perform more poorly. Students also tend to perform better on the project when debugging takes place earlier in the overall project life-cycle. We developed an algorithm to identify diagnostic print statements in the students' projects. We found that over 90% used at least one diagnostic print statement, and about 75% used the symbolic debugger, at least once in any given project.
Abstract: Software testing is an important aspect of the development process, one that has proven to be a challenge to formally introduce into the typical undergraduate CS curriculum. Unfortunately, existing assessment of testing in student software projects tends to focus on evaluation of metrics like code coverage over the finished software product, thus eliminating the possibility of giving students early feedback as they work on the project. Furthermore, assessing and teaching the process of writing and executing software tests is also important, as shown by the multiple variants proposed and disseminated by the software engineering community, e.g., test-driven development (TDD) or incremental test-last (ITL). We present a family of novel metrics for assessment of testing practices for increments of software development work, thus allowing early feedback before the software project is finished. Our metrics measure the balance and sequence of effort spent writing software tests in a work increment. We performed an empirical study using our metrics to evaluate the test-writing practices of 157 advanced undergraduate students, and their relationships with project outcomes over multiple projects for a whole semester. We found that projects where more testing effort was spent per work session tended to be more semantically correct and have higher code coverage. The percentage of method-specific testing effort spent before production code did not contribute to semantic correctness, and had a negative relationship with code coverage. These novel metrics will enable educators to give students early, incremental feedback about their testing practices as they work on their software projects.
Abstract: Assessment of software tends to focus on postmortem evaluation of metrics like correctness, mergeability, and code coverage. This is evidenced in the current practices of continuous integration and deployment that focus on software's ability to pass unit tests before it can be merged into a deployment pipeline. However, little attention or tooling is given to the assessment of the software development process itself. Good process becomes both more challenging and more critical as software complexity increases. Real-time evaluation and feedback about a software developer's skills, such as incremental development, testing, and time management, could greatly increase productivity and improve the ability to write tested andcorrect code. My work focuses on the collection and analysis of fine-grained programming process data to help quantitatively model the programming process in terms of these metrics. I report on my research problem, presenting past work involving the collection and analysis of IDE event data from junior level students working on large and complex projects. The goal is to quantify the programming process in terms of incremental development and procrastination. I also present a long-term vision for my research and present work planned in the short term as a step toward that vision.
Abstract: We present quantitative analyses performed on character-level program edit and execution data, collected in a junior-level data structures and algorithms course. The goal of this research is to determine whether proposed measures of student behaviors such as incremental development and procrastination during their program development process are significantly related to the correctness of final solutions, the time when work is completed, or the total time spent working on a solution. A dataset of 6.3 million fine-grained events collected from each student's local Eclipse environment is analyzed, including the edits made and events such as running the program or executing software tests. We examine four primary metrics proposed as part of previous work, and also examine variants and refinements that may be more effective. We quantify behaviors such as working early and often, frequency of program and test executions, and incremental writing of software tests. Projects where the author had an earlier mean time of edits were more likely to submit their projects earlier and to earn higher scores for correctness. Similarly earlier median time of edits to software tests was also associated with higher correctness scores. No significant relationships were found with incremental test writing or incremental checking of work using either interactive program launches or running of software tests, contrary to expectations. A preliminary prediction model with 69% accuracy suggests that the underlying metrics may support early prediction of student success on projects. Such metrics also can be used to give targeted feedback to help students improve their development practices.
Abstract: Good project management practices are hard to teach, and hard for novices to learn. Procrastination and bad project management practice occur frequently, and may interfere with successfully completing major programming projects in mid-level programming courses. Students often see these as abstract concepts that do not need to be actively applied in practice. Changing student behavior requires changing how this material is taught, and more importantly, changing how learning and practice are assessed. To provide proper assessment, we need to collect detailed data about how each student conducts their project development as they work on solutions. We present DevEventTracker, a system that continuously collects data from the Eclipse IDE as students program, giving us in-depth insight into students' programming habits. We report on data collected using DevEventTracker over the course of four programming projects involving 370 students in five sections of a Data Structures and Algorithms course over two semesters. These data support a new measure for how well students apply "incremental development" practices. We present a detailed description of the system, our methodology, and an initial evaluation of our ability to accurately assess incremental development on the part of the students. The goal is to help students improve their programming habits, with an emphasis on incremental development and time management.