Papers and posters I have authored or co-authored. Click on items to see their abstracts. Many of the articles are accompanied by abridged summaries.

Recommendations for Improving End-User Programming Education: A Case Study with Undergraduate Chemistry Students (to appear)
W. Fuchs, A. McDonald, A. Gautam, A. M. Kazerouni
ACS Journal of Chemical Education. (J Chem Ed). 2024.

Abstract: Programming is widespread in multiple domains and is being integrated into various discipline-specific university courses where, like students in a typical CS 1 course, students from other disciplines face challenges with learning to program. We offer a case study in which we study undergraduate students majoring in either Chemistry or Biochemistry as they learn programming in a Physical Chemistry course sequence. Using surveys and think-aloud sessions with students, we conducted a thematic content analysis to explain the challenges they face in this endeavor. We found that students struggled to transfer their programming knowledge to new representations and problems, and they did not have strategies in place for solving problems with programming. These facts combine to lower students' confidence in their programming ability, making it less likely that they will reach for computing to help solve domain-specific problems. We recommend that students in end-user programming contexts be explicitly taught the skills of abstraction, decomposition, and meta-cognitive awareness as they pertain to programming.

Socially Responsible Computing: Promoting Latinx Student Retention Via Community Engagement in Early Computer Science Courses (to appear)
D. M. Krum, Z. Wood, E. Kang, A. M. Kazerouni, J. L. Lehr, S. Hug, P. S. B. Inventado, F. Tang, I. Yoon, A. Kulkarni, Y. Sun, M. Beheshti, A. Gautam, A. Hubbard Cheuoua, S. Hooshmand, K. A. Wortman
Proceedings of the 2024 ASEE Annual Conference and Exposition. (ASEE). 2024.

Abstract: With the support of the NSF Broadening Participation in Computing program, the Socially Responsible Computing (SRC) alliance is committed to transforming early computing experience to motivate and engage historically marginalized students to pursue computing. This alliance, of six public universities, collectively serves over two thousand computing students who identify as Hispanic/Latino (Latinx). Unfortunately, Latinx students face a higher attrition rate across these campuses compared to non-Latinx students (34.6% versus 21.5%), especially during the first two years of computing journey. The primary goal of our alliance is to change this trend and broaden participation in computing. Specifically, we aim to create and deploy curriculum in the early Computer Sciences courses that demonstrate the value of computing to help society, which will provide students with the opportunity to see the alignment of their communal goals with computing and opportunities to bring their own cultural assets into the computing classroom. This includes students’ community-based knowledge or skills, general communication skills, and their skills related to teamwork and community engagement. We believe this framework will foster student’s sense of belonging, motivation, and engagement in computing. To achieve our goal of improve retention of Latinx students, the alliance has set four specific objectives.

O1: Designing and bringing curricular and pedagogical changes in the two earliest computing courses that integrate considerations of social responsibility into computing assignments.
O2: Introducing a new intervention in computing courses that focuses on creating a different kind of student experience focused on community driven computing projects.
O3: Building faculty learning communities to help train, orient and support instructors of this curriculum.
O4: Employing a cross site collaboration structure using a collective impact model, allowing variance for each site while working towards a common goal

Our alliance brings together six campuses, each with unique strengths and local challenges. We use a collective impact model, allowing each campus to contribute to the development, deployment, and continuous improvement the curriculum. Our team is composed of computer science educators and social scientists with expertise in evaluating inclusive STEM education and training faculty at Hispanic-Serving Institutions (HSIs). Our evaluation plan examines both student and faculty outcomes, enabling us to reflect and refine our approach. Shared leadership and site teams are integral to sustaining our work, even amid potential academic personnel changes.

Our research is impactful in the learning sciences for several reasons. It utilizes faculty learning communities as a vehicle to bring change to the climate and curriculum of computing education. Furthermore, this project holds the potential to develop a broadly applicable introductory curriculum that is designed, deployed, and evaluated across a range of public education institutions serving the diverse state of [State-Name]. We aim for the success of this alliance to extend to all other sister campuses, potentially reaching tens of thousands of computing students. This curriculum will also be broadly deployed nationwide to help marginalized students pursue computing.

Despite being in the initial year of the project, we have achieved significant results in terms of instructor skill gains and attitudes. We are poised to make a meaningful impact on students as we have begun introducing new curricular and pedagogical changes. In this paper, we will share our current progress and core activities related to each objective, which include establishing a supportive alliance structure, developing new computing curriculum that includes a socially responsible component at each site, creating the structure and content for the first faculty learning community (FLC), and implementing the collective impact model. In addition, we will also share survey data, including feedback from both students and instructors, and lessons learned during the first-year implementation.

Community Action Computing: A Data-centric CS0 Course
A. M. Kazerouni, J. Lehr, Z. Wood
ACM Technical Symposium on Computer Science Education—Curricular Initiatives. (SIGCSE). 2024.
pdf slides

Abstract: A student's sense of belonging in computing can be positively impacted when coursework can authentically be connected to real community contexts. We describe the design, materials, and preliminary evaluation of an introductory programming (CS0) course infused with a focus on societal responsibility and relevance. We take a data-centric, constructionist approach to introductory computing. Data-centricity allows us to authentically connect coursework with students' communal and societal interests, and students' motivation was enhanced given that they were creating and sharing artifacts as part of their coursework. Students used TypeScript to manipulate and analyze real data-sets, and created shareable websites containing statistics, data visualizations, and reflections based on the data-set of their choosing. Students chose varied topics for their assignments---they worked with data about access to CS education, climate change, and data provided by local non-profit organizations. A preliminary evaluation indicated that students who took this CS0 course attained CS-specific learning objectives equally well in the two subsequent follow-on courses as students who took alternative CS0 courses at our University. We close with instructor perspectives and reflections on lessons learned.

A Model of How Students Engineer Test Cases With Feedback
A. M. Shin, A. M. Kazerouni
ACM Transactions on Computing Education. (TOCE). 2023.
pdf blog

Background and Context. Students' programming projects are often assessed on the basis of their tests as well as their implementations, most commonly using test adequacy criteria like branch coverage, or, in some cases, mutation analysis. As a result, students are implicitly encouraged to use these tools during their development process (i.e., so they have awareness of the strength of their own test suites).
Objectives. Little is known about how students choose test cases for their software while being guided by these feedback mechanisms. We aim to explore the interaction between students and commonly used testing feedback mechanisms (in this case, branch coverage and mutation-based feedback).
Method. We use grounded theory to explore this interaction. We conducted 12 think-aloud interviews with students as they were asked to complete a series of software testing tasks, each of which involved a different feedback mechanism. Interviews were recorded and transcripts were analyzed, and we present the overarching themes that emerged from our analysis.
Findings. Our findings are organized into a process model describing how students completed software testing tasks while being guided by a test adequacy criterion. Program comprehension strategies were commonly employed to reason about feedback and devise test cases. Mutation-based feedback tended to be cognitively overwhelming for students, and they resorted to weaker heuristics in order to address this feedback.
Implications. In the presence of testing feedback, students did not appear to consider problem coverage as a testing goal so much as program coverage. While test adequacy criteria can be useful for assessment of software tests, we must consider whether they represent good goals for testing, and if our current methods of practice and assessment are encouraging poor testing habits.

Exploring the Impact of Cognitive Awareness Scaffolding for Debugging in an Introductory Computer Science Class
J. Lee, A. M. Kazerouni, C. Siu, T. Migler
ACM Technical Symposium on Computer Science Education—Experience Reports. (SIGCSE). 2023.

Abstract: Debugging involves the simultaneous application of a number of programming skills—reading code, writing code, problem comprehension, etc. This makes it a challenging activity for novice programmers. Unfortunately, debugging is rarely taught explicitly in introductory programming courses, and is often learned as an implicit goal through programming assignments. In this experience report we explore the impact of a cognitive awareness scaffold to help students monitor their progress as they debug their code. We created a simple form that students used to document their debugging process when they ran into bugs. The form asks questions that students are likely to be asked by course staff during office hours, e.g., ``What have you tried so far?''. This act of verbalizing errors and enumerating successful and unsuccessful strategies to fix them is meant to help students monitor their own debugging progress. We examined the cognitive awareness demonstrated in form responses, finding that responses were more superficial on projects of higher difficulty. Additionally, we gave students an exit survey to measure the perceived impact of the debugging form on students' ability to regulate their debugging process and their confidence while debugging. Students indicated that the form helped them better verbalize errors in their programs, and helped them surmount problems with which they would otherwise have needed help.

Patterns of Academic Help-Seeking in Undergraduate Computing Students
A. Doebling, A. M. Kazerouni
Koli Calling International Conference on Computing Education Research. (Koli Calling). 2021.
pdf blog

Abstract: Knowing when and how to seek academic help is crucial to the success of undergraduate computing students. While individual help-seeking resources have been studied, little is understood about the factors influencing students to use or avoid certain resources. Understanding students' patterns of help-seeking can help identify factors contributing to utilization or avoidance of help resources by different groups, an important step toward improving the quality of resources. We present a mixed-methods study investigating the help-seeking behavior of undergraduate computing students. We collected survey data (n=138) about students' frequency of using several resources followed by one-on-one student interviews (n=15) to better understand why they use those resources. Several notable patterns were found. Women sought help in office hours more frequently then men did and computing majors sought help from their peers more often than non-computing majors. Additionally, interview data revealed a common progression in which students started from easily accessible but low utility resources (online sources and peers) before moving on to less easily accessible, high utility resources (like instructor office hours).

The Impact of Programming Project Milestones on Procrastination, Project Outcomes, and Course Outcomes
C. A. Shaffer, A. M. Kazerouni
ACM Technical Symposium on Computer Science Education—Research Track. (SIGCSE). 2021.
pdf slides talk blog

Abstract: When faced with a large and complex project for the first time, students face numerous self-regulatory challenges that they may be ill-equipped to overcome. These challenges can result in degraded project outcomes, as commonly observed in programming-intensive mid-level CS courses. We have previously found that success in these situations is associated with a disciplined personal software process. Procrastination is a prominent failure of self-regulation that can occur for a number of reasons, e.g., low expectancy of success, low perceived value of the task at hand, or decision-paralysis regarding how to begin when faced with a large task. It is pervasive, but may be addressed through targeted interventions. We draw on theory related to goal theory and problem-solving in engineering education to evaluate the value of explicit project milestones at curbing procrastination and its negative impacts on relatively long-running software projects. We conduct a quasi-experiment in which we study differences in project and course outcomes between students in a treatment (with milestones) and control group (without milestones). We found that students in the treatment group were more likely to finish their projects on time, produced projects with higher correctness, and finished the course with generally better outcomes. Within the treatment group, we found that students who completed more milestones saw better outcomes than those who completed fewer milestones. We found no differences in withdrawal or failure rates between the treatment and control groups. An end-of-term survey indicated that student perceptions of the milestones were overwhelmingly positive.

Fast and Accurate Incremental Feedback for Students' Software Tests Using Selective Mutation Analysis
A. M. Kazerouni, J. C. Davis, A. Basak, C. A. Shaffer, F. Servant, S. H. Edwards
Journal of Systems and Software, Volume 175. (JSS). 2021.
pdf blog

Abstract: As incorporating software testing into programming assignments becomes routine, educators have begun to assess not only the correctness of students' software, but also the adequacy of their tests. In practice, educators rely on code coverage measures, though its shortcomings are widely known. Mutation analysis is a stronger measure of test adequacy, but it is too costly to be applied beyond the small programs developed in introductory programming courses. We demonstrate how to adapt mutation analysis to provide rapid automated feedback on software tests for complex projects in large programming courses. We study a dataset of 1389 student software projects ranging from trivial to complex. We begin by showing that although the state-of-the-art in mutation analysis is practical for providing rapid feedback on projects in introductory courses, it is prohibitively expensive for the more complex projects in subsequent courses. To reduce this cost, we use a statistical procedure to select a subset of mutation operators that maintains accuracy while minimizing cost. We show that with only 2 operators, costs can be reduced by a factor of 2–3 with negligible loss in accuracy. Finally, we evaluate our approach on open-source software and report that our findings may generalize beyond our educational context.

Exploring the Bug Investigation Techniques of Intermediate Student Programmers
R. S. Mansur, A. M. Kazerouni, S. H. Edwards, C. A. Shaffer
Koli Calling International Conference on Computing Education Research. (Koli Calling). 2020.

Abstract: We seek to understand bug investigation practices among intermediate student developers. To this end, we used a mixed-methods approach to study the testing and debugging practices of students in a junior-level, post-CS2 Data Structures course with relatively large projects (4-week lifecycle). First, we interviewed 12 students of varying project performances. From the interviews, we identified five techniques that students use for both testing and debugging: 1) writing diagnostic print statements, 2) unit testing, 3) using a source-level debugger, 4) submission to an online auto-grader, and 5) manual tracing. Using the Grounded Theory approach, we developed four hypotheses regarding students' use of multiple techniques and their possible impact on performance. We used clickstream data from the IDE (Eclipse) to analyze the level of use of the first four of these bug investigating techniques. We found that over 92%, 87%, and 73% of the students used JUnit testing, diagnostic print statements, and the source-level debugger, respectively. Most of the students (91%) used more than one technique to investigate bugs in their projects. We found a positive correlation between using multiple bug investigation techniques and a higher score in the projects. Finally, we identified some ineffective practices correlated with lower project scores.

ProgSnap2: A Flexible Format for Programming Process Data
T. Price, D. Hovemeyer, K. Rivers, A. C. Bart, G. Gao, A. M. Kazerouni, B. Becker, A. Petersen, L. Gusukuma, S. H. Edwards, D. Babcock
Conference on Innovation and Technology in Computer Science Education—Tools. (ITiCSE). 2020.

Abstract: In this paper, we introduce ProgSnap2, a standardized format for logging programming process data. ProgSnap2 is a tool for computing education researchers, with the goal of enabling collaboration by helping them to collect and share data, analysis code, and data-driven tools to support students. We give an overview of the format, including how events, event attributes, metadata, code snapshots and external resources are represented. We also present a case study to evaluate how ProgSnap2 can facilitate collaborative research. We investigated three metrics designed to quantify students' difficulty with compiler errors - the Error Quotient, Repeated Error Density and Watwin score - and compared their distributions and ability to predict students' performance. We analyzed five different ProgSnap2 datasets, spanning a variety of contexts and programming languages. We found that each error metric is mildly predictive of students' performance. We reflect on how the common data format allowed us to more easily investigate our research questions.

Measuring the Software Development Process to Enable Formative Feedback
Ayaan M. Kazerouni
PhD Dissertation, Virginia Tech. 2020.
pdf slides

Abstract: Graduating CS students face well-documented difficulties upon entering the workforce, with reports of a gap between what they learn and what is expected of them in industry. Project management, software testing, and debugging have been repeatedly listed as common "knowledge deficiencies" among newly hired CS graduates. Similar difficulties manifest themselves on a smaller scale in upper-level CS courses, like the Data Structures and Algorithms course at Virginia Tech: students are required to develop large and complex projects over a three to four week lifecycle, and it is common to see close to a quarter of the students drop or fail the course, largely due to the difficult and time-consuming nature of the projects. My research is driven by the hypothesis that regular feedback about the software development process, delivered during development, will help ameliorate these difficulties. Assessment of software currently tends to focus on qualities like correctness, code coverage from test suites, and code style. Little attention or tooling has been developed for the assessment of the software development process. I use empirical software engineering methods like IDE-log analysis, software repository mining, and semi-structured interviews with students to identify effective and ineffective software practices to formulate. Using the results of these analyses, I have worked on assessing students' development in terms of time management, test writing, test quality, and other "self-checking" behaviours like running the program locally or submitting to an oracle of instructor-written test cases. The goal is to use this information to formulate formative feedback about the software development process. In addition to educators, this research is relevant to software engineering researchers and practitioners, since the results from these experiments are based on the work of upper-level students who grapple with issues of design and work-flow that are not far removed from those faced by professionals in industry.

Testing Regex Generalizability And Its Implications: A Large-Scale Many-Language Measurement Study
J. C. Davis, D. Moyer, A. M. Kazerouni, D. Lee
International Conference on Automated Software Engineering. (ASE). 2019.

Abstract: The regular expression (regex) practices of software engineers affect the maintainability, correctness, and security of their software applications. Empirical research has described characteristics like the distribution of regex feature usage, the structural complexity of regexes, and worst-case regex match behaviors. But researchers have not critically examined the methodology they follow to extract regexes, and findings to date are typically generalized from regexes written in only 1-2 programming languages. This is an incomplete foundation. Generalizing existing research depends on validating two hypotheses: (1) Various regex extraction methodologies yield similar results, and (2) Regex characteristics are similar across programming languages. To test these hypotheses, we defined eight regex metrics to capture the dimensions of regex representation, string language diversity, and worst-case match complexity. We report that the two competing regex extraction methodologies yield comparable corpuses, suggesting that simpler regex extraction techniques will still yield sound corpuses. But in comparing regexes across programming languages, we found significant differences in some characteristics by programming language. Our findings have bearing on future empirical methodology, as the programming language should be considered, and generalizability will not be assured. Our measurements on a corpus of 537,806 regexes can guide data-driven designs of a new generation of regex tools and regex engines.

Toward Continuous Assessment of the Programming Process — Doctoral Consortium
A. M. Kazerouni
ACM International Computing Education Research Conference—Doctoral Consortium. (ICER). 2019.

Abstract: Assessment of software tends to focus on postmortem evaluation of metrics like correctness, mergeability, and code coverage. This is evidenced in the current practices of continuous integration and deployment that focus on software's ability to pass unit tests before it can be merged into a deployment pipeline. However, little attention or tooling is given to the assessment of the software development process itself. Good process becomes both more challenging and more critical as software complexity increases. Real-time evaluation and feedback about a student's software development skills, such as incremental development, testing, and time management, could greatly increase productivity and improve the ability to write tested and correct code. In my research, I develop models to quantify a student's programming process in terms of these metrics. By measuring the programming process, I can empirically evaluate its adherence to known best practices in software engineering. With the ability to characterize this, I can build tools to provide them with intelligent and timely feedback when they are in danger of straying from those practices. In the long term, I hope to contribute to the standardization and adoption of continuous software assessment techniques that include not only the final product, but also the process undertaken to produce it.

The Relationship Between Practicing Short Programming Exercises and Exam Performance
S. H. Edwards, K. P. Murali, A. M. Kazerouni
ACM Global Computing Education Conference. (CompEd). 2019.

Abstract: Learning to program can be challenging. Many instructors use drill-and-practice strategies to help students develop basic programming techniques and improve their confidence. Online systems that provide short programming exercises with immediate, automated feedback are seeing more frequent use in this regard. However, the relationship between practicing with short programming exercises and performance on larger programming assignments or exams are unclear. This paper describes an evaluation of short programming questions in the context of a CS1 course where they were used on both homework assignments, for practice and learning, and on exams, for assessing individual performance. The open-source drill-and-practice system used here provides for full feedback during practice exercises. During exams, it allows limiting feedback to compiler errors and to a very small number of example inputs shown in the question, instead of the more complete feedback received during practice. Using data collected from 200 students in a CS1 course, we examine the relationship between voluntary practice on short exercises and subsequent performance on exams, while using an early exam as a control for individual differences including ability level. Results indicate that, after controlling for ability, voluntary practice does contribute to improved performance on exams, but that motivation to improve may also be important.

Assessing Incremental Testing Practices and Their Impact on Project Outcomes
A. M. Kazerouni, C. A. Shaffer, S. H. Edwards, F. Servant
ACM Technical Symposium on Computer Science Education—Research Track. (SIGCSE). 2019.
pdf slides code blog 2nd Best Research Paper

Abstract: Software testing is an important aspect of the development process, one that has proven to be a challenge to formally introduce into the typical undergraduate CS curriculum. Unfortunately, existing assessment of testing in student software projects tends to focus on evaluation of metrics like code coverage over the finished software product, thus eliminating the possibility of giving students early feedback as they work on the project. Furthermore, assessing and teaching the process of writing and executing software tests is also important, as shown by the multiple variants proposed and disseminated by the software engineering community, e.g., test-driven development (TDD) or incremental test-last (ITL). We present a family of novel metrics for assessment of testing practices for increments of software development work, thus allowing early feedback before the software project is finished. Our metrics measure the balance and sequence of effort spent writing software tests in a work increment. We performed an empirical study using our metrics to evaluate the test-writing practices of 157 advanced undergraduate students, and their relationships with project outcomes over multiple projects for a whole semester. We found that projects where more testing effort was spent per work session tended to be more semantically correct and have higher code coverage. The percentage of method-specific testing effort spent before production code did not contribute to semantic correctness, and had a negative relationship with code coverage. These novel metrics will enable educators to give students early, incremental feedback about their testing practices as they work on their software projects.

Student Debugging Practices and Their Relationships with Project Outcomes — Poster
A. M. Kazerouni, R. S. Mansur, S. H. Edwards, C. A. Shaffer
ACM Technical Symposium on Computer Science Education. (SIGCSE). 2019.

Abstract: Debugging is an important part of the software development process, studied by both the CS education and software engineering communities. Most prior work has focused either on novice or professional programmers. Intermediate-to-advanced students (such as those enrolled in post-CS2 Data Structures courses) who are working on large and complex projects have largely been ignored. We present results from an empirical observational study that examined junior-level undergraduate students' debugging practices on relatively large (4-week lifecycle) projects, using IDE clickstream data collected by a custom Eclipse plugin. Specifically, we hypothesize that there are differing debugging behaviors exhibited, and that differing behaviors lead to differing project out-comes. For example, how often do students use the symbolic debugger available in modern IDEs, versus how often do they use diagnostic print statements, or both? What triggers a debugging session? What follows a debugging session? Does it matter when in the project lifecycle that debugging takes place? We have a number of interesting preliminary results. When using the debugger, there was a negative relationship between step-over and step-into actions versus final course grades, indicating that when students "spin their wheels" while debugging, they tend to perform more poorly. Students also tend to perform better on the project when debugging takes place earlier in the overall project life-cycle. We developed an algorithm to identify diagnostic print statements in the students' projects. We found that over 90% used at least one diagnostic print statement, and about 75% used the symbolic debugger, at least once in any given project.

Toward Continuous Assessment of the Programming Process — Student Research Competition
A. M. Kazerouni
ACM Technical Symposium on Computer Science Education—Student Research Competition. (SIGCSE). 2018.
pdf slides 1st Place

Abstract: Assessment of software tends to focus on postmortem evaluation of metrics like correctness, mergeability, and code coverage. This is evidenced in the current practices of continuous integration and deployment that focus on software's ability to pass unit tests before it can be merged into a deployment pipeline. However, little attention or tooling is given to the assessment of the software development process itself. Good process becomes both more challenging and more critical as software complexity increases. Real-time evaluation and feedback about a software developer's skills, such as incremental development, testing, and time management, could greatly increase productivity and improve the ability to write tested andcorrect code. My work focuses on the collection and analysis of fine-grained programming process data to help quantitatively model the programming process in terms of these metrics. I report on my research problem, presenting past work involving the collection and analysis of IDE event data from junior level students working on large and complex projects. The goal is to quantify the programming process in terms of incremental development and procrastination. I also present a long-term vision for my research and present work planned in the short term as a step toward that vision.

Quantifying Incremental Development Practices and Their Relationship to Procrastination
A. M. Kazerouni, S. H. Edwards, C. A. Shaffer
ACM International Computing Education Research Conference. (ICER). 2017.
pdf code blog

Abstract: We present quantitative analyses performed on character-level program edit and execution data, collected in a junior-level data structures and algorithms course. The goal of this research is to determine whether proposed measures of student behaviors such as incremental development and procrastination during their program development process are significantly related to the correctness of final solutions, the time when work is completed, or the total time spent working on a solution. A dataset of 6.3 million fine-grained events collected from each student's local Eclipse environment is analyzed, including the edits made and events such as running the program or executing software tests. We examine four primary metrics proposed as part of previous work, and also examine variants and refinements that may be more effective. We quantify behaviors such as working early and often, frequency of program and test executions, and incremental writing of software tests. Projects where the author had an earlier mean time of edits were more likely to submit their projects earlier and to earn higher scores for correctness. Similarly earlier median time of edits to software tests was also associated with higher correctness scores. No significant relationships were found with incremental test writing or incremental checking of work using either interactive program launches or running of software tests, contrary to expectations. A preliminary prediction model with 69% accuracy suggests that the underlying metrics may support early prediction of student success on projects. Such metrics also can be used to give targeted feedback to help students improve their development practices.

DevEventTracker: Tracking development events to assess incremental development and procrastination
A. M. Kazerouni, S. H. Edwards, T. S. Hall, C. A. Shaffer
ACM Conference on Innovation and Technology in Computer Science Education. (ITiCSE). 2017.
pdf code blog

Abstract: Good project management practices are hard to teach, and hard for novices to learn. Procrastination and bad project management practice occur frequently, and may interfere with successfully completing major programming projects in mid-level programming courses. Students often see these as abstract concepts that do not need to be actively applied in practice. Changing student behavior requires changing how this material is taught, and more importantly, changing how learning and practice are assessed. To provide proper assessment, we need to collect detailed data about how each student conducts their project development as they work on solutions. We present DevEventTracker, a system that continuously collects data from the Eclipse IDE as students program, giving us in-depth insight into students' programming habits. We report on data collected using DevEventTracker over the course of four programming projects involving 370 students in five sections of a Data Structures and Algorithms course over two semesters. These data support a new measure for how well students apply "incremental development" practices. We present a detailed description of the system, our methodology, and an initial evaluation of our ability to accurately assess incremental development on the part of the students. The goal is to help students improve their programming habits, with an emphasis on incremental development and time management.