This is an overview of my teaching philosophy and recent teaching topics. This page is intended for informational purposes only. Note that all Software Architecture Group members contribute to these efforts.

General fields:

Teaching Philosophy

General Approach

In contrast to constructivism, which strictly favors activity-based discovery over lecturing, I aim to balance activities and theoretical input to build a shared vocabulary in the course and convey basic methodology. A central goal of my activities is to let students experience the practical difficulties of a theoretical concept. I prefer using a layered approach, in which major out-of-class activities (e.g., building a software) are aided by minor activities that get students started with concepts and methods they might need to complete the major activity (e.g., hands-on exercises with a tool they need).

Assessment

Collaboration

Constructive Alignment

High-level Learning Goals

AI for Programming

Seminar: AI for Programming

HPI Master’s seminar, summer term 2024.

Participants will develop their own small AI solution to a software engineering problem. The seminar will focus on an in-depth and hands-on understanding of large language models, their fine-tuning, the user experience of the resulting tools.

Seminar: Future of Programming

HPI Master’s seminar, winter term 2023/24.

Supervised topics:

LLM code quality: Measure the code quality of large language models and the influence of prompts and parameters
Example recommendation: Use AI embeddings to locate useful code examples in other projects and recommend them depending on the programmers’ current task

Mining Repositories

Seminar: Code Repository Mining 2020
Seminar: Machine Learning on Code Repositories
Seminar: Code Repository Mining 2017
Data
Tools and Techniques
High-level Goals

Seminar: Code Repository Mining 2020

HPI Master’s seminar, summer term 2020.

Supervised topics:

Language influence on code style: Measure how learning a new language influences the code style of programmers
Vector embeddings of code: Train and evaluate vector embeddings that capture the semantics of source code
Change prediction: Use historical data to predict future software changes and recommend items for incomplete changes
Test-based modularity analysis: Correlate test-breaking and test-fixing changes to characterize the modularity of a software system
Live test prioritization: Develop real-time test prioritization models trained on mutation testing data
Technical debt at scale: Find large scale patterns in common technical debt metrics across GitHub
Issue complexity prediction: Train and evaluate a model that can distinguish easy from hard GitHub issues

Seminar: Machine Learning on Code Repositories

HPI Master’s seminar, summer term 2018.

Supervised topics:

Evolutionary modularity: Measure modularity and identify architectural problems by comparing distance of code passages to their co-change frequency
Package recommender: Implement and evaluate a recommender system for Python package dependencies based on collaborative filtering and matrix factorization
Code completion: Design parser/generator abstractions for efficiently auto-completing repetitive structures in source code
Expertise tracking: Investigate how individual commits improve or deteriorate (with respect to software quality metrics) when programmers gain more experience over several years of contributions
Paradigm transfer: Measure how programming style shifts when programmers pick up a different programming language

Seminar: Code Repository Mining 2017

HPI Master’s seminar, winter term 2017/18.

Supervised topics:

Language influence on code style: Measure which types of errors Python programmers tend to make when they switch from Java or C++ to Python and how teaching/training can be improved to better avoid them
Effects of high-profile incidents on code: Investigate how programmers react to severe vulnerabilities (CVEs), which information channels the knowledge over CVEs and CWEs propagates through, and how programmer awareness of vulnerabilities caused by programming errors can be improved
Classifying repository language: Design a classification method to identify a project’s primary language which is more robust than only determining the most prevalent language
Package recommender: Implement and evaluate a recommender system for Python package dependencies based on graph search
Cross-language syntax trees: Design data structures that can represent programs of multiple languages using the same abstractions, and allow to write code analyses that run unmodified on any language.

Data

The following data is provided to the students:

Enriched GHTorrent (now defunct) dataset containing meta-data on most GitHub projects, users, and commits
More than 10 billion file changes with meta-data and full patches
250.000+ cloned repositories, usually extended by students themselves

Tools and Techniques

Postgres + SQL
Jupyter Notebooks
Python data analysis (NumPy/SciPy, scikit-learn)
Source code analysis tools and techniques (parsers - e.g. srcML, linters - e.g. Pylint, metrics - e.g. OOP metrics)
Lively4 and Squeak/Smalltalk for extending programming environments

High-level Goals

We focus on teaching and giving regular (weekly) feedback and advice regarding the following practices during the seminar:

Research
- Learn and practice reproducible research
- Start from practical goals (hypothetical user having a problem) and evaluate results with respect to the original goal, reflect on limitations of the chosen methods and new questions discovered on the way
- Acquire literature related to the task at hand
- Present research results and insights in talks
- Document and structure source code and data artifacts for possible future use in teaching, research, or open source contributions
Working with Data
- Experience and manage differences in scale, e.g., what it means to process megabytes, gigabytes, or terabytes of data.
- Understand limitations of underlying hardware, algorithm complexity, SQL queries, etc.
- Practice data cleansing, exploration, sampling, and visualization with real-world development data
Working with code and repositories
- Gather insights on how development on GitHub works, which artifacts it produces, and what they tell about programmers, processes, and programs
- Programmatically work with version control, such as Git, and gather data from repositories
- Learn how to write code analyzers and measure code metrics

Software Testing

These topics are part of the annual undergraduate lecture Software Engineering I:

Test automation: xUnit frameworks and how to use them
Testing patterns: Mock objects/test doubles, regression testing, triangulation, learning tests, fixtures, exception testing, …
Test-driven development (TDD): General process (red, green, refactor), TDD patterns, examples, preconditions, limitations, and benefits
Behavior-driven development (BDD): Difference to TDD, ubiquitous language
Acceptance testing in contrast to unit tests, FIT, Cucumber/Lettuce, Selenium
Property-based testing: Quickcheck, Hypothesis
Test quality: Coverage, test smells, test refactorings, test performance, mutation testing
Non-functional testing: Dependability (reliability, availability, resilience, stress testing, …), performance (throughput, response time, …), compliance (e.g. coding conventions), compatibility, …
Scopes of testing: Unit/integration/system tests, black box/white box testing, user/acceptance vs. unit testing, …
Testing and continuous integration