Course syllabus

Foundations of Data Mining

Machine learning is the science of making computers act without being explicitly programmed. Instead, algorithms are used to find patterns in data. It is so pervasive today that you probably use it dozens of times a day without knowing it, for instance in web search, speech recognition, and (soon) self-driving cars. It is also a crucial component of data-driven industry (Big Data), scientific discovery, and modern healthcare. In this class, you will learn the foundations of how data mining and machine learning work internally, understand when and how to use key concepts and techniques, and gain hands-on experience in getting them to work for yourself. You'll learn about the theoretical underpinnings of data analysis, and leverage that to quickly and powerfully apply this knowledge to tackle new problems.

People

dr. ir. Joaquin Vanschoren (j.vanschoren@tue.nl) MF 7.104a - Responsible Lecturer
dr. Vlado Menkovski (v.menkovski@tue.nl) MF 7.097b - Lecturer
dr. Anne Driemel (a.driemel@tue.nl) MF 7.073 - Lecturer

Learning objectives

Upon completion of the course, students should be able to

write a program that builds a predictive model from training data
evaluate a predictive model using test/training splits
compare the performance of different types of predictive models
reason about the mathematical foundations of data mining techniques
recognize when a predictive model is overfitting
understand and exploit the fundamental bias-variance tradeoff
combine the above with dimension-reduction techniques
visualize and explore data sets using embeddings and clustering

Required prior knowledge: While there are no strict requirements, it is highly recommended to have a working knowledge of statistics, and to have programming experience. Programming is part of the assignments. The course will mostly feature examples using Python.

Course Structure

The course runs in Q3 and has the following weekly contact hours:

Mondays, 10:45 - 12:30: Plenary Lectures (Flux 1.02)
Thursdays, 13:45 - 15:30: Tutorials and Feedback (Flux 1.02)
Thursdays, 15:45 - 17:30: Plenary Lectures (Flux 1.02)

Materials, Assignments, Questions

We use Canvas for all course activities:

Check the 'Assignments' page for the assignments. You will have to submit your assignments here.
Pose (course related) questions under 'Discussions'. You are encouraged to answer each other's questions. The lecturers will answer open questions as soon as possible.
Check 'Files' and 'Pages' for other resources. The materials will also be available in a GitHub repo.

Please don't email the lecturers directly, except for personal questions. Even in those cases, please use Canvas to send a direct message.

It is your responsibility to keep up to date with postings and activities, but these will also clearly be announced in class. It is highly recommended that you adjust your email settings so that you get notified of all discussions happening on Canvas.

Schedule

This schedule is preliminary. The order may change and parts of lectures may be removed (or added).

Feb 5	Introduction Course guidelines. Introduction to machine learning. k-Nearest Neighbors.	Vanschoren
Feb 8	Tutorial: Linear Algebra (A. Driemel) Basics of linear algebra (matrix operations, projections,...). Tutorial: Introduction to ML in Python Installation and environment setup. NumPy, SciPy, scikit-learn, Jupyter Notebooks. Introduction Assignment 1 Linear models Linear regression (least squares), ridge regression, lasso, logistic regression, linear SVMs.	Vanschoren
	Spring break
Feb 19	Evaluation and model selection Evaluating predictive models. Avoiding overfitting. Cross-validation. ROC analysis, Bias-Variance analysis. Optimizing hyperparameters.	Vanschoren
Feb 22	Tutorial: Introduction to ML in Python (2) MatplotLib, OpenML, Feature engineering with scikit-learn. Data preprocessing and feature engineering Basic data preprocessing techniques: scaling, feature encoding, missing value imputation, feature selection,...	Vanschoren
Feb 26	Ensemble learning Decision trees, Bagging, RandomForests, Boosting, Gradient Boosting, Stacking.	Vanschoren
Mar 1	Q&A Assignment 1, Introduction Assignment 2 Tutorial: More Machine Learning Pipelines More data processing techniques, practical considerations, OpenML, Q&A Kernel methods Support Vector Machines, maximal margin, Kernel methods.	Vanschoren
Mar 5	Deadline Assignment 1. Bayesian Learning Bayes' rule, Naive Bayes, Gaussian processes	Vanschoren
Mar 8	Dimensionality Reduction I PCA, Multi-dimensional scaling, Isomap	Driemel
Mar 12	Dimensionality Reduction II Random Projections, Locality-sensitive hashing	Driemel
Mar 15	Introduction Assignment 3. Feedback Assignment 1. Locality-sensitive hashing Locality-sensitive hashing, Jaccard similarity, MinHashing	Driemel
Mar 16	Deadline Assignment 2
Mar 19	Clustering Lloyd's algorithm, kMeans++, Gonzales' algorithm	Driemel
Mar 22	Feedback Assignment 2 Introduction Individual Assignment Introduction to Learning Deep Representations Artificial Neuron, Gradient Descent, Back-propagation	Menkovski
Mar 26	Multilayer Perceptron Deep neural networks, activation functions, output functions, loss functions for classification and regression	Menkovski
Mar 29	Deadline Assignment 3. Introduction Assignment 4 Tutorial: Backpropagation, Keras MLP implementation Simple python api for backpropagation, Keras API for MLP Convolutional Neural Networks Neural network models for spatially correlated data	Menkovski
Apr 2	Easter Monday
Apr 5	Recurrent Neural Networks Neural network models for temporally correlated data (time-series) Tutorial: Convolutional Neural Networks, Recurrent Neural Networks Keras implementation for CNN and RNN	Menkovski
Apr 9	Feedback Assignment 3
Apr 12	Deadline Assignment 4
Apr 19	Feedback Assignment 4
Apr 22	Deadline Individual Assignment

Evaluation

There is no exam. Students are evaluated using a series of 5 problem sets, containing both theoretical and practical assignments. Students work in teams of 2 people for the first 4 problem sets, and have to complete the final (larger) problem set individually.

To pass, you need to score at least 50% on the individual assignment and 60% overall.

Assignments: Deadlines and grade breakdown:

Assignment 1 (15pt): Thursday March 5, 17:00
Assignment 2 (15pt): Thursday March 17, 23:55
Assignment 3 (15pt): Thursday March 29, 12:00 (noon)
Assignment 4 (15pt): Thursday April 12, 12:00 (noon)
Individual Assignment (30pt): Sunday April 22, 23:55

Resit

Students who do not pass the class can do a resit in the next quarter. The resit consists of an individual assignment, much like the normal individual assignment, but it replaces the entire course grade. It will be released before the middle of Q4, and the deadline is 1th of July.

Course Policies

Participation. As this class endeavors to teach professional skills, we ask that students act professionally and treat all course participants with respect. We also encourage you to offer your ideas and thoughts to the class and to question the material presented.

Assignments. Assignments are due at the time and in the manner specified in the assignment description. Late work will lose 33% of its original point-value for each day late, and once solutions are posted or discussed, late submissions will not be accepted.

Plagiarism. Plagiarism and cheating will not be tolerated. University policy will be adhered to in all such cases. There is a difference between collaboration and plagiarism. Plagiarism is the act of using another’s work without giving them credit for it. Collaboration is the exchange of ideas, the debate of issues and the examination of readings among each other that enables you to arrive at your own independent thoughts and designs.

Course summary:

Course Summary
Date	Details	Due

November 2025

Calendar
Sunday	Monday	Tuesday	Wednesday	Thursday	Friday	Saturday
27 October 2025 Previous month Next month Today Click to view event details	28 October 2025 Previous month Next month Today Click to view event details	29 October 2025 Previous month Next month Today Click to view event details	30 October 2025 Previous month Next month Today Click to view event details	31 October 2025 Previous month Next month Today Click to view event details	1 November 2025 Previous month Next month Today Click to view event details	2 November 2025 Previous month Next month Today Click to view event details
3 November 2025 Previous month Next month Today Click to view event details	4 November 2025 Previous month Next month Today Click to view event details	5 November 2025 Previous month Next month Today Click to view event details	6 November 2025 Previous month Next month Today Click to view event details	7 November 2025 Previous month Next month Today Click to view event details	8 November 2025 Previous month Next month Today Click to view event details	9 November 2025 Previous month Next month Today Click to view event details
10 November 2025 Previous month Next month Today Click to view event details	11 November 2025 Previous month Next month Today Click to view event details	12 November 2025 Previous month Next month Today Click to view event details	13 November 2025 Previous month Next month Today Click to view event details	14 November 2025 Previous month Next month Today Click to view event details	15 November 2025 Previous month Next month Today Click to view event details	16 November 2025 Previous month Next month Today Click to view event details
17 November 2025 Previous month Next month Today Click to view event details	18 November 2025 Previous month Next month Today Click to view event details	19 November 2025 Previous month Next month Today Click to view event details	20 November 2025 Previous month Next month Today Click to view event details	21 November 2025 Previous month Next month Today Click to view event details	22 November 2025 Previous month Next month Today Click to view event details	23 November 2025 Previous month Next month Today Click to view event details
24 November 2025 Previous month Next month Today Click to view event details	25 November 2025 Previous month Next month Today Click to view event details	26 November 2025 Previous month Next month Today Click to view event details	27 November 2025 Previous month Next month Today Click to view event details	28 November 2025 Previous month Next month Today Click to view event details	29 November 2025 Previous month Next month Today Click to view event details	30 November 2025 Previous month Next month Today Click to view event details
1 December 2025 Previous month Next month Today Click to view event details	2 December 2025 Previous month Next month Today Click to view event details	3 December 2025 Previous month Next month Today Click to view event details	4 December 2025 Previous month Next month Today Click to view event details	5 December 2025 Previous month Next month Today Click to view event details	6 December 2025 Previous month Next month Today Click to view event details	7 December 2025 Previous month Next month Today Click to view event details

Assignments are weighted by group:

Group	Weight
Assignments	0%
Total	0%