Programme

MADS’ first year lays foundations in several disciplines. The second year offers a more open curriculum, with access to various courses and workshops.
The last semester is dedicated to an internship – either in industry or in research – chosen in line with professional goals.
MADS offers refresher courses at the beginning of the Master to help students fill gaps in linear algebra and analysis.
To graduate from the Master in Data Science, one must have acquired 120 ECTS by successfully completing each of the 9 modules in the program. Each module consists of courses that, upon passing the exams, award ECTS. A module is considered validated if either the courses within it are passed or if the grade for each of the exams is above 5, and the weighted average of these grades is at least 10.
The program offers a wide variety of pedagogical approaches. Depending on the courses, students may be assessed through written assignments, oral presentations, individual or group projects.
The course format is 30, 45, 60 or 90 Teaching Units (TU), with 1 TU equivalent to a duration of 45 minutes.
Download the full programme of the academic year 2023-2024 here
Academic Contents
Course offer for Semestre 1 (2024-2025 Winter)
-
Details
- Course title: Probability Theory and Mathematical Statistics I
- Number of ECTS: 6
- Course code: MA_DS-31
- Module(s): Module 1 Mathematics for Data Science
- Language: EN
- Mandatory: Yes
-
Course learning outcomes
At the end of the course, a student should be familiar with the basic concepts of probability theory (event, random variables, distributions) and master the tools that allow him or her to make calculations (calculations of expectations, variances, distributions etc). The student should also be able to master the law of the large numbers as well as the central limit theorem. In particular, he or she must be able to calculate limits of random variables. Given an integral, the student should also be able to provide a probabilistic approximation of it and assess the probability of error. -
Description
What is an event, the probability of an event, a random variable?What are the main notations associated to probability theory?What is a law or a distribution? The main probability distributions: Bernoulli, binomial, Poisson, uniform, exponential, Cauchy, Gaussian. What random phenomena do they model? What is a density with respect to the Lebesgue measure? What is a density with respect to the counting measure? What is the distribution of a random variable?What is the expectation (or mean) of a random variable? How can we interpret it? What is the variance of a random variable?How to calculate the probability that a random variable belongs to a set from its distribution? How to calculate the expectation of a function of a given random variable from its distribution? What is a distribution function? What are its properties? What link densities and distribution functions? What are the quantiles of a real-valued random variable? What are the median and quartiles of a random variable?The main inequalities: the Markov inequality, the Bienaymé-Tchebychev inequality, the Jensen inequality.What does it mean that two random variables are independent? What is the density of a pair of independent random variables? How to generalize to n independent random variables? What is a random vector? What are the marginal distributions? How to calculate them?How to calculate the distributions of random variables (from distribution functions, change of variables)What does it mean that a sequence of random variables converges almost surely? In probability? In distribution? What are the connections between these convergence modes?What is Law of large numbers? What is the Monte Carlo method for calculating integrals?What is the central limit theorem? How can it be used to evaluate the error in the Monte Carlo method. -
Assessment
First sessionCombined assessment (end of course assessment + continuous assessment)Retake examOral exam
-
Details
- Course title: Optimization and Numerical Probabilities
- Number of ECTS: 5
- Course code: MA_DS-2
- Module(s): Module 1 Mathematics for Data Science
- Language: EN
- Mandatory: Yes
-
Objectives
Functional analysis in finite-dimensional
A successful student should be able to:
Prove that a function defined on Rd is differentiable and compute its derivative.
Determine the gradient, Hessian matrix of a function twice differentiable on Rd.
Prove that a function is convex.
Apply the Hilbert projection theorem.
Optimization in Rd
A successful student should be able to:
Apply necessary and sufficient criteria to solve an unconstrained optimization problem.
In an optimization problem with constraints of the form h(x)=0 and g(x)≤0: define the associated Lagrangian function, conjugate function, dual problem.
Give the necessary Karush-Kuhn-Tucker conditions. Sufficient conditions in convex problems.
Solve basic linear programming, quadratic programming, convex problems.
Know some algorithms that allow to numerically determine the minimizer of a convex function of a convex domain: Gradient Descent (GD) method, Newton’s method, projected GD.
Numerical probability
A successful student should be able to:
Know and apply some procedures to simulate a random variable (RV): inverse-transform methods, accept-reject method.
Design a Monte-Carlo procedure to compute the expectation of a bounded function of a RV directly, by using importance sampling.
Basically study simple Markov chains: determine transient/recurrent states, compute probabilities, determine (if applicable) invariant measure(s), apply the ergodic theorem.
Design a MCMC method (Metropolis-Hasting algorithm) to compute the expectation of a bounded function of a RV.
-
Description
1. Elements of functional analysis in finite-dimensional normed vector spacesBasics of functional analysisDifferential calculusConvex sets and convex functionsHilbert projection theorem 2. Optimization in RdUnconstrained optimizationOptimization with constraints – convex optimizationAlgorithms for optimization3. Numerical probabilitySimulation of random variablesMonte Carlo methods (principle, MCMC, Metropolis algorithm) -
Assessment
Exam modalities for the first session
Written
Exam modalities for the retake exam
Written
Absence plan
As I would like to fix a midterm exam, in case the students will not come, a second midterm exam could be organized only for those absent students enrolled for the first midterm exam. -
Note
Note / Literature / Bibliography
Functional Analysis, Calculus of Variations and Optimal Control, by Francis Clarke. Springer Science & Business Media, 2013.
Numerical Probability, by Gilles Pagès. Springer Cham, Universitext, 2018.
-
Details
- Course title: Signal processing
- Number of ECTS: 3
- Course code: MA_DS-3
- Module(s): Module 1 Mathematics for Data Science
- Language: EN
- Mandatory: Yes
-
Objectives
Being able to understand and manipulate modern signal processing tools;
Being able to choose between these techniques for efficient data representation in various domains -
Description
This course will introduce signal processing tools like Fourier transform, time frequency analyses, wavelet transforms. These tools will then be used for efficient, e.g. sparse, data representation, data compression (e.g. JPEG 2000 image compression standard), approximation, denoising, … -
Assessment
Exam modalities for the first session : “combined” assessment
Final grade = ( 3*MAX( test1, test2) + HW )/4with test1 and test2 will be supervised written tests based on exercises and results seen in class during the semesterAnd HW a final homework based on the practicals
Exam modalities for the retake exam
a supervised exam, without practical
Absence plan
absence for one of both tests, just the other is taken into account as the of both is taken;
justified absence at the 2 supervised written tests, need for a retake exam, without practical.
not justified absence = 0 grade -
Note
Note / Literature / Bibliography
* A wavelet tour of signal processing: the sparse way, Stéphane MALLAT,Academic Press, 2009* Fourier analysis and applications. Filtering, numerical computation, wavelets. Claude Gasquet, Patrick Witomski. Springer, 30, pp.442, 1999, Texts in Applied Mathematics,
-
Details
- Course title: Programming with R and PYTHON
- Number of ECTS: 5
- Course code: MA_DS-4
- Module(s): Module 2 Programming, Data Management and Visualization
- Language: EN
- Mandatory: Yes
-
Objectives
Perform data manipulations in R and Python
Understand the principles of reproducible analyses
Apply descriptive statistics to data
Knowledge of clustering algorithms and regression
Principles of machine learning methods and their implementation -
Description
The course introduces data science with R and Python and prepares students for more specialized courses. The course introduces basic programming concepts as data types, how to load, store and manipulate data sets, first in R using the tidyverse concepts for data manipulation, then in Python. The course then covers descriptive statistics, confidence intervals and hypothesis testing after an introduction to R and RStudio as developing environment to prepare for basic machine learning using standard libraries. Functional programming is used as the paradigm for data analysis.The last part of the course introduces standard machine techniques and algorithms to solve classification, clustering and regression problems. The course is delivered as a series of lectures and practical exercises, familiarizing students with version control systems. -
Assessment
First session
written exam (40%)
30% project work
30% exercises
Retake exam
oral exam
another project -
Note
Note / Literature / Bibliography
R for data science (https://r4ds.had.co.nz)
Python for data science (https://wesmckinney.com/book/)
Modern Statistics for Modern Biology (https://web.stanford.edu/class/bios221/)
-
Details
- Course title: NoSQL Databases & Cloud Computing
- Number of ECTS: 5
- Course code: MA_DS-5
- Module(s): Module 2 Programming, Data Management and Visualization
- Language:
- Mandatory: Yes
-
Objectives
On successful completion of this course, students are capable to:
explain both the theoretical foundations and the practical application of current NoSQL and cloud-computing architectures;
describe how different concepts concerning the modeling and management of large data collections are implemented on top of these architectures;
develop and evaluate various use-case applications based on the above platforms. -
Description
The course provides an introduction to both the theoretical foundations and practical applications concerning the broad area of “NoSQL Databases & Cloud Computing”. We specifically focus on current tools and respective application-programming interfaces (APIs) in the context of the Apache Hadoop and Spark ecosystems. The course starts by reviewing the functionality of a classical SQL database system (PostgreSQL) and then moves forward to distributed file systems, including the Google (GFS) and Hadoop (HDFS) distributed file systems, which is followed by a detailed discussion of the MapReduce distributed computing principle with different extensions. We then move on to a number of recent NoSQL engines and key-value stores, including Apache Pig, HBase, Hive, Spark and MongoDB, which provide a variety of options for processing different data formats such as text, CSV, XML and JSON. All of the practical examples discussed during the course will be interactively deployed on top of the Amazon Web Services (AWS) platform and/or the University’s HPC infrastructure.The course covers the following topics:Usage of classical data-modeling languages such as E/R diagramsData management in SQL using the PostgreSQL open-source DBMSDistributed file systems (GFS & HDFS), session semantics vs. transaction semantics, CAP theoremApache Hadoop: distributed computing principles (MapReduce), replication, fault tolerance, backup tasks, custom combiners and partitioners, local aggregation, linear scalabilityApache Pig: first dataflow language (Pig Latin), translation into MapReduce and optimizationsApache HBase: distributed key-value store for very large tabular data, columns and column families, indexing and lookupsApache Hive: SQL-like query language on top of Hadoop, translation into MapReduceMongoDB: API overview, JSON processing, user-defined functionsApache Spark: distributed resilient data objects (RDDs) and dataframes, basic overview of streaming and machine-learning extensions -
Assessment
First session
Practical exercises (group solutions): 50%
Final written or oral exam (individual): 50% -
Note
Background literature announced at the beginning of each course.
-
Details
- Course title: Data visualization
- Number of ECTS: 3
- Course code: MA_DS-6
- Module(s): Module 2 Programming, Data Management and Visualization
- Language: EN
- Mandatory: Yes
-
Objectives
Know the Tufte principles such as “data to ink ratio”, “show the data”
Explain the principles of good charts – labels, use of area, angles and lengths, etc.
Create publication-ready plots
Know all basic chart types and their use as well as some advanced plots
Formatting tables and figures for ease of reading
Know about data transformation for display purposes (normalization, log-transformation, standard error vs standard deviation)
Set-up interactive data analysis (using shiny or plotly) -
Description
The course introduces visualization using common software packages for exploratory data analysis and communication of data-driven findings. Data visualization principles such as data-to-ink ratio and weaknesses and strengths of types of data display will be introduced. It will teach best practices for figures, such as labeling of axis, titles, captions, red-green awareness as well as tables. The concepts for the use of ggplot2, seaborn, plotly and matplotlib packages will be discussed. Interactive visualizations will be introduced and discussed. -
Assessment
Assignments with a final practical work on an individual data set. -
Note
Note / Literature / Bibliography
Fundamentals of Data Visualization https://clauswilke.com/dataviz/
-
Details
- Course title: Introduction to Graph Theory
- Number of ECTS: 3
- Course code: MA_DS-8
- Module(s): Module 3 Transversal courses
- Language: EN
- Mandatory: Yes
-
Objectives
On successful completion of the course the students should be able tounderstand the relevance of the topics covered in the course for their applications,master the proofs of the main results of the course,solve problems using the toolkit developed in the coursebe autonomous in learning in the field of Graph Theory.
-
Description
Through a presentation of selected topics, the course aims to be an introduction to graph theory, its applications and its algorithmic aspects. It is designed as a self-contained course and focused on problems pertaining to Data Science. Possible topics for the course include, but are not limited toGraphs and digraphs, degree and the degree sequence algorithmConnectedness, distance, shortest paths and connected components algorithmsGraph matching problems and algorithmsElements of algebraic graph theory and PageRank algorithmGraph traversal algorithmsTrees and applicationsMinimum spanning tree algorithmsNetwork flow, min cut – max flow theorem and Ford–Fulkerson algorithmCentrality and betweness measuresCluster analysisRandom Graphs -
Assessment
First sessionWritten exam and homework, and possibly algorithm implementation project during the semester. Retake exam Writen exam -
Note
Note / Literature / BibliographyR. Diestel, Graph Theory, Springer, 2017D. Jungnickel, Graphs, Networks and algorithms, Springer 2017
-
Details
- Course title: Applied Philosophy of Science and Data Ethics
- Number of ECTS: 3
- Course code: MA_DS-7
- Module(s): Module 3 Transversal courses
- Language: EN
- Mandatory: Yes
-
Course learning outcomes
Get familiar with the scientific goals and methods.
Learn the most common data science and visualization misconduct problems.
Critically evaluate ethical issues and method choices.
Part 1: Scientific goals, methods and knowledge
Scientific Goals
Methodology
Scientific Knowledge
What is Philosophy of Science?
The scientific method
Methodology
Part 2: Scientific Inference
Scientific inferences.
Deduction and Induction
Hume’s problem of induction.Ref. https://stanford.library.sydney.edu.au/entries/induction-problem/
The Hypothetico-deductive Method.
Falsification.
Confirmation.
Scientific Explanation
Historical cases.Neptune and VulcanoMichelson & Morley experimentCovid19 ML issues can X-Ray images be used to diagnose Covid?
Part 3: Empirical practices and models
What is an experiment?
Observational studies
Field, laboratory and simulation experiments
Observability, Indicators and Evidence
Testing methods.
How to evaluate experiment success.
Repetition, Reproduction and Replication
What is a model? Models as analogies; as isolations; as mirrors
Differences between Models, Theories and Experiments.
The problems with Machine LearningRef. Chapter “A blueprint of reality” from The book of Why.
Part 4:
Experimental Control and Statistical Abuse.
Experimental Control
Bias and Confounders.Ref. Chapter Confounding and deconfounding, from The Book of Why.Ref. Judgment under Uncertainty. (Amos Tversky and Daniel Kahneman).Ref. John Snow and cholera.Ref. https://catalogofbias.org/
Imbalanced datasets; surrogates, proxies…
“Data alone is not enough”.
Randomised Control Trials
Origins of RCTs
Validity
Randomisation
Cross-validation in ML
“Visualizations can lie”.Ref. How charts lie from Alberto Cairo.
Part 5: Ethics and responsibility
Morality and ethics.
Data ethics
Ethical Frameworks: Consequentialism, deontology and virtue.
Informed consent and its limitations, how does it affect data scientists?
Data ownership; data destruction
Undesired consequences. – Case studies
Privacy, anonymity, de-identification and re-identification
GDPR
Data ethics, reproducibility and FAIR data.Ref. Chapter 3 Data Ethics of Deep Learning for Coders with Fastai and PyTorch.Ref. https://www.fast.ai/2020/08/19/data-ethics/ and https://ethics.fast.ai/ from Jeremy Howard and Rachel ThomasEthics and Data Science, O’Reilly.
Dilution of responsibility.
How progress and technology raise new ethical questions.
-
Description
This course aim is to provide the students with guidelines and methodologies to identify epistemic and ethical issues present in data science. We expect the students to develop a critical eye that helps them mitigate such problems in their daily work as data scientists.During this course, students will learn by example different layers of the scientific method and how they relate to data science and data ethics. In particular, they will learn how the mechanisms behind the data affect the data analysis, and how the different types of scientific inference condition the application of data science solutions and conclusions to other contexts. In this sense, examples of statistical abuse, misconduct and bad visualization will be shown together with their, sometimes catastrophic, collateral consequences.PlagiarisimQuizzes, assignments and any other pieces of work produced during the course by the students should be written in their own words. Any attempt to copy from internet sources or between the students may be detected and the corresponding disciplinary procedure will be open. I encourage the students to read the document on academic misconduct from UL for further details. Note that the potential sanctions for plagiarism may include:The cancellation of all grades obtained in examinations for the module or the entire examination session of the respective semester;A ban for up to five years on taking any examinations leading to the award of a degree, diploma or certificate by the University;The retroactive withdrawal of the degree, diploma or certificate awarded by the University. -
Assessment
First session
Students’ evaluations will be based on their performance in different types of individual and group exercises as well as test exams during the course weeks.
Written essays/reports regarding the course content.
Open discussion, test exercises and quiz tests;
Depending on time, there may be presentation exercises (slides exposition).
The final exam will be on paper, without the help of a computer or notes. It will consist of single/multiple-choice questions as well as essay questions. The student is thereby responsible for producing intelligible answers employing readable handwriting and correct English writing. Importantly, the final exam may be more challenging than the individual quizzes and assignments as it mixes chapters and does not allow any help from notes. Therefore, it is strongly recommended to study in-depth the content of the course (slides, questions, extra material, etc.) before the exam.
Only those students with a final grade less than 50% (from the quizzes and assignments of the course) must attend the final exam. This grade can be checked in the grade book in Moodle. If in doubt, do not hesitate to contact the teacher by e-mail.
Retake exam
Further retake exams will follow the same conditions as the first final exam. The questions of the exam are subject to change for each retake exam.
Course offer for Semestre 2 (2024-2025 Summer)
-
Details
- Course title: Fundamentals of Statistical Learning
- Number of ECTS: 5
- Course code: MA_DS-9
- Module(s): Module 4 Mathematics for Statistical Learning
- Language: EN
- Mandatory: Yes
-
Objectives
-
Course learning outcomes
A successful student should be able to
Understand the purpose of statistical learning
Be familiar with the probabilistic tools that enables to assess the risk and the excess risk of a predictor.
Be able to evaluate the Rademacher complexity of some sets of predictors
Be able to calibrate hyperparameters and benchmark learning methods. -
Description
ProgrammeEmpirical risk minimization. Bounding the prediction error. Concentration inequalities.Rademacher Complexity. Vapnik-Chervonenkis classes. Gaussian mean-width.Case of supervised binary classification.Prediction in bounded regression.Regularization approaches (cross-validation, unbiased risk estimation).Nonparametric statistics and minimax rates.Learning methods: plug-in, penalised ERM, kernel methods, Lasso, perceptron, Gradient descent, boosting, deep learning.Unsupervised learning: density estimation, principal component analysis, clustering. -
Assessment
Final grade will be based on class participation, the presentation of a research paper and a written exam. -
Note
Bibliography
Hastie, Tibshirani, and Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer, Second Edition, pdf available for download.
Boucheron, Bousquet, Lugosi, Theory of classification: a survey of some recent advances, ESAIM Probab. Stat. pdf available for download.
Tsybakov, an introduction to nonparametric statistics, Springer
Wainwright, High-Dimensional statistics: A non-asymptotic Viewpoint, Cambriddge University Press, 2019. https://people.eecs.berkeley.edu/~wainwrig/
-
Details
- Course title: Mathematical Statistics II
- Number of ECTS: 5
- Course code: MA_DS-35
- Module(s): Module 4 Mathematics for Statistical Learning
- Language: EN
- Mandatory: Yes
-
Course learning outcomes
In parametric models, a successful student should be is able to establish the main properties (consistency, asymptotic normality, etc) of the most classical estimators, provide (possibly asymptotic) confidence regions and tests between two hypotheses. -
Description
The main problems of mathematical statistics: estimation, testing, the notion of risk. Construction of confidence intervals and tests. False discovery rate. The empirical measure and its applications: the moment method, estimation by empirical quantiles, the Kolmokorov-Smirnov Test. The maximum likelihood estimator. Exponential families in one dimension. Introduction to the Bayes paradigm and elements of decision theory. -
Assessment
First sessionA partial written exam and a final written exam.Retake examOral exam -
Note
LiteraturFor probability theory: Real Analysis and Probability, R.M. DudleyFor statistics: Mathematical Statistics, P. Bickel and K. Doksum
-
Details
- Course title: High Dimensional Statistics
- Number of ECTS: 5
- Course code: MA_DS-11
- Module(s): Module 4 Mathematics for Statistical Learning
- Language: EN
- Mandatory: Yes
-
Objectives
The goal of the course is a detailed introduction to modern techniques in high dimensional estimation of linear models and covariance matrices. After successful completion the student should be able to establish the main properties of Lasso and related estimators in the setting of linear regression models, asymptotic results for large random matrices and non-asymptotic properties for high dimensional estimators of covariance matrices.
-
Description
High dimensional regression model Parameter estimation under constraints: Lasso and relates estimation procedures Variable selection Introduction to principal component analysis Basic elements of random matrix theory: semi-circle law, Marcenko-Pastur distribution, Tracy-Widom distribution Estimation of high dimensional covariance and precision matrices; hard and soft thresholding -
Assessment
There will be a written exam at the end of the course. -
Note
BibliographyP. Bühlmann and S. van de Geer „Statistics for High-Dimensional Data“C. Giraud “Introduction to High-Dimensional Statistics”G.W. Anderson, A. Guoinnet and O. Zeitouni „An Introduction to Random Matrices“
-
Details
- Course title: Big Data Analytics
- Number of ECTS: 5
- Course code: MICS2-41
- Module(s): Module 5 Big Data Analytics
- Language: EN
- Mandatory: Yes
-
Objectives
The lecture provides an entry point to large-scale data management and distributed computing principles in recent NoSQL architectures. We start with an overview of distributed file systems and MapReduce in Apache Hadoop and then move on to more advanced analytical tasks based on the machine-learning libraries in Apache Spark. The lecture serves as an ideal basis for further topics in this area (such as Master seminars, projects and theses).
-
Course learning outcomes
– Students become familiar with the usage of recent Big Data platforms such as Apache Hadoop and Spark- Student obtain an overview of both the theoretical foundations and practical applications of various Big Data and Machine Learning algorithms- Students learn how to approach and solve different data-analysis tasks by a number of programming exercises with real-world datasets -
Description
The course consists of a combination of theory-oriented lectures and practical exercises, through which the students are guided by a series of real-world use cases and hands-on examples. Specifically, we focus on the following topics:- Distributed File Systems (DFS) and MapReduce in Apache Hadoop- Resilient Distributed Data (RDD) objects and DataFrames in Apache Spark- Implementation of complex DataFlow programs in Spark using Scala- Performing advanced analytical tasks in Spark’s MLlib: o Distributed clustering and classification of objects o Decision trees and random forests o Recommender systems via matrix factorization o Text analysis via latent semantic indexing o Geospatial data analysis o Social-network analysis -
Assessment
Practical exercises: 50%Final written exam: 50%
-
Details
- Course title: Introduction to Machine Learning Methods and Data Mining
- Number of ECTS: 5
- Course code: MA_DS-13
- Module(s): Module 6 Introduction to Machine Learning Methods and Data Mining
- Language: EN
- Mandatory: Yes
-
Objectives
After successfully finishing the course, a student will become familiar with the basics of supervised and unsupervised ML methods, understand their theoretical background, advantages, limitations, and get practical experience in solving real problems employing ML techniques. Particular attention will be focused on the interdisciplinary aspect of ML applications and advantages, which provides an understanding of the nature of the data. Within the course, the students will learn how to implement the basic elements of ML models by themselves, as well as how to use state-of-the-art ML software packages.
-
Description
The main chapters arePreprocessing of collected data, understanding their structure, visualization. (1 hour)Introduction into Scikit-Learn and TensorFlow. (7 hours)Unsupervised methods: clustering, nearest neighbor task, association rules mining; rule- and tree-based classifications. (12 hours)(Kernel) ridge regression. (4 hours)Support vector machines. (4 hours)Artificial neural networks. (12 hours)Advanced topics: model evaluation and selection, anomaly detection, conformal learning (prediction with guarantees of accuracy), causal inference (identification of causal relationships). (4 hours)Combining different machine learning methods for solving actual problems in natural sciences. (4 hours)Presentation of personal projects. (8 hours)The course will be split into series of lectures with following practical exercises. The ideal schedule will be one day per week in a computer class, where the lecture is directly followed by practical exercises. At the end of the course each student will have to present his individual project. -
Assessment
The evaluation will be based on the presence on the lectures and practical exercises (25%), the individual project (50%), and the answers on the questions following the presentation (25%). -
Note
Bibliography
Géron, Aurélien. Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems. O’Reilly Media, 2019.
Andriy Burkov. The Hundred-Page Machine Learning Book, 2019
-
Details
- Course title: Introduction to Deep Learning
- Number of ECTS: 5
- Course code: MICS2-66
- Module(s): Module 7 Optional courses
- Language: EN
- Mandatory: No
-
Objectives
This course provides students with a high-quality and informed understanding of Deep Learning (DL) models for developing AI-based applications. The course promotes problem solving via design thinking philosophy. The course helps students to develop state-of-the-art competencies to solve many different real-world problems using DL. The course will provide students with a competitive advantage to solve challenging research problems that require dealing with complex search spaces, non-linear relationships within the data, and flexible models that can scale up to thousands or millions of observations. These competencies are also of key importance to many industrial and technological companies. Finally, the course also promotes development of soft skills such as written and verbal communication skills, via final project presentations.
-
Course learning outcomes
* Knowledge of foundational building blocks in DL.* Understanding of DL models and their applications.* Ability to formulate and solve problems using DL. -
Description
1. Introduction2. Discriminative modeling I: Classification3. Discriminative modeling II: Regression4. Unsupervised and self-supervised learning5. Sequence learning6. Generative modeling7. Reinforcement learning8. Deployment and best practices9. Project presentations -
Assessment
Continuous evaluation (written assignments) and project-based learning (students must deliver a final project and defend it in an oral presentation). -
Note
Recommended books:• F. Chollet. Deep Learning with Python, 2nd ed. 2021. • I. Goodfellow, Y. Bengio, A. Courville. Deep Learning, 2016.
-
Details
- Course title: Statistical Modelling
- Number of ECTS: 5
- Course code: MA_DS-15
- Module(s): Module 7 Optional courses
- Language: EN
- Mandatory: No
-
Objectives
The main goal of the course is to raise awareness of model assumptions and critical thinking, and to provide a guide as to how to make reasonable modelling choices.
-
Description
This course will deal with two cultures of statistical modelling: top-down via a parametric model, and bottom-up via data-adaptive methods. More concretely, it will cover classical distributions and their limitations, flexible distributions allowing to model complex modern datasets, interpretable machine learning, and the difference between the two modelling cultures. -
Assessment
First sessionWritten exam on both theory and exercises for 75% of the mark, and midterm exam for 25% of the mark.Retake examWritten exam on both theory and exercises. The student can choose whether they wish to keep their midterm exam mark or not. -
Note
LiteraturLey, C., Babic, S. and Craens, D. (2021) Flexible models for complex data with applications.Annual Review of Statistics and Its Application 8, 18.1-18.23.Genuer, R. and Poggi, J.-M. (2020) Random Forests with R, Springer.Kleinbaum, D.G. and Klein, M. (2012) Survival Analysis – A Self-learning Text, Springer.
-
Details
- Course title: Introduction to Biology for Data Scientists
- Number of ECTS: 5
- Course code: MA_DS-14
- Module(s): Module 7 Optional courses
- Language: EN
- Mandatory: No
-
Objectives
structure and function of the cellbasics in biochemistrybasics of geneticsbasics of evolutionintroduction to plant biologyintroduction to animal biologyintroduction to ecology
-
Description
We will follow the Campbell Biology. The course includes fundamental principles of biochemistry, genetics, molecular biology and cell biology. -
Assessment
Written exam
Course offer for Semestre 3 (2024-2025 Winter)
-
Details
- Course title: Workshop I – "Environmental Data Analytics"
- Number of ECTS: 5
- Course code: MA_DS-24
- Module(s): Module 9 Workshops
- Language: EN
- Mandatory: Yes
-
Objectives
i. Equip students with the skills to acquire, analyze, visualize, and interpret environmental and climate data effectively.ii. Introduce a variety of data analytics techniques specific to environmental science, including time series analysis, geospatial data analytics, and multivariate statistics.iii. Foster the application of machine learning and advanced computational models to address environmental issues.iv. Develop students’ abilities to integrate and apply theoretical and practical knowledge to real-world environmental challenges.
-
Course learning outcomes
i. Master techniques for effective analysis of time series, geospatial, and aerial imagery data.ii. Transform complex environmental data into practical insights for policy-making and sustainability efforts.iii. Acquire hands-on experience through project work, applying learned techniques to actual environmental data scenarios. -
Description
This interdisciplinary course offers a comprehensive exploration of how data science and machine learning techniques are applied to address environmental and climate challenges. Students will delve into both theoretical concepts and practical applications through engaging lectures and hands-on exercises. The course is designed to familiarize students with the types of data commonly used in environmental and climate science. This includes time series and geospatial data from environmental monitoring systems or simulations, as well as aerial imagery from drones and satellites. Students will learn the methodologies for acquiring, accessing, and analyzing these data types. They will also explore the insights typically sought from environmental data and understand the primary objectives of environmental data analytics. The curriculum provides an overview of essential concepts and algorithms in data analytics and machine learning that are commonly used in environmental data analytics. It covers a spectrum of topics from basic multivariate time series analysis to advanced techniques such as deep neural networks and physics-informed neural networks. Throughout the course, students will learn how these algorithms can be effectively implemented to answer prevalent questions in the field of environmental data analytics, preparing them to contribute to solutions for real-world environmental problems. -
Assessment
Exam modalities for the first session
The assessment for this course will follow a combined assessment approach, consisting of continuous assessment through project work during the term and one final exam at the end of the course.
Continuous assessment: This will be based on project work done throughout the term.
Final exam: The final exam will be held at the end of the course, covering all the course material.
Exam modalities for the retake exam
If a student misses the final exam, they must provide a valid justification in line with the university’s policies. If accepted, the student will be allowed to take a retake exam. The retake exam will be of the same nature as the first session’s final exam. However, the continuous assessment component (project work) will remain unchanged and will not be part of the retake.
Absence plan
As mentioned, if a student misses the final exam with a valid justification, they will be eligible for the retake exam. For project work, late submissions are allowed up to 5 days with a 5% deduction for each day of delay. Beyond 5 days, no points will be awarded for the project. There will be no retake or extensions for missed project deadlines. -
Note
Note / Literature / Bibliography
This course utilizes a comprehensive set of detailed presentations that are curated from a wide range of sources to form the primary material. These presentations are designed to offer an in-depth look at key concepts and techniques in environmental data analytics. As the main instructional content, students are expected to thoroughly review these presentations as part of their regular study and preparation for class discussions and assessments.For additional reading and a broader perspective on the topics covered, the course also recommends the following textbooks. These texts provide supplemental knowledge and insights that can further enrich the learning experience:
Introduction to Environmental Data Science by Author(s)(2023). This text from Cambridge University Press provides a contemporary overview of environmental data science, offering foundational knowledge and new developments in the field.
Environmental Data Analysis with MATLAB or Python: Principles, Applications, and Prospects by Menke, W. (2022). Published by Academic Press, this book delves into practical applications of environmental data analysis, focusing on how MATLAB and Python can be utilized to solve real-world environmental problems.
-
Details
- Course title: Workshop II "Practical Data Science for the Public Sector: Reproducible Pipelines and Time Series Forecasting"
- Number of ECTS: 5
- Course code: MA_DS-25
- Module(s): Module 9 Workshops
- Language: EN
- Mandatory: Yes
-
Description
Part 1/2 – 20 hours, Vasja Sivec 1. Introduction to Forecasting Time Series (1 hours = 1 lecture) a. Introduction 2. Traditional Models for Forecasting Time Series (10 hours = 5 lectures + 5 exercise) a. Basic time series concepts – stationarity and integrated processes b. Forecasting Univariate time series – ARIMA c. Forecasting Multivariate time series – VAR d. Forecasting data with seasonal patterns – SARIMA and ETS i. Simple Example with Code ii. Theory iii. Exercise(s) 3. Neural Networks for Forecasting Time Series (6h = 3 lectures + 3 exercise) a. Introduction b. NN Representation c. Estimation: Feedforward & Backpropagation d. Forecasting i. Simple example with Code ii. Theory iii. Exercise(s) 4. Latest Findings & advanced models (4h = 4 lectures) a. Latest Findings (Traditional vs. Machine Learning Approach) b. Advanced models (ARCH, Markow Switching, MIDAS, Time Varying Parameter Models,..; if time)Part 2/2 – 20 hours, Bruno Rodrigues 5. Introduction to R and functional programming (4 hours) a. Pure functions b. Higher-order functions 6. Git and Github (3 hours) a. Intro to Git and Github b. Cloning repos c. Collaborating d. Branching e. Pull requests 7. Package development and unit testing (3 hours) a. Adding functions b. Documenting functions using roxygen c. Unit testing package d. Test coverage 8. Build automation and data products (6 hours) a. Build automation essentials b. Recording package versions c. Building a pipeline d. Literate programming with Quarto e. Interactive web apps using Shiny 9. Self-contained pipelines with Docker and Github Actions (4 hours) a. Docker essentials b. Building a fully reproducible pipeline c. Short introduction to Github Actions -
Assessment
Exam modalities for the first session
Continuous assessment
Exam modalities for the retake exam
/
Absence plan
Not needed due to continuous assessment -
Note
Main textbook (students are not required to purchase these textbooks)
Part 1
Ghysels, E. and Marcellino, M., 2018. Applied economic forecasting using time series methods. Oxford University Press.
Lütkepohl, H., 2013. Introduction to multiple time series analysis. Springer Science & Business Media.
Part 2
Textbook for Part 2 is available here: https://rap4mads.eu/Programming environment for exercises: Python or R, depending on what the students already know best.
-
Details
- Course title: Computational Methods
- Number of ECTS: 4
- Course code: MCMP-21
- Module(s): Module 8 Advanced courses
- Language: EN
- Mandatory: No
-
Objectives
The main idea of the course is to provide knowledge and practical experience of the numerical techniques that constitute the basis of Computational Physics and Chemistry. Some emphasis will be put on the analysis of the outcome of the variation of physical parameters and numerical hyperparameters for the given problem.
-
Course learning outcomes
•For a given physical problem, students will be capable of writing a program to find its numerical solution and analyze the resulting data.•Students will have an overview of standard numerical algorithms adopted in computational Physics and Chemistry as well as their limitations. -
Description
The main idea of the course is to provide knowledge and practical experience of the numerical techniques that constitute the basis of Computational Physics and Chemistry. Some emphasis will be put on the analysis of the outcome of the variation of physical parameters and numerical hyperparameters for the given problem.The first part of the course will consolidate the basics of Python3 programming and cover the basic algorithms necessary to solve simple equations. The second part will introduce more advanced methods with applications to physical problems. Specifically, we will treat• Introduction to Python and relevant packages• Numerical differentiation and integration• Linear algebra solvers• Root finding and minimization• Ordinary differential equations• Partial differential equations• Monte Carlo methods• Molecular dynamics• Basics of machine learningEach lecture will be comprised of an introduction to the theory behind a given technique, followed by a practical session centered on its implementation and application to well-known problems. -
Assessment
Task 1: Homework’s weekly assignments.Task 2: Final project.Task 3: Written Exam.Assessment rules:Students hand in homework individually, no equipment is required/allowed for written exams.Assessment criteria: Weights for final grade: Task 1: 20%, Task 2: 50%, Written Exam: 30%,Each graded out of 20The final mark is calculated as a weighted average according to the abovementioned weights. -
Note
The most recommended textbooks•Numerical Methods in Physics with Python, Alex Gezerlis (Cambridge University Press, 2020)•Numerical Analysis 9th ed., Richard L. Burden and J. Douglas Faires (Brooks/Cole, 2011)•Computational Methods for Physicists, Simon Sirca and Martin Harvat (Springer, 2012)Other textbooks of note•Numerical Methods for Scientists and Engineers, Richard W. Hamming (Dover publications)•Numerical Recipes series, William H. Press, Saul A. Teukolsky, William T. Vetterling and Brian P. Flannery (Cambridge university press)•Numerical Methods, E. A. Volkov (Hemisphere publishing corporation)General resources on Python•https://www.codecademy.com/learn/learn-python-3•https://www.learnpython.org/en/•https://lectures.scientific-python.org
-
Details
- Course title: Programming Machine Learning Algorithms for HPC
- Number of ECTS: 5
- Course code: F1_MA_HPC_1-9
- Module(s): Module 8 Advanced courses
- Language: EN
- Mandatory: No
-
Objectives
The main objective of the course is to equip participants with an understanding of machine learning (ML) computational challenges and opportunities. The focus is on practical implementation tailored for high-performance computing environments. The course primarily utilizes the Python programming language, offering participants a versatile and hands-on learning experience.
-
Course learning outcomes
Gain a comprehensive theoretical knowledge of machine learning challenges and opportunities. Develop practical coding skills by implementing machine learning algorithms from scratch in Python. Learn how to adapt machine learning algorithms for efficient execution in HPC environments, focusing on parallelization, distribution, and leveraging the computational capabilities of modern hardware (including GPUs). Employing advanced optimization techniques to enhance computational efficiency. Acquire skills in gaining insight in code performance: profiling and optimizing machine learning code, identifying and addressing bottlenecks Be able to discuss pros and cons of different computational methods for deploying ML models on large datasets. -
Description
The course is designed for individuals seeking a comprehensive understanding of machine learning (ML) computational challenges and opportunities. We specifically emphasis on their implementation and optimization for high-performance computing environments. The course is conducted using mainly the Python programming language, providing a practical and versatile platform for participants. -
Assessment
The evaluation is based on 50% on exercises, 50% exam on lectures. There is no final exam -
Note
Student’s PC required
-
Details
- Course title: Functional Analysis
- Number of ECTS: 6
- Course code: F1_MA_MAT_MMCS2-2
- Module(s): Module 8 Advanced courses
- Language: EN
- Mandatory: No
-
Objectives
The aim of the course is to provide students with efficient tools to study linear operators between infinite dimensional vector spaces. In particular, this class will deal with normed vector spaces, Banach and Hilbert spaces, with bounded linear operators on normed vector spaces, fundamental principles such as Fourier analysis, Lebesgue integral, Han-Banach Theorem, Uniform Boundedness Principle or Closed Graph Theorem, and with spectral theory of compact (self-adjoint) linear operators.
-
Course learning outcomes
Students will acquire a solid understanding of functional analysis, its fundamental results and basic techniques. In particular, students will understand applications to measure theory, Fourier theory, and the spectral theorem for (unbounded) operators. Students will know the relevance of a theorem, its underlying motivation and a precise idea of its proof. Hopefully, students will demonstrate capacity for mathematical reasoning through analyzing, proving and explaining concepts from functional analysis. -
Description
Functional analysis aims to study infinite-dimensional spaces of functions and features of linear operators on these spaces. Though, from the point of view of Mathematics, functional analysis has its own interest, it plays a crucial role in many related areas, in particular in Physics, Engineering or Finance. Roughly speaking, most “real-life problems” involve non-linear partial differential equations with infinite-dimensional solution spaces of functions (or distributions). This course provides Master students with the basic tools and the fundamental results to develop skills to solve such problems. -
Assessment
Exam modality for the first session
Written test on week of October 28, duration 1 hour, 30% of the final grade,
Final written exam (scheduled by the administration), 60% of the final grade,
Written homework (problems to be determined), 10% of the final grade.
Exam modality for the retake exam
Retake exam (scheduled by the administration) will be in written form.
Absence
A make-up will be possible only for an unavoidable reason supported by a proof. -
Note
Note / Literature / Bibliography
Introductory Functional Analysis with Applications, by Erwin Kreyszig.Functional Analysis, Spectral Theory and Applications, by Manfred Einsiedler and Thomas Ward.An Introduction to Fourier Analysis, by Russell L. Herman
-
Details
- Course title: Introduction to Imaging AI with Applications in Medical Imaging
- Number of ECTS: 5
- Course code: MA_DS-19
- Module(s): Module 8 Advanced courses
- Language: EN
- Mandatory: No
-
Course learning outcomes
Knowing important image formats and their specific constraints
Knowing and being able to apply basic image processing operations
Aware of the problems that can arise from different spatial alignments and resolution issues
Knows an overview about state-of-the art deep learning network types and their principle application domains
Hands on experience in employing feed forward networks for image classification and segmentation
Hands on experience in training deep neural networks utilizing a high-performance computing environment -
Description
Exam modalities for the first sessionCombined assessment of two parts:Part a) Practical Homework Project in Teams of 2 with oral Presentation of the solution.Final Grade: 50% Part a) + 50% Part b), both parts have to be graded with at least “passed”. I.e. we will combine a written final exam on the theory, with a practical hands project where i will give an assignment on which the students can work at home and present the solution.Exam modalities for the retake examStudents that passed one Part (a or b) can choose to retake only the failed/missed part or both parts.Absence planboth parts have to be graded with at least “passed”, missing one part requires retake of that part. -
Assessment
Final Grade: 50% Part a) + 50% Part b), both parts have to be graded with at least “passed”. I.e. we will combine a written final exam on the theory, with a practical hands project where i will give an assignment on which the students can work at home and present the solution. -
Note
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press.
Sonka, M., & Fitzpatrick, J. M. (2000). Handbook of medical imaging. Volume 2, Medical image processing and analysis. University of Iowa.
Coursera – Course “AI for Medical Diagnosis”
-
Details
- Course title: Bioinformatics and Network Analysis in Life Sciences
- Number of ECTS: 5
- Course code: MA_DS-20
- Module(s): Module 8 Advanced courses
- Language: EN
- Mandatory: No
-
Objectives
Understanding data structure of biological Omics-Data
DNA analysis
Single-cell RNA sequencing analysis
Heterogeneous data integration -
Course learning outcomes
Blast approaches for sequence alignments.
Single-cell RNA sequencing workflow
Biological network analyses -
Description
The course will give an overview of current challenges in biomedical data management and analysis and introduce bioinformatic approaches and tools to analyze biomedical data. A particular focus will be on sequence alignment for DNA and RNA analyses and on single-cell sequencing approaches to biomedical data including the subsequent representation in networks. -
Assessment
First session
tbaRetake exam tba -
Note
Literatur
E. Klipp: Systems Biology
U. Alon: Introduction to Systems Biology
A Lesk: Introduction to Bioinformatics
S. Strogatz: Non-linear Dynamics and Chaos
-
Details
- Course title: Analysis of Complex Networks
- Number of ECTS: 5
- Course code: MA_DS-21
- Module(s): Module 8 Advanced courses
- Language: EN
- Mandatory: No
-
Objectives
The course aims to provide the students an overview of (1) important concepts, theories and computational methods in network science, and (2) recent developments in machine learning for graph data.
-
Course learning outcomes
After the course, the student should develop an intuition on how to model systems as networks and perform reasoning and analysis on real-world network data. The student should achieve a good understanding of complex networks (i.e., network metrics, structural properties, types of networks, and network models). The student should have a good overview on different community detection algorithms. The student should be able to describe different type of epidemic spreading over complex networks and establish a formal background for information diffusion and influence maximization in online social networks. It is more desirable that by the end of the course the student also has a basic understanding of machine learning on graphs. -
Description
Networks are a fundamental concept for modelling complex physical, technological, social, and biological systems. The course will cover the fundamental aspects of networks: network models, methods for describing network structure and measuring networks, community detection, and information diffusion in complex networks. More advanced topics, such as network embedding and graph neural networks (GNNs) and their applications, will be also introduced and discussed. With the course, students will learn how to explore computational algorithms and machine learning techniques to reveal insights of real-world networks. -
Assessment
Exam modalities for the first session
Continuous assessment (course projects)Exam modalities for the retake examOral exam
Absence plan
Based on the project complete percentage -
Note
Book “Networks: An Introduction”, by Mark Newman, 2010
Book “Network Science”, by Albter Laszlo Barabasi, 2016.
Book “Graph Representation Learning”, by William L. Hamilton, 2020.
-
Details
- Course title: Parallel and Grid Computing
- Number of ECTS: 4
- Course code: MICS-COMMSYST-024
- Module(s): Module 8 Advanced courses
- Language: EN
- Mandatory: No
-
Objectives
The frequency of a core in modern processors is stagnating for more than a decade now. Hence, it is unavoidable to use multiple cores and computers in parallel to accelerate a computation. In this course, we overview three main models of parallelism (shared memory, message passing and partitioned global address space) and their implementations (std::thread/OpenMP, MPI, Chapel). We start with the C++ standard library for multithreading (std::thread) to study shared memory parallelism, the most flexible paradigm but also the most error-prone. We move to a safer model based on data parallelism using OpenMP for computations that are embarrassingly parallel. We then deepen our study of shared memory parallelism with the concept of memory consistency and work-stealing algorithms for task parallelism. Afterwards, we focus on grid computing with the message passing model and the popular library MPI. We terminate this course by an introduction to the partitioned global address space and the Chapel programming language.
-
Course learning outcomes
A comprehensive understanding of the three main models of parallelism: shared memory, message passing and partitioned global address space (PGAS). Ability to program with the C++ libraries std::thread, OpenMP and MPI, and with the emerging Chapel programming language. -
Description
1. C++ programming language (C++11 and above)2. Shared memory concurrency (std::thread)3. Data parallelism (OpenMP, Map/Reduce)4. Task parallelism (work-stealing algorithm)5. Memory consistency6. Message passing concurrency (MPI)7. Partitioned global address space (Chapel) -
Assessment
Project 20% and final exam 80%.
-
Details
- Course title: Natural Language Processing in Data Science
- Number of ECTS: 5
- Course code: MA_DS-29
- Module(s): Module 8 Advanced courses
- Language: EN
- Mandatory: No
-
Objectives
To introduce students the fundamentals of Natural Language Processing.
Learn the techniques in natural language processing.
Be familiar with the natural language generation.
Be exposed to Text Mining.
Understand major applications of NLP -
Course learning outcomes
On completion of the course, students will have the ability to
Understand and implement word and syntactic level analysis
Extract relation in text.
Implement the Python and NLTK libraries
Implementing Sematic analysis
Implement real-time based analysis -
Description
Chapter 1: Basics in NLP 6 hoursOverview: Origins and challenges of NLP-Need of NLP, python and NLTK for NLP, Text Wrangling and cleansing- Text cleansing, sentence splitter, tokenization, stemming, lemmatization, stop word removal, rare word removal, spell correction.Chapter 2: Text Preprocessing and Morphology 12 HoursCharacter Encoding, Word Segmentation, Sentence Segmentation, Introduction to Corpora, Corpora Analysis. Inflectional and Derivation Morphology, Morphological analysis and generation using Finite State Automata and Finite State transducer.Chapter 3: Language Modelling 12 HoursWords: Collocations- Frequency-Mean and Variance –Hypothesis testing: The t test, Hypothesis testing of differences, Pearson’s chi-square test, Likelihood ratios. Statistical Inference: n -gram Models over Sparse Data: Bins: Forming Equivalence Classes- N gram model – Statistical Estimators- Combining EstimatorsChapter 4: POS Tagging and Text Classification 12 HoursParts of Speech Tagging – Tagging in NLP, Sequential tagger, N-gram tagger, Regex tagger, Brill tagger, Machine learning taggers-MEC, HMM, CRF, NER tagger, Types of learning techniques, Text Classification-Sampling, Naïve Bayes, Decision trees, Stochastic gradient descent, Support vector machine, Text clusteringChapter 5: Syntax and Semantics 12 HoursShallow Parsing and Chunking, Shallow Parsing with Conditional Random Fields (CRF), Lexical Semantics, WordNet, Thematic Roles, Semantic Role Labelling with CRFs. Statistical Alignment and Machine Translation, Text alignment, Word alignment, Information extraction, Text mining, Information Retrieval, NL interfaces, Sentimental Analysis, Question Answering Systems, Social network analysis.Chapter 6: Recent Trends and Applications of NLP 6 hoursRecent trends in NLP, Applications of NLP: Transforming text, Sentiment Analysis, Information retrieval, text summarization, Question and Answering, Automatic Summarization -
Assessment
Continuous evaluation -
Note
Text Books
Nitin Hardeniya, Jacob Perkins, Deepti Chopra, Nisheeth Joshi, Iti Mathur, “Natural Language Processing: Python and NLTK”, Packt publisher, 2016.
Christopher D. Manning and Hinrich Schutze, “Foundations of Natural Language Processing” , 6 th Edition, The MIT Press Cambridge, Massachusetts London, England, 2003
-
Details
- Course title: Advanced topics in applied Machine Learning
- Number of ECTS: 5
- Course code: MA_DS-23
- Module(s): Module 8 Advanced courses
- Language: EN
- Mandatory: No
-
Objectives
After successfully finishing the course, a student will become familiar with applied machine-learning and robustification of ML-based systems, understand the key methods and learning algorithms, their advantages and limitations. Students will also get practical experience in setting up robust and secure ML systems.Within the course, the students will learn the fundamentals of metaheuristic search algorithms, which will be used to improve the robustness and security of ML systems. Additionally, students will learn to implement state-of-the-art techniques capable of attacking and defending ML systems.
-
Description
The course aims at introducing clear-eyed and principled algorithms to engineer robust and secure ML-systems. Meta-heuristics will be presented as they are key algorithms to optimize the search of a solution that applies to the robustification of ML-systems (data augmentation, model generalization). Adversarial testing and learning in realistic settings will also be studied. A variety of ML algorithms (CART, random forest, NN, DNN), approaches (active learning, multi-task learning) and applications (fintech, industry 4.0, energy optimization) will be considered. The students should be able to understand, synthetize and present a research paper in relation to the studied topics. Introduction (1 lecture)Engineering for Machine Learning Systems (1 lecture)Meta-heuristics search (2 lectures)Genetic Programming (1 lecture)Adversarial attacks and robustification (2 lectures)Project presentations (1 lecture)Selected topics on applied machine learning (3 lectures) Student presentations (2 lectures) -
Assessment
The evaluation will be based on practical individual projects (30%), presentations of two scientific articles (40%) and written critical appraisal of a scientific article (30%).
-
Note
https://cs.gmu.edu/~sean/book/metaheuristics/
http://www.genetic-programming.org/
https://en.wikipedia.org/wiki/Evolutionary_computation
Jie M. Zhang, Mark Harman, Lei Ma, Yang Liu: Machine Learning Testing: Survey, Landscapes and Horizons. IEEE Trans. Software Eng. 48(2): 1-36 (2022)
-
Details
- Course title: Nonparametric Statistics
- Number of ECTS: 5
- Course code: MA_DS-18
- Module(s): Module 8 Advanced courses
- Language: EN
- Mandatory: No
-
Objectives
Upon completion a successful student should be able toConstruct classical estimators of unknown probability distributionDerive asymptotic properties of empirical distribution functionsPerform tests for unknown distribution functionsConstruct non-parametric estimators of the densityEstimate components of a non-linear regression modelUnderstand the concept of minimax estimation
-
Course learning outcomes
Understanding the concepts of empirical distribution functions, kernel estimation and minimax theoryPerforming derivation of standard estimation and testing methods in nonparametric statistics -
Description
Estimation of probability measuresWeak limit theorems for empirical measuresKolmogorov-Smirnov and Cramer-von Mises testsEstimation of the densityNadaraja-Watson estimator and general kernel estimatorsNon-linear regression modelsMinimax theory -
Assessment
First sessionSolving bi-weekly exercises (40%) and a final exam (60%) will build the overall grade.Retake examWritten exam. -
Note
LiteraturA.B. Tsybakov (2009): “Introduction to Nonparametric Estimation”, Springer.A. Van der Vaart and J. Wellner (1996): “Weak Convergence and Empirical Processes”, Springer.
-
Details
- Course title: Bayesian Statistics
- Number of ECTS: 5
- Course code: MA_DS-22
- Module(s): Module 8 Advanced courses
- Language: EN
- Mandatory: No
-
Objectives
Provide students with the necessary knowledge of Bayesian statistics, both from a theoretical as well as practical viewpoint.
-
Course learning outcomes
Handle of the concepts from Bayesian statistics, Bayesian inference, Bayesian modelling, practical handle of Bayesian techniques on Python. -
Description
This course will provide a thorough introduction and overview on the most important concepts from Bayesian inference, starting with the Bayesian philosophy in contrast to frequentist statistics. We will discuss various choices of prior distributions, Bayesian inference, Bayesian modelling, model checking and comparison (in particular introducing the concept of Bayes factor), and advanced computation. The latter aspect will especially be dealt with in the practicals, where Markov Chain Monte Carlo methods shall be treated. -
Assessment
First session
Written exam with open questions, both theory and exercises, in particular there will be computational exercisesRetake examWritten exam with open questions, both theory and exercises, in particular there will be computational exercises -
Note
Note / Literature / Bibliography
Box, GEP and Tiao, GC (1992) Bayesian Inference in Statistical Analysis. John Wiley and Sons
Gelman, A, Carlin, JB, Stern, HS, Dunson, DB, Vehtari, A and Rubin, DB (2013) Bayesian Data Analysis 3rd edition. Chapman & Hall/CRC.
Brooks, S, Gelman, A, Jones, GL, Meng, X-L (2011) Handbook of Markov Chain Monte Carlo. Chapman & Hall/CRC
Course offer for Semestre 4 (2024-2025 Summer)
-
Details
- Course title: Internship or Master Thesis
- Number of ECTS: 30
- Course code: MA_DS-30
- Module(s): Module 10 – Internship or Master Thesis
- Language: EN
- Mandatory: Yes