RWTHx: Basics of Data Science

RWTHx: Basics of Data Science

Course
en
English
72 h
This content is rated 0 out of 5
Source
  • From https://www.edx.org
More info
  • 9 Sequences
  • Intermediate Level
  • Starts on June 23, 2024
  • Ends on December 10, 2024

Their employees are learning daily with Edflex

  • Safran
  • Air France
  • TotalEnergies
  • Generali
Learn more

Course details

Syllabus

Week 1: Introduction, Data Exploration & Visualization

In the first half of the week, we will provide an overview of the course and illustrate the advantages and challenges when applying data science techniques. Students will get an overview of the data science pipeline, data sources and data types, data analysis techniques and challenges related to their application.

The second half of the week focuses on basic data exploration, visualization and transformation techniques.

Week 2: Supervised Learning Techniques

In the first half of this week, students will delve into data analysis using decision trees. We introduce the basic ID3 Algorithm and its extension to different notions of information gain, as well as pruning techniques, random forests and the applicability of decision trees to continuous data.

The second half of the week is dedicated to a brief overview of other supervised learning techniques (students interested in details are referred to the "Basics of Machine Learning" course which is also part of the BridgingAI course series). These techniques include Linear Regression, Logistic Regression, Support Vector Machines (SVMs), Neural Networks and Naive Bayesian Classification.

Week 3: Evaluation of Supervized Learning, Data Quality & Preprocessing

The first half of this week is dedicated to the evaluation of supervised learning techniques and the models they produce. We introduce the confusion matrix, ROC curve, R2 Coefficient and cross validation including their extension and adaption to specific goals or contexts. Furthermore, challenges and pitfalls regarding the evaluation and interpretation of supervised learning techniques are highlighted.

In the second half of the week, students will learn about data quality issues, their causes and avoidance strategies as well as possible approaches to dealing with outliers or missing values. Furthermore, and overview of data transformation, data reduction and normalization techniques is given.

Week 4: Clustering, Frequent Itemsets

In the first half of this week clustering is introduced as the first unsupervised learning technique. In particular, we present various similarity measures, the k-means and k-medoids algorithms, density-based clustering (DBSCAN) and give an overview of agglomerative clustering techniques and self-organizing maps (SOM).

The second half of the week focuses on the introduction of frequent itemsets. Two algorithms to compute such itemsets are explained: the straightforward Apriori approach as well as the more efficient FP-Growth algorithm.

Week 5: Association Rule Mining, Sequence Mining

In this week, we build upon the concepts of frequent itemsets to generate and evaluate association rules. Furthermore, we use association rules to illustrate Simpson's paradox.

The second half of the week revolves around sequence mining, in particular the AprioriAll algorithm. The relationships between frequent itemsets, association rules, sequence mining and process mining (introduced in Week 6) are clarified.

Week 6: Process Mining

The whole week is dedicated to various aspects of process mining. We start out with an extensive introduction to the topic, including various types of models, tools and applications. Next, various approaches to process discovery are presented as the most prominent example of unsupervised learning in the context of process mining. Finally, supervised problems in process mining are discussed with the main focus on conformance checking techniques.

Week 7: Text Mining

In this week we explore the topic of text mining. Various approaches to text preprocessing are discussed, including corpus annotation, tokenization, stop word removal, token normalization, stemming and lemmatization, followed by an overview of modelling techniques, i.e., BoW, document-term matrix and TF-IDF scoring. We briefly discuss the inclusion of semantics using public databases (Linked Open Data) before proceeding with a detailed introduction to N-grams and their application to word prediction and text generation. These concepts are extended in the following when discussing word embeddings, particularly the concepts of autoencoders, Word2vec, CBoW and Doc2vec.

Week 8: Responsible Data Science

In this week we discuss challenges and solution approaches to confidentiality and fairness in data science. The first half of the week is dedicated to confidentiality. We give a brief overview to data encryption before introducing various techniques to anonymize data while maintaining its usefulness for analysis and to objectively evaluate the level of anonymization.

The second part of the week, focusing on fairness, introduces various metrics to objectively measure fairness and explores approaches to decrease discrimination of data science models and techniques. We conclude with a discussion of the potential trade-offs between model performance and model fairness.

Week 9: The Bigger Picture

In the final week, we briefly recap the contents of the course and discuss connections, trade-offs, conflicts and interactions between the various topics as well as their context and impact within the bigger picture of data science. An outlook to further perspectives and topics omitted in this introductory course is given.

Prerequisite

Everyone from any discipline with an interest in data science can start this course. We expect this course to be useful for everyone. Prior knowledge in math is of advantage (i.e., mathematical notations, linear algebra, stochastics, and statistics), but not mandatory.

Instructors

Prof. Dr. Wil van der Aalst
Head of Chair for Process and Data Science • RWTH Aachen University

Lisa Luise Mannel
Doctoral student at the Process and Data Science (PADS) group • RWTH Aachen University

Editor

RWTH Aachen University is among the leading technical universities in Europe measured by its academic output, the quality of its graduates, and by the quantity of its external funding. A unique location in the center of Europe bordering Belgium and the Netherlands induces an international and technology driven climate for Aachen as a University City as well as for the whole region. Founded in 1870 RWTH holds 260 institutes in nine faculties and has more than 42,000 Students, about 7,000 of whom are international.

Teaching at RWTH Aachen is first and foremost application-oriented. Its graduates are therefore sought-after as junior executives and leaders in business and industry. National rankings and international assessments attest to the RWTH graduates’ marked ability to handle complex tasks, to solve problems constructively in team work and to take on leadership roles. It is therefore not surprising that many board members of German corporate groups studied at RWTH Aachen.

RWTH Aachen University has a major interest in sharing its qualified lectures and research experiences and has a history in implementing numerous blended learning patterns in higher education.

Platform

Harvard University, the Massachusetts Institute of Technology, and the University of California, Berkeley, are just some of the schools that you have at your fingertips with EdX. Through massive open online courses (MOOCs) from the world's best universities, you can develop your knowledge in literature, math, history, food and nutrition, and more. These online classes are taught by highly-regarded experts in the field. If you take a class on computer science through Harvard, you may be taught by David J. Malan, a senior lecturer on computer science at Harvard University for the School of Engineering and Applied Sciences. But there's not just one professor - you have access to the entire teaching staff, allowing you to receive feedback on assignments straight from the experts. Pursue a Verified Certificate to document your achievements and use your coursework for job and school applications, promotions, and more. EdX also works with top universities to conduct research, allowing them to learn more about learning. Using their findings, edX is able to provide students with the best and most effective courses, constantly enhancing the student experience.

Complete this resource to write a review