GitHunt
JA

Jakob-Bach/AGD-Exercises-2020

The programming exercises for the lecture "Big Data Analytics" at the Karlsruhe Institute of Technology (KIT), winter term 2020/2021.

Exercises for Big Data Analytics, winter semester 2020/2021

This repo contains the exercises for the lecture "Big Data Analytics" ("Analysetechniken für große Datenbestände") at KIT in winter semester 2020/2021.
I re-structured the exercises as part of the "Baden-Württemberg-Zertifikat für Hochschuldidaktik".
Uebungskonzept.pdf introduces the exercise concept.
While the previous exercise concept focused on programming tasks, the new one uses a combination of:

  • programming tasks
  • (theory-oriented) online tests (for the learning management system ILIAS; I provide exports here)
  • (interactive, Q&A-based) exercise sessions (I summarized the discussions as presentations)

Further, there were two surveys (again on ILIAS) to assess the satisfaction of the students.

Apart from the solutions to the programming exercises, materials are in German.
For English materials, see the following year's course (note that the actual tasks differ more or less).

Programming Exercise Structure

The goal of the exercises is to apply basic data-mining techniques to small toy datasets.
The solutions use the programming language R, version 4.0.3, and the recent (end of 2020 / beginning of 2021) versions of all required packages.
Solutions are available as notebooks, in both .Rmd as well as .html format.

  • Exercise sheet 1:
    • R basics (including descriptive statistics and plots)
    • Statistical tests
  • Exercise sheet 2:
    • Entropy
    • PCA
  • Exercise sheet 3:
    • Decision trees
    • Evaluation of classifiers
  • Exercise sheet 4:
    • Association rules
  • Exercise sheet 5:
    • Clustering
    • Outlier detection
  • Exercise sheet 6:
    • Lab qualification task 2021

Lab Qualification

Since the beginning of 2019, aspiring participants of the practical course in the following summer semester have to solve a qualification task first.

  • Qualification task 2020: Data Mining Cup 2019
    • Download the data from the DMC website, un-zip it and place it in a folder called data/.
    • Create a simple valid submission with CreateDemoSubmission-xgboost.R.
    • Score a folder of ILIAS submissions with ScoreStudents.R (i.e., folder containing sub-folders named like <<Name>>_<<uxxxx>>_<<some number>>).
  • Qualification task 2021: Predict grid stability with the DSGC dataset
    • Prepare train-test split of dataset with PrepareData.R (also downloads data from UCI repo).
    • Create a simple valid submission with CreateDemoSubmission-rpart.R or CreateDemoSubmission-xgboost.R.
    • Score a folder of ILIAS submissions with ScoreStudents.R (i.e., folder containing sub-folders named like <<Name>>_<<uxxxx>>_<<some number>>).

Languages

HTML99.4%TeX0.5%R0.1%

Contributors

MIT License
Created March 22, 2021
Updated October 5, 2022
Jakob-Bach/AGD-Exercises-2020 | GitHunt