WayneJz/COMP9321-19T1
COMP9321 Data Services Engineering 2019T1
COMP9321-19T1
COMP9321 Data Services Engineering 2019T1
ALL CODES SHOULD BE APPROPRIATELY REFERENCED, COPYING MAY RESULT IN PLAGIARISM
Lecturer in charge: Lina Yao
Assignments
-
Data cleaning and visualization, Mark: 13/13 (Bonus 3 marks).
-
Restful Flask API and Swagger, Mark: 10/10.
-
Heart disease analysis (including Machine learning,
backend and frontend), Mark: 16/20.
Main content
-
Data formation and access: Fetch and collect different types of data, PDF, XML etc. database
access with SQL and ORM. -
Data quality and cleaning: Standardization, normalization (Min-max, z-score, log ...), NLP transformation
(Tokenization, Lemmatization, Stemming), Pairwise matching score. -
Data visualization: Benefits of data visualization, different visual methods (Histograms, charts etc.),
High dimension visualization, dimensionality reduction (PCA and SVD algorithms), drawbacks of PCA. -
Restful API and client: HTTP request methods, XML-based and restful API, uniform interface, statelessness,
caching, swagger, SOAP vs REST, API design, API security (Authentication, token-based methods etc..), O-Auth. -
Data analytics: Bayes theorem, overview of data mining, correlation, similarity measure, unsupervised
learning, clustering (K-Means, K-Means++), association rules mining (Apriori algorithm). -
Supervised learning: Linear regression, least square error (R-square values and p values), logistic
regression, instance-based method (KNN), decision tree, build decision tree (entropy, ID3 algorithm), overfitting,
cross validation, bagging decision tree, random forest. -
Neural networks: Gradient ascent/descent, forward pass, back propagation, activation and loss functions,
learning rates, avoid overfitting (early stopping and dropout), Tensorflow and Keras, CNN, CNN convolution, RNN,
RNN back propagation, long short-term memory (LSTM). -
Recommender Systems: Collaborative filtering, pearson correlation, user-based vs item-based vs content-based,
latent factor based model, SVD based model, TF-IDF method, cosine similarity, knowledge-based approaches, hybrid
recommender systems, accuracy (MAE, RMSE).
Copyright and Credits
All course slides, materials come from the lecturers.
No sharing or commercial use before getting agreement from them.
I will take no responsibility for such misuse.