- English
- فارسی
Data Analysis (Spring 2020)
Topics:
- Statistical learning: (Supervised learning - Unsupervised learning)
- Linear Regression
- Classification: ( Logistic Regression, Bayes Classfier)
- Linear Model Selection and Regularization: (Ridge Regression, Lasso Regression)
- Decision Trees: (Bagging, Random Forests, Boosting)
- Clustering: (K-means, Hierarchical, Model-based: Mixture models)
- Neural Networks (A brief introduction)
TextBook:
Moset of the topics are based on:
- An Introduction to Statistical Learning: with Applications in R (2013) (Springer Series in Statistics) by G. James, D. Witten, T. Hastie and R. Tibshirani
- The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Springer Series in Statistics) (2001 & 2009) by T. Hastie, R. Tibshirani, J. H. Friedman.
-
Final exam: 11-12 points
Group assignments: 4-6 points
Group project: 2-4 points
Extra points: up to 1 point for remarkable questions/answers about your classmates' projects
Extra points: up to 1 point for optional quiz for Python software
--------------------------------------------------------------------------------------------------------------------------------------------------------
Assignements must be done with the R software. Project can be done with any software including R/Python.
Students who take less than 40% of the final exam will receive half the group score.
---------------------------------------------------------------------------------------------------------------------------------------------------------
Picture: from Google search
About projects:
The project must be about a real data set in Iran with many variables/observations. There are some deadlines for reporting project progress and presenting your outputs by sending emails to a.mofidian@math.iut.ac.ir
Reports and outputs must be prepared by WORD (Times New Roman, size 11 or Arial, size 10)
The first deadline: March 04 (14 Esfand) (Extended)
- Sending the data set along with its expalnation in one page including the description of variables, the way of collecting data and the reference.
- Also, one page about the possible data analysis tasks on your data and difficulties arrising in this way.
- One/two page/s about visualizing your data. (not necessary with R).
---------------------------------------------------------------------------------------------------------------------------------------------------------
- Amir Abbas Mofidian (PhD student of Statistics): teaching R, evaluating homeworks, consulting on projects
- Ghasemi nejad: teaching Python
Class time: Sundays and Tuesdays, 9:30-11
Also, there are two classes to teach R (Tuesdays at 8:00) and Python (Mondays at 16:30) in the Statistics Lab.
Attendance in the R class is mandatory. (Cancel)
------------------------------------------------------------------------------------------------------------------------------------------------------