Course Catalogue

Module Code and Title:       CSC308          Data Mining and Data Warehousing

Programme:                          BCA

Credit Value:                         12

Module Tutor:                       Phub Namgay

General Objective: This module introduces the concepts, techniques, design and applications of data warehousing and data mining. Some systems for data warehousing and/or data mining will also be introduced. The course will enable students to understand and implement classical algorithms in data mining and data warehousing. Students will learn how to analyse the data, identify the problems, and choose the relevant algorithms to apply. The module also discusses the aspects of data mining and data warehousing, encompassing the principles, and commercial application of the technologies. The module will prepare students for some of the most common scenarios in professional software development, which increasingly call for the management of large volumes of data. 

Learning Outcomes – On completion of the module, learners will be able to:

  1. Describe the functionality of the various data mining and data warehousing components.
  2. Analyse data, identify problems, and choose relevant models to apply.
  3. Define and implement classical algorithms in data warehousing and data mining.
  4. Explain the fundamental theories and concepts of data mining.
  5. Design data mining and data warehousing systems and solutions to meet user requirements.
  6. Evaluate and select appropriate technologies and tools for data mining and data warehousing.
  7. Explain the different operations and techniques involved in data mining.
  8. Utilize a range of techniques for designing data warehousing and data mining systems for real-world applications.

Learning and Teaching Approach:

Approach

Hours per week

Total credit hours

Lecture & discussions

4

60

Lab Practical

1

15

Independent study

3

45

Total

120

 

Assessment Approach:

A. Individual Assignment: Portion of Final Mark: 10%

Students should submit two assignments of 1000-1500 words related to Data Mining & Data Warehousing to obtain this 10%. The first one will be before the midterm and it constitutes half of the total 10% allocated. The second one will be after them midterm. 40% will be awarded for solving the problem, 40% for analysing the problem and 20% for the overall report.

B. Lab Practical Exam: Portion of Final Mark: 10%

This component assesses the student’s practical knowledge. They will be assessed on their program designing skills, maintaining syntax, use of functions, testing and debugging a code. 2-3 programs will be assigned to individual student. They have to solve it within predefined examination duration. 35% will be awarded sub tasks completed, 35% Techniques used for each sub task, 10% for timing and 30% for output.

C. Class Test: Portion of Final Mark: 10%

This is a written test conducted within the class for duration of 30-40 minutes and cover 2-3 weeks of material. There will two such tests, one before midterm comprising of topics from the beginning to the quarter point of the subject matter and the other after the midterm comprising of topics from after the midterm to quarter pointer after midterm.

D. Group Presentation: Portion of Final Mark: 10%

Aside from conventional style of learning, students are made to presents topics related to the module to their classmates. The objective of the presentation is to make students learn the art of presentation and improve their skills in communicating their knowledge with their peers. The presentation will be approximately 20-30 minutes, and include power points slides. 30% will be awarded for content of the presentation, 30% for preparedness, 10% for timing, 15% for handling of Q&A session, 5% for group-based coordination and 10% for presentation skill.

E. Midterm Examination: Portion of Final Mark: 20%

This a college wide examination conducted at the half-way into the semester. This examination is conducted for 1 hour and 30 Minutes and it includes all topics till the half-way point in the subject matter.

 

Areas of assignments

Quantity

Weighting

A. Individual Assignment

2

10%

B. Lab Practical Exam

1

10%

C. Class Test

2

10%

D. Group Presentation

1

10%

E. Midterm Exam

1

20%

Total Continuous Assessment (CA)

 

60%

Semester-end Examination (SE)

 

40%

 

Prerequisites: CAP102, CSC205 

Subject Matter:

  1. Introduction to Data Mining
    • Fundamentals of data mining
    • Data Mining Functionalities
    • Major issues in Data Mining.
    • Applications and Trends in Data Mining
      • Data Mining Applications
      • Data Mining and Society
      • Data Mining Trends
  1. Getting to Know your data
    • Data Objects and Attribute types
    • Basic Statistical Descriptions of Data
    • Data Visualization
    • Measuring Data Similarity and Dissimilarity
      • Cosine similarity
  1. Data pre-processing
    • Data pre-processing: An overview
    • Major Tasks in Data Pre-processing
      • Data Cleaning
      • Data Integration
      • Data Reduction
      • Data Transformation and Data Discretization
  1. Data Warehouse and OLAP Technology for Data Mining
    • Data Warehouse: Basic Concepts
      • What is Data Warehouse?
      • Differences between Operational Database Systems and Data Warehouses?
      • Data Warehousing: A Multitiered Architecture
      • Data Warehousing Models
    • Data Warehouse Modelling: Data Cube and OLAP
      • Data Cube: A Multidimensional Data Model
      • Schemas for Multidimensional data Models
    • Data Warehouse Design and Usage
  2. Mining Frequent Patterns and Analysis
    • Basic Concepts
    • Frequent Itemsets Mining Methods
      • Apriori Algorithm- Finding Frequent Itemsets by Confined Candidate generation.
      • Generating Association Rules from Frequent Itemsets
      • Improving the Efficiency of Apriori
    • Which Patterns Are Interesting?- Pattern Evaluation Methods
    • Numerical exercises
  3. Classification: Basic Concepts
    • Basic Concepts
      • What is Classification?
      • General Approach to Classification
    • Decision Tree Induction
      • Decision Tree Induction
      • Attribute Selection Measures
      • Tree Pruning
    • Rule-Based Classification
      • Using IF-THEN Rules for Classification
      • Rule Extraction from a Decision Tree
  1. Cluster Analysis: Basic Concepts and Methods
    • Cluster Analysis
      • What is Cluster Analysis?
      • Requirements of Cluster analysis
      • Overview of Basic Clustering Methods
    • Partitioning Methods
      • K-Means: A Centroid-Based Techniques
      • K-Medoids: A Representative Object-Based Techniques
      • Numerical exercises
    • Hierarchical Methods
      • Agglomerative versus Divisive Hierarchical Clustering
      • Distance Measures in Algorithmic Methods
      • Numerical calculations on Hierarchical clustering
      • Numerical exercises
  1. Outlier Detection
    • Outliers and Outliers Analysis
      • What are Outliers
      • Types of Outliers
      • Challenges in Outlier Detection
    • Outlier Detection Methods
      • Supervised, Semi-Supervised, and Unsupervised Methods
  1. Practical Components
    • R Programming
    • R Basics
    • R data types
    • Code editor for R
    • Control structures
    • Functions
    • Useful utilities
    • Data mining with R
    • Cluster analysis
    • K-Means clustering
    • Hierarchical clustering
    • Classification
    • Bayesian classification with R

Reading List:

  1. Essential Reading
    • Han, J., & Kamber, M., &Pei, J. (2011). DATA MINING – Concepts and Techniques (3rd Edition) .Morgan Kaufmann Publishers, Elsevier. 22006.
    • Data Mining and Warehousing (1st ed.). (2014). CL India.
    • Berson, A. (2004). DATA WAREHOUSING, DATA MINING, & OLAP. McGraw Hill Education (India) Private Limited.
  2. Additional Reading
    • Berson, A., & Smith, J. S. (2008). Data Warehousing, Data Mining, & Olap. New-Delhi: Tata McGraw-Hill Publishing Company Limited.
    • Inmon, W. H. (2005). Building the Data Warehouse (4th Edition). John Willey and Sons,Inc., New York, NY, USA
    • Tan, P. N., Steinbach, M., & Kumar, V. (2013). Introduction to Data Mining (2nd Edition). Addison-Wesley Longman Publishing Co., Inc. Boston, MA, USA. 
    • Linoff, G. (2012). Data Mining Techniques: For Marketing, Sales and Customer Relationship Management (3rd ed.). Wiley.

Date: May 30, 2015