Chapter 32 Final project overview

In this data mining project, you will choose a data mining application domain and develop a data mining pipeline for a specific problem.

32.1 Problem statement

One of the goals for this course is for you to be able to design, implement, and evaluate a data mining / knowledge discovery pipeline from start to finish. You will propose and implement a data mining project in an application domain of your choice. You will be guided on your project through a series of check-in assignments (listed below), culminating in a written final report.

32.2 Learning objectives

  1. Design and implement a full data mining pipeline
  2. Develop skills for planning a data mining project and conducting background research
  3. Gain proficiency with data mining algorithms and techniques
  4. Effectively communicate your methodology and results

32.3 Groups

You will be allowed to work in groups of three or work by yourself. I will assist in forming groups by sending out a survey asking about which topics interest you, who you would/would not like to work with (if anyone), and your expectations for your final project.

32.4 Grading

40% of your grade in this course relates to your final project:

Item Contribution to grade
Survey response 2.5%
Project proposal 10%
Project update 2.5%
Draft 5%
Final report 20%

Note that you cannot use your homework extensions on project components.

All components of your final project are subject to GVSU and the School of Computing’s academic honesty policies. Violations of these policies may result in failure from the course.

32.5 Timeline

Week Project Component
Week 5 Survey out (Monday, 09/26)
Week 6 Survey due (Monday, 10/03)
Week 7 Groups assigned (by Monday, 10/10)
Week 8
Week 9 Project proposal due (Wednesday, 10/26)
Week 10 Receive proposal decision and feedback (by Monday, 10/31)
Week 11
Week 12 Progress report due (Wednesday, 11/16)
Week 13
Week 14 Project report draft due (Wednesday, 11/30)
Week 15 Receive feedback on draft
Week 16 Final project due (Wednesday, 12/14)

32.6 Possible project ideas

I do understand that we have not covered many data mining techniques at this point in the class. However, we have overviewed broad categories of data mining tasks. My advice is to start by choosing a domain that is interesting to you and a type of data mining task that you think is applicable (and that you are interested in learning more about). You will then need to do your own research on the specifics.

Some example types of projects (your project does not need to fall under any of these categories)

  • Solve a problem
    • E.g., Design and apply a data mining pipeline to a particular problem you are interested in, and then evaluate your approach by comparing to another method.
  • Replicate (and/or extend) a published study
    • E.g., identify a published study that you are interested in and independently replicate some portion of it (you must create your own pipeline to do so, not just reuse the original authors’).
  • Answer a research question
    • E.g., formulate a hypothesis and design (and apply) a data mining approach to test your hypothesis.
  • Analysis
    • E.g., design a data mining pipeline for some task (or use an existing pipeline) and analyze how/why it works (or doesn’t work). You might try swapping out individual components of the pipeline for alternative techniques, add noise to the dataset, etc, etc.
    • E.g., design an experiment to analyze the properties of different similarity metrics and their effect on a data mining algorithm

You are welcome to drop by office hours to chat about project ideas!