Chapter 12 Homework 3
12.1 Overview
12.1.1 Objectives
- Identify nominal versus ordinal attributes
- Practice loading a dataset and using ggplot to graph data
- Consider general data mining tasks
12.2 Setup
Create a new R Markdown file with the title “Homework 3” and with you as the author (hint: this information should be in your frontmatter at the top of the file). In your R Markdown file, create a section heading for each of the following parts of your homework (see this documentation for how to make a heading):
- Part A
- Part B
- Part C
When you knit your markdown document, these headings should be actual headings (like they are in this document). For each of your responses, clearly indicate which question the responses belongs to.
12.3 Part A. Attribute descriptions
You are given a data file (see assignment attachments on blackboard): hw03data.csv
.
This is a table of ratings for teaching assistants (TA) at some university (this is modified data that originally came from https://archive.ics.uci.edu/ml/datasets.php).
There are 6 attributes:
id
: unique number to identify each ratinginstructor
: id number that identifies the instructorclassNbr
: id number that identifies the coursesemester
: 1=summer, 2=fall, 3=winterclassSize
: number of students in the classrating
: 1=low, 2=medium, 3=high
For each attribute, describe whether it is nominal or ordinal.
id
instructor
classNbr
semester
classSize
rating
12.4 Part B. R practice
Load the tidyverse collection of packages and load hw03data.csv
in R (as a dataframe or tibble).
Using ggplot2, graph the relationship between class size and rating.
The type of graph is up to you (but must be appropriate).
Explicitly label your graph axes (e.g., using the labs
function).
12.5 Part C. Thinking about common data mining applications
- This question asks you to think about clustering. Considering an average kitchen, identify 4 clusters of objects. Name them according to their utility. For example, if I were asked to identify 4 clusters in a sporting goods store, I might choose apparel (e.g., shoes, shirts), containers (e.g., backpacks, coolers), sports equipment (e.g., basketball, tennis racquets), and fishing (e.g., rods, lures).
- Pick one of the following forms of predictive modeling: regression, classification, or ranking.
- Describe an application domain or a dataset where you think it would be useful to apply the chosen predictive modeling approach.
- What attribute(s) would you be predicting?
- What attributes would you use to make the prediction?