Chapter 6 Reading and Writing Data
You can download the R Markdown file used to generate this document (or webpage) here. I would encourage you to download that Rmd file, open it own your own computer in RStudio, and run the code chunks on your own (and modify things as you wish!).
In this tutorial:
- I’ll walk you through how to write data stored in a dataframe as a
.csv
file. - And, I’ll show you how to load data saved in a
.csv
file into a dataframe.
6.1 Writing data
R comes preloaded with many datasets (for a complete list run library(help="datasets")
).
For example, Orange
is a small data set that gives growth data for 5 orange trees.
head(Orange)
## Tree age circumference
## 1 1 118 30
## 2 1 484 58
## 3 1 664 87
## 4 1 1004 115
## 5 1 1231 120
## 6 1 1372 142
If we wanted to write this dataset out to a csv file, we could use the write.csv
function.
write.csv(x=Orange, file="orange_trees.csv", row.names=FALSE)
If you run the above R code, where did orange_trees.csv
get saved? (hint: in your current working directory)
write.csv
has many options. Run ?write.csv
in your R console to read the documentation.
write.csv
will be very useful when you want to clean up or transform a dataset that you’ll be analyzing/working with.
If your dataset is large and/or the preprocessing operations you perform are expensive, you don’t want do those operations every time you load your dataset.
Instead, write a script to process the dataset and save the processed dataset.
6.2 Loading data
Loading data from a csv file is straightfoward. You’ll use the read.csv
function.
<- read.csv(file="lecture-material/week-02/data/pokemon.csv")
pokemon_df head(pokemon_df)
## abilities against_bug against_dark against_dragon
## 1 ['Overgrow', 'Chlorophyll'] 1.00 1 1
## 2 ['Overgrow', 'Chlorophyll'] 1.00 1 1
## 3 ['Overgrow', 'Chlorophyll'] 1.00 1 1
## 4 ['Blaze', 'Solar Power'] 0.50 1 1
## 5 ['Blaze', 'Solar Power'] 0.50 1 1
## 6 ['Blaze', 'Solar Power'] 0.25 1 1
## against_electric against_fairy against_fight against_fire against_flying
## 1 0.5 0.5 0.5 2.0 2
## 2 0.5 0.5 0.5 2.0 2
## 3 0.5 0.5 0.5 2.0 2
## 4 1.0 0.5 1.0 0.5 1
## 5 1.0 0.5 1.0 0.5 1
## 6 2.0 0.5 0.5 0.5 1
## against_ghost against_grass against_ground against_ice against_normal
## 1 1 0.25 1 2.0 1
## 2 1 0.25 1 2.0 1
## 3 1 0.25 1 2.0 1
## 4 1 0.50 2 0.5 1
## 5 1 0.50 2 0.5 1
## 6 1 0.25 0 1.0 1
## against_poison against_psychic against_rock against_steel against_water
## 1 1 2 1 1.0 0.5
## 2 1 2 1 1.0 0.5
## 3 1 2 1 1.0 0.5
## 4 1 1 2 0.5 2.0
## 5 1 1 2 0.5 2.0
## 6 1 1 4 0.5 2.0
## attack base_egg_steps base_happiness base_total capture_rate classfication
## 1 49 5120 70 318 45 Seed Pokémon
## 2 62 5120 70 405 45 Seed Pokémon
## 3 100 5120 70 625 45 Seed Pokémon
## 4 52 5120 70 309 45 Lizard Pokémon
## 5 64 5120 70 405 45 Flame Pokémon
## 6 104 5120 70 634 45 Flame Pokémon
## defense experience_growth height_m hp japanese_name name
## 1 49 1059860 0.7 45 Fushigidaneフシギダネ Bulbasaur
## 2 63 1059860 1.0 60 Fushigisouフシギソウ Ivysaur
## 3 123 1059860 2.0 80 Fushigibanaフシギバナ Venusaur
## 4 43 1059860 0.6 39 Hitokageヒトカゲ Charmander
## 5 58 1059860 1.1 58 Lizardoリザード Charmeleon
## 6 78 1059860 1.7 78 Lizardonリザードン Charizard
## percentage_male pokedex_number sp_attack sp_defense speed type1 type2
## 1 88.1 1 65 65 45 grass poison
## 2 88.1 2 80 80 60 grass poison
## 3 88.1 3 122 120 80 grass poison
## 4 88.1 4 60 50 65 fire
## 5 88.1 5 80 65 80 fire
## 6 88.1 6 159 115 100 fire flying
## weight_kg generation is_legendary
## 1 6.9 1 0
## 2 13.0 1 0
## 3 100.0 1 0
## 4 8.5 1 0
## 5 19.0 1 0
## 6 90.5 1 0
One slightly tricky thing to keep in mind about loading data is where R will look for the data you’re loading.
This affects how you specify the file
argument when you call read.csv
.
You can always specify the complete file path, for example (on my computer):
<- read.csv(file="/Users/lalejina/class_ws/CIS635-f22/gvsu-cis635-2022f/lecture-material/week-02/data/pokemon.csv")
pokemon_df head(pokemon_df)
## abilities against_bug against_dark against_dragon
## 1 ['Overgrow', 'Chlorophyll'] 1.00 1 1
## 2 ['Overgrow', 'Chlorophyll'] 1.00 1 1
## 3 ['Overgrow', 'Chlorophyll'] 1.00 1 1
## 4 ['Blaze', 'Solar Power'] 0.50 1 1
## 5 ['Blaze', 'Solar Power'] 0.50 1 1
## 6 ['Blaze', 'Solar Power'] 0.25 1 1
## against_electric against_fairy against_fight against_fire against_flying
## 1 0.5 0.5 0.5 2.0 2
## 2 0.5 0.5 0.5 2.0 2
## 3 0.5 0.5 0.5 2.0 2
## 4 1.0 0.5 1.0 0.5 1
## 5 1.0 0.5 1.0 0.5 1
## 6 2.0 0.5 0.5 0.5 1
## against_ghost against_grass against_ground against_ice against_normal
## 1 1 0.25 1 2.0 1
## 2 1 0.25 1 2.0 1
## 3 1 0.25 1 2.0 1
## 4 1 0.50 2 0.5 1
## 5 1 0.50 2 0.5 1
## 6 1 0.25 0 1.0 1
## against_poison against_psychic against_rock against_steel against_water
## 1 1 2 1 1.0 0.5
## 2 1 2 1 1.0 0.5
## 3 1 2 1 1.0 0.5
## 4 1 1 2 0.5 2.0
## 5 1 1 2 0.5 2.0
## 6 1 1 4 0.5 2.0
## attack base_egg_steps base_happiness base_total capture_rate classfication
## 1 49 5120 70 318 45 Seed Pokémon
## 2 62 5120 70 405 45 Seed Pokémon
## 3 100 5120 70 625 45 Seed Pokémon
## 4 52 5120 70 309 45 Lizard Pokémon
## 5 64 5120 70 405 45 Flame Pokémon
## 6 104 5120 70 634 45 Flame Pokémon
## defense experience_growth height_m hp japanese_name name
## 1 49 1059860 0.7 45 Fushigidaneフシギダネ Bulbasaur
## 2 63 1059860 1.0 60 Fushigisouフシギソウ Ivysaur
## 3 123 1059860 2.0 80 Fushigibanaフシギバナ Venusaur
## 4 43 1059860 0.6 39 Hitokageヒトカゲ Charmander
## 5 58 1059860 1.1 58 Lizardoリザード Charmeleon
## 6 78 1059860 1.7 78 Lizardonリザードン Charizard
## percentage_male pokedex_number sp_attack sp_defense speed type1 type2
## 1 88.1 1 65 65 45 grass poison
## 2 88.1 2 80 80 60 grass poison
## 3 88.1 3 122 120 80 grass poison
## 4 88.1 4 60 50 65 fire
## 5 88.1 5 80 65 80 fire
## 6 88.1 6 159 115 100 fire flying
## weight_kg generation is_legendary
## 1 6.9 1 0
## 2 13.0 1 0
## 3 100.0 1 0
## 4 8.5 1 0
## 5 19.0 1 0
## 6 90.5 1 0
Often, it’ll be easier to specify a relative path.
In RStudio, when you do not specify a complete path, paths are relative to your current working directory.
Run getwd()
to see what your current working directory is set to.
Use setwd
to change your working directory.
Read more about project management and R working directories here: https://r4ds.had.co.nz/workflow-projects.html