Chapter 6 Reading and Writing Data

You can download the R Markdown file used to generate this document (or webpage) here. I would encourage you to download that Rmd file, open it own your own computer in RStudio, and run the code chunks on your own (and modify things as you wish!).

In this tutorial:

  • I’ll walk you through how to write data stored in a dataframe as a .csv file.
  • And, I’ll show you how to load data saved in a .csv file into a dataframe.

6.1 Writing data

R comes preloaded with many datasets (for a complete list run library(help="datasets")). For example, Orange is a small data set that gives growth data for 5 orange trees.

head(Orange)
##   Tree  age circumference
## 1    1  118            30
## 2    1  484            58
## 3    1  664            87
## 4    1 1004           115
## 5    1 1231           120
## 6    1 1372           142

If we wanted to write this dataset out to a csv file, we could use the write.csv function.

write.csv(x=Orange, file="orange_trees.csv", row.names=FALSE)

If you run the above R code, where did orange_trees.csv get saved? (hint: in your current working directory)

write.csv has many options. Run ?write.csv in your R console to read the documentation.

write.csv will be very useful when you want to clean up or transform a dataset that you’ll be analyzing/working with. If your dataset is large and/or the preprocessing operations you perform are expensive, you don’t want do those operations every time you load your dataset. Instead, write a script to process the dataset and save the processed dataset.

6.2 Loading data

Loading data from a csv file is straightfoward. You’ll use the read.csv function.

pokemon_df <- read.csv(file="lecture-material/week-02/data/pokemon.csv")
head(pokemon_df)
##                     abilities against_bug against_dark against_dragon
## 1 ['Overgrow', 'Chlorophyll']        1.00            1              1
## 2 ['Overgrow', 'Chlorophyll']        1.00            1              1
## 3 ['Overgrow', 'Chlorophyll']        1.00            1              1
## 4    ['Blaze', 'Solar Power']        0.50            1              1
## 5    ['Blaze', 'Solar Power']        0.50            1              1
## 6    ['Blaze', 'Solar Power']        0.25            1              1
##   against_electric against_fairy against_fight against_fire against_flying
## 1              0.5           0.5           0.5          2.0              2
## 2              0.5           0.5           0.5          2.0              2
## 3              0.5           0.5           0.5          2.0              2
## 4              1.0           0.5           1.0          0.5              1
## 5              1.0           0.5           1.0          0.5              1
## 6              2.0           0.5           0.5          0.5              1
##   against_ghost against_grass against_ground against_ice against_normal
## 1             1          0.25              1         2.0              1
## 2             1          0.25              1         2.0              1
## 3             1          0.25              1         2.0              1
## 4             1          0.50              2         0.5              1
## 5             1          0.50              2         0.5              1
## 6             1          0.25              0         1.0              1
##   against_poison against_psychic against_rock against_steel against_water
## 1              1               2            1           1.0           0.5
## 2              1               2            1           1.0           0.5
## 3              1               2            1           1.0           0.5
## 4              1               1            2           0.5           2.0
## 5              1               1            2           0.5           2.0
## 6              1               1            4           0.5           2.0
##   attack base_egg_steps base_happiness base_total capture_rate  classfication
## 1     49           5120             70        318           45   Seed Pokémon
## 2     62           5120             70        405           45   Seed Pokémon
## 3    100           5120             70        625           45   Seed Pokémon
## 4     52           5120             70        309           45 Lizard Pokémon
## 5     64           5120             70        405           45  Flame Pokémon
## 6    104           5120             70        634           45  Flame Pokémon
##   defense experience_growth height_m hp         japanese_name       name
## 1      49           1059860      0.7 45 Fushigidaneフシギダネ  Bulbasaur
## 2      63           1059860      1.0 60  Fushigisouフシギソウ    Ivysaur
## 3     123           1059860      2.0 80 Fushigibanaフシギバナ   Venusaur
## 4      43           1059860      0.6 39      Hitokageヒトカゲ Charmander
## 5      58           1059860      1.1 58       Lizardoリザード Charmeleon
## 6      78           1059860      1.7 78    Lizardonリザードン  Charizard
##   percentage_male pokedex_number sp_attack sp_defense speed type1  type2
## 1            88.1              1        65         65    45 grass poison
## 2            88.1              2        80         80    60 grass poison
## 3            88.1              3       122        120    80 grass poison
## 4            88.1              4        60         50    65  fire       
## 5            88.1              5        80         65    80  fire       
## 6            88.1              6       159        115   100  fire flying
##   weight_kg generation is_legendary
## 1       6.9          1            0
## 2      13.0          1            0
## 3     100.0          1            0
## 4       8.5          1            0
## 5      19.0          1            0
## 6      90.5          1            0

One slightly tricky thing to keep in mind about loading data is where R will look for the data you’re loading. This affects how you specify the file argument when you call read.csv. You can always specify the complete file path, for example (on my computer):

pokemon_df <- read.csv(file="/Users/lalejina/class_ws/CIS635-f22/gvsu-cis635-2022f/lecture-material/week-02/data/pokemon.csv")
head(pokemon_df)
##                     abilities against_bug against_dark against_dragon
## 1 ['Overgrow', 'Chlorophyll']        1.00            1              1
## 2 ['Overgrow', 'Chlorophyll']        1.00            1              1
## 3 ['Overgrow', 'Chlorophyll']        1.00            1              1
## 4    ['Blaze', 'Solar Power']        0.50            1              1
## 5    ['Blaze', 'Solar Power']        0.50            1              1
## 6    ['Blaze', 'Solar Power']        0.25            1              1
##   against_electric against_fairy against_fight against_fire against_flying
## 1              0.5           0.5           0.5          2.0              2
## 2              0.5           0.5           0.5          2.0              2
## 3              0.5           0.5           0.5          2.0              2
## 4              1.0           0.5           1.0          0.5              1
## 5              1.0           0.5           1.0          0.5              1
## 6              2.0           0.5           0.5          0.5              1
##   against_ghost against_grass against_ground against_ice against_normal
## 1             1          0.25              1         2.0              1
## 2             1          0.25              1         2.0              1
## 3             1          0.25              1         2.0              1
## 4             1          0.50              2         0.5              1
## 5             1          0.50              2         0.5              1
## 6             1          0.25              0         1.0              1
##   against_poison against_psychic against_rock against_steel against_water
## 1              1               2            1           1.0           0.5
## 2              1               2            1           1.0           0.5
## 3              1               2            1           1.0           0.5
## 4              1               1            2           0.5           2.0
## 5              1               1            2           0.5           2.0
## 6              1               1            4           0.5           2.0
##   attack base_egg_steps base_happiness base_total capture_rate  classfication
## 1     49           5120             70        318           45   Seed Pokémon
## 2     62           5120             70        405           45   Seed Pokémon
## 3    100           5120             70        625           45   Seed Pokémon
## 4     52           5120             70        309           45 Lizard Pokémon
## 5     64           5120             70        405           45  Flame Pokémon
## 6    104           5120             70        634           45  Flame Pokémon
##   defense experience_growth height_m hp         japanese_name       name
## 1      49           1059860      0.7 45 Fushigidaneフシギダネ  Bulbasaur
## 2      63           1059860      1.0 60  Fushigisouフシギソウ    Ivysaur
## 3     123           1059860      2.0 80 Fushigibanaフシギバナ   Venusaur
## 4      43           1059860      0.6 39      Hitokageヒトカゲ Charmander
## 5      58           1059860      1.1 58       Lizardoリザード Charmeleon
## 6      78           1059860      1.7 78    Lizardonリザードン  Charizard
##   percentage_male pokedex_number sp_attack sp_defense speed type1  type2
## 1            88.1              1        65         65    45 grass poison
## 2            88.1              2        80         80    60 grass poison
## 3            88.1              3       122        120    80 grass poison
## 4            88.1              4        60         50    65  fire       
## 5            88.1              5        80         65    80  fire       
## 6            88.1              6       159        115   100  fire flying
##   weight_kg generation is_legendary
## 1       6.9          1            0
## 2      13.0          1            0
## 3     100.0          1            0
## 4       8.5          1            0
## 5      19.0          1            0
## 6      90.5          1            0

Often, it’ll be easier to specify a relative path. In RStudio, when you do not specify a complete path, paths are relative to your current working directory. Run getwd() to see what your current working directory is set to. Use setwd to change your working directory. Read more about project management and R working directories here: https://r4ds.had.co.nz/workflow-projects.html