To make this chunk run, you will need to point the file path in the right direction.
What do we see here. Things look okay, but usually we don’t think of days of the week and months as numbers. We want to name them!
head(births_combined)
## # A tibble: 6 x 5
## year month date_of_month day_of_week births
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1994 1 1 6 8096
## 2 1994 1 2 7 7772
## 3 1994 1 3 1 10142
## 4 1994 1 4 2 11248
## 5 1994 1 5 3 11053
## 6 1994 1 6 4 11406
There are three things we are doing here:
month
and day_of_week
that are ordered factors. “Factors” just means they are categorical variable, and ordered means that one comes before another (we wouldn’t want them in alphabetical order, for example)# The c() function lets us make a list of values
month_names <- c("January", "February", "March", "April", "May", "June", "July",
"August", "September", "October", "November", "December")
day_names <- c("Monday", "Tuesday", "Wednesday",
"Thursday", "Friday", "Saturday", "Sunday")
births <- births_combined %>%
# Make month an ordered factor, using the month_name list as labels
mutate(month = factor(month, labels = month_names, ordered = TRUE)) %>%
mutate(day_of_week = factor(day_of_week, labels = day_names, ordered = TRUE),
date_of_month_categorical = factor(date_of_month)) %>%
# Add a column indicating if the day is on a weekend
mutate(weekend = ifelse(day_of_week %in% c("Saturday", "Sunday"), TRUE, FALSE))
head(births)
## # A tibble: 6 x 7
## year month date_of_month day_of_week births date_of_month_categori… weekend
## <dbl> <ord> <dbl> <ord> <dbl> <fct> <lgl>
## 1 1994 January 1 Saturday 8096 1 TRUE
## 2 1994 January 2 Sunday 7772 2 TRUE
## 3 1994 January 3 Monday 10142 3 FALSE
## 4 1994 January 4 Tuesday 11248 4 FALSE
## 5 1994 January 5 Wednesday 11053 5 FALSE
## 6 1994 January 6 Thursday 11406 6 FALSE
If you look at the data now, you can see the columns are changed and have different types. year and date_of_month are still numbers, but month, and day_of_week are ordered factors (ord) and date_of_month_categorical is a regular factor (fct). Technically it’s also ordered, but because it’s already alphabetical (i.e. 2 naturally comes after 1), we don’t need to force it to be in the right order.
Our births data is now clean and ready to go!
See what happens when you delete and replace little bits of code:
guides(fill = FALSE)
(make sure you remove the + before it as well)fill = day_of_week
? (make sure you don’t delete any parentheses)x
and y
variables? (note that x can come before or after y - it doesn’t)total_births_weekday <- births %>%
group_by(day_of_week) %>%
summarize(total = sum(births))
ggplot(data = total_births_weekday,
mapping = aes(x = day_of_week, y = total, fill = day_of_week)) +
geom_col() +
# Turn off the fill legend because it's redundant
guides(fill = FALSE)
total_births_weekday <- births %>%
group_by(day_of_week) %>%
summarize(total = sum(births)) %>%
mutate(weekend = ifelse(day_of_week %in% c("Saturday", "Sunday"), TRUE, FALSE))
ggplot(data = total_births_weekday,
mapping = aes(x = day_of_week, y = total, fill = weekend)) +
geom_col()
This graph makes three modifications: - scale_fill_manual()
- adds specific colors - scale_y_continuous(labels = comma)
- gets away from scientific notation - labs()
- add some labels
Currently they are commented out. Add them back in. Note that you need to connect them with a + in order to make it work, and the + has to be at the end of the line.
ggplot(data = total_births_weekday,
mapping = aes(x = day_of_week, y = total, fill = weekend)) +
geom_col() +
# Use grey and orange
#scale_fill_manual(values = c("grey70", "#f2ad22")) +
# Use commas instead of scientific notation
#scale_y_continuous(labels = comma) +
# Turn off the legend since the title shows what the orange is
guides(fill = FALSE)
#labs(title = "Weekends are unpopular times for giving birth",
# x = NULL, y = "Total births")
Here, we’ve brought in a new geom - geom_pointrange
. You can make this look even better if you add in options to increase the size of the dots and width of the lines. Add options fatten = 5, size = 1.5
after the comma: geom_pointrange(aes(ymin = 0,ymax = total),PUT HERE)
. Feel free to play w/ the exact sizes.
ggplot(data = total_births_weekday,
mapping = aes(x = day_of_week, y = total, color = weekend)) +
geom_pointrange(aes(ymin = 0, ymax = total), )+
# Make the lines a little thicker and the dots a little bigger
# fatten = 5, size = 1.5) +
# Use grey and orange
scale_color_manual(values = c("grey70", "#f2ad22")) +
# Use commas instead of scientific notation
scale_y_continuous(labels = comma) +
# Turn off the legend since the title shows what the orange is
guides(color = FALSE) +
labs(title = "Weekends are unpopular times for giving birth",
x = NULL, y = "Total births")
Now, let’s make a cool strip plot. See what the jitter
is doing by removing position=position_jitter(height = 0)
.
ggplot(data = births,
mapping = aes(x = day_of_week, y = births, color = weekend)) +
scale_color_manual(values = c("grey70", "#f2ad22")) +
geom_point(size = 0.5, position = position_jitter(height = 0)) +
guides(color = FALSE)
To make a pretty version of a violin plot (“bee swarm”), you will need to install the package ggbswarm
(see in code). Once you install it, comment it back out.
#install.packages("ggbeeswarm")
library(ggbeeswarm)
ggplot(data = births,
mapping = aes(x = day_of_week, y = births, color = weekend)) +
scale_color_manual(values = c("grey70", "#f2ad22")) +
# Make these points suuuper tiny
geom_quasirandom(size = 0.0001) +
guides(color = FALSE)
Now, let’s make a heat map!
Look closely at what we’re doing 1. Generating a data frame that summarizes average births per day over the entire period 2. Plotting average births as the fill using geom_tile
.
avg_births_month_day <- births %>%
group_by(month, date_of_month_categorical) %>%
summarize(avg_births = mean(births))
## `summarise()` has grouped output by 'month'. You can override using the `.groups` argument.
ggplot(data = avg_births_month_day,
# By default, the y-axis will have December at the top, so use fct_rev() to reverse it
mapping = aes(x = date_of_month_categorical, y = fct_rev(month), fill = avg_births)) +
geom_tile() +
# Add nice labels
labs(x = "Day of the month", y = NULL,
title = "Average births per day",
subtitle = "1994-2014",
fill = "Average births") +
# Force all the tiles to have equal widths and heights
coord_equal() +
# Use a cleaner theme
theme_minimal()
labs
command we use in the heatmap? Add a set of labels to unlabelled graphs that include the followingNice that we don’t have to recreate the entire graph again, right? We can just edit within what we already have
# You'll want to use this form
## births %>% then filter(STUFF) %>% then start your ggplot commands. You don't need to do any additional data cleaning
## For your ggplot, you'll want to use geom_histogram()
What happens if we add a “fill” to our histogram? Create a new graph that includes fill=day_of_week
in your aes mapping. What is being plotted now?
(If you have time) What if we want to make a bar chart of total number of births per year? You can do this in a way similar to the barplot
chunk above, but you’ll want to group by year.
births_total = births %>%
group_by(year) %>%
summarize(total = sum(births))
# Now, add a bar chart
geom_col
with geom_line
! Wow, that is a lot clearer!