Chapters 5.9 and 5.10, Economy, Society, and Public Policy
Visit the Data Access Research Tool from the LIS. Note how you can visualize inequality data based on income or wealth across time for multiple high-income countries!
On the main DART page, select “Income Data” and “Trends.” You’ll see a line graph with lots of options! Make sure that you are looking at the “Equivalised Disposable Household Income,” “Gini coefficeint,” and “Trends.”
Adjust the country selection to include the United States plus 4-5 additional countries of your choosing.
Do you see the export button at the bottom right-hand corner of the page? Let’s export it! Export as a table, which will save as a .xlsx
file. Save that file (or move it) to your working directory for this document.
Now, it’s time to import!
#replace with your file name
gini <- read_excel("dart-table_gini_1617151774650.xlsx")
## Take a look at what this does. We are calling the `clean_names() command from the janitor package. Look how it adds an "x" to the variable names for our year. That's because R gets weirded out with variable names that start with a number, and it will make stuff hard later on.
gini <- janitor::clean_names(gini)
gini
## # A tibble: 5 x 42
## countries x1978 x1979 x1980 x1981 x1982 x1983 x1984 x1985 x1986 x1987
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Australia NA 0.282 0.282 0.282 0.282 0.293 0.293 0.293 0.293 0.304
## 2 Brazil NA NA NA NA NA NA NA NA NA NA
## 3 Chile NA NA NA NA NA NA NA NA NA NA
## 4 Peru NA NA NA NA NA NA NA NA NA NA
## 5 United S… 0.31 0.31 0.31 0.31 NA NA 0.34 0.34 0.34 0.34
## # … with 31 more variables: x1988 <dbl>, x1989 <dbl>, x1990 <dbl>, x1991 <dbl>,
## # x1992 <dbl>, x1993 <dbl>, x1994 <dbl>, x1995 <dbl>, x1996 <dbl>,
## # x1997 <dbl>, x1998 <dbl>, x1999 <dbl>, x2000 <dbl>, x2001 <dbl>,
## # x2002 <dbl>, x2003 <dbl>, x2004 <dbl>, x2005 <dbl>, x2006 <dbl>,
## # x2007 <dbl>, x2008 <dbl>, x2009 <dbl>, x2010 <dbl>, x2011 <dbl>,
## # x2012 <dbl>, x2013 <dbl>, x2014 <dbl>, x2015 <dbl>, x2016 <dbl>,
## # x2017 <dbl>, x2018 <dbl>
Wait, how do we plot this? Will this work? (Don’t fix it, just note that this isn’t good).
gini %>%
ggplot(mapping = aes(x = countries, y = x1978)) +
geom_line()
## Warning: Removed 4 row(s) containing missing values (geom_path).
## geom_path: Each group consists of only one observation. Do you need to adjust
## the group aesthetic?
We could call this data “wide” That’s because each year is spread across columns rather than rows. It looks wide! Wide data has many uses - good for computing the average gini coefficient by country, for example:
gini <- gini %>% rowwise() %>%
mutate(average_gini = mean(c_across(x1978:x2018), na.rm = TRUE))
gini$average_gini
## [1] 0.3124444 0.4813333 0.4898387 0.4905882 0.3620256
What we want is one observation per country-year pair. That data would be considered “long” because it looks, well, long. There are a lot more observations this way! We can do this using the pivot_long
function.
#Reshape command
gini_long <- gini %>% pivot_longer(x1978:x2018,names_to="year",names_prefix="x",values_to="gini")
# Year comes in as a string variable, let's make it a number
gini_long <- type_convert(gini_long)
gini_long
## # A tibble: 205 x 4
## countries average_gini year gini
## <chr> <dbl> <dbl> <dbl>
## 1 Australia 0.312 1978 NA
## 2 Australia 0.312 1979 0.282
## 3 Australia 0.312 1980 0.282
## 4 Australia 0.312 1981 0.282
## 5 Australia 0.312 1982 0.282
## 6 Australia 0.312 1983 0.293
## 7 Australia 0.312 1984 0.293
## 8 Australia 0.312 1985 0.293
## 9 Australia 0.312 1986 0.293
## 10 Australia 0.312 1987 0.304
## # … with 195 more rows
Now, let’s make a line chart:
Why are there gaps? Because of the NA values. When there is a missing value (labeled as NA), R doesn’t have anything to plot. So, it skips it.
ggplot(data = gini_long, mapping = aes(x = year, y = gini, color = countries)) +
geom_line()
What I’ve done here is told R to ignore the gaps. That is, for our data set, use the gini_long
frame, but only include values that are not equal to NA in the gini
column.
You can read it this way:
gini_long
: ggplot( data = gini_long
, …)`!is.na
implies “not” equal to NA (!
is “not”).gini
column, which we indicate with gini_long$gini
Uniting these three concepts gives us this: ggplot(data = gini_long[!is.na(gini_long$gini),], ...)
ggplot(data = gini_long[!is.na(gini_long$gini),], mapping = aes(x = year, y = gini, color=countries)) +
geom_line()
This isn’t best practice, because the reader cannot tell which data is real and which is not. What’s a better strategy? Look below, I’ve made two different lines. The first line is the one with the gaps, and the second excludes the NA
values I’ve changed the line type of the second one to be dotted, so we can tell them apart. Neat!
ggplot(data = gini_long, mapping = aes(x = year, y = gini, color=countries)) +
geom_line() +
geom_line(data = filter(gini_long, !is.na(gini_long$gini)), linetype = "dotted")
Ok, so that was neat. Now, what is your job? Actually, it’s not too bad!
gini_plot
What is the Gini coefficient for the most recent year of data in the United States? In words, what does it mean?
Based on this chart, how does income inequality in the United States compare to the other countries you included most recently and over time? Are these patterns relatively similar or different when considering a 90/10 percentile ratio? (You can check using DARTS, no need to download.)
For this exercise, you plotted the Gini coefficient based on disposable income. Why is that important, and based on the readings, what you expect to be different if you used total income instead?
Knit and upload!
# You can also do this directly with annotations, but why not use a package!
# install.packages("directlabels")
library(directlabels)
ggplot(data = gini_long, mapping = aes(x = year, y = gini, color=countries)) +
geom_line() +
geom_line(data = filter(gini_long, !is.na(gini_long$gini)), linetype = "dotted") +
geom_dl(aes(label = countries), method = list("last.bumpup", cex = 0.5)) +
scale_x_continuous(limits = c(1975, 2020)) +
theme_minimal() +
theme(legend.position ="none")