Set-up

This block will install the gapminder package and load it. If this is your first time, you will need to "uncomment" the install.packages line in order to install it. You will want to comment it back out or delete it once it's installed, or it can create problems when knitting

Now that we've loaded the library, we have access to a data frame. Take a look at the data set by running this code. You can also type View(gapminder) into the console to see all the data, which will show up in a new window.

Note that you can use head() and tail() to show the first and last six rows of the data as well. What happens when you run head(gapminder,20)?

Selecting with select()

We can select specific columns or a range of columns using select()

What does this do?

Write two different ways to select country, continent, year, and gdpPercap

Filtering with filter()

Recall that we can use filter() restrict our data according to criteria. We can display it as is, or we can assign it to an object if we want to use that restricted data later.

You can also use multiple conditions, and these will extract rows that meet every test. By default, if you separate the tests with a comma, R will consider this an “and” test and find rows that are both Denmark and greater than 2000.

What does this do?

Use filter() to do the following:

Arrange

We sort our data using arrange()

The first line sorts by year, then by country. What does the second line do? What does the third line do?

Use arrange() to do the followin:

Piping

You can nest commands inside other commands to combine them, but your life will be 100% better with... piping! (The keyboard shortcut is Shift-Command-M on a Mac, or Ctrl-Shift-M on a PC)

Use arrange() and filter() to do the following:

In general, you can also use help files, Google, and other resources to find new commands. I wanted to list countries that ever had life expectancies below 35.

This is okay, but there are lots of duplicates! I also don't want to pick just one year, because life expectancies aren't always increasing. What's one way to solve this if you don't know where to start?

  1. Googled what I wanted to do "list distinct observations" (it may take a few tries)
  2. Looked for a command that worked with frames.
  3. Ended up in the dplyr documentation (surprise!) with the distinct command.
  4. Reviewed briefly, but then jumped down to examples to make sure I knew how to apply it.

Basic plotting

Next week we'll do more plotting. For now, use what we know about filters to create a second plot that shows the distribution of per-capita GDP for 2007.

All done!

Knit me and upload me to BB. If you knit to an html file, you will need to compress it first.