Switch to English Site

Make 154 charts in 10 lines of code with R

Make 154 charts in 10 lines of code with R

2022年3月10日

Data visualization automation isn’t just a dream. This short tutorial shows you how to create and save 154 charts as high-resolution PNG image files with just 10 lines of R code.

Years ago creating multiple charts for publication was painful. Wrangling the data to get it in the right format in Excel or painstakingly adding manual configurations to In Design was not fast nor fun. And updating the graphics with any changes in the dataset was equally, if not more, frustrating.

Thankfully, today's data professionals have many tools at their disposal to generate data visualization content at scale.

Step 1. Load required packages (2 lines of code)

We start by loading the tidyverse library, which gives us the data wrangling (via dplyr) and data visualization (via ggplot2) functions to get things started. We also bring in the lesser-known hrbrthemes library to give us some opinionated design options.

library(tidyverse)
library(hrbrthemes)

Step 2. Read in our input data (1 line)

We'll use a csv file that you can grab from GitHub and define it as countries.

countries <- read_csv("input_data/countries.csv")

A quick look at the data reveals three columns that represent country name, year, and population in millions. So the dataset tells us how the total population (in millions) has changed around the world between 1991 and 2018. Here are the raw records for Albania.

countryyearpopulation
Albania19913.26679
Albania19923.247039
Albania19933.227287
Albania19943.207536
Albania19953.187784
Albania19963.168033
Albania19973.148281
Albania19983.12853
Albania19993.108778
Albania20003.089027
Albania20013.060173
Albania20023.05101
Albania20033.039616
Albania20043.026939
Albania20053.011487
Albania20062.992547
Albania20072.970017
Albania20082.947314
Albania20092.927519
Albania20102.913021
Albania20112.905195
Albania20122.900401
Albania20132.895092
Albania20142.889104
Albania20152.880703
Albania20162.876101
Albania20172.873457
Albania20182.866376

Step 3. Write the for loop (7 lines)

We want to make one line chart for each country in the dataset. We start by adding unique(countries$country) to the loop definition so that each country gets cycled through the two nested functions.

for(target_country in unique(countries$country)) {
  ggplot(countries %>% filter(country == target_country),
         aes(x = year, y = population)) +
    geom_line() + geom_point(color='blue') +
    labs(title = target_country, subtitle = 'Population in millions from 1990 to 2018',
         y = 'Population in millions', x = 'Year') + theme_ipsum()
  ggsave(filename = str_c(target_country, '.png'), path = 'output_charts')  }

The first function, ggplot(), creates a simple line chart with a few labels and design options along the way. Adding + theme_ipsum() at the end cleans up the appearance nicely courtesy of the hrbrthemes library that we loaded above.

After a given plot is created with ggplot(), we then use ggsave() to export each country's chart. Here we use the unique country name as the file name with a .png extension as the expected output.

Albania, our example country, reveals a steep decline in population over the period.

Don’t like that our y-axis doesn’t start at 0 for all countries? Simply add + ylim(0, NA) to the ggplot() function and it will fix things right up. 

Step 4. Bask in glory and use the beautiful visualizations

Now the fun part. Send your beautiful charts to your marketing, publications, or comms teams for them to share with the world. Just don’t tell them it took you only 10 lines of code to get it done!

An alternative approach

Finally, for the R purists out there who find traditional loops offensive, here is a functional approach using purr. In terms of total processing time for this example, however, purr was only one second faster on my desktop to create all the saved images (19 seconds vs. 20 seconds). The code below is extended from the indispensable R for Data Science by Hadley Wickham and Garrett Grolemund.

plots <- countries %>% 
  split(.$country) %>% 
  map(~ggplot(., aes(x = year, y = population)) +
        geom_line() + geom_point(color='blue') +
        labs(title = .$country, subtitle = 'Population in millions from 1990 to 2018',
             y = 'Population in millions', x = 'Year') +
        theme_ipsum()
  )

paths <- str_c(names(plots), ".png")
pwalk(list(paths, plots), ggsave, path = "save_charts/output_charts")

 

相关课程

相关学习路径

Coursera
Johns Hopkins University