My first post about R packages

Photo by Markus Spiske on Unsplash

R package, or Library, contains a set of user-defined functions that bundled together to provide more functionality to base R, i.e. a collection R packages that came with the installation of R program. The reason that I built my very first R package, infoDecompuTE, was that I have a collection of R functions which I can constantly reuse and I would also like to share it to the public, via the Comprehensive R Archive Network, or CRAN. But more importantly, that was for my PhD study.

I want to use my first post to share my knowledge on the loading R packages and the complexity of loading many R packages at the same time in an R script or Rmarkdown document.

Note that R program is not the same as Rstudio, Rstudio is an integrated development environment (IDE) to the R program. Very simply, Rstudio is like notpad++ (which was what I used before Rstudio) for you to write R code more efficiently and also enable you to do much more, e.g. building R packages, developing an R Shiny applications and generating Rmarkdown documents, to name a few.

This post is not about how to build an R package, but you can find a very comprehensive book in this link, https://r-pkgs.org/.

here are two functions to load an R package, library() or require(). Let’s suppose if you want to load shiny R package, you can type the following into your R console,

library(shiny)

or

requrie(shiny)

By default, library() returns (invisibly) the list of attached packages, invisibly means it will not print any onto the screen console. Whereas, requrie() returns (invisibly) a logical indicating whether the required package is available, i.e. TRUE or FALSE. I find the later is better programmatically because we can implement some code to automatically install missing R packages, like below,

if(!require(shiny)){
install.packages("shiny")
}

Thus, the shiny R package will only install if you have not installed previously. One important thing to note from the example above, the package name is in the quotation when installing, whereas, the package name does not need to be in quotation when loading it. Documentation is always your best friend, i.e. ?install.packages.

Also, you only need to install the package once but need to load the package every time you start R program.

If there is a collection of R packages that you need to load and ensure they have been installed, the example below should do the job for you,

pkg <- c("shiny", "data.table", "ggplot2")
new_pkg <- pkg[!pkg %in% installed.packages())]
if(length(new_pkg) > 0) {
install.packages(new_pkg)
}

If there is a specific version of R package that you need to ensure has installed, for example, the version the shinyR package needs to be greater or equal to 1.5.0, then,

if(packageVersion("shiny") < "1.5.0") {
install.packages("shiny")
}

ust a few parting notes for now on the conflicts, if you load thedplyr R package, you will likely get a message below (if you have not installed any other R packages),

> library(dplyr)Attaching package: ‘dplyr’The following objects are masked from ‘package:stats’:filter, lagThe following objects are masked from ‘package:base’:intersect, setdiff, setequal, union

Or if you load the tidyverse R package, you will get a slight nicely formated message, which essentially means the same thing, below,

> library(tidyverse)
-- Attaching packages --------------------------- tidyverse 1.3.0 --
v ggplot2 3.3.2 v purrr 0.3.4
v tibble 3.0.4 v stringr 1.4.0
v tidyr 1.1.2 v forcats 0.5.0
v readr 1.4.0
-- Conflicts ------------------------------ tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()

These messages mean if you want to use filter() or lag() R functions, these will be based on the dplyr R package now, not from the stats R package. You can still to access these two functions from stats R package using double-colon, i.e. ::, like following,

stats::filter()stats::lag()

inally, three crucial points to finish this off,

  • Always examine these conflict messages to make sure you have not overwritten some important functions that you might need in your analysis. The best practice is not keeping on pilling down the loading process on all the R packages that you think you need, like below,
library(tidyverse) 
library(dplyr)
library(tidyr)
library(shiny)
library(shinydashboard)
...

It is worth it to spend some time to see if you really need all these R packages. A bonus tip, by loading just tidyverse R package, it will also loadggplot2 , dplyr , tidyr , readr , purrr , tibble , stringr and forcat R packages. You can find more information in this link, https://tidyverse.tidyverse.org/.

  • Always load the most important R package last, to avoid those conflict on your more essential R functions.
  • If you only need to use one or two R functions from an R package in one or two places of your R scripts or R markdown document, then do not load the whole R package. You can use double-colon :: , e.g. if I want to use the spread() R function in the tidyr R package, you can usetidyr::spread() which will explicitly call for that function (spread) from that package (tidyr ).

That's all for now. Hope it helps and comments are welcome!!

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store