Definition

R programming language

What is the R programming language?

The R programming language is an open source scripting language for predictive analytics and data visualization. Written primarily in C, C++, Fortran and R, it's used for data analysis in numerous fields, including healthcare, manufacturing, bioinformatics and finance. Companies such as Accenture, Amazon, Ford Motor Company, Google and Novartis use R for statistics, research and application development.

The initial version of R was released in 1995 to allow academic statisticians and others with sophisticated programming skills to perform complex data statistical analysis and display the results in any of a multitude of visual graphics. The "R" name is derived from the first letter of the names of its two developers, Ross Ihaka and Robert Gentleman, who at the time were associated with the University of Auckland in Auckland, New Zealand.

The R programming language includes functions that support linear modeling, nonlinear modeling, classical statistics, classification and clustering. It remains popular in academic settings due to its features and the fact that it's free to download in source code form under the terms of the Free Software Foundation's GNU general public license. It compiles and runs on Unix platforms and other systems including Linux, Windows and macOS.

The appeal of the R language has gradually migrated from academia into business settings, as it offers a wide range of functionality and supports numerous statistical techniques.

The R software environment

The R language programming environment is built around a standard command-line interface. Users employ this to read data and load it into the workspace, specify commands and receive results. Commands can be anything from simple mathematical operators, including +, -, * and /, to more complicated functions that perform linear regressions and other advanced calculations.

Users can also write their own functions. The environment lets them combine individual operations, such as joining separate data files into a single document, pulling out a single variable and running a regression on the resulting data set, into a single function that can be used over and over.

Looping functions are also popular in the R programming environment. These functions let users repeatedly perform an action -- such as pulling out samples from a larger data set -- as many times as the user specifies.

R language pros and cons

The R programming language offers the following benefits:

  • Open source. The R programming language is free to download and has an active community of online users who can provide support.
  • Applicability. R is ideal for machine learning tasks including regression and classification.
  • Platform independent. The R programming language is universal and can run on Windows, macOS, Unix and Linux operating systems.
  • Maturity. Because R has been around for many years and has been popular throughout its existence, the language is mature.
  • Availability of add-on packages. Users can download add-on packages that enhance the basic functionality of the language. These packages let users visualize data, connect to external databases, map data geographically and perform advanced statistical functions. A popular integrated development environment called RStudio simplifies coding in the R language.

Although the R programming language offers users many benefits, it also has some downsides:

  • Speed. The R language has been criticized for slow delivery of analyses when applied to large data sets. This is because the language uses single-threaded processing, which means the basic open source version can only use one central processing unit (CPU) at a time. By comparison, modern big data analytics thrives on parallel data processing, simultaneously using dozens of CPUs across a cluster of servers to process large data volumes quickly.
  • Memory management. Because the R programming environment is an in-memory application, all data objects are stored in a machine's RAM during a given session. This can limit the amount of data R can work with at one time.
  • Security. As an open source language, R lacks basic security features, creating restrictions when working with sensitive data or embedding web applications.
  • Limited user interfaces. Even though R provides RStudio, users might need more comprehensive graphical user interfaces.

R vs. Python

While R is primarily a scripting language that's easy for non-IT users to learn, it isn't as powerful, flexible or efficient as Python, the favored language of data analysts and data scientists.

However, the R programming language is much easier to use than Python, and applications can in general be created, tested and deployed more quickly using R, depending on their complexity and the size of the data sets involved in the application.

Use cases for R

Because of its ease of use and facility in scripting, R has rapidly grown in popularity and application as statistical software. It rivals Python as a common language used in data science and data analysis, and in machine learning applications. It's matrices-friendly, making it versatile in applications using graphs in Excel fashion. The following are some common applications:

  • Statistical analysis and data visualization.
  • Exploratory data analysis.
  • Machine learning and predictive and prescriptive applications.
  • Market research and consumer behavior analysis.
  • Diagnostic applications and biostatistics in healthcare.
  • Risk management and fraud detection in finance.

R and big data

These limitations have mitigated the applicability of the R language in big data applications. Instead of putting R to work in production, many enterprise users adopt R as an exploratory and investigative tool.

Data scientists use R to run complicated analyses on sample data and then, after identifying a meaningful correlation or cluster in the data, turn their findings into actionable insight.

Several software vendors have added support for the R programming language to their offerings, allowing R to gain a stronger footing in the modern big data realm. Vendors including IBM, Microsoft, Oracle, SAS Institute, Tableau and Tibco provide some level of integration between their analytics software and the R language. There are also R packages for popular open source big data platforms, including Hadoop and Apache Spark.

Careers with R

R has become increasingly useful and prevalent in the toolkits of many professionals and there are many opportunities for IT professionals skilled in R. The following are some in-demand professional roles offering competitive pay and benefits for R expertise:

  • Data scientist.
  • Data analyst.
  • Data architect.
  • Business and science researcher.
  • Business and financial analyst.
  • Business intelligence specialist.
  • Statistician and quantitative analyst.
  • Machine learning professional.
  • R programmer.
  • Database administrator.
  • Geo statistician.
  • Financial analyst.
  • Machine learning scientist.
  • Quantitative analyst.
  • Statistician.

Learn about the top programming languages for cybersecurity and why you need them. Examine how to get started learning them.

This was last updated in May 2024

Continue Reading About R programming language

Dig Deeper on Data science and analytics