# What is your preferred style for naming variables in R? [closed]

medriscoll 12/07/2018. 8 answers, 0 views

Which conventions for naming variables and functions do you favor in R code?

As far as I can tell, there are several different conventions, all of which coexist in cacophonous harmony:

1. Use of period separator, e.g.

  stock.prices <- c(12.01, 10.12)
col.names    <- c('symbol','price')

Pros: Has historical precedence in the R community, prevalent throughout the R core, and recommended by Google's R Style Guide.

Cons: Rife with object-oriented connotations, and confusing to R newbies

2. Use of underscores

  stock_prices <- c(12.01, 10.12)
col_names    <- c('symbol','price')

Pros: A common convention in many programming langs; favored by Hadley Wickham's Style Guide, and used in ggplot2 and plyr packages.

Cons: Not historically used by R programmers; is annoyingly mapped to '<-' operator in Emacs-Speaks-Statistics (alterable with 'ess-toggle-underscore').

3. Use of mixed capitalization (camelCase)

  stockPrices <- c(12.01, 10.12)
colNames    <- c('symbol','price')

Pros: Appears to have wide adoption in several language communities.

Cons: Has recent precedent, but not historically used (in either R base or its documentation).

Finally, as if it weren't confusing enough, I ought to point out that the Google Style Guide argues for dot notation for variables, but mixed capitalization for functions.

The lack of consistent style across R packages is problematic on several levels. From a developer standpoint, it makes maintaining and extending other's code difficult (esp. where its style is inconsistent with your own). From a R user standpoint, the inconsistent syntax steepens R's learning curve, by multiplying the ways a concept might be expressed (e.g. is that date casting function asDate(), as.date(), or as_date()? No, it's as.Date()).

Dirk Eddelbuettel 12/22/2009.

• underscores are really annoying for ESS users; given that ESS is pretty widely used you won't see many underscores in code authored by ESS users (and that set includes a bunch of R Core as well as CRAN authors, excptions like Hadley notwithstanding);

• dots are evil too because they can get mixed up in simple method dispatch; I believe I once read comments to this effect on one of the R list: dots are a historical artifact and no longer encouraged;

• so we have a clear winner still standing in the last round: camelCase. I am also not sure if I really agree with the assertion of 'lacking precendent in the R community'.

And yes: pragmatism and consistency trump dogma. So whatever works and is used by colleagues and co-authors. After all, we still have white-space and braces to argue about :)

Rasmus Bååth 02/27/2017.

I did a survey of what naming conventions that are actually used on CRAN that got accepted to the R Journal :) Here is a graph summarizing the results:

Turns out (no surprises perhaps) that lowerCamelCase was most often used for function names and period.separated names most often used for parameters. To use UpperCamelCase, as advocated by Google's R style guide is really rare however, and it is a bit strange that they advocate using that naming convention.

The full paper is here:

http://journal.r-project.org/archive/2012-2/RJournal_2012-2_Baaaath.pdf

Underscores all the way! Contrary to popular opinion, there are a number of functions in base R that use underscores. Run grep("^[^\\.]*\$", apropos("_"), value = T) to see them all.

I use the official Hadley style of coding ;)

Robert 01/08/2010.

I like camelCase when the camel actually provides something meaningful -- like the datatype.

dfProfitLoss, where df = dataframe

or

vdfMergedFiles(), where the function takes in a vector and spits out a dataframe

While I think _ really adds to the readability, there just seems to be too many issues with using .-_ or other characters in names. Especially if you work across several languages.

Shane 12/22/2009.

This comes down to personal preference, but I follow the google style guide because it's consistent with the style of the core team. I have yet to see an underscore in a variable in base R.

As I point out here:

How does the verbosity of identifiers affect the performance of a programmer?

it's worth bearing in mind how understandable your variable names are to your co-workers/users if they are non-native speakers...

For that reason I'd say underscores and periods are better than capitalisation, but as you point out consistency is essential within your script.

geoffjentry 12/23/2009.

As others have mentioned, underscores will screw up a lot of folks. No, it's not verboten but it isn't particularly common either.

Using dots as a separator gets a little hairy with S3 classes and the like.

In my experience, it seems like a lot of the high muckity mucks of R prefer the use of camelCase, with some dot usage and a smattering of underscores.

Jesse 12/22/2009.

I have a preference for mixedCapitals.

But I often use periods to indicate what the variable type is:

mixedCapitals.mat is a matrix. mixedCapitals.lm is a linear model. mixedCapitals.lst is a list object.

and so on.