Skip to contents

This guide covers the full syntax of SGL (Structured Graphics Language) as implemented in rsgl. For the formal language specification, see Chapman (2025).

Setup

All examples use dbGetPlot(), which takes a DuckDB connection and a SGL statement. We’ll load some datasets into an in-memory DuckDB database to work with.

library(rsgl)
library(duckdb)
#> Loading required package: DBI

con <- dbConnect(duckdb())
dbWriteTable(con, "cars", mtcars)
dbWriteTable(con, "trees", as.data.frame(Orange))

diamonds <- ggplot2::diamonds
diamonds$cut <- as.character(diamonds$cut)
diamonds$color <- as.character(diamonds$color)
diamonds$clarity <- as.character(diamonds$clarity)
dbWriteTable(con, "diamonds", diamonds)

Statement structure

A SGL statement is built from clauses. A minimal statement has three: visualize, from, and using. Additional clauses control grouping, scaling, faceting, and titles. The statement ends with a semicolon (optional when passed as a string to dbGetPlot()).

visualize
  <aesthetic mappings>
from <data source>
[group by <grouping expressions>]
[collect by <collection expressions>]
using <geom expression>
[scale by <scale expressions>]
[facet by <facet expressions>]
[title <title expressions>]

The from clause

The from clause specifies the data source. This is typically a table name:

dbGetPlot(con, "
  visualize
    hp as x,
    mpg as y
  from cars
  using points
")

A SQL subquery can be used for filtering or transforming data before visualization. The subquery must be enclosed in parentheses:

dbGetPlot(con, "
  visualize
    hp as x,
    mpg as y
  from (
    select hp, mpg
    from cars
    where hp < 200
  )
  using points
")

The visualize clause

The visualize clause maps data source columns to aesthetics. Each mapping has the form <column> as <aesthetic>.

Positional aesthetics

Cartesian coordinates use x and y:

dbGetPlot(con, "
  visualize
    hp as x,
    mpg as y
  from cars
  using points
")

Polar coordinates use theta (angle) and r (radius) — see the coordinate systems section for examples.

A single positional aesthetic is also valid:

dbGetPlot(con, "
  visualize
    mpg as x
  from cars
  using points
")

Non-positional aesthetics

color maps a column to color:

dbGetPlot(con, "
  visualize
    hp as x,
    mpg as y,
    cyl as color
  from cars
  using points
")

size maps a column to point size:

dbGetPlot(con, "
  visualize
    hp as x,
    mpg as y,
    cyl as size
  from cars
  using points
")

Multiple aesthetics can be combined:

dbGetPlot(con, "
  visualize
    hp as x,
    mpg as y,
    cyl as color,
    wt as size
  from cars
  using points
")

Column expressions

Aesthetics can be mapped to expressions that include transformations or aggregations, not just bare column names. The supported column-level transformations and aggregations are covered in their own section below.

The using clause

The using clause specifies which geometric object (geom) represents the data. Both singular and plural forms are accepted as keywords.

Points

points (or point) draws individual points — one per row:

dbGetPlot(con, "
  visualize
    hp as x,
    mpg as y
  from cars
  using points
")

Bars

bars (or bar) draws bar segments. Bars are stacked by default when a non-positional grouping is present:

dbGetPlot(con, "
  visualize
    cut as x,
    count(*) as y,
    color as color
  from diamonds
  group by
    cut, color
  using bars
")

Lines

lines (or line) draws connected lines. Lines are collective geoms — they represent multiple rows with a single geometric object:

dbGetPlot(con, "
  visualize
    age as x,
    circumference as y
  from trees
  collect by
    Tree
  using lines
")

Boxes

boxes (or box) draws box plots. Like lines, boxes are collective geoms:

dbGetPlot(con, "
  visualize
    cut as x,
    price as y
  from diamonds
  using boxes
")

Column-level transformations and aggregations

SGL supports transformations and aggregations as part of column expressions in the visualize clause.

bin()

The bin() transformation groups a continuous column into discrete bins:

dbGetPlot(con, "
  visualize
    bin(mpg) as x,
    count(*) as y
  from cars
  group by
    bin(mpg)
  using bars
")

count()

The count(*) aggregation counts rows per group. Any non-aggregated column in the visualize clause must also appear in the group by clause — the same rule as SQL:

dbGetPlot(con, "
  visualize
    cut as x,
    count(*) as y
  from diamonds
  group by
    cut
  using bars
")

Combining transformations and aggregations

bin() and count(*) can be combined to produce histograms. Non-positional groupings create stacked histograms:

dbGetPlot(con, "
  visualize
    bin(mpg) as x,
    count(*) as y,
    cyl_cat as color
  from (
    select
      *,
      cast(cyl as varchar) as cyl_cat
    from cars
  )
  group by
    bin(mpg),
    cyl_cat
  using bars
")

The group by clause

The group by clause specifies grouping expressions for aggregations. It follows the same semantics as SQL’s GROUP BY: any non-aggregated expression in the visualize clause must appear in the group by clause.

dbGetPlot(con, "
  visualize
    cut as x,
    count(*) as y
  from diamonds
  group by
    cut
  using bars
")

The collect by clause

Collective geoms (lines and boxes) represent multiple rows with a single geometric object. By default, the collection of rows is determined implicitly. The collect by clause overrides this behavior, explicitly specifying which column determines how rows are grouped into separate geometric objects.

Without collect by, all rows feed into a single line:

dbGetPlot(con, "
  visualize
    age as x,
    circumference as y
  from trees
  using line
")

With collect by, each unique value of the specified column produces a separate line:

dbGetPlot(con, "
  visualize
    age as x,
    circumference as y
  from trees
  collect by
    Tree
  using lines
")

Geom qualifiers

Geom qualifiers are keywords placed before the geom name in the using clause. They modify how the geom represents data.

jittered (points)

Adds random positional variation to points, useful when data has overlapping values:

set.seed(42)
dbGetPlot(con, "
  visualize
    cyl_cat as x,
    mpg as y
  from (
    select mpg, cast(cyl as varchar) as cyl_cat
    from cars
  )
  using jittered points
")

regression (lines)

Fits a linear regression line instead of connecting data points:

dbGetPlot(con, "
  visualize
    hp as x,
    mpg as y
  from cars
  using regression line
")

unstacked (bars)

Positions bars side-by-side instead of stacking them:

dbGetPlot(con, "
  visualize
    cut as x,
    count(*) as y,
    color as color
  from diamonds
  group by
    cut, color
  using unstacked bars
")

The layer operator

Multiple layers of geometric objects can be combined into a single graphic using the layer keyword. Each layer is a complete sub-statement with its own visualize, from, and using clauses:

dbGetPlot(con, "
  visualize
    hp as x,
    mpg as y
  from cars
  using points

  layer

  visualize
    hp as x,
    mpg as y
  from cars
  using regression line
")

Layered geom expressions

When layers share the same data source and aesthetic mapping, a shorthand syntax avoids repetition. Geom expressions are listed inside parentheses in the using clause, separated by layer:

dbGetPlot(con, "
  visualize
    hp as x,
    mpg as y
  from cars
  using (
    points
    layer
    regression line
  )
")

The scale by clause

Each mapped aesthetic has a scale that determines how data values are mapped to visual properties. Scales default to linear but can be overridden with the scale by clause.

Log scale

log() applies a base-10 logarithmic scale to an aesthetic:

dbGetPlot(con, "
  visualize
    hp as x,
    mpg as y
  from cars
  using (
    points
    layer
    regression line
  )
  scale by
    log(x),
    log(y)
")

Note that scaling is applied to the aesthetic, not to the data. This means the regression in the example above is computed on the log-scaled values. To transform the data itself, use SQL in the from clause:

dbGetPlot(con, "
  visualize
    log_hp as x,
    log_mpg as y
  from (
    select
      log(hp) as log_hp,
      log(mpg) as log_mpg
    from cars
  )
  using (
    points
    layer
    regression line
  )
")

Coordinate systems

The coordinate system is inferred from the positional aesthetics in the visualize clause. x and y imply Cartesian coordinates. theta and r imply polar coordinates.

A pie chart is a stacked bar chart rendered in polar coordinates:

dbGetPlot(con, "
  visualize
    count(*) as theta,
    cut as color
  from diamonds
  group by
    cut
  using bars
")

The facet by clause

Faceting generates small multiples — separate panels for each unique value of a column. A single facet expression produces horizontal panels by default:

dbGetPlot(con, "
  visualize
    hp as x,
    mpg as y
  from cars
  using points
  facet by
    cyl
")

The orientation can be changed with the vertically keyword:

dbGetPlot(con, "
  visualize
    hp as x,
    mpg as y
  from cars
  using points
  facet by
    cyl vertically
")

Two facet expressions produce a grid — one varies horizontally, the other vertically:

dbGetPlot(con, "
  visualize
    hp as x,
    mpg as y
  from cars
  using points
  facet by
    cyl,
    am
")

The title clause

Axis and legend titles are automatically derived from aesthetic mappings. The title clause overrides these defaults:

dbGetPlot(con, "
  visualize
    hp as x,
    mpg as y,
    cyl as color
  from cars
  using points
  title
    x as 'Horsepower',
    y as 'Miles Per Gallon',
    color as 'Cylinders'
")