Raincloud Plots with Deneb

Published Categorized as Deneb, How To's, Vega-Lite 5 Comments on Raincloud Plots with Deneb

Raincloud plots are a relatively recent and effective addition to the data visualisation toolkit comprising three parts:

1) distributions as density (half-violin plot);

2) summary statistics (box plot) and

3) raw data points (scatter).

They provide statistical inference at a glance as can be done with boxplots but are less likely to obscure multi-modal distributions or patterns and outliers in data.

Raincloud plots are not available natively in Power BI nor, as far as I can tell, are they available via AppSource. I’m not set up to use R or Python, so I took to Deneb to try my hand at them.

I’d been casually tapping away at the plots over the course of a few weeks, trying out various forms using concat to join the separate graphs and DAX RAND() function to create jittered scatter:

I hadn’t reached anything I was satisfied with until I saw Daniel March-Patrick’s own creation that absolutely blew my socks off!

He had kindly posted a template on GitHub which I borrowed and eagerly played around with.

Given I only had three populations to compare, I wanted to overlay the density plots rather than stack them as I had been doing. So I leveraged the template to create the following:

Specification:

{
  "data": {"name": "dataset"},
  "bounds": "flush",
  "spacing": 15,
  "vconcat": [
    {
      "height": 65,
      "width": 400,
      "mark": {
        "type": "area",
        "opacity": 0.7
      },
      "transform": [
        {
          "density": "flipper_length_mm",
          "groupby": ["species"]
        }
      ],
      "encoding": {
        "x": {
          "field": "value",
          "type": "quantitative",
          "scale": {
            "domain": [170, 230]
          },
          "axis": false,
          "title": ""
        },
        "y": {
          "field": "density",
          "type": "quantitative"
        },
        "color": {
          "field": "species",
          "type": "nominal"
        }
      }
    },
    {
      "facet": {
        "row": {"field": "species"}
      },
      "transform": [
        {
          "calculate": "random()",
          "as": "Jitter"
        }
      ],
      "spec": {
        "resolve": {
          "scale": {"y": "independent"}
        },
        "height": 10,
        "width": 400,
        "layer": [
          {"mark": {"type": "boxplot"}},
          {
            "mark": {
              "type": "point",
              "tooltip": true
            },
            "encoding": {
              "y": {
                "field": "Jitter",
                "type": "quantitative",
                "scale": {
                  "range": [35, 15]
                }
              }
            }
          }
        ],
        "encoding": {
          "x": {
            "field": "flipper_length_mm",
            "type": "quantitative",
            "axis": {"title": ""},
            "scale": {
              "domain": [170, 230]
            }
          },
          "color": {
            "field": "species",
            "type": "nominal"
          }
        }
      }
    }
  ]
}

Config:

{
  "padding": 0,
  "view": {"stroke": "transparent"},
  "facet": {"spacing": 2},
  "header": {
    "title": null,
    "labelColor": "white"
  },
  "font": "Segoe UI",
  "area": {
    "color": "#eaeaea",
    "interpolate": "cardinal",
    "stroke": "white"
  },
  "point": {
    "size": 10,
    "opacity": 0.5,
    "color": "#eaeaea",
    "stroke": "white",
    "strokeWidth": 0.25,
    "filled": true
  },
  "axis": {
    "domain": false,
    "grid": false,
    "labelFontSize": 12,
    "ticks": false,
    "tickCount": 5,
    "titleFontSize": 12,
    "titleFontWeight": 400,
    "titleColor": "#605E5C",
    "offset": 10
  },
  "boxplot": {
    "size": 10,
    "outliers": false,
    "box": {
      "color": "#eaeaea",
      "stroke": "white",
      "strokeWidth": 1
    },
    "rule": {"stroke": "black"},
    "median": {"color": "white"}
  },
  "axisY": {"disable": true},
  "legend": {"title": null}
}

And then added population labels using a similar technique here:

Beauty!

I am one happy lassie 🙂

Tweaking the plots

According to the Vega-Lite documentation, the bandwidth (standard deviation) of the kernel is automatically estimated. In the case that the distribution appears oversmoothed the bandwidth can be adjusted as demonstrated below:

Specifying the extent determines whether the tails of the density distribution are clamped at min/max values. Here I set the bandwidth and adjusted the extent to [170 , 240]:

{
          "density": "flipper_length_mm",
          "groupby": ["species"],
          "bandwidth":3,
          "extent": [170, 240]
        }

5 comments

  1. Hi,
    This is a general question for you about Deneb. From what you know of it, can it be used to create a 3d visual much like a 3d scatterplot that can be done in R. I have visualized some x,y,z data in a custom R/HTML visual already (as shown here: https://www.youtube.com/watch?v=Ax-jgwnolNI) but I want something more.
    And I am wondering if Deneb would be the tool to use. You are on the bleeding edge of use from what I can tell and am curious about your opinion.

    thx,
    wes

Leave a comment

Your email address will not be published. Required fields are marked *