Skip to content

Wilkinson beeswarm

julia
export WilkinsonBeeswarm

This is a beeswarm implementation as described in Leland Wilkinson's original paper on dot plots.

This is probably one of the simplest beeswarm algorithms to implement, so this file is heavily documented to serve as a reference for other implementations.

Boilerplate

In order to define a beeswarm algorithm, we need to define a struct that is a subtype of BeeswarmAlgorithm.

This may optionally contain fields which control the algorithm, but in this case, we don't need any.

julia
"""
    WilkinsonBeeswarm()

A beeswarm algorithm that implements Leland Wilkinson's original dot-hist algorithm.

This is essentially a histogram with dots, where all dots are binned in the `y` (non-categorical)
direction, and then dodged in the `x` (categorical) direction.

Original y-coordinates are not preserved, and if you want that try `SimpleBeeswarm` instead.
"""
struct WilkinsonBeeswarm <: BeeswarmAlgorithm
end

The calculate! function

The calculate! function is the main function that is called by the beeswarm! recipe to calculate the positions of the beeswarm.

It accepts points in pixelspace and marker size through positions, and sets the elements of buffer to the correct positions in pixelspace.

This decreases the amount of memory allocation, as the buffer is preallocated.

It is vital that the order of points in buffer is the same as in positions, since all of the other attributes (like color, size, etc.) are indexed by the order of the points in positions.

This is why we've essentially reimplemented the histogram here, as opposed to using it from StatsBase.

julia
function calculate!(buffer::AbstractVector{<: Point2}, alg::WilkinsonBeeswarm, positions::AbstractVector{<: Point2}, markersize, side::Symbol)
    @debug "Calculating..."

Here, we need to find each unique x-value, which indicates a different group or category.

julia
    xs = first.(positions)
    # Once we've found this, then we can calculate
    # the beeswarm for each group.
    for x_val in unique(xs)
        group = findall(==(x_val), xs)
        ms = if markersize isa Number
            markersize
        else
            view_ms = view(markersize, group)
            maximum(view_ms)
        end
        wilkinson_kernel!(view(buffer, group), view(positions, group), ms, side)
    end

end

The kernel

The wilkinson_kernel! function is the main function that calculates the beeswarm for a single group.

It accepts points in pixelspace and marker size through positions and markersize, and sets the elements of buffer to the correct positions in pixelspace.

julia
function wilkinson_kernel!(buffer, positions, markersize, side::Symbol)
    #=

    Initial sketch of code credited to Frederic Freyer and Julius Krumbiegel.
    Reference here: https://github.com/MakieOrg/Makie.jl/issues/1950

    =#

    # This code calculates some parameters
    # like the minimum and maximum y-values,
    # and the number of bins.
    ys = last.(positions)
    ymin, ymax = extrema(ys)
    Δy = ymax - ymin
    nbins = round(Int, Δy / markersize)

    # This will be used for us to push to, and is probably one of
    # two allocation sources in the code!
    bin_idxs = [Int[] for _ in 1:nbins]

Binning y-values

We now need to bin the y-values into nbins bins.

This is done by looping through each y-value, calculating the index of the bin it should be in, and then pushing the index to the relevant vector in bin_idxs.

Since the bins are all uniform, we don't actually need to materialize anything!

julia
    for (i, y) in enumerate(ys)
        current_index = round(Int, (y - ymin)/markersize)
        if current_index == 0
            current_index = 1
        elseif current_index > nbins
            current_index = nbins
        end

        push!(bin_idxs[current_index], i)
    end

Calculating positions

Now that we have the indices of the points in each bin, we can calculate the positions. We force the points to dodge each other by markersize.

julia
    for (i, idxs) in enumerate(bin_idxs)
        isempty(idxs) && continue
        # This is the center of the bin cell
        current_y = ymin + (i - 1) * markersize + markersize/2
        idxs_by_position = @view idxs[sortperm(last.(@view positions[idxs]))]

        if side == :both
            # Split the bin in two parts, evenly split.
            # One will go left and one will go right.
            a = idxs_by_position[begin:2:end]
            b = idxs_by_position[(begin+1):2:end]
            # Update the buffer array.
            buffer[a] .= Point2f.(
                ((1:length(a))) .* markersize .- markersize/2 .+ first.(view(positions, a)),
                current_y
                )
            buffer[b] .= Point2f.(
                ((1:length(b))) .* (-markersize) .+ markersize/2 .+ first.(view(positions, b)),
                current_y
                )
        elseif side == :left
            # Update the buffer array.
            buffer[idxs_by_position] .= Point2f.(((1:length(idxs_by_position))) .* markersize .- markersize/2 .+ first.(view(positions, idxs_by_position)), current_y)
        elseif side == :right
            # Update the buffer array.
            buffer[idxs_by_position] .= Point2f.(((1:length(idxs_by_position))) .* (-markersize) .+ markersize/2 .+ first.(view(positions, idxs_by_position)), current_y)
        end
    end
end

This page was generated using Literate.jl.