-
Notifications
You must be signed in to change notification settings - Fork 204
Plotinum redesign
This page may be out of date. It describes design changes considered when moving Plotinum to gonum/plot. Some of these changes have already been made, some are in progress, some are planned for the future, and some may never happen at all. Enjoy!
Plotinum is a plotting and charting API for Go. Currently, Plotinum is maintained mostly by a single developer, and progress is rather slow. The goal of this project is to move Plotinum over to the gonum community. This has at least three benefits: 1) the community has more programmers, and can help develop it more quickly; 2) design decisions can be made by more than a single, rather stubborn, dictator; and 3) the move gives a rare chance to make reverse-incompatible API changes. The purpose of this page is two-fold. First, it gives some background on how Plotinum is currently designed, so that those who are not familiar with the internals have some context. Second, it serves to document the design decisions for the new gonum/plot package. Discussion about this document should take place on the gonum-dev Google group.
Currently Plotinum is divided into four packages: vg, plot, plotter, and plotutil. In the following sections, we describe the current state of, problems with, and things that should change (or should not change) in each package. Then we discuss some possible tools that could be created using gonum/plot, and other issues that are not specific to any package.
Vg is a vector graphics drawing package. It allows one to create vector graphics images that can be saved in a variety of different formats (currently: raster image formats supported by Go's image package, SVG via svgo, PDF via gopdf, and EPS). With the exception of some small cleanups, vg will remain the same between Plotinum and gonum/plot. Some possible cleanups are:
Change vg.Length to use constants for unit conversion as in Go's standard time packge, e.g., 5*vg.Inch instead of vg.Inches(5).Better handling of negative Length values (see issue 116).- Vgpdf (the PDF driver) doesn't automatically connect discontinuous paths, other drivers do (see issue 128).
- There are no tests! Dan has some ideas for this. See the discussion here and example code there.
In addition, it would be very nice to have an OpenGL driver for vg. See issue 110 and the Tools section below.
We may also want to add support for embedding images into canvases. Currently, all output formats support raster images, it would simply be a matter of adding this functionality to the drivers. We should not add it to the vg.Canvas interface, however, as it may require a significant amount of effort for some driver types, and we want it to be very easy to implement vg.Canvas. Also, if the functionality is not added to vg.Canvas, then the change would be reverse-compatible, so we can do it at a later time.
The biggest issue with vg is how it deals with fonts. Ideally, a user could specify any .ttf file to be used as the current font. Unfortunately, to handle general fonts, both PDF and EPS require the font data to be embedded in their output. Additionally, neither of these formats support truetype fonts, instead they support type-1 fonts. I don't have the expertise in either format to implement the font embedding. Instead, vg uses trickery! Postscript and PDF require all implementations to have small set of fonts, which do not need to be embedded and can simply be used by name: Courier, Helvetica, and Times-Roman. Instead of implementing font embedding, vg only supports these three fonts---how annoying!
Additionally, to position text vg needs font metrics (width, height, ascent, and descent values) for all fonts it supports. Freetype-go can compute font metrics for truetype fonts, and, luckily, Ghostscript includes some free (GPLed) TTF fonts that are metric-equivalent to the fonts required by PDF and Postscript. To compute font metrics, vg uses freetype-go to load these metric-equivalent fonts from the vg source directory. This is problematic for anyone who wants to distribute a binary file that uses vg; the font data must be installed separately. And, since the Ghostscript fonts are GPLed, we cannot directly embed their data into a []byte to fix this problem unless we license Plotinum under the GPL---not going to happen.
In the long run, it would be best for someone to implement truetype to type-1 font conversion and embedding for the PDF and EPS formats. Then any TTF file could be used as a font (which is what users typically expect), and we could embed some free, non-GPLed default fonts directly into vg as byte arrays that do not require any extra data to be installed. Until then, we should at least modify vg so that it could be used without the font data if no text is draw. While avoiding text sounds limiting, this behavior has actually been requested by a user of Plotinum.
The plot package is the main plotting engine of Plotinum. To some degree, it is composed of four pieces: DrawArea, Axis, Legend, and Plot.
The DrawArea type implements a bounded vg.Canvas (vg.Canvas provides an unbounded drawing area--one with infinite size). Additionally, it includes a variety of higher-level drawing utilities for more convenient drawing of lines, glyphs, text, etc. It also supports clipping.
Most of the functionality of DrawArea is useful for any 2D vector graphic drawing. It may be better to move DrawArea either into its own package (vgdraw) or right into the vg package itself. This way, others can use the drawing functionality for non-plotting work, and the package can focus solely on a good bounded-canvas drawing implementation, as opposed to the current approach where DrawArea is simply smashed into the plot engine. The advantage of using the vgdraw package is that it can provide a higher-level API (i.e., DrawArea.StrokeLines is a lot more convenient than creating a vg.Path then stroking it). The advantage of putting it in vg is that we don't have two different packages for drawing vector graphic images (one lower level and one higher level).
The DrawArea component also includes types for line, glyph, and text styles. The LineStyle and GlyphStyle types seem fine as-is. For TextStyle, we may want to include text rotation and text alignment. Currently, text can be rotated by applying translation and rotation transforms to the DrawArea. These transforms are quite ugly (see the discussion here). It would be nice to simply include the rotation in TextStyle and let the DrawArea handle the ugly transformations internally. This would probably require changes to the TextStyle.Height, TextStyle.Width, and TextStyle.Rect methods to account for the bounding box of the rotated text. In fact, we could probably ditch TextStyle.Height and TextStyle.Width since they are generalized by TextStyle.Rect.
Text alignment is currently done in DrawArea.FillText with the xalign and yalign parameters. These are floating point values that offset the final location of the text by the given factor of the text width and height respectively. With xalign=0 and yalign=0, text is drawn above and to the left of the location specified for FillText. So, for example, to center text at a location one would specify xalign=-0.5 and yalign=-0.5 to shift the draw position right by half of the width and down by half of the height. This allows for a variety of alignments without requiring a ton of constants such as Centered, CenteredAbove, CenteredBelow, BeforeCentered, AfterCentered, etc. Ideally, however, these parameters would be included in the TextStyle. This was attempted once, but there were difficulties with getting the tick mark labels correct, though I don't recall why. We should try this again.
DrawArea's API for clipping is also suboptimal. On the surface, it would seem like everything could be automatically clipped to the bound of the draw area. Unfortunately, this is no the case because a DrawArea may be a subarea or a larger DrawArea, and one may want their lines or glyphs to extend beyond the inner area and to draw into the outter area. So, the user needs the ability to draw without clipping. This is used by many plotters. But, I dislike the need for ClipLinesX, ClipLinesY, ClipLinesXY, ClipPolygonX, ClypPolygonY, and ClipPolygonXY; ideally we could have two functions: ClipLines and ClipPolygon.
A side note on clipping: Whatever API we settle on should make it more clear that data points should be clipped in data coordinates, not drawing coordinates. If data points are clipped in drawing coordinates then subtle rounding errors can clip glyphs that would, in data coordinates, fall within range or the axes. Clipping in drawing coordinates should only be done for decorations.
Finally, DrawArea is a really terrible name. If we go with vgdraw then this should be renamed to vgdraw.Canvas. If we move this functionality into vg then it should be called vg.BoundedCanvas.
The Axis type draws either horizontal or vertical axes on the left side or bottom of the plot respectively. An axis has a line showing the axis, tick marks, tick labels, and an axis label. Axes are also responsible for normalizing data coordinates to the range [0, 1], where 0 is the minimum value on the axis and 1 is the maximum value on the axis. This is the first step in transforming data coordinates to drawing coordinates.
To a large degree, I am happy with the current Axis implementation. The one concern that has been raised is that it is too inflexible, because it doesn't allow for axes to be drawn anywhere except for the left and bottom sides of the plot. [Insert Dan's Axis interface idea here.] Additionally, there is no way to support more than a single vertical and single horizontal axis. I argue that this is a deficiency in the plot component, discussed below.
The Legend type simply draws the legend. The biggest problem with the legend is that it can only be drawn in the data area of the plot, and it often gets in the way of the data. Ideally, we would be able to draw the legend outside of the data area, either beneath, or to the side of the plot. This may indicate the for a more general "Panel" support as hinted at by Dan (some very rough ideas below).
Plot ties all of these items together. It computes the size of the data drawing area based on the axes, title, and padding needed for glyphs that extend outside the data area. It also draws the axes, plotters, and legend. This portion will need to change appropriately depending on the final design decisions for the previously mentioned components.
If we choose to support multiple vertical and multiple horizontal axes, then it would likely be handled in the plot component. I don't have a very clear concept of how to do this, however. The thing that we need to figure out is how to select which axes are used by each plotter. One possibility is: users can add whatever axes they would like to the plot, and when they create each plotter they specify which axes it uses for its transforms. However, this makes for a lot of extra work in the common case (just one x and one y axis). Perhaps, plotters can have two New functions, one that defaults to using the first x and first y axis, and one that accepts the axes that it should use as parameters.
How do we handle Z axes? For now, I propose that we don't.
Plot.Save is the most common way to save a plot to a file. Currently it accepts the plot size in inches. This was poorly thought out; the size should be specified with vg.Length. Additionally, we may want to move the Plot.Save function to the plotutil, as it does a form of run-time checking in that it looks at the file extension to determine which vg.Canvas should be used to save the plot. I consider this a high-level utility, and the plot package should just contain the more-run-time-safe Plot.Draw function. If the plotutil package is expanded and becomes more useful, then the extra import required to get the Save function shouldn't be a problem; users will want to import plotutil anyway.
An alternative to moving Plot.Save into plotutils would be to change it so that it doesn't rely on file extensions.
Plot.New returns an error value if the default font fails to load. But, there's nothing a user can do to handle this error. Instead, it should just panic.
It would be nice if plots could be encoded with encoding/gob. Currently, they can't because Plot.Axis.Tick.Marker is a function. If we make this an interface instead then I think that plots would be gob encode-able.
Panels are not currently a component of the plot package, but perhaps they should be. Panels can be used to generalize some of the desired functionality of Axes and Legends. Adding a panel to a plot reserves a rectangular section of the plot, and when the plot is drawn the panel's Draw method is called to draw its contents. A legend, for example, could simply be a panel that is added to the plot. A legend panel could be added inside the data area or to the side of the plot outside of the data area. Likewise, axes could be added as panels. So could the plot title. Clearly this needs more thought.
Plotinum was designed to allow users to (hopefully easily) create their own plotters. The idea is: no one will ever agree on style, so people should be able to easily make their own style if they don't like the defaults that are provided. In addition, by making plotter construction easy, only a very small set of default plotters can be provided; anything beyond that can be created by others and hosted and supported elsewhere. While it may be more convenient for users if we include everything and the kitchen sink, I propose that we avoid that approach. We should continue to provide only a small set of default plotters. Others can easily host additional, go gettable, plotters elsewhere. This really alleviates a lot of maintenance effort, and the alternative sets a precedence that requires us to add every new plotter that anyone can dream up. I propose these default plotters:
- Lines
- Scatter plot (can be combined with lines for line-and-points plots)
- Box plots
- Bar charts
- Histograms
- Grid (puts a grid behind the data, showing a line at each tick mark)
- Function
- Labels
- Error bars
- Glyph Boxes--a plotter used for testing plotter construction
- Heat maps
While I realize that other types of plots may be fairly heavily used by some gonum devs (scatter plot with colored glyphs to show a 3rd data dimension), I think that these should be hosted separately. We can make a wiki page pointing to common 3rd party plotter packages.
Currently, Plotinum's plotter types hold a copy of the data used to create the plotter. This was a deliberate choice. If a references were used instead, the user could change the data after creating a plotter and this would put the plotter in an inconsistent state, which may be very difficult to debug. Instead, we elected the safer choice, so that a user can change their XYs or Values after making a plotter and the plotter would be unaffected. The copy shouldn't cause performance issues, because if the user wants to plot so much data that copying it would be problematic then they should probably summarize the data instead. Even so, we may want to reconsider the data copying, as it has been suggested that using a reference instead of copying could provide more flexibility.
An argument against copying is that the data interfaces can also contain styling information, however, this would break the decoupling of data and style. One would need to pre-compute style information for their data before creating a plotter, instead of letting the plotter computer the style.
Plotters that use a single value accept data via the Valuer interface. That is a horrible name, so we may want to rethink it. Additionally, there was some talk about a gonum/floats package; maybe we can make the interface used here compatible with the floats package if it is still in the works.
Labels are a very important feature of plots, and they are woefully underused. This is likely because it is difficult for machines to determine good label placement. Older, hand-drawn plots often make wonderful use of labels (consider the plots by Tukey in his book Exploratory Data Analysis). Ideally, gonum/plot would have very good support for labels, making it so easy to add nice labels to plots that everyone is compelled to do so. Supposedly one can use simulated annealing to find good label placements (close to the point that is being labelled, but without overlapping other aspects of the plot). I suspect that the GlyphBoxer interface can be helpful here, as it can be used to determine if labels are colliding with other plot features. This needs more thought.
Plotutil is a package that makes creating some simple and very common types of plots easier. It is quite high-level, and allows for run-time type errors in order to provide more convenience. All other packages are (should be) compile-time type safe.
I am not sure if anyone actually uses this package, and I haven't gotten any feedback on it, so I am not sure if or how it may be changed.
Once gonum/plot is stabilized, there are some tools that we should create.
Many people have requested an interactive front-end for Plotinum. Such a tool would allow one to view their plot(s) in an interactive window. There is a (very rough) prototype of this idea here (go get code.google.com/p/eaburns/showplot). The problem with this prototype is that it is rather slow. Profiling it showed that it spends a lot of time rendering the plot in software with draw2d. A faster/better method would be to create an OpenGL driver for vg so that rendering could be done in hardware.
The tool will likely be in the form of a library with a function that can visualize a plot: showplot.Show(p)
. The visualizer would then modify the plot's axis ranges as the user zooms and translates around. Additionally there could be support for changing stylistic aspects of the plot (though this could probably be deferred until a later version of the tool).
Such a tool may also support:
- Ability to scroll left, right, zoom in and zoom out of the window.
- Ability to set up several chart windows on the same page and then link their X axes linked, so as you scroll in one window the other windows scroll as well
- A cross hair.
- Ability to add a translucent selector object to a window. This allows you to do Google finance like interactive charts. You can have a bottom window which shows the entire time series data and you can move the selector around and the other windows adjust accordingly.
- Add a selector to other windows that can launch other charting windows or scripts. This allows you to select a region of the chart and launch a completely new chart window.
- The ability to draw objects on the chart. Circles, arrows, etc.
- Exogenous changes to the plot, so that the user can see their plot update in real-time as, e.g., new samples arrive from a senors or as new iterations of a learning algorithm are performed.
Sizes in Plotinum are all absolute. This makes the implementation much easier, but occasionally a user would prefer to specify sizes of things in relative terms. For example, see this discussion about bar chart widths, and this discussion about line widths and font sizes.
Ideally, we would support both absolute and relative sizes. However, relative sizes are difficult since the plot size isn't known until Plot.Draw is called with a DrawArea of a specific size. Additionally, vg has no concept of size (it provides a canvas that extends to infinity in both directions), so adding support for relative sizes in vg.Length doesn't make sense. Perhaps, relative sizes could be added to some future vgdraw package.