Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request: allow computation #133

Open
dmwit opened this issue Dec 16, 2020 · 0 comments
Open

feature request: allow computation #133

dmwit opened this issue Dec 16, 2020 · 0 comments

Comments

@dmwit
Copy link

dmwit commented Dec 16, 2020

MarkupM is a Monad, but you can't really use that fact, because all the useful combinators have types like:

body :: MarkupM () -> MarkupM ()

A friendlier type would allow the internal action to produce some value, and pass it along to the caller, like:

body :: MarkupM a -> MarkupM a

From a quick look at the source of most of the combinators, it seems like this is already their type, and they have been given the more restrictive Html -> Html annotation manually.

But why would you want this? Well, I want it so that I can thread a unique-ID generator through the computation; and I want my child tags to be able to report back how many IDs they consumed. A natural try starts like this:

type IDMarkupM = StateT Int MarkupM
type IDHtml = IDMarkupM ()

takeID :: IDMarkupM Int
takeID = modify (+1) *> get

liftTag :: (Html -> Html) -> IDHtml -> IDHtml
liftTag f act = do
    n <- get
    -- Option 1: Doesn't type-check. execStateT act n is a MarkupM Int, not a MarkupM ()
    lift (f (execStateT act n))
    -- Option 2: I didn't try it, but I would assume this gives the wrong behavior, emitting the body of act twice.
    n' <- lift (execStateT act n)
    put n'
    lift (f (evalStateT act n))

On the other hand, if the combinators were polymorphic, we would be able to pull this trick, like this:

liftTag :: (MarkupM (a, Int) -> MarkupM (a, Int)) -> IDMarkupM a -> IDMarkupM a
liftTag f act = do
    n <- get
    (a, n') <- lift (f (runStateT act n))
    put n'
    pure a

One could then plausibly use liftTag body in place of body inside IDMarkupM blocks.

That's the entire feature request. Now for some speculation on why the current implementation is as it is, and what can be done about it. Here's a representative sample of MarkupM constructors:

data MarkupM a
    = Append (MarkupM b) (MarkupM a)
    | Parent StaticString StaticString StaticString (MarkupM a)
    | Content ChoiceString a

The important thing here is that, to reach the computed a value, you have to follow a longish chain of pointers. Presumably that makes actually using (>>=) (instead of (>>)) fairly expensive. Left-associated (>>=) calls especially become expensive -- quadratic in the number of calls, because each call must re-traverse the whole MarkupM tree of the previous call to find the value at its end. An alternative design might look like this:

data Html
    = Append Html Html
    | Parent StaticString StaticString StaticString
    | Content ChoiceString

data MarkupM a = MarkupM Html a

This makes reaching the a incredibly cheap. But this might not be a complete performance win, because now (>>) has to do an additional pointer dereference. (Previously: x >> y = Append x y; now: MarkupM x _ >> MarkupM y a = MarkupM (Append x y) a, which requires additionally deconstructing both arguments and calling one additional constructor compared to the old way.) Since (>>) is the common case, and (>>=) the uncommon case, perhaps the thought was that this is a bad trade.

But I think we can get the best of both worlds by simply duplicating the computed value at every level of the tree. So:

data MarkupM a
    = Append (MarkupM b) (MarkupM a) a
    | Parent StaticString StaticString StaticString (MarkupM a) a
    | Content ChoiceString a

markupValue (Append _ _ a) = a
markupValue (Parent _ _ _ _ a)  = a
markupValue (Content _ a) = a

-- smart constructors
append x y = Append x y (markupValue y)
parent sTag sOpen sOpenClose x = Parent sTag sOpen sOpenClose x (markupValue x)
-- content not needed, it's the same as Content

This incurs a memory penalty of one extra constructor per node in the tree, but now both (>>) and left-associated (>>=) can be efficient. For (>>) there is no additional unpacking needed compared to the current implementation (because that unpacking is delayed in a thunk until somebody actually wants to use the answer). For left-associated (>>=)s, the traversals deep into the tree are shared among the calls, giving linear rather than quadratic asymptotic runtime.

The previous design makes the asymptotic analysis dead simple, but I think we can go even one step better in the design and significantly reduce the memory penalty while keeping the asymptotic win. Since (>>=) is the only (?) combinator that actually uses the computed value, we need only duplicate the value at Append nodes.

data MarkupM a
    = Append (MarkupM b) (MarkupM a) a
    | Parent StaticString StaticString StaticString (MarkupM a)
    | Content ChoiceString a

markupValue (Append _ _ a) = a
markupValue (Parent _ _ _ m) = markupValue m
markupValue (Content _ a) = a

-- only one smart constructor needed
append x y = Append x y (markupValue y)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant