Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split records of SymbolT and DefinitionS #384

Open
wants to merge 95 commits into
base: master
Choose a base branch
from

Conversation

matil019
Copy link
Member

Part of #382

This PR refactors the internal of the compiler so that the two types, SymbolT and DefinitionS no longer are simultaneously sum types and record types.

A lot of the code was modified to adapt the changes but most of them are just to please the compiler. The following are the important changes.

Main changes

Building blocks

  • A tiny part of the Haskell package lens was ported as frege.compiler.common.Lens.
  • instance Alt Maybe was added to port the lens.

SymbolT

  • SymbolT was split into independent data types: SymT, SymL, SymD, SymC, SymI, SymV and SymA.
  • Two additional types were added to express a part of the set of the symbols:
    • SymVal a sum of SymV and SymD
    • SymMeth a sum of SymV and SymL
  • SymC.env and SymI.env was renamed to meth and its type changed to TreeMap String (SymMeth Global)
  • Getters and lenses are added to SymbolT, SymVal and SymMeth to help accessing fields. Lenses are prefixed with an underscore. For example:
    • SymbolT._name can update the name field without pattern matching.
      set SymbolT._name name sym ===
      case sym of { SymbolT.V symv -> symv.{name}; SymbolT.T symt -> symt.{name}; ...; }
    • SymbolT.name is a simple getter, and is much faster than view SymbolT._name.

DefinitionS

  • DefinitionS was split into independent data types.
  • Two additional types were added to express a part of the set of the definitions:
    • ClassMemberS a sum of AnnDcl, NatDcl and FunDcl
    • LetMemberS a sum of AnnDcl and FunDcl
  • Frege.y was modified to adapt the changes. The syntax remains the same.

New errors

Most of the type-unsafe uses of the sum-records were proved safe by the process of refactoring, but a few remained.

A genuine warning

  • frege.compiler.passes.Fix.unDoc.apply: Added a new warning. Emitted when a documentation comment is found next to a ModDcl i.e. a native module declaration. The comment is ignored after emitting the warning.

Internal errors

The following are internal errors. I added them anyway to please the compiler but I doubt any of them actually occurs unless something goes very wrong.

  • frege.compiler.common.SymbolTable.insUpdSymByName (formerly enterByName and updateSym): Emits an error when a Symbol which will belong to a SymC or SymI is neither SymV nor SymL.
  • frege.compiler.common.SymbolTable.toSymVBecauseLocal :: Symbol -> SymV Global is a new partial function. Used on an assumption that a Symbol is a SymV if its name is Local.
  • frege.compiler.gen.java.InstanceCode.lowerKindSpecialClasses: May emit an error if specialClassNames contains a non-class name.
  • frege.compiler.gen.java.InstanceCode.lowerKindSpecialClasses: May emit an error if any of the class resolved by specialClassNames has SymL as its member.
  • frege.compiler.gen.java.InstanceCode.symMethAsSymV :: SymMeth Global -> SymV Global is a new partial function. Used on an assumption that all SymLs in SymC.meth and SymI.meth are resolved to SymVs at the time instanceCode is running.
  • frege.compiler.types.Global.instTSym: Emits an error if TName pPreludeBase "->" resolves to a non-SymT.
  • frege.compiler.types.Global.instTauSym: Emits an error if TCon.name resolves to a non-SymT.

I hope some of the above can be eliminated by applying the same refactoring to QName some other time.

Performance concerns

fregec

The performance of the compiler was tested by compiling the source at master with different versions of fregec.jar.

The command make clean compiler1 was run 5 times each.

branch commit id average secs %
master a3c4f77 78.59 100
no-sum-symbolt 11bdb3f 81.25 103.3

The performance impact appears to be minimal. (Previously I reported the time increase was 0.9% but I was wrong.)

fregedoc

On the other hand, fregedoc suffers from a significant performance penalty.

The command java -jar fregec-<version>.jar frege.tools.Doc -d out fregec-<master>.jar was run 5 times each.

branch commit id average secs %
master a3c4f77 51.86 100
no-sum-symbolt 11bdb3f 65.33 126

frege IDE

Not tested because I don't have an IDE but the same performance drop as fregedoc is expected.

Notes

This is not an exhaustive effort. Only parts of the code which were complained by the compiler were modified, so there may be redundant uses of SymbolT left around.

This is needed by the lenses to be added later.
See frege/compiler/types/Symbols.fr

As a first step to decomposing the sum-record SymbolT, SymT was isolated
to an independent data type.

Some parts of code were deliberately rewritten into refutable case
analyses so that compiler warnings are triggered.
The field accessors of SymbolT were replaced with lenses.

Partial functions were deliberately introduced to trigger compilation
warnings where certain constructor is implicitly assumed (for example,
an unsafe conversion `SymbolT g -> SymD g` is inserted to a piece of
code which accesses `flds`, which exists only on `SymD`.)
Lots of redundant unsafeToSymVs and its friends were removed.
by tightening the types of parameters
- removed redundant pattern from `sref`
- changed the type of `overSig` to take `SymV Global`
By changing the type of the local function `rbSymT` to return `SymT`
by tightening the types of parameters
allvars was changed from monadic value to a plain function because it
doesn't change nay state.

Some functions in other modules had their parameters tightened.
Instead of crashing, erroneous cases are simply treated as un-matched
cases.
Instead of crashing, erroneous cases are simply treated as un-matched
cases.
All SymMeths in SymC.meth should have been resolved to SymV in this
phase (at least that's how I understand the code).

An error here, if any, is an internal error, and is reported as a pure
error to help find a bug in the compiler.
Some functions in Global assumes certain propeties of Symbols. Those
assumptions should be correct unless something goes very wrong.

An error here, if any, is an internal error, and is reported as a pure
error to help find a bug in the compiler.
Since only SymMeth (i.e. SymV or SymL) can be a member of SymC.meth or
SymI.meth, if 'sym' is not SymMeth, it is an error.

A new error message was added to report it instead of just crashing.
A part of MethodCall.nativeCall uses unsafePartialView a lot. The reason
is that it was written by myself as a prototype and is not polished yet.
Instead of crashing, erroneous cases are simply treated as un-matched
cases.
'allfuns' now contain SymVs only. If non SymVs are in an env of a
Symbol, they are just now ignored.

Some of the other functions had their types changed in order to reflect
that change.
Instead of crashing, erroneous cases are simply treated as un-matched
cases.
being "private" crashes fregedoc (but not fregec).
Particularly unsafePartialView (defined in f.c.common.Lens) is no longer
used anywhere.
Added simple getters to SymbolT to be used instead of 'view'ing a
'Lens'. In Haskell, 'view' can have significant overhead than simple
pattern matching.

The existing lenses were given a leading underscore, like the prisms.
We don't use 'review', so Traversal is enough.
Well, we do use 'is', but 'has' can be used for the purpose.

Prism depends on Profunctor. In Haskell, the "profunctor" is a huge
package. If, in some day, we port it to Frege, the compiler depending on
it may be a problem.
The same refactoring as done to SymbolT.

Now all uses of 'view's were replaced by simple getters.

This contributes to a big reduction of performance penalty.
A redundant pattern match was removed.
Unused partial function
The new type SymVal and its member fromSymbol serves the same purpose.
On invalid data constructor, 'updVis' behaves as 'id' instead of an
error, just like 'over' in lens does.
An error case in symInfo was eliminated. Related functions were altered
to reflect this change. Most notably Global.GenSt.symi8 now have (SymVal
Global) instead of Symbol as keys.

A new error message was added to lowerKindSpecialClasses. It didn't make
sense to pass SymL to lowerKindAbstractFun because it would eventually
put into symi8 but never referenced because symInfo were never called
with SymL.
Removed unwanted diffs introduced by a series of modifications
@Ingo60
Copy link
Member

Ingo60 commented Nov 1, 2019

Great work, indeed! I think this justifies/requires a bump in the major version.

I shall figure out what I have to change in the eclipse plugin to accommodate this.
However, I prefer to wait until Christmas, where I have a couple of leisure days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants