From 344a08dd71b40f846bf2b3264d428847cdeed729 Mon Sep 17 00:00:00 2001 From: avsm Date: Wed, 23 Nov 2022 15:00:44 +0000 Subject: [PATCH] deploy: 0f8ea69ad66f745bd52ecae7189b72f8b6d50da8 --- command-line-parsing.html | 6 ++--- compiler-backend.html | 24 ++++++++--------- compiler-frontend.html | 12 ++++----- concurrent-programming.html | 24 ++++++++--------- data-serialization.html | 18 ++++++------- error-handling.html | 12 ++++----- files-modules-and-programs.html | 12 ++++----- first-class-modules.html | 6 ++--- foreign-function-interface.html | 30 ++++++++++----------- garbage-collector.html | 24 ++++++++--------- guided-tour.html | 38 +++++++++++++-------------- imperative-programming.html | 18 ++++++------- index.html | 2 +- json.html | 18 ++++++------- lists-and-patterns.html | 12 ++++----- maps-and-hashtables.html | 24 ++++++++--------- objects.html | 18 ++++++------- parsing-with-ocamllex-and-menhir.html | 18 ++++++------- platform.html | 24 ++++++++--------- records.html | 6 ++--- runtime-memory-layout.html | 18 ++++++------- testing.html | 18 ++++++------- toc.html | 2 +- variables-and-functions.html | 18 ++++++------- variants.html | 12 ++++----- 25 files changed, 207 insertions(+), 207 deletions(-) diff --git a/command-line-parsing.html b/command-line-parsing.html index 627228b87..5aa2d1887 100644 --- a/command-line-parsing.html +++ b/command-line-parsing.html @@ -685,11 +685,11 @@

Installing the Completion Fragment

add diff help version

Command completion support works for flags and grouped commands and is very useful when building larger command-line interfaces. Don’t forget to install the shell fragment into your global bash_completion.d directory if you want it to be loaded in all of your login shells.  

-
-

Installing a Generic Completion Handler

+
+

Installing a Generic Completion Handler

Sadly, bash doesn’t support installing a generic handler for all Command-based applications. This means you have to install the completion script for every application, but you should be able to automate this in the build and packaging system for your application.

It will help to check out how other applications install tab-completion scripts and follow their lead, as the details are very OS-specific.

-
+
diff --git a/compiler-backend.html b/compiler-backend.html index 283a40a0e..b557dfa02 100644 --- a/compiler-backend.html +++ b/compiler-backend.html @@ -225,12 +225,12 @@

Generating Portable Bytecode

The preceding bytecode has been simplified from the lambda form into a set of simple instructions that are executed serially by the interpreter.

There are around 140 instructions in total, but most are just minor variants of commonly encountered operations (e.g., function application at a specific arity). You can find full details online.  

-
-

Where Did the Bytecode Instruction Set Come From?

+
+

Where Did the Bytecode Instruction Set Come From?

The bytecode interpreter is much slower than compiled native code, but is still remarkably performant for an interpreter without a JIT compiler. Its efficiency can be traced back to Xavier Leroy’s ground-breaking work in 1990, “The ZINC experiment: An Economical Implementation of the ML Language”.

This paper laid the theoretical basis for the implementation of an instruction set for a strictly evaluated functional language such as OCaml. The bytecode interpreter in modern OCaml is still based on the ZINC model. The native code compiler uses a different model since it uses CPU registers for function calls instead of always passing arguments on the stack, as the bytecode interpreter does.

Understanding the reasoning behind the different implementations of the bytecode interpreter and the native compiler is a very useful exercise for any budding language hacker.

-
+

Compiling and Linking Bytecode

The ocamlc command compiles individual ml files into bytecode files that have a cmo extension. The compiled bytecode files are matched with the associated cmi interface, which contains the type signature exported to other compilation units.  

@@ -439,10 +439,10 @@

Benchmarking Polymorphic Comparison

We see that the polymorphic comparison is close to 10 times slower! These results shouldn’t be taken too seriously, as this is a very narrow test that, like all such microbenchmarks, isn’t representative of more complex codebases. However, if you’re building numerical code that runs many iterations in a tight inner loop, it’s worth manually peering at the produced assembly code to see if you can hand-optimize it.

-
-

Accessing Stdlib Modules from Within Core

+
+
+

Accessing Stdlib Modules from Within Core

In the benchmark above comparing polymorphic and monomorphic comparison, you may have noticed that we prepended the comparison functions with Stdlib. This is because the Core module explicitly redefines the > and < and = operators to be specialized for operating over int types, as explained in Chapter 14, Maps and Hashtables. You can always recover any of the OCaml standard library functions by accessing them through the Stdlib module, as we did in our benchmark.

-
@@ -607,8 +607,9 @@

Perf

This trace broadly reflects the results of the benchmark itself. The mutable benchmark consists of the combination of the call to test_mutable and the caml_modify write barrier function in the runtime. This adds up to slightly over half the execution time of the application.

Perf has a growing collection of other commands that let you archive these runs and compare them against each other. You can read more on the home page.  

-
-

Using the Frame Pointer to Get More Accurate Traces

+
+
+

Using the Frame Pointer to Get More Accurate Traces

Although Perf doesn’t require adding in explicit probes to the binary, it does need to understand how to unwind function calls so that the kernel can accurately record the function backtrace for every event. Since Linux 3.9 the kernel has had support for using DWARF debug information to parse the program stack, which is emitted when the -g flag is passed to the OCaml compiler. For even more accurate stack parsing, we need the compiler to fall back to using the same conventions as C for function calls. On 64-bit Intel systems, this means that a special register known as the frame pointer is used to record function call history. Using the frame pointer in this fashion means a slowdown (typically around 3-5%) since it’s no longer available for general-purpose use.

OCaml thus makes the frame pointer an optional feature that can be used to improve the resolution of Perf traces. opam provides a compiler switch that compiles OCaml with the frame pointer activated:

@@ -616,7 +617,6 @@

Using the Frame Poi

Using the frame pointer changes the OCaml calling convention, but opam takes care of recompiling all your libraries with the new interface.

-
@@ -635,11 +635,11 @@

Embedding Native Code in C

After calling OCaml

The embed_native.o is a standalone object file that has no further references to OCaml code beyond the runtime library, just as with the bytecode runtime. Do remember that the link order of the libraries is significant in modern GNU toolchains (especially as used in Ubuntu 11.10 and later) that resolve symbols from left to right in a single pass. 

-
-

Activating the Debug Runtime

+
+

Activating the Debug Runtime

Despite your best efforts, it is easy to introduce a bug into some components, such as C bindings, that causes heap invariants to be violated. OCaml includes a libasmrund.a variant of the runtime library which is compiled with extra debugging checks that perform extra memory integrity checks during every garbage collection cycle. Running these extra checks will abort the program nearer the point of corruption and help isolate the bug in the C code.

To use the debug library, just link your program with the -runtime-variant d flag:

-
+
ocamlopt -runtime-variant d -verbose -o hello.native hello.ml
 >+ as  -o 'hello.o' '/tmp/build_cd0b96_dune/camlasmd3c336.s'
diff --git a/compiler-frontend.html b/compiler-frontend.html
index 49262071d..93cd22b3b 100644
--- a/compiler-frontend.html
+++ b/compiler-frontend.html
@@ -297,13 +297,13 @@ 

Displaying Inferred Types from the Compiler

> Actual declaration [2]
-
-

Which Comes First: The ml or the mli?

+
+

Which Comes First: The ml or the mli?

There are two schools of thought on which order OCaml code should be written in. It’s very easy to begin writing code by starting with an ml file and using the type inference to guide you as you build up your functions. The mli file can then be generated as described, and the exported functions documented.     

If you’re writing code that spans multiple files, it’s sometimes easier to start by writing all the mli signatures and checking that they type-check against one another. Once the signatures are in place, you can write the implementations with the confidence that they’ll all glue together correctly, with no cyclic dependencies among the modules.

As with any such stylistic debate, you should experiment with which system works best for you. Everyone agrees on one thing though: no matter in what order you write them, production code should always explicitly define an mli file for every ml file in the project. It’s also perfectly fine to have an mli file without a corresponding ml file if you’re only declaring signatures (such as module types).

Signature files provide a place to write succinct documentation and to abstract internal details that shouldn’t be exported. Maintaining separate signature files also speeds up incremental compilation in larger code bases, since recompiling a mli signature is much faster than a full compilation of the implementation to native code.

-
+

Type Inference

@@ -517,8 +517,9 @@

Defining a Module Search Path

The type checker resolves such module references into concrete structures and signatures in order to unify types across module boundaries. It does this by searching a list of directories for a compiled interface file matching that module’s name. For example, it will look for alice.cmi and bob.cmi on the search path and use the first ones it encounters as the interfaces for Alice and Bob.

The module search path is set by adding -I flags to the compiler command line with the directory containing the cmi files as the argument. Manually specifying these flags gets complex when you have lots of libraries, and is the reason why tools like dune and ocamlfind exist. They both automate the process of turning third-party package names and build descriptions into command-line flags that are passed to the compiler command line.

By default, only the current directory and the OCaml standard library will be searched for cmi files. The Stdlib module from the standard library will also be opened by default in every compilation unit. The standard library location is obtained by running ocamlc -where and can be overridden by setting the CAMLLIB environment variable. Needless to say, don’t override the default path unless you have a good reason to (such as setting up a cross-compilation environment).    

-
-

Inspecting Compilation Units with ocamlobjinfo

+
+
+

Inspecting Compilation Units with ocamlobjinfo

For separate compilation to be sound, we need to ensure that all the cmi files used to type-check a module are the same across compilation runs. If they vary, this raises the possibility of two modules checking different type signatures for a common module with the same name. This in turn lets the program completely violate the static type system and can lead to memory corruption and crashes.

OCaml guards against this by recording a MD5 checksum in every cmi. Let’s examine our earlier typedef.ml more closely:

@@ -542,7 +543,6 @@

Inspecting Compilation U over interface Map

This hash check is very conservative, but ensures that separate compilation remains type-safe all the way up to the final link phase. Your build system should ensure that you never see the preceding error messages, but if you do run into it, just clean out your intermediate files and recompile from scratch.

-
diff --git a/concurrent-programming.html b/concurrent-programming.html index eec82c4c7..65f3c44d2 100644 --- a/concurrent-programming.html +++ b/concurrent-programming.html @@ -221,8 +221,8 @@

Ivars and Upon

This code isn’t particularly long, but it is subtle. In particular, note how the queue of thunks is used to ensure that the enqueued actions are run in the order they were scheduled, even if the thunks scheduled by upon are run out of order. This kind of subtlety is typical of code that involves ivars and upon, and because of this, you should stick to the simpler map/bind/return style of working with deferreds when you can.

-
-

Understanding bind in Terms of Ivars and upon

+
+

Understanding bind in Terms of Ivars and upon

Here’s roughly what happens when you write let d' = Deferred.bind d ~f.

  • A new ivar i is created to hold the final result of the computation. The corresponding deferred is returned

  • @@ -242,7 +242,7 @@

    Understanding bind<

Async’s real implementation has more optimizations and is therefore more complicated. But the above implementation is still a useful first-order mental model for how bind works under the covers. And it’s another good example of how upon and ivars can be useful for building concurrency primitives.

- +
@@ -271,12 +271,12 @@

Example: An Echo Server

If we hit an end-of-file condition, the loop is ended. The deferred returned by a call to copy_blocks becomes determined only once the end-of-file condition is hit.  

One important aspect of how copy_blocks is written is that it provides pushback, which is to say that if the process can’t make progress writing, it will stop reading. If you don’t implement pushback in your servers, then anything that prevents you from writing (e.g., a client that is unable to keep up) will cause your program to allocate unbounded amounts of memory, as it keeps track of all the data it intends to write but hasn’t been able to yet.

-
-

Tail-Calls and Chains of Deferreds

+
+

Tail-Calls and Chains of Deferreds

There’s another memory problem you might be concerned about, which is the allocation of deferreds. If you think about the execution of copy_blocks, you’ll see it’s creating a chain of deferreds, two per time through the loop. The length of this chain is unbounded, and so, naively, you’d think this would take up an unbounded amount of memory as the echo process continues.

Happily, this is a case that Async knows how to optimize. In particular, the whole chain of deferreds should become determined precisely when the final deferred in the chain is determined, in this case, when the Eof condition is hit. Because of this, we could safely replace all of these deferreds with a single deferred. Async does just this, and so there’s no memory leak after all.

This is essentially a form of tail-call optimization, lifted to the Deferred monad. Indeed, you can tell that the bind in question doesn’t lead to a memory leak in more or less the same way you can tell that the tail recursion optimization should apply, which is that the bind that creates the deferred is in tail-position. In other words, nothing is done to that deferred once it’s created; it’s simply returned as is.  

-
+

copy_blocks provides the logic for handling a client connection, but we still need to set up a server to receive such connections and dispatch to copy_blocks. For this, we’ll use Async’s Tcp module, which has a collection of utilities for creating TCP clients and servers:  

(** Starts a TCP server, which listens on the specified port, invoking
@@ -315,8 +315,8 @@ 

Tail-Calls and Chains of Deferreds

-
-

Functions that Never Return

+
+

Functions that Never Return

The call to never_returns around the call to Scheduler.go is a little bit surprising, but it has a purpose: to make it clear to whoever invokes Scheduler.go that the function never returns.    

By default, a function that doesn’t return will have an inferred return type of 'a:

@@ -368,7 +368,7 @@

Functions that Never Return

>val do_stuff : int -> int = <fun>
-
+

Improving the Echo Server

Let’s try to go a little bit farther with our echo server by walking through a few improvements. In particular, we will:

@@ -948,12 +948,12 @@

Working with System Threads

Given that threads don’t provide physical parallelism, why are they useful at all?

The most common reason for using system threads is that there are some operating system calls that have no nonblocking alternative, which means that you can’t run them directly in a system like Async without blocking your entire program. For this reason, Async maintains a thread pool for running such calls. Most of the time, as a user of Async you don’t need to think about this, but it is happening under the covers.  

Another reason to have multiple threads is to deal with non-OCaml libraries that have their own event loop or for another reason need their own threads. In that case, it’s sometimes useful to run some OCaml code on the foreign thread as part of the communication to your main program. OCaml’s foreign function interface is discussed in more detail in Chapter 22, Foreign Function Interface.

-
-

Multicore OCaml

+
+

Multicore OCaml

OCaml doesn’t support truly parallel threads today, but it will soon. The current development branch of OCaml, which is expected to be released in 2022 as OCaml 5.0, has a long awaited multicore-capable garbage collector, which is the result of years of research and hard implementation work.  

We won’t discuss the multicore gc here in part because it’s not yet released, and in part because there’s a lot of open questions about how OCaml programs should take advantage of multicore in a way that’s safe, convenient, and performant. Given all that, we just don’t know enough to write a chapter about multicore today.

In any case, while multicore OCaml isn’t here yet, it’s an exciting part of OCaml’s near-term future.

-
+

Another occasional use for system threads is to better interoperate with compute-intensive OCaml code. In Async, if you have a long-running computation that never calls bind or map, then that computation will block out the Async runtime until it completes.

One way of dealing with this is to explicitly break up the calculation into smaller pieces that are separated by binds. But sometimes this explicit yielding is impractical, since it may involve intrusive changes to an existing codebase. Another solution is to run the code in question in a separate thread. Async’s In_thread module provides multiple facilities for doing just this, In_thread.run being the simplest. We can simply write:  

diff --git a/data-serialization.html b/data-serialization.html index a3caf0d31..3fbc43140 100644 --- a/data-serialization.html +++ b/data-serialization.html @@ -43,8 +43,8 @@

Basic Usage

>- : Sexp.t = (1 2 (3 4))
-
-

Base, Core, and Parsexp

+
+

Base, Core, and Parsexp

In these examples, we’re using Core rather than Base because Core has integrated support for parsing s-expressions, courtesy of the Parsexp library. If you just use Base, you’ll find that you don’t have Sexp.of_string at your disposal.

open Base;;
@@ -63,7 +63,7 @@ 

Base, Core, and Parsexp

>- : Sexp.t = (1 2 3)
-
+

In addition to providing the Sexp module, most of the base types in Base and Core support conversion to and from s-expressions. For example, we can use the conversion functions defined in the respective modules for integers, strings, and exceptions:

Int.sexp_of_t 3;;
@@ -99,8 +99,8 @@ 

Base, Core, and Parsexp

>(Of_sexp_error "int_of_sexp: (Failure int_of_string)" (invalid_sexp three))
-
-

More on Top-Level Printing

+
+

More on Top-Level Printing

The values of the s-expressions that we created were printed properly as s-expressions in the toplevel, instead of as the tree of Atom and List variants that they’re actually made of.  

This is due to OCaml’s facility for installing custom top-level printers that can rewrite some values into more top-level-friendly equivalents. They are generally installed as ocamlfind packages ending in top:

@@ -115,7 +115,7 @@

More on Top-Level Printing

The core.top package (which you should have loaded by default in your .ocamlinit file) loads in printers for the Core extensions already, so you don’t need to do anything special to use the s-expression printer.

-
+

S-Expression Converters for New Types

But what if you want a function to convert a brand new type to an s-expression? You can of course write it yourself manually. Here’s an example.  

@@ -170,8 +170,8 @@

S-Expression Converters for New Types

The syntax extensions bundled with Base and Core almost all have the same basic structure: they auto-generate code based on type definitions, implementing functionality that you could in theory have implemented by hand, but with far less programmer effort.

-
-

Syntax Extensions and PPX

+
+

Syntax Extensions and PPX

OCaml doesn’t directly support deriving s-expression converters from type definitions. Instead, it provides a mechanism called PPX which allows you to add to the compilation pipeline code for transforming OCaml programs at the syntactic level, via the -ppx compiler flag.

PPXs operate on OCaml’s abstract syntax tree, or AST, which is a data type that represents the syntax of a well-formed OCaml program. Annotations like [%sexp_of: int] or [@@deriving sexp] are part of special extensions to the syntax, called extension points, which were added to the language to give a place to put information that would be consumed by syntax extensions like ppx_sexp_conv.   

ppx_sexp_conv is part of a family of syntax extensions, including ppx_compare, described in Chapter 14, Maps And Hash Tables, and ppx_fields, described in Chapter 5, Records, that generate code based on type declarations.        

@@ -182,7 +182,7 @@

Syntax Extensions and PPX

(preprocess (pps ppx_sexp_conv)) )
- +
diff --git a/error-handling.html b/error-handling.html index 5a63b067d..70d1c7366 100644 --- a/error-handling.html +++ b/error-handling.html @@ -169,8 +169,8 @@

bind and Other Error Handling Idioms

This use of bind isn’t really materially better than the one we started with, and indeed, for small examples like this, direct matching of options is generally better than using bind. But for large, complex examples with many stages of error handling, the bind idiom becomes clearer and easier to manage.

-
-

Monads and Let_syntax

+
+

Monads and Let_syntax

We can make this look a little bit more ordinary by using a syntax extension that’s designed specifically for monadic binds, called Let_syntax. Here’s what the above example looks like using this extension.

#require "ppx_let";;
@@ -187,7 +187,7 @@ 

Monads and Let_syntax

Note that we needed a #require statement to enable the extension.

To understand what’s going on here, you need to know that let%bind x = some_expr in some_other_expr is rewritten into some_expr >>= fun x -> some_other_expr.

The advantage of Let_syntax is that it makes monadic bind look more like a regular let-binding. This works nicely because you can think of the monadic bind in this case as a special form of let binding that has some built-in error handling semantics.

-
+

There are other useful idioms encoded in the functions in Option. One example is Option.both, which takes two optional values and produces a new optional pair that is None if either of its arguments are None. Using Option.both, we can make compute_bounds even shorter:

let compute_bounds ~compare list =
@@ -272,8 +272,8 @@ 

Exceptions

forever doesn’t return a value for a different reason: it’s an infinite loop.

This all matters because it means that the return type of raise can be whatever it needs to be to fit into the context it is called in. Thus, the type system will let us throw an exception anywhere in a program.

-
-

Declaring Exceptions Using [@@deriving sexp]

+
+

Declaring Exceptions Using [@@deriving sexp]

OCaml can’t always generate a useful textual representation of an exception. For example:  

type 'a bounds = { lower: 'a; upper: 'a };;
@@ -298,7 +298,7 @@ 

Declaring Exceptions Using

The period in front of Crossed_bounds is there because the representation generated by [@@deriving sexp] includes the full module path of the module where the exception in question is defined. In this case, the string //toplevel// is used to indicate that this was declared at the utop prompt, rather than in a module.

This is all part of the support for s-expressions provided by the Sexplib library and syntax extension, which is described in more detail in Chapter 20, Data Serialization With S-Expressions.

-

+

Helper Functions for Throwing Exceptions

Base provides a number of helper functions to simplify the task of throwing exceptions. The simplest one is failwith, which could be defined as follows:   

diff --git a/files-modules-and-programs.html b/files-modules-and-programs.html index cbbddcd3a..a9cb67096 100644 --- a/files-modules-and-programs.html +++ b/files-modules-and-programs.html @@ -42,11 +42,11 @@

Single-File Programs

The function build_counts reads in lines from stdin, constructing from those lines an association list with the frequencies of each line. It does this by invoking In_channel.fold_lines (similar to the function List.fold described in Chapter 3, Lists And Patterns), which reads through the lines one by one, calling the provided fold function for each line to update the accumulator. That accumulator is initialized to the empty list.

With build_counts defined, we then call the function to build the association list, sort that list by frequency in descending order, grab the first 10 elements off the list, and then iterate over those 10 elements and print them to the screen. These operations are tied together using the |> operator described in Chapter 2, Variables And Functions.  

-
-

Where Is main?

+
+

Where Is main?

Unlike programs in C, Java or C#, programs in OCaml don’t have a unique main function. When an OCaml program is evaluated, all the statements in the implementation files are evaluated in the order in which they were linked together. These implementation files can contain arbitrary expressions, not just function definitions. In this example, the declaration starting with let () = plays the role of the main function, kicking off the processing. But really the entire file is evaluated at startup, and so in some sense the full codebase is one big main function.

The idiom of writing let () = may seem a bit odd, but it has a purpose. The let binding here is a pattern-match to a value of type unit, which is there to ensure that the expression on the right-hand side returns unit, as is common for functions that operate primarily by side effect.

-
+

If we weren’t using Base or any other external libraries, we could build the executable like this:

ocamlopt freq.ml -o freq
@@ -109,12 +109,12 @@ 

Where Is main?

We’ve really just scratched the surface of what can be done with dune. We’ll discuss dune in more detail in Chapter 21, The OCaml Platform.

-
-

Bytecode Versus Native Code

+
+

Bytecode Versus Native Code

OCaml ships with two compilers: the ocamlopt native code compiler and the ocamlc bytecode compiler. Programs compiled with ocamlc are interpreted by a virtual machine, while programs compiled with ocamlopt are compiled to machine code to be run on a specific operating system and processor architecture. With dune, targets ending with .bc are built as bytecode executables, and those ending with .exe are built as native code.

Aside from performance, executables generated by the two compilers have nearly identical behavior. There are a few things to be aware of. First, the bytecode compiler can be used on more architectures, and has some tools that are not available for native code. For example, the OCaml debugger only works with bytecode (although gdb, the GNU Debugger, works with some limitations on OCaml native-code applications). The bytecode compiler is also quicker than the native-code compiler. In addition, in order to run a bytecode executable, you typically need to have OCaml installed on the system in question. That’s not strictly required, though, since you can build a bytecode executable with an embedded runtime, using the -custom compiler flag.

As a general matter, production executables should usually be built using the native-code compiler, but it sometimes makes sense to use bytecode for development builds. And, of course, bytecode makes sense when targeting a platform not supported by the native-code compiler. We’ll cover both compilers in more detail in Chapter 26, The Compiler Backend: Byte Code And Native Code.

-
+

Multifile Programs and Modules

diff --git a/first-class-modules.html b/first-class-modules.html index 1499d9e8e..126e4aeb5 100644 --- a/first-class-modules.html +++ b/first-class-modules.html @@ -179,8 +179,8 @@

Exposing types

Polymorphic first-class modules are important because they allow you to connect the types associated with a first-class module to the types of other values you’re working with.

-
-

More on Locally Abstract Types

+
+

More on Locally Abstract Types

One of the key properties of locally abstract types is that they’re dealt with as abstract types in the function they’re defined within, but are polymorphic from the outside. Consider the following example:  

let wrap_in_list (type a) (x:a) = [x];;
@@ -216,7 +216,7 @@ 

More on Locally Abstract Types

This technique is useful beyond first-class modules. For example, we can use the same approach to construct a local module to be fed to a functor.

-
+
diff --git a/foreign-function-interface.html b/foreign-function-interface.html index c694a25bc..fa876fc3e 100644 --- a/foreign-function-interface.html +++ b/foreign-function-interface.html @@ -12,8 +12,8 @@

Foreign Function Interface

The simplest foreign function interface in OCaml doesn’t even require you to write any C code at all! The Ctypes library lets you define the C interface in pure OCaml, and the library then takes care of loading the C symbols and invoking the foreign function call.   

Let’s dive straight into a realistic example to show you how the library looks. We’ll create a binding to the Ncurses terminal toolkit, as it’s widely available on most systems and doesn’t have any complex dependencies.

-
-

Installing the Ctypes Library

+
+

Installing the Ctypes Library

If you want to use Ctypes interactively, you’ll also need to install the libffi library as a prerequisite to using Ctypes. It’s a fairly popular library and should be available in your OS package manager. If you’re using opam 2.1 or higher, it will prompt you to install it automatically when you install ctypes-foreign.

$ opam install ctypes ctypes-foreign
@@ -21,7 +21,7 @@ 

Installing the Ctypes Library

# require "ctypes-foreign" ;;

You’ll also need the Ncurses library for the first example. This comes preinstalled on many operating systems such as macOS, and Debian Linux provides it as the libncurses5-dev package.

-
+

Example: A Terminal Interface

Ncurses is a library to help build terminal-independent text interfaces in a reasonably efficient way. It’s used in console mail clients like Mutt and Pine, and console web browsers such as Lynx. 

@@ -155,14 +155,14 @@

Example: A Terminal Interface

Running hello.exe should now display a Hello World in your terminal!

Ctypes wouldn’t be very useful if it were limited to only defining simple C types, of course. It provides full support for C pointer arithmetic, pointer conversions, and reading and writing through pointers, using OCaml functions as function pointers to C code, as well as struct and union definitions.

We’ll go over some of these features in more detail for the remainder of the chapter by using some POSIX date functions as running examples.

-
-

Linking Modes: libffi and Stub Generation

+
+

Linking Modes: libffi and Stub Generation

The core of ctypes is a set of OCaml combinators for describing the structure of C types (numeric types, arrays, pointers, structs, unions and functions). You can then use these combinators to describe the types of the C functions that you want to call. There are two entirely distinct ways to actually link to the system libraries that contain the function definitions: dynamic linking and stub generation.

The ctypes-foreign package used in this chapter uses the low-level libffi library to dynamically open C libraries, search for the relevant symbols for the function call being invoked, and marshal the function parameters according to the operating system’s application binary interface (ABI). While much of this happens behind-the-scenes and permits convenient interactive programming while developing bindings, it is not always the solution you want to use in production.

The ctypes-cstubs package provides an alternative mechanism to shift much of the linking work to be done once at build time, instead of doing it on every invocation of the function. It does this by taking the same OCaml binding descriptions, and generating intermediate C source files that contain the corresponding C/OCaml glue code. When these are compiled with a normal dune build, the generated C code is treated just as any handwritten code might be, and compiled against the system header files. This allows certain C values to be used that cannot be dynamically probed (e.g. preprocessor macro definitions), and can also catch definition errors if there is a C header mismatch at compile time.

C rarely makes life easier though. There are some definitions that cannot be entirely expressed as static C code (e.g. dynamic function pointers), and those require the use of ctypes-foreign (and libffi). Using ctypes does make it possible to share the majority of definitions across both linking modes, all while avoiding writing C code directly.

While we do not cover the details of C stub generation further in this chapter, you can read more about how to use this mode in the “Dealing with foreign libraries” chapter in the dune manual.

-
+

Basic Scalar C Types

@@ -344,12 +344,12 @@

Using Views to Map Complex Values

val string    : string typ
-
-

OCaml Strings Versus C Character Buffers

+
+

OCaml Strings Versus C Character Buffers

Although OCaml strings may look like C character buffers from an interface perspective, they’re very different in terms of their memory representations.

OCaml strings are stored in the OCaml heap with a header that explicitly defines their length. C buffers are also fixed-length, but by convention, a C string is terminated by a null (a \0 byte) character. The C string functions calculate their length by scanning the buffer until the first null character is encountered.

This means that you need to be careful that OCaml strings that you pass to C functions don’t contain any null values, since the first occurrence of a null character will be treated as the end of the C string. Ctypes also defaults to a copying interface for strings, which means that you shouldn’t use them when you want the library to mutate the buffer in-place. In that situation, use the Ctypes Bigarray support to pass memory by reference instead.

-
+
@@ -512,8 +512,9 @@

Recap: a Time-Printing Command

>Mon Oct 11 15:57:38 2021
-
-

Why Do We Need to Use returning?

+
+
+

Why Do We Need to Use returning?

The alert reader may be curious about why all these function definitions have to be terminated by returning:

(* correct types *)
@@ -558,7 +559,6 @@ 

Why Do We Need to Use returning?

The OCaml type of uncurried_C when bound by Ctypes is int -> int -> int: a two-argument function. The OCaml type of curried_C when bound by ctypes is int -> (int -> int): a one-argument function that returns a one-argument function.

In OCaml, of course, these types are absolutely equivalent. Since the OCaml types are the same but the C semantics are quite different, we need some kind of marker to distinguish the cases. This is the purpose of returning in function definitions.

-
@@ -703,14 +703,14 @@

Example: A Command-Line Quicksort

The qsort' wrapper function has a much more canonical OCaml interface than the raw binding. It accepts a comparator function and a Ctypes array, and returns unit.

Using qsort' to sort arrays is straightforward. Our example code reads the standard input as a list, converts it to a C array, passes it through qsort, and outputs the result to the standard output. Again, remember to not confuse the Ctypes.Array module with the Core.Array module: the former is in scope since we opened Ctypes at the start of the file.    

-
-

Lifetime of Allocated Ctypes

+
+

Lifetime of Allocated Ctypes

Values allocated via Ctypes (i.e., using allocate, Array.make, and so on) will not be garbage-collected as long as they are reachable from OCaml values. The system memory they occupy is freed when they do become unreachable, via a finalizer function registered with the garbage collector (GC).

The definition of reachability for Ctypes values is a little different from conventional OCaml values, though. The allocation functions return an OCaml-managed pointer to the value, and as long as some derivative pointer is still reachable by the GC, the value won’t be collected.

“Derivative” means a pointer that’s computed from the original pointer via arithmetic, so a reachable reference to an array element or a structure field protects the whole object from collection.

A corollary of the preceding rule is that pointers written into the C heap don’t have any effect on reachability. For example, if you have a C-managed array of pointers to structs, then you’ll need some additional way of keeping the structs themselves around to protect them from collection. You could achieve this via a global array of values on the OCaml side that would keep them live until they’re no longer needed.

Functions passed to C have similar considerations regarding lifetime. On the OCaml side, functions created at runtime may be collected when they become unreachable. As we’ve seen, OCaml functions passed to C are converted to function pointers, and function pointers written into the C heap have no effect on the reachability of the OCaml functions they reference. With qsort things are straightforward, since the comparison function is only used during the call to qsort itself. However, other C libraries may store function pointers in global variables or elsewhere, in which case you’ll need to take care that the OCaml functions you pass to them aren’t prematurely garbage-collected.

-
+
diff --git a/garbage-collector.html b/garbage-collector.html index b3668795a..b10f298e6 100644 --- a/garbage-collector.html +++ b/garbage-collector.html @@ -21,11 +21,11 @@

Generational Garbage Collection

A typical functional programming style means that young blocks tend to die young and old blocks tend to stay around for longer than young ones. This is often referred to as the generational hypothesis.  

OCaml uses different memory layouts and garbage-collection algorithms for the major and minor heaps to account for this generational difference. We’ll explain how they differ in more detail next.

-
-

The Gc Module and OCAMLRUNPARAM

+
+

The Gc Module and OCAMLRUNPARAM

OCaml provides several mechanisms to query and alter the behavior of the runtime system. The Gc module provides this functionality from within OCaml code, and we’ll frequently refer to it in the rest of the chapter. As with several other standard library modules, Core alters the Gc interface from the standard OCaml library. We’ll assume that you’ve opened Core in our explanations.   

You can also control the behavior of OCaml programs by setting the OCAMLRUNPARAM environment variable before launching your application. This lets you set GC parameters without recompiling, for example to benchmark the effects of different settings. The format of OCAMLRUNPARAM is documented in the OCaml manual.

-
+

The Fast Minor Heap

@@ -46,8 +46,9 @@

Understanding Allocation

It is possible to write loops or recurse in a way that may take a long time to do an allocation - if at all. To ensure that UNIX signals and other internal bookkeeping that require interrupting the running OCaml program still happen the compiler introduces poll points into generated native code.

These poll points check ptr against limit and developers should expect them to be placed at the start of every function and the back edge of loops. The compiler includes a dataflow pass that removes all but the minimum set of points necessary to ensure these checks happen in a bounded amount of time.

 

-
-

Setting the Size of the Minor Heap

+
+
+

Setting the Size of the Minor Heap

The default minor heap size in OCaml is normally 2 MB on 64-bit platforms, but this is increased to 8 MB if you use Core (which generally prefers default settings that improve performance, but at the cost of a bigger memory profile). This setting can be overridden via the s=<words> argument to OCAMLRUNPARAM. You can change it after the program has started by calling the Gc.set function:

open Core;;
@@ -63,7 +64,6 @@ 

Setting the Size of the Minor Heap

Changing the GC size dynamically will trigger an immediate minor heap collection. Note that Core increases the default minor heap size from the standard OCaml installation quite significantly, and you’ll want to reduce this if running in very memory-constrained environments.

-
@@ -159,11 +159,11 @@

Controlling Major Heap Collections

Heap Compaction

After a certain number of major GC cycles have completed, the heap may begin to be fragmented due to values being deallocated out of order from how they were allocated. This makes it harder for the GC to find a contiguous block of memory for fresh allocations, which in turn would require the heap to be grown unnecessarily.    

The heap compaction cycle avoids this by relocating all the values in the major heap into a fresh heap that places them all contiguously in memory again. A naive implementation of the algorithm would require extra memory to store the new heap, but OCaml performs the compaction in place within the existing heap.

-
-

Controlling Frequency of Compactions

+
+

Controlling Frequency of Compactions

The max_overhead setting in the Gc module defines the connection between free memory and allocated memory after which compaction is activated.

A value of 0 triggers a compaction after every major garbage collection cycle, whereas the maximum value of 1000000 disables heap compaction completely. The default settings should be fine unless you have unusual allocation patterns that are causing a higher-than-usual rate of compactions:

-
+
Gc.tune ~max_overhead:0 ();;
 >- : unit = ()
@@ -258,11 +258,11 @@ 

The Mutable Write Barrier

Attaching Finalizer Functions to Values

OCaml’s automatic memory management guarantees that a value will eventually be freed when it’s no longer in use, either via the GC sweeping it or the program terminating. It’s sometimes useful to run extra code just before a value is freed by the GC, for example, to check that a file descriptor has been closed, or that a log message is recorded.    

-
-

What Values Can Be Finalized?

+
+

What Values Can Be Finalized?

Various values cannot have finalizers attached since they aren’t heap-allocated. Some examples of values that are not heap-allocated are integers, constant constructors, Booleans, the empty array, the empty list, and the unit value. The exact list of what is heap-allocated or not is implementation-dependent, which is why Core provides the Heap_block module to explicitly check before attaching the finalizer.

Some constant values can be heap-allocated but never deallocated during the lifetime of the program, for example, a list of integer constants. Heap_block explicitly checks to see if the value is in the major or minor heap, and rejects most constant values. Compiler optimizations may also duplicate some immutable values such as floating-point values in arrays. These may be finalized while another duplicate copy is being used by the program.

-
+

Core provides a Heap_block module that dynamically checks if a given value is suitable for finalizing. Core keeps the functions for registering finalizers in the Core.Gc.Expert module. Finalizers can run at any time in any thread, so they can be pretty hard to reason about in multi-threaded contexts.   Async, which we discussed in Chapter 16, Concurrent Programming with Async, shadows the Gc module with its own module that contains a function, Gc.add_finalizer, which is concurrency-safe. In particular, finalizers are scheduled in their own Async job, and care is taken by Async to capture exceptions and raise them to the appropriate monitor for error-handling.  

Let’s explore this with a small example that finalizes values of different types, all of which are heap-allocated.

diff --git a/guided-tour.html b/guided-tour.html index ccbbe8695..2c521d5d6 100644 --- a/guided-tour.html +++ b/guided-tour.html @@ -3,8 +3,8 @@

A Guided Tour

This chapter gives an overview of OCaml by walking through a series of small examples that cover most of the major features of the language. This should provide a sense of what OCaml can do, without getting too deep into any one topic.

Throughout the book we’re going to use Base, a more full-featured and capable replacement for OCaml’s standard library. We’ll also use utop, a shell that lets you type in expressions and evaluate them interactively. utop is an easier-to-use version of OCaml’s standard toplevel (which you can start by typing ocaml at the command line). These instructions will assume you’re using utop, but the ordinary toplevel should mostly work fine.

Before going any further, make sure you’ve followed the steps in the installation page.

-
-

Base and Core

+
+

Base and Core

Base comes along with another, yet more extensive standard library replacement, called Core. We’re going to mostly stick to Base, but it’s worth understanding the differences between these libraries.    

  • Base is designed to be lightweight, portable, and stable, while providing all of the fundamentals you need from a standard library. It comes with a minimum of external dependencies, so Base just takes seconds to build and install.

  • @@ -12,7 +12,7 @@

    Base and Core

As of the version of Base and Core used in this book (version v0.14), Core is less portable than Base, running only on UNIX-like systems. For that reason, there is another package, Core_kernel, which is the portable subset of Core. That said, in the latest stable release, v0.15 (which was released too late to be adopted for this edition of the book) Core is portable, and Core_kernel has been deprecated. Given that, we don’t use Core_kernel in this text.

-
+

Before getting started, make sure you have a working OCaml installation so you can try out the examples as you read through the chapter.

OCaml as a Calculator

@@ -200,8 +200,8 @@

Inferring Generic Types

In this example, big_number requires that 'a be instantiated as int, whereas "short" and "loooooong" require that 'a be instantiated as string, and they can’t both be right at the same time.

-
-

Type Errors Versus Exceptions

+
+

Type Errors Versus Exceptions

There’s a big difference in OCaml between errors that are caught at compile time and those that are caught at runtime. It’s better to catch errors as early as possible in the development process, and compilation time is best of all.    

Working in the toplevel somewhat obscures the difference between runtime and compile-time errors, but that difference is still there. Generally, type errors like this one:

@@ -225,7 +225,7 @@

Type Errors Versus Exceptions

The distinction here is that type errors will stop you whether or not the offending code is ever actually executed. Merely defining add_potato is an error, whereas is_a_multiple only fails when it’s called, and then, only when it’s called with an input that triggers the exception.

-
+
@@ -264,12 +264,12 @@

Tuples

The **. operator used above is for raising a floating-point number to a power.

This is just a first taste of pattern matching. Pattern matching is a pervasive tool in OCaml, and as you’ll see, it has surprising power.

-
-

Operators in Base and the Stdlib

+
+

Operators in Base and the Stdlib

OCaml’s standard library and Base mostly use the same operators for the same things, but there are some differences. For example, in Base, **. is float exponentiation, and ** is integer exponentiation, whereas in the standard library, ** is float exponentiation, and integer exponentiation isn’t exposed as an operator.

Base does what it does to be consistent with other numerical operators like *. and *, where the period at the end is used to mark the floating-point versions.

In general, Base is not shy about presenting different APIs than OCaml’s standard library when it’s done in the service of consistency and clarity.

-
+

Lists

@@ -324,8 +324,9 @@

Constructing Lists with ::

>- : string list = ["OCaml"; "Perl"; "C"]
-
-

Semicolons Versus Commas

+
+
+

Semicolons Versus Commas

Unlike many other languages, OCaml uses semicolons to separate list elements in lists rather than commas. Commas, instead, are used for separating elements in a tuple. If you try to use commas in a list, you’ll see that your code compiles but doesn’t do quite what you might expect:  

["OCaml", "Perl", "C"];;
@@ -340,7 +341,7 @@ 

Semicolons Versus Commas

to allocate a tuple of integers. This is generally considered poor style and should be avoided.

- +

The bracket notation for lists is really just syntactic sugar for ::. Thus, the following declarations are all equivalent. Note that [] is used to represent the empty list and that :: is right-associative:

[1; 2; 3];;
@@ -358,7 +359,6 @@ 

Semicolons Versus Commas

It’s important to remember that, unlike ::, this is not a constant-time operation. Concatenating two lists takes time proportional to the length of the first list.

-

List Patterns Using Match

The elements of a list can be accessed through pattern matching. List patterns are based on the two list constructors, [] and ::. Here’s a simple example:  

@@ -566,10 +566,10 @@

Records and Variants

You might at this point notice that the use of match here is reminiscent of how we used match with option and list. This is no accident: option and list are just examples of variant types that are important enough to be defined in the standard library (and in the case of lists, to have some special syntax).

We also made our first use of an anonymous function in the call to List.exists. Anonymous functions are declared using the fun keyword, and don’t need to be explicitly named. Such functions are common in OCaml, particularly when using iteration functions like List.exists.  

The purpose of List.exists is to check if there are any elements of the list in question for which the provided function evaluates to true. In this case, we’re using List.exists to check if there is a scene element within which our point resides.

-
-

Base and Polymorphic Comparison

+
+

Base and Polymorphic Comparison

One other thing to notice was the fact that we opened Float.O in the definition of is_inside_scene_element. That allowed us to use the simple, un-dotted infix operators, but more importantly it brought the float comparison operators into scope. When using Base, the default comparison operators work only on integers, and you need to explicitly choose other comparison operators when you want them. OCaml also offers a special set of polymorphic comparison operators that can work on almost any type, but those are considered to be problematic, and so are hidden by default by Base. We’ll learn more about polymorphic compare in Chapter 3, Terser and Faster Patterns.

-
+

Imperative Programming

@@ -684,8 +684,8 @@

Refs

This isn’t the most idiomatic way to sum up a list, but it shows how you can use a ref in place of a mutable variable.

-
-

Nesting lets with let and in

+
+

Nesting lets with let and in

The definition of sum in the above examples was our first use of let to define a new variable within the body of a function. A let paired with an in can be used to introduce a new binding within any local scope, including a function body. The in marks the beginning of the scope within which the new variable can be used. Thus, we could write: 

let z = 7 in
@@ -709,7 +709,7 @@ 

Nesting lets with let and

This kind of nested let binding is a common way of building up a complex expression, with each let naming some component, before combining them in one final expression.

-
+

For and While Loops

diff --git a/imperative-programming.html b/imperative-programming.html index af185d062..85cac42f4 100644 --- a/imperative-programming.html +++ b/imperative-programming.html @@ -354,8 +354,8 @@

Example: Doubly Linked Lists

let prev elt = elt.prev

These all follow relatively straightforwardly from our type definitions.

-
-

Cyclic Data Structures

+
+

Cyclic Data Structures

Doubly linked lists are a cyclic data structure, meaning that it is possible to follow a nontrivial sequence of pointers that closes in on itself. In general, building cyclic data structures requires the use of side effects. This is done by constructing the data elements first, and then adding cycles using assignment afterward.    

There is an exception to this, though: you can construct fixed-size cyclic data structures using let rec:

@@ -364,7 +364,7 @@

Cyclic Data Structures

This approach is quite limited, however. General-purpose cyclic data structures require mutation.

-
+

Modifying the List

Now, we’ll start considering operations that mutate the list, starting with insert_first, which inserts an element at the front of the list:  

@@ -697,8 +697,8 @@

Memoization and Dynamic Programming

>- : int = 2 -
-

Limitations of let rec

+
+

Limitations of let rec

You might wonder why we didn’t tie the recursive knot in memo_rec using let rec, as we did for make_rec earlier. Here’s code that tries to do just that:  

let memo_rec m f_norec =
@@ -739,7 +739,7 @@ 

Limitations of let rec

Laziness is more constrained than explicit mutation, and so in some cases can lead to code whose behavior is easier to think about.  

-
+
@@ -811,8 +811,8 @@

Formatted Output with printf

> int -
-

Understanding Format Strings

+
+

Understanding Format Strings

The format strings used by printf turn out to be quite different from ordinary strings. This difference ties to the fact that OCaml’s printf facility, unlike the equivalent in C, is type-safe. In particular, the compiler checks that the types referred to by the format string match the types of the rest of the arguments passed to printf.

To check this, OCaml needs to analyze the contents of the format string at compile time, which means the format string needs to be available as a string literal at compile time. Indeed, if you try to pass an ordinary string to printf, the compiler will complain:

@@ -845,7 +845,7 @@

Understanding Format Strings

If this looks different from everything else you’ve seen so far, that’s because it is. This is really a special case in the type system. Most of the time, you don’t need to know about this special handling of format strings—you can just use printf and not worry about the details. But it’s useful to keep the broad outlines of the story in the back of your head.

-
+

Now let’s see how we can rewrite our time conversion program to be a little more concise using printf:

open Core
diff --git a/index.html b/index.html
index 5ed14d78a..bd2200f01 100644
--- a/index.html
+++ b/index.html
@@ -56,7 +56,7 @@ 

Anil Madhavapeddy

College. He has worked in industry (NetApp, Citrix, Intel), academia (Cambridge, Imperial, UCLA) and startups (XenSource, Unikernel Systems, Docker) over the past two - decades. At Cambridge, is a member of the Energy and Environment + decades. At Cambridge, he is a member of the Energy and Environment Group which delves into the intersection of technology and conservation. He is a long-time maintainer on open-source projects ranging from OCaml, OpenBSD, Xen and diff --git a/json.html b/json.html index 6011a9709..8a4c52b2e 100644 --- a/json.html +++ b/json.html @@ -27,8 +27,8 @@

JSON Basics

The outermost JSON value is usually a record (delimited by the curly braces) and contains an unordered set of key/value pairs. The keys must be strings, but values can be any JSON type. In the preceding example, tags is a string list, while the authors field contains a list of records. Unlike OCaml lists, JSON lists can contain multiple different JSON types within a single list.

This free-form nature of JSON types is both a blessing and a curse. It’s very easy to generate JSON values, but code that parses them also has to handle subtle variations in how the values are represented. For example, what if the preceding pages value is actually represented as a string value of “450” instead of an integer?  

Our first task is to parse the JSON into a more structured OCaml type so that we can use static typing more effectively. When manipulating JSON in Python or Ruby, you might write unit tests to check that you have handled unusual inputs. The OCaml model prefers compile-time static checking as well as unit tests. For example, using pattern matching can warn you if you’ve not checked that a value can be Null as well as contain an actual value.        

-
-

Installing the Yojson Library

+
+

Installing the Yojson Library

There are several JSON libraries available for OCaml. For this chapter, we’ve picked the popular Yojson library, which you can install by running opam install yojson. Once installed, you can open it in utop as follows:

open Core;;
@@ -36,7 +36,7 @@ 

Installing the Yojson Library

open Yojson;;
-
+

Parsing JSON with Yojson

@@ -157,8 +157,8 @@

Selecting Values from JSON Structures

This code introduces the Yojson.Basic.Util module, which contains combinator functions that let you easily map a JSON object into a more strongly typed OCaml value.   

-
-

Functional Combinators

+
+

Functional Combinators

Combinators are a design pattern that crops up quite often in functional programming. John Hughes defines them as “a function which builds program fragments from program fragments.” In a functional language, this generally means higher-order functions that combine other functions to apply useful transformations over values.

You’ve already run across several of these in the List module:

@@ -170,7 +170,7 @@

Functional Combinators

val iter : 'a list -> f:('a -> unit) -> unit

iter is a more specialized combinator that is only useful when writing imperative code. The input function is applied to every value, but no result is supplied. The function must instead apply some side effect such as changing a mutable record field or printing to the standard output.

-
+

Yojson provides several combinators in the Yojson.Basic.Util module to manipulate values:   

-
-

Unused Lexing Values

+
+

Unused Lexing Values

In our parser, we have not used all the token regexps that we defined in the lexer. For instance, id is unused since we do not parse unquoted strings for object identifiers (something that is allowed by JavaScript, but not the subset of it that is JSON). If we included a token pattern match for this in the lexer, then we would have to adjust the parser accordingly to add a %token <string> ID. This would in turn trigger an “unused” warning since the parser never constructs a value with type ID:

File "parser.mly", line 4, characters 16-18:
 Warning: the token ID is unused.

It’s completely fine to define unused regexps as we’ve done, and to hook them into parsers as required. For example, we might use ID if we add an extension to our parser for supporting unquoted string identifiers as a non-standard JSON extension.

-
+

Recursive Rules

@@ -330,8 +330,8 @@

Recursive Rules

This rule takes a buf : Buffer.t as an argument. If we reach the terminating double quote ", then we return the contents of the buffer as a STRING.

The other cases are for handling the string contents. The action [^ '"' '\\']+ { ... } matches normal input that does not contain a double quote or backslash. The actions beginning with a backslash \ define what to do for escape sequences. In each of these cases, the final step includes a recursive call to the lexer.

That covers the lexer. Next, we need to combine the lexer with the parser to bring it all together.       

-
-

Handling Unicode

+
+

Handling Unicode

We’ve glossed over an important detail here: parsing Unicode characters to handle the full spectrum of the world’s writing systems. OCaml has several third-party solutions to handling Unicode, with varying degrees of flexibility and complexity:

  • Uutf is a nonblocking streaming Unicode codec for OCaml, available as a standalone library. It is accompanied by the Uunf text normalization and Uucd Unicode character database libraries. There is also a robust parser for JSON available that illustrates the use of Uutf in your own libraries.

  • @@ -339,7 +339,7 @@

    Handling Unicode

  • sedlex is a lexer generator for Unicode that can serve as a Unicode-aware replacement for ocamllex.

All of these libraries are available via opam under their respective names.

-
+
diff --git a/platform.html b/platform.html index 5cb7dc359..39f738b76 100644 --- a/platform.html +++ b/platform.html @@ -2,8 +2,8 @@

The OCaml Platform

So far in Part II, we’ve gone through a number of libraries and techniques you can use to build larger scale OCaml programs. We’ll now wrap up this part by examining the tools you can use for editing, compiling, testing, documenting and publishing your own projects.

The OCaml community has developed a suite of modern tools to interface it with IDEs such as Visual Studio Code, and to generate API documentation and implement modern software engineering practices such as continuous integration (CI) and unit or fuzz testing. All you need to do is to specify your project metadata (for example, library dependencies and compiler versions), and the OCaml Platform tools that we’ll describe next will do much of the heavy lifting.

-
-

Using the Opam Source-Based Package Manager

+
+

Using the Opam Source-Based Package Manager

opam is the official package manager and metadata packaging format that is used in the OCaml community. We’ve been using it in earlier chapters to install OCaml libraries, and we’re going to take a closer look at how to use opam within a full project next. You’ve almost certainly done this already at this point in the book, but in case you’ve skipped straight to this chapter make sure you first initialize opam’s global state. 

$ opam init
@@ -16,7 +16,7 @@

Using the Opam Source-Based > default ocaml.4.13.1 default

-
+

A Hello World OCaml Project

Let’s start by creating a sample OCaml project and navigating around it. Dune has a basic built-in command to initialize a project template that is suitable to get us started. 

@@ -218,11 +218,11 @@

Using Visual Studio Code

opam install ocaml-lsp-server

Once installed, the VSCode OCaml plugin will ask you which opam switch to use. Just the default one should be sufficient to get you going with building and browsing your interfaces.

-
-

What Is The Language Server Protocol?

+
+

What Is The Language Server Protocol?

The Language Server Protocol defines a communications standard between an editor or IDE and a language-specific server that provides features such as auto-completion, definition search, reference indexing and other facilities that require specialized support from language tooling. This allows a programming language toolchain to implement all this functionality just once, and then integrate cleanly into the multiplicity of IDE environments available these days – and even go beyond conventional desktop environments to web-based notebooks such as Jupyter.

Since OCaml has a complete and mature LSP server, you’ll find that an increasing number of IDEs will just support it out of the box once you install the ocaml-lsp-server. It integrates automatically with the various tools we’ve used in this book, such as detecting opam switches, invoking dune rules, and so on.

-
+

Browsing Interface Documentation

@@ -282,8 +282,8 @@

Publishing Your Code Online

Defining Opam Packages

The only metadata file that is really required to participate in the open-source OCaml ecosystem is an opam file in your source tree. Each opam file defines a package – a collection of OCaml libraries and executable binaries or application data. Each opam package can define dependencies on other opam packages, and includes build and testing directions for your project. This is what’s installed when you eventually publish the package and someone else types in opam install hello.

A collection of opam files can be stored in an opam repository to create a package database, with a central one for the OCaml ecosystem available at https://github.com/ocaml/opam-repository. The official (but not exclusive) tool used for manipulating opam files is the eponymous opam package manager that we’ve been using throughout this book.

-
-

How Do We Name OCaml Modules, Libraries and Packages?

+
+

How Do We Name OCaml Modules, Libraries and Packages?

Much of the time, the module, library, and package names are all the same. But there are reasons for these names to be distinct as well:

  • Some libraries are exposed as multiple top-level modules, which means you need a different name for that collection of modules.
  • @@ -291,7 +291,7 @@

    How Do We Name OCam
  • Package names might differ from library names if a package combines multiple libraries and/or binaries together.

It’s important to understand the difference between modules, libraries and packages as you work on bigger projects. These can easily have thousands of modules, hundreds of libraries and dozens of opam packages in a single codebase.

-
+

Generating Project Metadata from Dune

@@ -406,8 +406,8 @@

Releasing Your Code into the Opam Repository

This will begin an interactive session where you will need to enter some GitHub authentication details (via creating a personal access token). Once that is completed, the tool will run all local tests, generate documentation and upload it to your GitHub pages branch for that project, and finally offer to open a pull request to the central opam-repository. Recall that the central opam package set is all just a normal git repository, and so your opam file will be added to that and your GitHub account will create a PR.

At this point, you can sit back and relax while the central opam repository test system runs your package through a battery of installations (including on exotic architectures you might not access to, such as S390X mainframes or 32-bit ARMv7). If there is a problem detected, some friendly maintainers from the OCaml community will comment on the pull request and guide you through how to address it. You can simply delete the git tag and re-run the release process until the package is merged. Once it is merged, you can navigate to the <ocaml.org> site and view it online in an hour or so. It will also be available in the central repository for other users to install.

-
-

Creating Lock Files for Your Projects

+
+

Creating Lock Files for Your Projects

Before you publish a project, you might also want to create an opam lock file to include with the archive. A lock file records the exact versions of all the transitive opam dependencies at the time you generate it. All you need to do is to run:

opam lock
@@ -420,7 +420,7 @@ 

Creating Lock Files for Your Proj

Lock files are an optional but useful step to take when releasing your project to the Internet.

-
+
diff --git a/records.html b/records.html index 3ec4feb90..506fa015a 100644 --- a/records.html +++ b/records.html @@ -136,8 +136,8 @@

Patterns and Exhaustiveness

It’s a good idea to enable the warning for incomplete record matches and to explicitly disable it with an _ where necessary.

-
-

Compiler Warnings

+
+

Compiler Warnings

The OCaml compiler is packed full of useful warnings that can be enabled and disabled separately. These are documented in the compiler itself, so we could have found out about warning 9 as follows:

ocaml -warn-help | egrep '\b9\b'
@@ -149,7 +149,7 @@ 

Compiler Warnings

The warnings used for building the examples in this book are specified with the following flag: -w @A-4-33-40-41-42-43-34-44.

The syntax of -w can be found by running ocaml -help, but this particular invocation turns on all warnings as errors, disabling only the numbers listed explicitly after the A.

Treating warnings as errors (i.e., making OCaml fail to compile any code that triggers a warning) is good practice, since without it, warnings are too often ignored during development. When preparing a package for distribution, however, this is a bad idea, since the list of warnings may grow from one release of the compiler to another, and so this may lead your package to fail to compile on newer compiler releases.

-
+

Field Punning

diff --git a/runtime-memory-layout.html b/runtime-memory-layout.html index 0a92bc1cb..e014bbc60 100644 --- a/runtime-memory-layout.html +++ b/runtime-memory-layout.html @@ -3,12 +3,12 @@

Memory Representation of Values

The FFI interface we described in Chapter 22, Foreign Function Interface hides the precise details of how values are exchanged across C libraries and the OCaml runtime. There is a simple reason for this: using this interface directly is a delicate operation that requires understanding a few different moving parts before you can get it right. You first need to know the mapping between OCaml types and their runtime memory representation. You also need to ensure that your code is interfacing correctly with OCaml runtime’s memory management.   

However, knowledge of the OCaml internals is useful beyond just writing foreign function interfaces. As you build and maintain more complex OCaml applications, you’ll need to interface with various external system tools that operate on compiled OCaml binaries. For example, profiling tools report output based on the runtime memory layout, and debuggers execute binaries without any knowledge of the static OCaml types. To use these tools effectively, you’ll need to do some translation between the OCaml and C worlds.  

Luckily, the OCaml toolchain is very predictable. The compiler minimizes the amount of optimization magic that it performs, and relies instead on its straightforward execution model for good performance. With some experience, you can know rather precisely where a block of performance-critical OCaml code is spending its time.

-
-

Why Do OCaml Types Disappear at Runtime?

+
+

Why Do OCaml Types Disappear at Runtime?

The OCaml compiler runs through several phases during the compilation process. The first phase is syntax checking, during which source files are parsed into abstract syntax trees (ASTs). The next stage is a type checking pass over the AST. In a validly typed program, a function cannot be applied with an unexpected type. For example, the print_endline function must receive a single string argument, and an int will result in a type error.    

Since OCaml verifies these properties at compile time, it doesn’t need to keep track of as much information at runtime. Thus, later stages of the compiler can discard and simplify the type declarations to a much more minimal subset that’s actually required to distinguish polymorphic values at runtime. This is a major performance win versus something like a Java or .NET method call, where the runtime must look up the concrete instance of the object and dispatch the method call. Those languages amortize some of the cost via “Just-in-Time” dynamic patching, but OCaml prefers runtime simplicity instead.    

We’ll explain this compilation pipeline in more detail in Chapter 25, The Compiler Frontend Parsing And Type Checking and Chapter 26, The Compiler Backend Byte Code And Native Code.

-
+

This chapter covers the precise mapping from OCaml types to runtime values and walks you through them via the toplevel. We’ll cover how these values are managed by the runtime later on in Chapter 24, Understanding The Garbage Collector.  

OCaml Blocks and Values

@@ -30,12 +30,12 @@

Distinguishing Integers and Pointers at Runtime

A value is treated as a memory pointer if its lowest bit is zero. A pointer value can still be stored unmodified despite this, since pointers are guaranteed to be word-aligned (with the bottom bits always being zero).

The only problem that remains with this memory representation is distinguishing between pointers to OCaml values (which should be followed by the GC) and pointers into the system heap to C values (which shouldn’t be followed).

The mechanism for this is simple, since the runtime system keeps track of the heap blocks it has allocated for OCaml values. If the pointer is inside a heap chunk that is marked as being managed by the OCaml runtime, it is assumed to point to an OCaml value. If it points outside the OCaml runtime area, it is treated as an opaque C pointer to some other system resource.   

-
-

Some History About OCaml’s Word-Aligned Pointers

+
+

Some History About OCaml’s Word-Aligned Pointers

The alert reader may be wondering how OCaml can guarantee that all of its pointers are word-aligned. In the old days, when RISC chips such as Sparc, MIPS, and Alpha were commonplace, unaligned memory accesses were forbidden by the instruction set architecture and would result in a CPU exception that terminated the program. Thus, all pointers were historically rounded off to the architecture word size (usually 32 or 64 bits).  

Modern CISC processors such as the Intel x86 do support unaligned memory accesses, but the chip still runs faster if accesses are word-aligned. OCaml therefore simply mandates that all pointers be word-aligned, which guarantees that the bottom few bits of any valid pointer will be zero. Setting the bottom bit to a nonzero value is a simple way to mark an integer, at the cost of losing that single bit of precision.

An even more alert reader will be wondering about the performance implications are for integer arithmetic using this tagged representation. Since the bottom bit is set, any operation on the integer has to shift the bottom bit right to recover the “native” value. The native code OCaml compiler generates efficient x86 assembly code in this case, taking advantage of modern processor instructions to hide the extra shifts where possible. Addition is a single LEA x86 instruction, subtraction can be two instructions, and multiplication is only a few more.

-
+
@@ -154,11 +154,11 @@

Variants and Lists

In the preceding example, the Apple and Kiwi values are still stored as normal OCaml integers with values 0 and 1, respectively. The Orange and Pear values both have parameters and are stored as blocks whose tags ascend from 0 (and so Pear has a tag of 1, as the use of Obj.tag verifies). Finally, the parameters are fields that contain OCaml values within the block, and Obj.field can be used to retrieve them.

Lists are stored with a representation that is exactly the same as if the list was written as a variant type with Nil and Cons. The empty list [] is an integer 0, and subsequent blocks have tag 0 and two parameters: a block with the current value, and a pointer to the rest of the list.    

-
-
Obj Module Considered Harmful
+
+
Obj Module Considered Harmful

Obj is an undocumented module that exposes the internals of the OCaml compiler and runtime. It is very useful for examining and understanding how your code will behave at runtime but should never be used for production code unless you understand the implications. The module bypasses the OCaml type system, making memory corruption and segmentation faults possible.

Some theorem provers such as Coq do output code that uses Obj internally, but the external module signatures never expose it. Unless you too have a machine proof of correctness to accompany your use of Obj, stay away from it except for debugging!

-
+

Due to this encoding, there is a limit around 240 variants with parameters that applies to each type definition, but the only limit on the number of variants without parameters is the size of the native integer (either 31 or 63 bits). This limit arises because of the size of the tag byte, and that some of the high-numbered tags are reserved.

diff --git a/testing.html b/testing.html index 0a486efcc..b586d751f 100644 --- a/testing.html +++ b/testing.html @@ -101,12 +101,12 @@

Where Should Tests Go?

  • Testing mindset. Writing tests on the inside of your libraries lets you write tests against any part of your implementation, rather than just the exposed API. This freedom is useful, but can also put you in the wrong testing mindset. Testing that’s phrased in terms of the public API often does a better job of testing what’s fundamental about your code, and will better survive refactoring of the implementation. Also, the discipline of keeping tests outside of requires you to write code that can be tested that way, which pushes towards better designs.

  • For all of these reasons, our recommendation is to put the bulk of your tests in test-only libraries created for that purpose. There are some legitimate reasons to want to put some test directly in your production library, e.g., when you need access to some functionality to do the test that’s important but is really awkward to expose. But such cases are very much the exception.

    -
    -

    Why Can’t Inline Tests Go in Executables?

    +
    +

    Why Can’t Inline Tests Go in Executables?

    We’ve only talked about putting tests into libraries. What about executables? It turns out you can’t do this directly, because Dune doesn’t support the inline_tests declaration in source files that are directly part of an executable.

    There’s a good reason for this: the ppx_inline_test test runner needs to instantiate the modules that contain the tests. If those modules have toplevel side-effects, that’s a recipe for disaster, since you don’t want those top-level effects to be triggered by the test framework.

    So, how do we test code that’s part of an executable? The solution is to break up your program into two pieces: a directory containing a library that contains the logic of your program, but no top-level effects; and a directory for the executable that links in the library, and is responsible for launching the code.

    -
    +
    @@ -122,12 +122,12 @@

    Basic Mechanics

    let%expect_test "trivial" = print_endline "Hello World!"
    -
    -

    open and open!

    +
    +

    open and open!

    In this example, we use open! instead of open because we happen not to be using any values from Base, and so the compiler will warn us about an unused open.

    But because Base is effectively our standard library, we want to keep it open anyway, since we want any new code we write to find Base’s modules rather than those from the ordinary standard library. The exclamation point at the end of open suppresses that warning.

    A sensible idiom is to always use open! when opening a library like Base, so that you don’t have to choose when to use the !, and when not to.

    -
    +

    If we run the test, we’ll be presented with a diff between what we wrote, and a corrected version of the source file that now has an [%expect] clause containing the output. Note that Dune will use the patdiff tool if it’s available, which generates easier-to-read diffs. You can install patdiff with opam.

    dune runtest
    @@ -224,8 +224,8 @@ 

    Exploratory Programming

    let hrefs = get_href_hosts soup in print_s [%sexp (hrefs : Set.M(String).t)]
    -
    -

    Quoted Strings

    +
    +

    Quoted Strings

    The example above used a new syntax for string literals, called quoted strings. Here’s an example.

    {|This is a quoted string|};;
    @@ -246,7 +246,7 @@ 

    Quoted Strings

    >- : string = "This is how you quote a {|quoted string|}"
    -
    +

    If we run the test, we’ll see that the output isn’t exactly what was intended.

    dune runtest
    diff --git a/toc.html b/toc.html
    index b2fea1040..0f8e78ee8 100644
    --- a/toc.html
    +++ b/toc.html
    @@ -1 +1 @@
    -Real World OCaml

    Real World OCaml

    2nd Edition (Oct 2022)
    \ No newline at end of file +Real World OCaml

    Real World OCaml

    2nd Edition (Oct 2022)
    \ No newline at end of file diff --git a/variables-and-functions.html b/variables-and-functions.html index b4ba55823..23d27b38d 100644 --- a/variables-and-functions.html +++ b/variables-and-functions.html @@ -83,12 +83,12 @@

    Variables

    Here, we redefined pi to be zero after the definition of area_of_circle. You might think that this would mean that the result of the computation would now be zero, but in fact, the behavior of the function is unchanged. That’s because the original definition of pi wasn’t changed; it was just shadowed, which means that any subsequent reference to pi would see the new definition of pi as 0, but earlier references would still see the old one. But there is no later use of pi, so the binding of pi to 0. made no difference at all. This explains the warning produced by the toplevel telling us that there is an unused variable.

    In OCaml, let bindings are immutable. There are many kinds of mutable values in OCaml, which we’ll discuss in Chapter 8, Imperative Programming, but there are no mutable variables.

    -
    -

    Why Don’t Variables Vary?

    +
    +

    Why Don’t Variables Vary?

    One source of confusion for people new to OCaml is the fact that variables are immutable. This seems pretty surprising even on linguistic terms. Isn’t the whole point of a variable that it can vary? 

    The answer to this is that variables in OCaml (and generally in functional languages) are really more like variables in an equation than a variable in an imperative language. If you think about the mathematical identity x(y + z) = xy + xz, there’s no notion of mutating the variables x, y, and z. They vary in the sense that you can instantiate this equation with different numbers for those variables, and it still holds.

    The same is true in a functional language. A function can be applied to different inputs, and thus its variables will take on different values, even without mutation.

    -
    +

    Pattern Matching and Let

    Another useful feature of let bindings is that they support the use of patterns on the left-hand side. Consider the following code, which uses List.unzip, a function for converting a list of pairs into a pair of lists.   

    @@ -178,8 +178,8 @@

    Anonymous Functions

    This is the most common and convenient way to declare a function, but syntactic niceties aside, the two styles of function definition are equivalent.

    -
    -

    let and fun

    +
    +

    let and fun

    Functions and let bindings have a lot to do with each other. In some sense, you can think of the parameter of a function as a variable being bound to the value passed by the caller. Indeed, the following two expressions are nearly equivalent.  

    (fun x -> x + 1) 7;;
    @@ -189,7 +189,7 @@ 

    let and fun

    This connection is important, and will come up more when programming in a monadic style, as we’ll see in Chapter 16, Concurrent Programming With Async.

    -
    +

    Multiargument Functions

    @@ -415,8 +415,8 @@

    Prefix and Infix Operators

    The type error is a little bewildering at first glance. What’s going on is that, because ^> is right associative, the operator is trying to feed the value List.dedup_and_sort ~compare:String.compare to the function List.iter ~f:print_endline. But List.iter ~f:print_endline expects a list of strings as its input, not a function.

    The type error aside, this example highlights the importance of choosing the operator you use with care, particularly with respect to associativity.

    -
    -

    The Application Operator

    +
    +

    The Application Operator

    |> is known as the reverse application operator. You might be unsurprised to learn that there’s also an application operator:  

    (@@);;
    @@ -424,7 +424,7 @@ 

    The Application Operator

    This one is useful for cases where you want to avoid many layers of parentheses when applying functions to complex expressions. In particular, you can replace f (g (h x)) with f @@ g @@ h x. Note that, just as we needed |> to be left associative, we need @@ to be right associative.

    -
    +

    Declaring Functions with function

    diff --git a/variants.html b/variants.html index 8431be54c..61060c3d5 100644 --- a/variants.html +++ b/variants.html @@ -104,8 +104,8 @@

    Variants

    >- : unit = () -
    -

    Variants, Tuples and Parens

    +
    +

    Variants, Tuples and Parens

    Variants with multiple arguments look an awful lot like tuples. Consider the following example of a value of the type color we defined earlier.

    RGB (200,0,200);;
    @@ -159,7 +159,7 @@ 

    Variants, Tuples and Parens

    The differences between a multi-argument variant and a variant containing a tuple are mostly about performance. A multi-argument variant is a single allocated block in memory, while a variant containing a tuple requires an extra heap-allocated block for the tuple. You can learn more about OCaml’s memory representation in Chapter 23, Memory Representation of Values.

    -
    +

    Catch-All Cases and Refactoring

    OCaml’s type system can act as a refactoring tool, warning you of places where your code needs to be updated to match an interface change. This is particularly valuable in the context of variants.      

    @@ -608,8 +608,8 @@

    Polymorphic Variants

    Here, the inferred type states that the tags can be no more than `Float, `Int, and `Not_a_number, and must contain at least `Float and `Int. As you can already start to see, polymorphic variants can lead to fairly complex inferred types.

    -
    -

    Polymorphic Variants and Catch-All Cases

    +
    +

    Polymorphic Variants and Catch-All Cases

    As we saw with the definition of is_positive, a match expression can lead to the inference of an upper bound on a variant type, limiting the possible tags to those that can be handled by the match. If we add a catch-all case to our match expression, we end up with a type with a lower bound.      

    let is_positive_permissive = function
    @@ -631,7 +631,7 @@ 

    Polymorphic Variants and Catch

    With ordinary variants, such a typo would have been caught as an unknown tag. As a general matter, one should be wary about mixing catch-all cases and polymorphic variants.

    -
    +

    Example: Terminal Colors Redux

    To see how to use polymorphic variants in practice, we’ll return to terminal colors. Imagine that we have a new terminal type that adds yet more colors, say, by adding an alpha channel so you can specify translucent colors. We could model this extended set of colors as follows, using an ordinary variant: