From 699771ee06228c2168ebcaad54cba663063b7a19 Mon Sep 17 00:00:00 2001 From: Sabine Schmaltz Date: Tue, 21 Nov 2023 14:36:51 +0100 Subject: [PATCH] editing --- data/tutorials/language/0lg_11_imperative.md | 442 +++++++++++-------- 1 file changed, 255 insertions(+), 187 deletions(-) diff --git a/data/tutorials/language/0lg_11_imperative.md b/data/tutorials/language/0lg_11_imperative.md index 899144f1ce..18dac6b077 100644 --- a/data/tutorials/language/0lg_11_imperative.md +++ b/data/tutorials/language/0lg_11_imperative.md @@ -1,9 +1,9 @@ --- -id: imperative -title: Imperative and Mutability +id: mutability-loops-and-imperative +title: Mutability, Loops, and Imperative Programming description: > Writing stateful programs in OCaml, mixing imperative and functional style -category: "Tutorials" +category: "Introduction" --- - - -# Imperative and Mutability - -## Introduction This document has two main teaching goals: 1. Writing imperative code in OCaml 1. Combining and balancing imperative and functional code +--> -In OCaml, you can write code in imperative style without compromising on type and memory safety. In the first part of this tutorial, imperative programming in OCaml is introduced. -Imperative and functional programming both have unique merits; OCaml allows combining them efficiently. See the second part of this tutorial for examples of recommended, handle-with-care or inadvisable patterns. +# Mutability, Loops, and Imperative Programming -**Prerequisites**: This is an intermediate-level tutorial. You should have completed the [Basic Data Types](/docs/basic-data-types), [Values and Functions](/docs/values-and-functions), [Lists](/docs/lists) and [Modules](/docs/modules) tutorials +Imperative and functional programming both have unique merits; OCaml allows combining them efficiently. In the first part of this tutorial, mutable state and imperative control flow in OCaml are introduced. See the second part of this tutorial for examples of recommended, situational or discouraged use of these features. -## Mutable Data +**Prerequisites**: [Basic Data Types](/docs/basic-data-types), [Values and Functions](/docs/values-and-functions), [Lists](/docs/lists) and [Modules](/docs/modules). -A name-value binding created using the `let … = …` construct is [immutable](https://en.wikipedia.org/wiki/Immutable_object), once added to the environment, it is impossible to replace the value or remove the name. +## Immutable vs Mutable Data -### References +When you use `let … = …` to bind a value to a name, this name-value binding is [immutable](https://en.wikipedia.org/wiki/Immutable_object): It is impossible to _mutate_ (which is a fancy term for "change", "update", or "modify") the value assigned to the name. -However, there is a kind of value that can be updated. That is called a _reference_ in OCaml. +In the following sections, we introduce OCaml's language features for dealing with _mutable_ state. + +## References + +There is a special kind of value, called _reference_, whose contents can be updated: ```ocaml # let a = ref 0;; val a : int ref = {contents = 0} @@ -53,55 +51,45 @@ val a : int ref = {contents = 0} - : int = 1 ``` -Here is what happens above: -1. The value `{ contents = 0 }` is bound to the name `a`. This is a normal definition, like any other definition, it is immutable. However, the value `0` inside `contents` can be updated. -3. The _assign_ operator `:=` is used to update the value inside `a` from 0 to 1. -4. The _dereference_ operator `!` is used to read the content inside the value `a`. +Here is what happens in this example: +1. The value `{ contents = 0 }` is bound to the name `a`. This is a normal definition. Like any other definition it is immutable. However, the value `0` in the `contents` field of `a` is _mutable_, i.e. it can be updated. +3. The _assignment_ operator `:=` is used to update the mutable value inside `a` from `0` to `1`. +4. The _dereference_ operator `!` reads the contents of the mutable value inside `a`. -The `ref` identifier denotes two different things: -* The type of mutable references: `'a ref`. +The `ref` identifier above refers to two different things: * The function `ref : 'a -> 'a ref` that creates a reference. +* The type of mutable references: `'a ref`. -The assign operator is just a function that takes: -1. The reference to be updated -1. The value that replaces the previous contents. +**Assignment Operator** -The update takes place as a [side effect](https://en.wikipedia.org/wiki/Side_effect_(computer_science)). ```ocaml # ( := );; - : 'a ref -> 'a -> unit = ``` -The dereference operator is also a function. It takes a reference and returns its contents. -```ocaml -# ( ! );; -- : 'a ref -> 'a = -``` - -Refer to the [Operators](/docs/operators) tutorial for more information on how unary and binary operators work in OCaml. - -The way OCaml handles mutable data has the following characteristics: -* It's impossible to create uninitialised references. -* No confusion between mutable content and the reference is possible: They have different syntax and type. +The assignment operator `:=` is just a function. It takes: +1. The reference to be updated +1. The value that replaces the previous contents. -### Mutable Fields +The update takes place as a [side effect](https://en.wikipedia.org/wiki/Side_effect_(computer_science)). -#### References Are Single Fields Records +**Dereference Operator** -The value `{ contents = 0 }` of type `int ref` frome the previous section not only looks like a record: It is a record. Having a look at the way the `ref` type is defined is enlightening: ```ocaml -# #show ref;; -external ref : 'a -> 'a ref = "%makemutable" -type 'a ref = { mutable contents : 'a; } +# ( ! );; +- : 'a ref -> 'a = ``` +The dereference operator is a function that takes a reference and returns its contents. -Starting from the bottom, the `'a ref` type is a record with a single field `contents` which is marked with the `mutable` keyword. This means that the field can be updated. +Refer to the [Operators](/docs/operators) documentation for more information on how unary and binary operators work in OCaml. -The `external ref : 'a -> 'a ref = "%makemutable"` means the function `ref` is not written in OCaml, but that is an implementation detail we do not care about in this tutorial. If interested, check the [Calling C Libraries](/docs/calling-c-libraries) tutorial to learn how to use the foreign function interface. +Working with mutable data in OCaml, +* it's impossible to create uninitialised references, and +* the mutable content and the reference have different syntax and type: no confusion betweem them is possible. -#### Any Record Can Have Mutable Fields +## Mutable Record Fields -Any field in a record can be tagged using the `mutable` keyword. +Any field in a record can be tagged using the `mutable` keyword, to declare that this field can be updated. ```ocaml # type book = { series : string; @@ -119,9 +107,9 @@ type book = { } ``` -For instance here is how a bookshop could track its book inventory: -* Fields `title`, `author`, `volume`, `series` are constants -* Field `stock` is mutable because this value changes with each sale or when restocking +For instance, here is how a bookshop could track its book inventory: +* Fields `title`, `author`, `volume`, `series` are constants. +* Field `stock` is mutable because this value changes with each sale or when restocking. Such a database should have an entry like this: ```ocaml @@ -137,7 +125,7 @@ val vol_7 : book = author = "Martha Wells"; stock = 0} ``` -When the bookshop receives the delivery of 10 of these books, here is how the data update can be done: +When the bookshop receives a delivery of 10 of these books, we update the mutable `stock` field: ```ocaml # vol_7.stock <- vol_7.stock + 10;; - : unit = () @@ -148,11 +136,26 @@ When the bookshop receives the delivery of 10 of these books, here is how the da author = "Martha Wells"; stock = 10 } ``` -Mutable record field update is a side effect performed using the left arrow symbol `<-`. In the expression `vol_7.stock <- vol_7.stock + 7` the meaning of `vol_7.stock` depends on its context: -* In the left-hand side of `<-` it refers to the mutable field to be updated -* In the right-hand side of `<-`, it denotes the contents of the field +Mutable record fields are updated using the left arrow symbol `<-`. In the expression `vol_7.stock <- vol_7.stock + 7` the meaning of `vol_7.stock` depends on its context: +* In the left-hand side of `<-`, it refers to the mutable field to be updated +* In the right-hand side of `<-`, it denotes the contents of the mutable field -Looking at references again, we can define functions `assign` and `deref`: +In contrast to references, there is no special syntax to dereference a mutable record field. + +### References Are Single Field Records + +In OCaml, references are records with a single mutable field: +```ocaml +# #show ref;; +external ref : 'a -> 'a ref = "%makemutable" +type 'a ref = { mutable contents : 'a; } +``` + +The type `'a ref` is a record with a single field `contents` which is marked with the `mutable` keyword. + +The line `external ref : 'a -> 'a ref = "%makemutable"` means the function `ref` is not written in OCaml, but that is an implementation detail we do not care about in this tutorial. If interested, check the [Calling C Libraries](/docs/calling-c-libraries) tutorial to learn how to use the foreign function interface. + +Since references are single field records, we can define functions `assign` and `deref` using the mutable record field update syntax: ```ocaml # let assign a x = a.contents <- x;; val assign : 'a ref -> 'a -> unit = @@ -169,22 +172,76 @@ val deref : 'a ref -> 'a = The function `assign` does the same as the operator `( := )`, while the function `deref` does the same as the `( ! )` operator. -#### Field Update _vs_ Record Copy -In this section, we compare two ways to implement a `get_char` function. It waits until a key is pressed and returns the character corresponding without echoing it. This function will also be used later on in this tutorial. +## Arrays + +In OCaml, an array is a mutable, fixed-size data structure that can store a sequence of elements of the same type. Arrays are indexed by integers and provide constant-time access and update of elements. + +```ocaml +# let a = [| 2; 3; 4; 5; 6; 7; 8 |];; +val a : int array = [|2; 3; 4; 5; 6; 7; 8|] + +# a.(0);; +- : int = 2 + +# a.(0) <- 9;; +- : unit = () + +# a.(0);; +- : int = 9 +``` + +The arrow symbol `<-` is used to update an array element at a given index. The array index access syntax `a.(i)`, where `a` is a value of type `array` and `i` is an integer, stands for either + +* the array location to update (when on the left hand side of `<-`), or +* the cell's content (when on the right hand side of `<-`). + +For more a more detailed discussion of Arrays, see the [Arrays](/docs/arrays) tutorial. + +## Byte Sequences + +The `bytes` type in OCaml represents a mutable sequence of bytes. Each element in a `bytes` sequence is a character, but since characters in OCaml are represented as 8-bit bytes, a `bytes` value can effectively manage any sequence of bytes. + +```ocaml +# let b = Bytes.of_string "abcdefghijklmnopqrstuvwxyz";; +val b : bytes = Bytes.of_string "abcdefghijklmnopqrstuvwxyz" + +# Bytes.get b 10;; +- : char = 'k' + +# Bytes.set b 10 '_';; +- : unit = () + +# b;; +- : bytes = Bytes.of_string "abcdefghij_lmnopqrstuvwxyz" +``` + +Byte sequences can be created from `string` values using the function `Bytes.of_string`. Individual elements in the sequence can be updated or read by their index using `Bytes.set`, and, respectively `Bytes.get`. + +You can think of byte sequences as +* updatable strings that can't be printed, or +* `char array`s without syntactic sugar for indexed read and update. -This is using two functions from the `Unix` module. Both are used to access attributes of the terminal associated with standard input: + + +## Example: `get_char` Function + + + +In this section, we compare two ways to implement a `get_char` function. The function waits until a key is pressed and returns the corresponding character without echoing it. This function will also be used later on in this tutorial. + +We use two functions from the `Unix` module to read/update attributes of the terminal associated with standard input: * `tcgetattr stdin TCSAFLUSH` read and return them as a record (this is similar to `deref`) * `tcsetattr stdin TCSAFLUSH` update them (this is similar to `assign`) -These attributes need to be tweaked in order to do the reading the way we want. The logic is the same in both implementations: +These attributes need to be set correctly (i.e. turn of echoing and disable canonical mode) in order to do the reading the way we want. The logic is the same in both implementations: 1. Read the terminal attributes -1. Tweak the terminal attributes +1. Set the terminal attributes 1. Wait until a key is pressed, read it as a character 1. Restore the initial terminal attributes 1. Return the read character -The actual read is done using the `input_char` function from the standard library. +We read characters from standard input using the `input_char` function from the OCaml standard library. Here is the first implementation: ```ocaml @@ -202,9 +259,9 @@ Here is the first implementation: c;; val get_char : unit -> char = ``` -In this implementation, the update of the `termio` fields takes place twice. -* Before `input_char`, both are set to `false` -* After `input_char`, initial values are restored +In this implementation, we update the fields of `termio` +* before `input_char` (setting both `c_icanon` and `c_echo` to `false`), and +* after `input_char` (restoring the initial values). Here is the second implementation: ```ocaml @@ -219,117 +276,108 @@ Here is the second implementation: val get_char : unit -> char = ``` -In this implementation, the record returned by the call to `tcgetattr` isn't updated. A copy is made using `{ termio with c_icanon = false; c_echo = false }`. That copy only differs from the read `termio` value on fields `c_icanon` and `c_echo`, that's the meaning of `termio with …` +In this implementation, the record returned by the call to `tcgetattr` is not modified. A copy is made using `{ termio with c_icanon = false; c_echo = false }`. This copy only differs from the `termio` value on fields `c_icanon` and `c_echo`, that's the meaning of `termio with …` -That allows the second call to `tcsetattr` to restore terminal attributes to their initial state without explicitly reading them. +In the second call to `tcsetattr`, we restore the terminal attributes to their initial state without explicitly reading them. -### Arrays and Bytes Sequences +## Imperative Control Flow -#### Arrays +OCaml provides a sequence operator `;` that allows to chain expressions, as well as `for` loops and `while` loops which execute a block of code repeatedly. -```ocaml -# let a = [| 2; 3; 4; 5; 6; 7; 8 |];; +### Sequence Operator -# a.(0);; -- : int = 2 +The `;` operator is known as the _sequence operator_. It allows you to evaluate multiple expressions in order, with the value of the last expression being the value of the entire sequence. -# a.(0) <- 9;; -- : unit = () +The values of any previous expressions are discarded. Thus, it makes sense to use expressions with side effects, except for the last expression of the sequence which could be free of side effects. -# a.(0);; -- : int = 9 +```ocaml +# let _ = + print_endline "Hello,"; + print_endline "world!"; + 42;; +Hello, +world! +- : int = 42 ``` +In this example, the first two expressions are `print_endline` function calls, which produce side effects (printing to the console), and the last expression is simply the integer `42`, which becomes the value of the entire sequence. The `;` operator is used to separate these expressions. -The update symbol `<-` used for fields is also used to update an array's cell content. The semantics of `a.(i)` work as field update: -* When on the left of `<-`, it denotes which cell to update. -* When on the right of `<-`, it denotes a cell's content. - -Arrays are covered in detail in a [dedicated](/docs/arrays) tutorial. - -#### Byte Sequences +### For Loop +A `for` loop with syntax ```ocaml -# let b = Bytes.of_string "abcdefghijklmnopqrstuvwxyz";; - -# Bytes.get b 10;; -- : char = 'k' - -# Bytes.set b 10 '_';; -- : unit = () - -# Bytes.get b 10;; -- : int = '_' +for i = start_value to end_value do body done ``` +has a loop variable `i` that starts at `start_value` and is incremented until it reaches `end_value`, evaluating the `body` expression (which may contain `i`) on every iteration. Here, `for`, `to`, `do`, and `done` are keywords used to declare the loop. -Byte sequences can be seen in two equivalent ways -* Updatable strings that can't be printed -* `char array` without syntactic sugar for indexed read and update - -## Imperative Iteration +The type of a `for` loop expression is `unit`. -### For Loop - -In OCaml, a for loop is an expression of type `unit`. ```ocaml -# for i = 0 to Array.length a - 1 do Printf.printf "%i\n" a.(i) done;; -9 +# for i = 0 to 5 do Printf.printf "%i\n" i done;; +0 +1 +2 3 4 5 -6 -7 -8 - : unit = () ``` -For loops are convenient to iterate over arrays. +Here, `0` is the start value and `5` is the end value that the loop variable `i`, which is incremented after every iteration, will take on. + +The body of a `for` loop must be an expression of type `unit`: ```ocaml +# for i = Array.length a - 1 downto 0 do 0 done;; +Line 1, characters 39-40: +Warning 10 [non-unit-statement]: this expression should have type unit. +- : unit = () +``` + +When you use the `downto` keyword (instead of the `to` keyword), the counter decreases on every iteration of the loop. + +For example, `for` loops are convenient to iterate over and modify arrays: +```ocaml +# let a = [| 2; 3; 4; 5; 6; 7; 8 |];; +val a : int array = [|2; 3; 4; 5; 6; 7; 8|] + # let sum = ref 0 in for i = 0 to Array.length a - 1 do sum := !sum + a.(i) done; !sum;; - : int = 42 ``` -**Note: Here is how to do the same thing using an iterator function: +**Note:** Here is how to do the same thing using an iterator function: ```ocaml # let sum = ref 0 in Array.iter (fun i -> sum := !sum + i) a; !sum;; - : int = 42 ``` -The body of a for loop must be an expression of type `unit`. +### While Loop + +A `while` loop has syntax ```ocaml -# for i = Array.length a - 1 downto 0 do 0 done;; -Line 1, characters 39-40: -Warning 10 [non-unit-statement]: this expression should have type unit. -- : unit = () +while condition do body done ``` +and continues to execute the `body` expression as long as `condition` remains true. Here, `while`, `do`, and `done` are keywords used to declare the while loop. -The `downto` keyword allows the counter to decrease during the loop. - -### While Loop +The type of a `while` loop expression is `unit`. -While loops are expressions too. ```ocaml -# let u = [9; 8; 7; 6; 5; 4; 3; 2; 1];; - -# let sum, u = ref 0, ref u in - while !u <> [] do - sum := !sum + List.hd !u; - u := List.tl !u - done; - !sum;; -- : int = 45 +# let i = ref 0 in + while !i < 5 do + print_int !i; + i := !i + 1; (* This will print numbers from 0 to 4 *) + done;; +01234- : unit = () ``` -There are no repeat loops in OCaml. +In this example, the `while` loop continues to execute as long as the value held by the reference `i` is less than `5`. -### Breaking Out From a Loop +### Breaking Loops Using Exceptions -There is no break instruction in OCaml. Throwing the `Exit` exception is the recommended way to exit immediately from a loop. +Throwing the `Exit` exception is a recommended way to exit immediately from a loop. -This requires using a `get_char` function as defined in [Field Update _vs_ Record Copy](#field-update-vs-record-copy) section. +The following example uses the `get_char` function we defined earlier (in the section [Example: `get_char` Function](#example-getchar-function)). -The following loop echoes characters typed on the keyboard, as long as they are different from `Escape`. ```ocaml # try print_endline "Press Escape to exit"; @@ -341,14 +389,15 @@ The following loop echoes characters typed on the keyboard, as long as they are done with Exit -> ();; ``` +This `while` loop echoes characters typed on the keyboard. When the ASCII `Escape` character is read, the `Exit` exception is thrown. -## Recommendations on Mixing Functional and Imperative Programming +## Recommendations for Mutable State and Side Effects -Functional and imperative programming are often mixed. However, not all mix creates good results. Some patterns and anti-patterns are listed in this section. +Functional and imperative programming styles are often used together. However, not all ways of combining them give good results. We show some patterns and anti-patterns relating to mutable state and side effects in this section. ### Good: Function-Encapsulated Mutability -Here is a possible way to compute the sum of an array of integers. +Here is a function that computes the sum of an array of integers. ```ocaml # let sum a = let result = ref 0 in @@ -359,25 +408,25 @@ Here is a possible way to compute the sum of an array of integers. val sum : int array -> int = ``` -Function `sum` is written in the imperative style, using mutable data structures. That's an encapsulated implementation choice. No mutability is exposed, That is just fine. +The function `sum` is written in imperative style, using mutable data structures and a `for` loop. However, no mutability is exposed: it is a fully encapsulated implementation choice. This function is safe to use, no problems are to be expected. -### Good: Application Wide State +### Good: Application-Wide State -Some applications maintain a state while they are running. Here is a couple of examples: -- A REPL, the state is the environment. In OCaml it is append-only, but some languages allow reset or removals. -- A server for a stateful protocol. Each session has a state, the global state is the conjunction of all the session states. -- A text editor. The state includes the commands (to allow undo), the settings, what is displayed and the file. -- Any cache. +Some applications maintain some state while they are running. Here are a couple of examples: +- A Read-Eval-Print-Loop (REPL). The state is the environment where values are bound to names. In OCaml, the environment is append-only, but some languages allow replacing or removing name-value bindings. +- A server for a stateful protocol. Each session has a state, the global state consists of all the session states. +- A text editor. The state includes the most recent commands (to allow undo), state of any open files, the settings, and the state of the UI. +- A cache. -The following is a toy line editor, using the `get_char` function [defined earlier](#field-update-vs-record-copy). It waits for characters on standard input and exits on end-of-file, carriage return or newline. Otherwise, if the character is printable, it prints it and records it in a mutable list used as a stack. If the character is the delete code, the stack is popped and the last printed character is erased. +The following is a toy line editor, using the `get_char` function [defined earlier](#example-getchar-function). It waits for characters on standard input and exits on end-of-file, carriage return or newline. Otherwise, if the character is printable, it prints it and records it in a mutable list used as a stack. If the character is the delete code, the stack is popped and the last printed character is erased. ```ocaml # let record_char state c = (String.make 1 c, c :: state);; -val record_char : char list -> char -> char list = +val record_char : char list -> char -> string * char list = # let remove_char state = ("\b \b", if state = [] then [] else List.tl state);; -val remove_char : 'a list -> 'a list = +val remove_char : 'a list -> string * 'a list = # let state_to_string state = List.(state |> rev |> to_seq |> String.of_seq);; @@ -399,16 +448,16 @@ val loop : char list ref -> 'a = # let state = ref [] in try loop state with Exit -> state_to_string !state;; ``` -This is not a production-grade code. However, it illustrates the following: -- State handling functions `record_char` and `remove_char` don't update the state or produce side effects, they only produce data -- I/O and state update side effects are confined to the `loop`` function +This example illustrates the following: +- The functions `record_char` and `remove_char` neither update the state nor produce side effects. Instead, they each return a pair of values consisting of a string to print, and the next state `new_state`. +- I/O and state update side effects happen inside the `loop` function - The state is passed as a parameter to the `loop` function -This is the idea of a possible way to handle an application-wide state. As in the [Function-Encapsulated Mutability](#good-function-encapsulated-mutability) state aware code is contained in a narrow scope, the rest of the code is functional. +This is one possible way to handle application-wide state. As in the [Function-Encapsulated Mutability](#good-function-encapsulated-mutability) example, state-aware code is contained in a narrow scope, the rest of the code is purely functional. -**Note**: Here, the state is copied, which is not memory efficient. In a memory-aware implementation, state update functions would produce “diffs” (data describing the difference between the old and updated versions of the state). +**Note**: Here, the state is copied, which is not memory efficient. In a memory-aware implementation, state update functions would produce a “diff” (data describing the difference between the old and updated version of the state). -### Good: Memoization +### Good: Precomputing Values Let's imagine you store angles as fractions of the circle in 8-bit unsigned integers, storing them as `char` values. In this system, 64 is 90 degrees, 128 is 180 degrees, 192 is 270 degrees, 256 is full circle and so on. If you need to compute cosine on those values, an implementation might look like this: ```ocaml @@ -429,26 +478,34 @@ val char_cos_tab : float array = val char_cos : char -> float = ``` -The [memoization](https://en.wikipedia.org/wiki/Memoization) technique relies on the same idea: get the results from a table recording previously computed values. +### Good: Memoization + +The [memoization](https://en.wikipedia.org/wiki/Memoization) technique relies on the same idea as the example from the previous section: look up results from a table of previously computed values. + +However, instead of precomputing everything, memoization uses a cache that is populated when calling the function. Either, the provied parameters +* are found in the cache (it is a hit) and the stored result is returned, or they +* are not found in the cache (it's a miss) and the result is computed, stored in the cache, and returned. + +You can find a concrete example of memoization and a more in-depth explanation in the chapter on [Memoization](https://cs3110.github.io/textbook/chapters/ds/memoization.html) of "OCaml Programming: Correct + Efficient + Beautiful". + + + +### Good: Functional by Default -However, instead of precomputing everything as in the example, memoization uses a cache. When calling the function, the input data is checked against it: -* If it's a hit, the stored result is returned -* If it's a miss, the result is computed, stored and returned +By default, OCaml programs should be written in a mostly functional style. This constitutes trying to avoid side-effects where possible, and relying on immutable state instead of mutable state. -Management of the cache is yet another application of [function-encapsulated mutability](#good-function-encapsulated-mutability). +It is possible to use an imperative programming style without losing the benefits of type and memory safety. However, it doesn't usually make sense to only program in an imperative style. Not using functional programming idioms at all would result in non-idiomatic OCaml code. -Refer to CS3110 or Real World OCaml for complete examples using caching: -1. https://cs3110.github.io/textbook/chapters/ds/memoization.html -1. https://dev.realworldocaml.org/imperative-programming.html +Most existing modules provide an interface meant to be used in functional way. Some would require the development and maintenance of [wrapper libraries](https://en.wikipedia.org/wiki/Wrapper_library) to be used in an imperative setting and such use would in many cases be inefficient. -### Acceptable: Module Bound State +### It Depends: Module State A module may expose or encapsulate a state in several different ways: 1. Good: Expose a type representing a state, with state creation or reset functions -1. Acceptable: Only expose state initialisation, this implies there only is a single state -1. Bad: No explicit initialisation function or no name referring state +1. It depends: Only expose state initialisation, this implies there only is a single state +1. Bad: Mutable state with no explicit initialisation function or no name referring to the mutable state -The [`Hashtbl`](/api/Hashtbl.html) module provides an interface of the first kind. It has the type `Hashtbl.t` representing mutable data, it also exposes `create`, `clear` and `reset` functions. Its functorial interface makes functor instantiation and hash table creation distinct operations. +For example, the [`Hashtbl`](/api/Hashtbl.html) module provides an interface of the first kind. It has the type `Hashtbl.t` representing mutable data, it also exposes `create`, `clear` and `reset` functions. That the `clear` and `reset` functions return `unit` is a strong signal to the reader that they perform the side-effect of updating the mutable data. ```ocaml #show Hashtbl.t;; @@ -464,13 +521,11 @@ type ('a, 'b) t = ('a, 'b) Hashtbl.t - : ('a, 'b) Hashtbl.t -> unit = ``` -TODO: optional arguments - On the other hand, a module may define mutable data internally impacting its behaviour without exposing it in its interface. This is inadvisable. -### Bad: Mutable in Disguise +### Bad: Undocumented Mutation -Consider this code: +Here's an example of bad code: ```ocaml # let partition p a = let b = Array.copy a in @@ -488,41 +543,49 @@ Consider this code: (Array.truncate a_len a, Array.truncate b_len b);; ``` -As an anti-pattern, this is purposely broken. Let's assume the function `Array.truncate` has type `int -> 'a array -> 'a array` and behaves such that `Array.truncate 3 [5; 6; 7; 8; 9]` returns `[5; 6; 7]` and that array physically corresponds to the 3 first cells of the input array. +**Note:** This example will not run in the REPL, since the function `Array.truncate` is not defined. -The type of `truncate` should be `('a -> bool) -> 'a array -> 'a array * 'a array` and it could be documented as: +To understand why this is bad code, assume that the function `Array.truncate` has type `int -> 'a array -> 'a array`. It behaves such that `Array.truncate 3 [5; 6; 7; 8; 9]` returns `[5; 6; 7]` and the returned array physically corresponds to the 3 first cells of the input array. + +The type of `partition` would be `('a -> bool) -> 'a array -> 'a array * 'a array` and it could be documented as: > `partition p a` returns a pair of arrays `(b, c)` where `b` is an array containing all the elements of `a` that satisfy the predicate `p`, and `c` is an array containing the elements of `a` that do not satisfy `p`. The order of the elements from the input array is preserved. -**Don't do that**. It looks like an application of the [Function-Encapsulated mutability](#good-function-encapsulated-mutability). But it is not, the input array is modified. This function has a side effect that is either not intended or not documented, in the latter case, the function should be named differently. +On first glance, this looks like an application of [Function-Encapsulated mutability](#good-function-encapsulated-mutability). However, it is not: the input array is modified. This function has a side effect that is either +* not intended, or +* not documented. + +In the latter case, the function should be named differently (e.g. `partition_in_place` or `partition_mut`) and the effect on the input array should be documented. + + +### Bad: Undocumented Side Effect -### Bad: Hidden Side Effects Consider this code: ```ocaml # module Array = struct include Stdlib.Array let copy a = - if Array.length a > 1 lsl 20 then Analytics.collect "Array.copy" a; + if Array.length a > 1000000 then Analytics.collect "Array.copy" a; copy a end;; -Error: Unbound value analytics_record +Error: Unbound module Analytics ``` -As an anti-pattern, this is purposely broken. A module called `Array` is defined, it shadows and includes the [`Stdlib.Array`](/api/Array.html) module. See the [Module Inclusion](docs/modules#module-inclusion) part of the [Modules](docs/modules) tutorial for details about this pattern. - -However, the newly defined `Array` module contains a `copy` function which has a side effect. If the array to copy has a million cells or above, it will create a network connection and transmit data. +**Note:** This code will not run because there is no module called `Analytics`. -**Don't do that**. Side effects and mutability are hard enough alone. Don't add complexity to the intrinsic complexity of the matter by hiding relevant information from the developers or maintainers. +A module called `Array` is defined, it shadows and includes the [`Stdlib.Array`](/api/Array.html) module. See the [Module Inclusion](docs/modules#module-inclusion) part of the [Modules](docs/modules) tutorial for details about this pattern. -### Bad: Imperative by Default +To understand why this code is bad, assume that `Analytics.collect` is a function that makes a network connection to transmit data to another server. -By default, OCaml programs should be written in a mostly functional style, the imperative style shouldn't be the default. It is possible to use the imperative style without losing the benefits of type and memory safety, it doesn't make sense to only use it. Not using functional programming idioms at all would result in a contrived and obfuscated style. +Now, the newly defined `Array` module contains a `copy` function which has a potentially unexpected side effect, but only if the array to copy has a million cells or above. -Additionally, most modules provide an interface meant to be used in functional way. Some would require the development and maintenance of [wrapper libraries](https://en.wikipedia.org/wiki/Wrapper_library) to be used in an imperative setting. That would be wasteful and brittle. +If you're writing functions with non-obvious side effects, don't shadow existing definitions: give the function a descriptive name and document the fact that there's a side-effect that the caller may not be aware of. ### Bad: Side Effects in Arguments @@ -535,13 +598,15 @@ val id_print : string -> string = wednesday tuesday monday val s : string = "monday tuesday wednesday " ``` -Functionally `id_print` is an identity function on `string`, it returns its input unchanged. However, it has a side effect: it prints each string it receives. Wrapping the parameters passed to `Printf.sprintf` into calls to `id_print` makes the side effects happen. +The function `id_print` returns its input unchanged. However, it has a side effect: it first prints the string it receives as an argument. -The order in which the `id_ print` side effects take place is unreliable. Parameters are evaluated from right to left, but this is not part of the definition of the OCaml language, this way the compiler is implemented, but it could change. +In the second line, we apply `id_print` to the arguments `"monday"`, `"tuesday"`, `"wednesday"`, respectively and apply `Printf.sprintf "%s %s %s "` to them. -There are several means to make sure computation takes place in a specified order. +Since the order of evaluation for function arguments in OCaml is not explicitly defined, the order in which the `id_ print` side effects take place is unreliable. In this example, the arguments are evaluated from right to left, but this could change on future compiler releases. -Use the use the semicolon operator `;` +There are several means to ensure that computation takes place in a specific order. + +You can use the use the sequence operator `;` to execute expressions in a particular order: ```ocaml # print_endline "ha"; print_endline "ho";; ha @@ -549,7 +614,7 @@ hu - : unit = () ``` -Use a `let` construction: +`let` expressions are executed in the order they appear in, so you can nest them to achieve a particular order of evaluation: ```ocaml # let () = print_endline "ha" in print_endline "hu";; ha @@ -559,12 +624,15 @@ hu ## Conclusion -Handling mutable state isn't good or bad. In the cases where it is needed, OCaml provides fine tools to handle them. Many courses and books on programming and algorithmic are written in imperative style without stronger reasons than being the dominant style. Many techniques can be translated into functional style without loss in speed or increased memory consumption. Careful inspection of many efficient programming techniques or good practices shows in essence, they are functional, made working the imperative setting by hook or by crook. In OCaml, it is possible to express things in their true nature and it is preferable to do so. +Mutable state as such is neither good or bad. For the cases where mutable state enables a significantly simpler implementation, OCaml provides fine tools to deal with mutable state. We looked at references, mutable record fields, arrays, byte sequences, as well as imperative control flow expressions like `for` and `while` loops. Finally, we discussed several examples of recommended and discouraged use of side effects and mutable state. -## References + \ No newline at end of file