Chapter 1 The core language
This part of the manual is a tutorial introduction to the OCaml language. A good familiarity with programming in a conventional languages (say, C or Java) is assumed, but no prior exposure to functional languages is required. The present chapter introduces the core language. Chapter 2 deals with the module system, chapter 3 with the object-oriented features, chapter 4 with extensions to the core language (labeled arguments and polymorphic variants), and chapter 6 gives some advanced examples.
1.1 Basics
For this overview of OCaml, we use the interactive system, which is started by running ocaml from the Unix shell, or by launching the OCamlwin.exe application under Windows. This tutorial is presented as the transcript of a session with the interactive system: lines starting with # represent user input; the system responses are printed below, without a leading #.
Under the interactive system, the user types OCaml phrases terminated by ;; in response to the # prompt, and the system compiles them on the fly, executes them, and prints the outcome of evaluation. Phrases are either simple expressions, or let definitions of identifiers (either values or functions).
1+2*3;;- : int = 7
let pi = 4.0 *. atan 1.0;;val pi : float = 3.14159265358979312
let square x = x *. x;;val square : float -> float = <fun>
square (sin pi) +. square (cos pi);;- : float = 1.
The OCaml system computes both the value and the type for each phrase. Even function parameters need no explicit type declaration: the system infers their types from their usage in the function. Notice also that integers and floating-point numbers are distinct types, with distinct operators: + and * operate on integers, but +. and *. operate on floats.
1.0 * 2;;Error: This expression has type float but an expression was expected of type int
Recursive functions are defined with the let rec binding:
let rec fib n = if n < 2 then n else fib (n-1) + fib (n-2);;val fib : int -> int = <fun>
fib 10;;- : int = 55
1.2 Data types
In addition to integers and floating-point numbers, OCaml offers the usual basic data types: booleans, characters, and immutable character strings.
(1 < 2) = false;;- : bool = false
'a';;- : char = 'a'
"Hello world";;- : string = "Hello world"
Predefined data structures include tuples, arrays, and lists. There are also general mechanisms for defining your own data structures, such as records and variants, which will be covered in more detail later; for now, we concentrate on lists. Lists are either given in extension as a bracketed list of semicolon-separated elements, or built from the empty list [] (pronounce “nil”) by adding elements in front using the :: (“cons”) operator.
let l = ["is"; "a"; "tale"; "told"; "etc."];;val l : string list = ["is"; "a"; "tale"; "told"; "etc."]
"Life" :: l;;- : string list = ["Life"; "is"; "a"; "tale"; "told"; "etc."]
As with all other OCaml data structures, lists do not need to be explicitly allocated and deallocated from memory: all memory management is entirely automatic in OCaml. Similarly, there is no explicit handling of pointers: the OCaml compiler silently introduces pointers where necessary.
As with most OCaml data structures, inspecting and destructuring lists is performed by pattern-matching. List patterns have exactly the same form as list expressions, with identifiers representing unspecified parts of the list. As an example, here is insertion sort on a list:
let rec sort lst = match lst with [] -> [] | head :: tail -> insert head (sort tail) and insert elt lst = match lst with [] -> [elt] | head :: tail -> if elt <= head then elt :: lst else head :: insert elt tail ;;val sort : 'a list -> 'a list = <fun> val insert : 'a -> 'a list -> 'a list = <fun>
sort l;;- : string list = ["a"; "etc."; "is"; "tale"; "told"]
The type inferred for sort, 'a list -> 'a list, means that sort can actually apply to lists of any type, and returns a list of the same type. The type 'a is a type variable, and stands for any given type. The reason why sort can apply to lists of any type is that the comparisons (=, <=, etc.) are polymorphic in OCaml: they operate between any two values of the same type. This makes sort itself polymorphic over all list types.
sort [6;2;5;3];;- : int list = [2; 3; 5; 6]
sort [3.14; 2.718];;- : float list = [2.718; 3.14]
The sort function above does not modify its input list: it builds and returns a new list containing the same elements as the input list, in ascending order. There is actually no way in OCaml to modify a list in-place once it is built: we say that lists are immutable data structures. Most OCaml data structures are immutable, but a few (most notably arrays) are mutable, meaning that they can be modified in-place at any time.
The OCaml notation for the type of a function with multiple arguments is
arg1_type -> arg2_type -> ... -> return_type. For example,
the type inferred for insert, 'a -> 'a list -> 'a list, means that insert
takes two arguments, an element of any type 'a and a list with elements of
the same type 'a and returns a list of the same type.
1.3 Functions as values
OCaml is a functional language: functions in the full mathematical sense are supported and can be passed around freely just as any other piece of data. For instance, here is a deriv function that takes any float function as argument and returns an approximation of its derivative function:
let deriv f dx = function x -> (f (x +. dx) -. f x) /. dx;;val deriv : (float -> float) -> float -> float -> float = <fun>
let sin' = deriv sin 1e-6;;val sin' : float -> float = <fun>
sin' pi;;- : float = -1.00000000013961143
Even function composition is definable:
let compose f g = function x -> f (g x);;val compose : ('a -> 'b) -> ('c -> 'a) -> 'c -> 'b = <fun>
let cos2 = compose square cos;;val cos2 : float -> float = <fun>
Functions that take other functions as arguments are called “functionals”, or “higher-order functions”. Functionals are especially useful to provide iterators or similar generic operations over a data structure. For instance, the standard OCaml library provides a List.map functional that applies a given function to each element of a list, and returns the list of the results:
List.map (function n -> n * 2 + 1) [0;1;2;3;4];;- : int list = [1; 3; 5; 7; 9]
This functional, along with a number of other list and array functionals, is predefined because it is often useful, but there is nothing magic with it: it can easily be defined as follows.
let rec map f l = match l with [] -> [] | hd :: tl -> f hd :: map f tl;;val map : ('a -> 'b) -> 'a list -> 'b list = <fun>
1.4 Records and variants
User-defined data structures include records and variants. Both are defined with the type declaration. Here, we declare a record type to represent rational numbers.
type ratio = {num: int; denom: int};;type ratio = { num : int; denom : int; }
let add_ratio r1 r2 = {num = r1.num * r2.denom + r2.num * r1.denom; denom = r1.denom * r2.denom};;val add_ratio : ratio -> ratio -> ratio = <fun>
add_ratio {num=1; denom=3} {num=2; denom=5};;- : ratio = {num = 11; denom = 15}
Record fields can also be accessed through pattern-matching:
let integer_part r = match r with {num=num; denom=denom} -> num / denom;;val integer_part : ratio -> int = <fun>
Since there is only one case in this pattern matching, it is safe to expand directly the argument r in a record pattern:
let integer_part {num=num; denom=denom} = num / denom;;val integer_part : ratio -> int = <fun>
Unneeded fields can be omitted:
let get_denom {denom=denom} = denom;;val get_denom : ratio -> int = <fun>
Optionally, missing fields can be made explicit by ending the list of fields with a trailing wildcard _::
let get_num {num=num; _ } = num;;val get_num : ratio -> int = <fun>
When both sides of the = sign are the same, it is possible to avoid repeating the field name by eliding the =field part:
let integer_part {num; denom} = num / denom;;val integer_part : ratio -> int = <fun>
This short notation for fields also works when constructing records:
let ratio num denom = {num; denom};;val ratio : int -> int -> ratio = <fun>
At last, it is possible to update few fields of a record at once:
let integer_product integer ratio = { ratio with num = integer * ratio.num };;val integer_product : int -> ratio -> ratio = <fun>
With this functional update notation, the record on the left-hand side of with is copied except for the fields on the right-hand side which are updated.
The declaration of a variant type lists all possible forms for values of that type. Each case is identified by a name, called a constructor, which serves both for constructing values of the variant type and inspecting them by pattern-matching. Constructor names are capitalized to distinguish them from variable names (which must start with a lowercase letter). For instance, here is a variant type for doing mixed arithmetic (integers and floats):
type number = Int of int | Float of float | Error;;type number = Int of int | Float of float | Error
This declaration expresses that a value of type number is either an integer, a floating-point number, or the constant Error representing the result of an invalid operation (e.g. a division by zero).
Enumerated types are a special case of variant types, where all alternatives are constants:
type sign = Positive | Negative;;type sign = Positive | Negative
let sign_int n = if n >= 0 then Positive else Negative;;val sign_int : int -> sign = <fun>
To define arithmetic operations for the number type, we use pattern-matching on the two numbers involved:
let add_num n1 n2 = match (n1, n2) with (Int i1, Int i2) -> (* Check for overflow of integer addition *) if sign_int i1 = sign_int i2 && sign_int (i1 + i2) <> sign_int i1 then Float(float i1 +. float i2) else Int(i1 + i2) | (Int i1, Float f2) -> Float(float i1 +. f2) | (Float f1, Int i2) -> Float(f1 +. float i2) | (Float f1, Float f2) -> Float(f1 +. f2) | (Error, _) -> Error | (_, Error) -> Error;;val add_num : number -> number -> number = <fun>
add_num (Int 123) (Float 3.14159);;- : number = Float 126.14159
Another interesting example of variant type is the built-in 'a option type which represents either a value of type 'a or an absence of value:
type 'a option = Some of 'a | None;;type 'a option = Some of 'a | None
This type is particularly useful when defining function that can fail in common situations, for instance
let safe_square_root x = if x > 0. then Some(sqrt x) else None;;val safe_square_root : float -> float option = <fun>
The most common usage of variant types is to describe recursive data structures. Consider for example the type of binary trees:
type 'a btree = Empty | Node of 'a * 'a btree * 'a btree;;type 'a btree = Empty | Node of 'a * 'a btree * 'a btree
This definition reads as follows: a binary tree containing values of type 'a (an arbitrary type) is either empty, or is a node containing one value of type 'a and two subtrees also containing values of type 'a, that is, two 'a btree.
Operations on binary trees are naturally expressed as recursive functions following the same structure as the type definition itself. For instance, here are functions performing lookup and insertion in ordered binary trees (elements increase from left to right):
let rec member x btree = match btree with Empty -> false | Node(y, left, right) -> if x = y then true else if x < y then member x left else member x right;;val member : 'a -> 'a btree -> bool = <fun>
let rec insert x btree = match btree with Empty -> Node(x, Empty, Empty) | Node(y, left, right) -> if x <= y then Node(y, insert x left, right) else Node(y, left, insert x right);;val insert : 'a -> 'a btree -> 'a btree = <fun>
1.4.1 Record and variant disambiguation
( This subsection can be skipped on the first reading )
Astute readers may have wondered what happens when two or more record fields or constructors share the same name
type first_record = { x:int; y:int; z:int } type middle_record = { x:int; z:int } type last_record = { x:int };;
type first_variant = A | B | C type last_variant = A;;
The answer is that when confronted with multiple options, OCaml tries to use locally available information to disambiguate between the various fields and constructors. First, if the type of the record or variant is known, OCaml can pick unambiguously the corresponding field or constructor. For instance:
let look_at_x_then_z (r:first_record) = let x = r.x in x + r.z;;val look_at_x_then_z : first_record -> int = <fun>
let permute (x:first_variant) = match x with | A -> (B:first_variant) | B -> A | C -> C;;val permute : first_variant -> first_variant = <fun>
type wrapped = First of first_record let f (First r) = r, r.x;;type wrapped = First of first_record val f : wrapped -> first_record * int = <fun>
In the first example, (r:first_record) is an explicit annotation telling OCaml that the type of r is first_record. With this annotation, Ocaml knows that r.x refers to the x field of the first record type. Similarly, the type annotation in the second example makes it clear to OCaml that the constructors A, B and C come from the first variant type. Contrarily, in the last example, OCaml has inferred by itself that the type of r can only be first_record and there are no needs for explicit type annotations.
Those explicit type annotations can in fact be used anywhere. Most of the time they are unnecessary, but they are useful to guide disambiguation, to debug unexpected type errors, or combined with some of the more advanced features of OCaml described in later chapters.
Secondly, for records, OCaml can also deduce the right record type by looking at the whole set of fields used in a expression or pattern:
let project_and_rotate {x;y; _ } = { x= - y; y = x ; z = 0} ;;val project_and_rotate : first_record -> first_record = <fun>
Since the fields x and y can only appear simultaneously in the first record type, OCaml infers that the type of project_and_rotate is first_record -> first_record.
In last resort, if there is not enough information to disambiguate between different fields or constructors, Ocaml picks the last defined type amongst all locally valid choices:
let look_at_xz {x;z} = x;;val look_at_xz : middle_record -> int = <fun>
Here, OCaml has inferred that the possible choices for the type of {x;z} are first_record and middle_record, since the type last_record has no field z. Ocaml then picks the type middle_record as the last defined type between the two possibilities.
Beware that this last resort disambiguation is local: once Ocaml has chosen a disambiguation, it sticks to this choice, even if it leads to an ulterior type error:
let look_at_x_then_y r = let x = r.x in (* Ocaml deduces [r: last_record] *) x + r.y;;Error: This expression has type last_record The field y does not belong to type last_record
let is_a_or_b x = match x with | A -> true (* OCaml infers [x: last_variant] *) | B -> true;;Error: This variant pattern is expected to have type last_variant The constructor B does not belong to type last_variant
Moreover, being the last defined type is a quite unstable position that may change surreptitiously after adding or moving around a type definition, or after opening a module (see chapter 2). Consequently, adding explicit type annotations to guide disambiguation is more robust than relying on the last defined type disambiguation.
1.5 Imperative features
Though all examples so far were written in purely applicative style, OCaml is also equipped with full imperative features. This includes the usual while and for loops, as well as mutable data structures such as arrays. Arrays are either created by listing semicolon-separated element values between [| and |] brackets, or allocated and initialized with the Array.make function, then filled up later by assignments. For instance, the function below sums two vectors (represented as float arrays) componentwise.
let add_vect v1 v2 = let len = min (Array.length v1) (Array.length v2) in let res = Array.make len 0.0 in for i = 0 to len - 1 do res.(i) <- v1.(i) +. v2.(i) done; res;;val add_vect : float array -> float array -> float array = <fun>
add_vect [| 1.0; 2.0 |] [| 3.0; 4.0 |];;- : float array = [|4.; 6.|]
Record fields can also be modified by assignment, provided they are declared mutable in the definition of the record type:
type mutable_point = { mutable x: float; mutable y: float };;type mutable_point = { mutable x : float; mutable y : float; }
let translate p dx dy = p.x <- p.x +. dx; p.y <- p.y +. dy;;val translate : mutable_point -> float -> float -> unit = <fun>
let mypoint = { x = 0.0; y = 0.0 };;val mypoint : mutable_point = {x = 0.; y = 0.}
translate mypoint 1.0 2.0;;- : unit = ()
mypoint;;- : mutable_point = {x = 1.; y = 2.}
OCaml has no built-in notion of variable – identifiers whose current value can be changed by assignment. (The let binding is not an assignment, it introduces a new identifier with a new scope.) However, the standard library provides references, which are mutable indirection cells, with operators ! to fetch the current contents of the reference and := to assign the contents. Variables can then be emulated by let-binding a reference. For instance, here is an in-place insertion sort over arrays:
let insertion_sort a = for i = 1 to Array.length a - 1 do let val_i = a.(i) in let j = ref i in while !j > 0 && val_i < a.(!j - 1) do a.(!j) <- a.(!j - 1); j := !j - 1 done; a.(!j) <- val_i done;;val insertion_sort : 'a array -> unit = <fun>
References are also useful to write functions that maintain a current state between two calls to the function. For instance, the following pseudo-random number generator keeps the last returned number in a reference:
let current_rand = ref 0;;val current_rand : int ref = {contents = 0}
let random () = current_rand := !current_rand * 25713 + 1345; !current_rand;;val random : unit -> int = <fun>
Again, there is nothing magical with references: they are implemented as a single-field mutable record, as follows.
type 'a ref = { mutable contents: 'a };;type 'a ref = { mutable contents : 'a; }
let ( ! ) r = r.contents;;val ( ! ) : 'a ref -> 'a = <fun>
let ( := ) r newval = r.contents <- newval;;val ( := ) : 'a ref -> 'a -> unit = <fun>
In some special cases, you may need to store a polymorphic function in a data structure, keeping its polymorphism. Doing this requires user-provided type annotations, since polymorphism is only introduced automatically for global definitions. However, you can explicitly give polymorphic types to record fields.
type idref = { mutable id: 'a. 'a -> 'a };;type idref = { mutable id : 'a. 'a -> 'a; }
let r = {id = fun x -> x};;val r : idref = {id = <fun>}
let g s = (s.id 1, s.id true);;val g : idref -> int * bool = <fun>
r.id <- (fun x -> print_string "called id\n"; x);;- : unit = ()
g r;;called id called id - : int * bool = (1, true)
1.6 Exceptions
OCaml provides exceptions for signalling and handling exceptional conditions. Exceptions can also be used as a general-purpose non-local control structure, although this should not be overused since it can make the code harder to understand. Exceptions are declared with the exception construct, and signalled with the raise operator. For instance, the function below for taking the head of a list uses an exception to signal the case where an empty list is given.
exception Empty_list;;exception Empty_list
let head l = match l with [] -> raise Empty_list | hd :: tl -> hd;;val head : 'a list -> 'a = <fun>
head [1;2];;- : int = 1
head [];;Exception: Empty_list.
Exceptions are used throughout the standard library to signal cases where the library functions cannot complete normally. For instance, the List.assoc function, which returns the data associated with a given key in a list of (key, data) pairs, raises the predefined exception Not_found when the key does not appear in the list:
List.assoc 1 [(0, "zero"); (1, "one")];;- : string = "one"
List.assoc 2 [(0, "zero"); (1, "one")];;Exception: Not_found.
Exceptions can be trapped with the try…with construct:
let name_of_binary_digit digit = try List.assoc digit [0, "zero"; 1, "one"] with Not_found -> "not a binary digit";;val name_of_binary_digit : int -> string = <fun>
name_of_binary_digit 0;;- : string = "zero"
name_of_binary_digit (-1);;- : string = "not a binary digit"
The with part does pattern matching on the exception value with the same syntax and behavior as match. Thus, several exceptions can be caught by one try…with construct. Also, finalization can be performed by trapping all exceptions, performing the finalization, then re-raising the exception:
let temporarily_set_reference ref newval funct = let oldval = !ref in try ref := newval; let res = funct () in ref := oldval; res with x -> ref := oldval; raise x;;val temporarily_set_reference : 'a ref -> 'a -> (unit -> 'b) -> 'b = <fun>
1.7 Symbolic processing of expressions
We finish this introduction with a more complete example representative of the use of OCaml for symbolic processing: formal manipulations of arithmetic expressions containing variables. The following variant type describes the expressions we shall manipulate:
type expression = Const of float | Var of string | Sum of expression * expression (* e1 + e2 *) | Diff of expression * expression (* e1 - e2 *) | Prod of expression * expression (* e1 * e2 *) | Quot of expression * expression (* e1 / e2 *) ;;type expression = Const of float | Var of string | Sum of expression * expression | Diff of expression * expression | Prod of expression * expression | Quot of expression * expression
We first define a function to evaluate an expression given an environment that maps variable names to their values. For simplicity, the environment is represented as an association list.
exception Unbound_variable of string;;exception Unbound_variable of string
let rec eval env exp = match exp with Const c -> c | Var v -> (try List.assoc v env with Not_found -> raise (Unbound_variable v)) | Sum(f, g) -> eval env f +. eval env g | Diff(f, g) -> eval env f -. eval env g | Prod(f, g) -> eval env f *. eval env g | Quot(f, g) -> eval env f /. eval env g;;val eval : (string * float) list -> expression -> float = <fun>
eval [("x", 1.0); ("y", 3.14)] (Prod(Sum(Var "x", Const 2.0), Var "y"));;- : float = 9.42
Now for a real symbolic processing, we define the derivative of an expression with respect to a variable dv:
let rec deriv exp dv = match exp with Const c -> Const 0.0 | Var v -> if v = dv then Const 1.0 else Const 0.0 | Sum(f, g) -> Sum(deriv f dv, deriv g dv) | Diff(f, g) -> Diff(deriv f dv, deriv g dv) | Prod(f, g) -> Sum(Prod(f, deriv g dv), Prod(deriv f dv, g)) | Quot(f, g) -> Quot(Diff(Prod(deriv f dv, g), Prod(f, deriv g dv)), Prod(g, g)) ;;val deriv : expression -> string -> expression = <fun>
deriv (Quot(Const 1.0, Var "x")) "x";;- : expression = Quot (Diff (Prod (Const 0., Var "x"), Prod (Const 1., Const 1.)), Prod (Var "x", Var "x"))
1.8 Pretty-printing
As shown in the examples above, the internal representation (also called abstract syntax) of expressions quickly becomes hard to read and write as the expressions get larger. We need a printer and a parser to go back and forth between the abstract syntax and the concrete syntax, which in the case of expressions is the familiar algebraic notation (e.g. 2*x+1).
For the printing function, we take into account the usual precedence rules (i.e. * binds tighter than +) to avoid printing unnecessary parentheses. To this end, we maintain the current operator precedence and print parentheses around an operator only if its precedence is less than the current precedence.
let print_expr exp = (* Local function definitions *) let open_paren prec op_prec = if prec > op_prec then print_string "(" in let close_paren prec op_prec = if prec > op_prec then print_string ")" in let rec print prec exp = (* prec is the current precedence *) match exp with Const c -> print_float c | Var v -> print_string v | Sum(f, g) -> open_paren prec 0; print 0 f; print_string " + "; print 0 g; close_paren prec 0 | Diff(f, g) -> open_paren prec 0; print 0 f; print_string " - "; print 1 g; close_paren prec 0 | Prod(f, g) -> open_paren prec 2; print 2 f; print_string " * "; print 2 g; close_paren prec 2 | Quot(f, g) -> open_paren prec 2; print 2 f; print_string " / "; print 3 g; close_paren prec 2 in print 0 exp;;val print_expr : expression -> unit = <fun>
let e = Sum(Prod(Const 2.0, Var "x"), Const 1.0);;val e : expression = Sum (Prod (Const 2., Var "x"), Const 1.)
print_expr e; print_newline ();;2. * x + 1. - : unit = ()
print_expr (deriv e "x"); print_newline ();;2. * 1. + 0. * x + 0. - : unit = ()
1.9 Standalone OCaml programs
All examples given so far were executed under the interactive system. OCaml code can also be compiled separately and executed non-interactively using the batch compilers ocamlc and ocamlopt. The source code must be put in a file with extension .ml. It consists of a sequence of phrases, which will be evaluated at runtime in their order of appearance in the source file. Unlike in interactive mode, types and values are not printed automatically; the program must call printing functions explicitly to produce some output. The ;; used in the interactive examples is not required in source files created for use with OCaml compilers, but can be helpful to mark the end of a top-level expression unambiguously even when there are syntax errors. Here is a sample standalone program to print Fibonacci numbers:
(* File fib.ml *) let rec fib n = if n < 2 then 1 else fib (n-1) + fib (n-2);; let main () = let arg = int_of_string Sys.argv.(1) in print_int (fib arg); print_newline (); exit 0;; main ();;
Sys.argv is an array of strings containing the command-line parameters. Sys.argv.(1) is thus the first command-line parameter. The program above is compiled and executed with the following shell commands:
$ ocamlc -o fib fib.ml $ ./fib 10 89 $ ./fib 20 10946
More complex standalone OCaml programs are typically composed of multiple source files, and can link with precompiled libraries. Chapters 9 and 12 explain how to use the batch compilers ocamlc and ocamlopt. Recompilation of multi-file OCaml projects can be automated using third-party build systems, such as the ocamlbuild compilation manager.