Linearity, Uniqueness, and Haskell

Posted on January 8, 2017

There is a group of people consisting of Tweag I/O, ghc head-quarters and JP Bernardy working on a proposal for adding linear types to Haskell. In this blog post I will attempt to put this proposal in some context. I will explain what linearity is and how it relates to its close cousin uniqueness typing, and I will compare the proposal to some other ways of doing things. My hope is that this will make the work more accessible.

Note on syntax: I will attempt to use a consistent syntax throughout this article. Although I will be citing examples from various programming languages (Haskell, Clean, Idris, the Haskell linearity proposal), my syntax will not necessarily match the concrete syntax used in any of those languages. My goal is not to provide a tutorial on using any of those specific languages, but rather to provide the necessary background to understand all of them and the differences between them.

Substructural type systems

Consider the following two Haskell functions:

wasteful :: a -> b -> a
wasteful x y = x

frugal :: a -> (a, a)
frugal x = (x, x)

Although I might have given them unusual names, they have perfectly normal Haskell types. However, they illustrate two properties of the Haskell type system:

From a formal type system perspective, this corresponds to the structural rules weakening and contraction, and we call type systems in which these rules do not apply substructural type systems.

If weakening does not apply, wasteful would not be type correct: functions must use all their arguments. If contraction does not apply, frugal would not be type correct: functions cannot use their arguments more than once. If neither weakening nor contraction applies, functions must use their arguments exactly once.

Examples

To give you an intuition why substructurality is useful, we will consider two examples in this section. Note that although these two examples may look quite similar, we will see later that they are actually of quite a different nature.

In-place updates

An update to a mutable array in Haskell is a monadic operation:

write :: IOVector Double -> Int -> Double -> IO ()

But why exactly? Why can’t we just have

write :: Int -> Double -> Vector Double -> Vector Double

with the understanding that write updates the vector in-place (i.e., really modifying it rather than returning a new array)? Well, if we had such a function, we could then define

haveAndEat :: Vector Double -> (Vector Double, Vector Double)
haveAndEat arr = (write 0 2.3 arr, arr)

We can’t have our cake and eat it: the definition of haveAndEat suggests that it returns two arrays, one of which is updated and one of which is not; but since write updates the array in-place, both of those two arrays are the same array: before the call to write is evaluated, both would be the original array and after the call to write is fully evaluated both would be modified. The value of that “second” array arr now depends on when another expression (write 0 2.3 arr) is evaluated, and hence it has become a stateful value: we have lost purity and all its benefits.

Here’s the thing: trouble started because haveAndEat used its argument arr twice (it’s like frugal): if we had a type system in which contraction does not apply, we would not be able to define haveAndEat and all would be well.

Protocols

The basic way to open a file in Haskell is

openFile :: FilePath -> IOMode -> IO Handle

There are two problems with this function, as illustrated by the following two programs:

careless :: FilePath -> IO ()
careless fp = do
    handle <- openFile fp ReadMode
    return ()

anxious :: FilePath -> IO ()
anxious fo = do
    handle <- openFile fp ReadMode
    hClose handle
    hClose handle

In careless the handle returned by openFile is never closed, and in anxious it is closed twice. We can deal with the first problem by having the garbage collector close handles (through finalizers), and we can deal with the second problem by maintaining some per-handle state even after its closed (alternatively, for applications where lexical scoping is not too restrictive, we can use a function such as withFile which itself takes care of closing the Handle).

But wouldn’t it be nicer if we could solve these problems at the type level? As you might have realized, careless is a bit like wasteful: it throws away the handle. Similarly, anxious is a bit like frugal: it uses the handle twice. In a type system in which weakening and contraction do not apply, both careless and anxious would be type incorrect. Thus, a substructural type system allows us to express protocols: the handle must eventually be closed, it cannot be used anymore after it has been closed, etc.; we will see another example later.

Limiting the applicability of the structural rules ..

Typically we want to restrict weakening and contraction for some arguments, but not for others. For example, consider

writeTwoValues :: Int -> Int -> Double -> Vector Double -> Vector Double
writeTwoValues i j x arr = write i x (write j x arr)

Although writeTwoValues uses argument x twice, this should be unproblemantic: the important thing is that it doesn’t use argument arr twice. In other words, the structural rules (contraction and weakening) should be applicable to x, but not to arr.

Let’s agree to call values to which the structural rules do not apply substructural values; this nomenclature is not standard but I think it will help.

.. through types

To a Haskell programmer, the most obvious way to separate structural values from substructural values is through types. This is the approach taken in the functional programming language Clean.

As an example, we might give write the following type:

write :: ω:Int -> ω:Double -> 1:(Vector ω:Double) -> 1:(Vector ω:Double)

A value of type ω:a is an “ordinary” value which we can reuse or throw away at will (i.e., the structural rules apply as normal); by contrast, a value of type 1:a is a substructural value which we can not reuse or throw away (the structural rules do not apply). (Technically speaking we are missing some annotations here; here and elsewhere when that happens you can assume a ω: annotation in such cases, unless otherwise indicated. Note that I’m not using Clean syntax to keep the syntax uniform throughout this article.)

This means that we cannot define

haveAndEat :: 1:(Vector ω:Double) -> (1:(Vector ω:Double), 1:(Vector ω:Double))
haveAndEat arr = (write 0 2.3 arr, arr)

because write insists that the array is substructural, and hence the structural rules do not apply.

Similarly, we can define

frugal :: ω:a ->:a, ω:a)
frugal x = (x, x)

but not

frugal :: 1:a -> (1:a, 1:a) -- incorrect
frugal x = (x, x)

.. through kinds

Another closely related approach, used in the functional programming language Idris, is to separate structural from substructural values through kinds.

Specifically, we split the universe of types into two: the types of structural values (types like Int or Double), and the types of substructural values (types like Vector). In this case, the type of write is just

write :: Int -> Double -> Vector Double -> Vector Double

but with the implicit understanding that the kind of Int and Double is different from the kind of Vector Double. The type of frugal becomes

frugal :: forall (a :: Type). a -> (a, a)
frugal x = (x, x)

where Type is the Idris name for the kind of “normal” types (structural rules do apply). If we then tried to apply frugal to an array, we would get a kind error rather than a type error.

.. through variable bindings

There is alternative possibility: we can instead mark variable bindings with whether or not the substructural rules should apply to those variables; this is the approach that the Haskell linearity proposal takes.

For example, write would turn into something like

write (i ::ω Int) (x ::ω Double) (arr ::1 Vector Double) = ...

We can introduce a convenient shorthand as follows:

write :: Int -> Double -> Vector DoubleVector Double
write i x arr = ...

This operator is pronounced “lollipop”; a function of type (a ⊸ b) is really a function (\(x ::1 a) -> ...); a “normal” function of type (a -> b) would then be shorthand for (\(x ::ω a) -> ...).

This means we cannot define

haveAndEat :: Vector Double ⊸ (Vector Double, Vector Double) -- incorrect
haveAndEat arr = (write 0 2.3 arr, arr)

and the type of wasteful becomes

wasteful :: a ⊸ b -> a
wasteful x y = x

Let bindings

Of course, if we mark only variables this raises the question what happens to larger expressions. One rule is that when we let-bind a variable

let x = expr in ...

and in expr we use some variable (arr ::1 ...), the the binding of x must also become (x ::1 ...). For example the following function would also be rejected:

writeThenDup :: Vector Double ⊸ (Vector Double, Vector Double) -- incorrect
writeThenDup arr = let x = write 0 2.3 arr in (x, x)

Note that this applies no matter what the result type is; for example, the following function is just as incorrect:

dupLength :: Vector Double ⊸ (Int, Int) -- incorrect
dupLength arr = let len = length arr in (len, len)

Function results

If we define

dup :: a -> (a, a)
dup x = (x, x)

then clearly we should also reject

writeThenDup :: Vector Double ⊸ (Vector Double, Vector Double) -- incorrect
writeThenDup arr = dup (write 0 2.3 arr)

as it is semantically equivalent to the version in the previous section. The rule is here that we can apply a function of type (a -> b) only to arguments constructed without using substructural values. One intuition here is that the result of applying a function to a substructural value is itself a substructural value.

We will see more rules later.

Uniqueness versus Linearity

So far I have avoided the terms unique or linear, but we are now at a point where we should start to distinguish.

Uniqueness

A unique value is one which has not been shared: uniqueness is a guarantee. We relied on uniqueness in the mutable array example: we can only safely apply write to an array if that array is unique. It should be safe to forget guarantees, and in uniqueness typing this is formalized through a subtyping relationship; this means that the following function is type correct:

forgetUnique :: 1:(Vector ω:Double) -> ω:(Vector ω:Double)
forgetUnique arr = arr

Of course, once we forget that an array is unique we can no longer modify it.

Linearity

Conversely, a linear value is one which may not be shared: linearity is a restriction. We relied on linearity in the Handle example: if openFile returns a linear Handle, then functions which close that Handle twice are not type correct.

In many ways linearity is dual to uniqueness typing, and we can see this quite clearly in the subtyping relation. Although we cannot forget restrictions, we can impose them. For example:

restrict :: ω:(Chan ω:Int) -> 1:(Chan ω:Int)
restrict chan = chan

In restrict we take a “normal” channel (with no restrictions) and return a linear channel. As an example use case, consider

master :: IO Int
master = do
    chan <- newChan  
    forkIO $ slave1 (restrict chan)
    forkIO $ slave2 (restrict chan)
    x <- readChan chan
    y <- readChan chan
    return $ x + y

Here we have some master process which creates a new (unrestricted) channel. It then uses restrict to pass this channel as a linear channel to two slave processes slave1 and slave2, and is therefore guaranteed that the two slave processes will each send precisely one value on this channel (we might need to newtype the channel so that sending a value is the only applicable operation).

Pairs

I hope that by now you have a reasonable intuition of the usefulness of limiting the applicability of the structural rules contraction and weakening, and of the difference between uniqueness and linearity.

However, the examples we considered so far were carefully constructed to avoid the complications that arise from propagation. This is where things get a bit more subtle.

Uniqueness

Consider updating a pair of arrays. In standard Haskell we’d write something like

writeTwoArrays ::    (Vector Double, Vector Double)
               -> IO (Vector Double, Vector Double)
writeTwoArrays (arr1, arr2) = do
    arr1' <- write 0 2.3 arr1
    arr2' <- write 1 4.5 arr2
    return (arr1', arr2')

The use of the IO monad forced us to decide whether we want to write arr1 or arr2 first. The compiler is not free to rearrange these two calls, much less execute them in parallel. This is unfortunate: after all, updates to two arrays are independent.

In a language with uniqueness typing we can write something like this (we shall see shortly that this type is not entirely correct):

writeTwoArrays :: (1:(Vector ω:Double), 1:(Vector ω:Double)) -- not quite correct
               -> (1:(Vector ω:Double), 1:(Vector ω:Double))
writeTwoArrays (arr1, arr2) = (write 0 2.3 arr1, write 1 4.5 arr2)

Apart from the additional information in the type, this is just ordinary functional programming. However, now consider

haveAndEatTwo arrs = (writeTwoArrays arrs, arrs)

Just like haveAndEat, we should reject haveAndEatTwo for much the same reasons: we cannot the updated pair of arrays and the non-updated pair of arrays, because the arrays are updated in-place.

But how can we see this from the types? After all, write wants a unique array, but writeTwoArray gets a pair of two unique arrays and so it seems type correct; haveAndEatTwo duplicates the pair, but not the arrays and hence seems type correct as well.

This is where uniqueness propagation comes in: if we want to extract a unique element from a pair (like writeTwoArrays does), then that pair itself must be unique. After all, if two people have a reference to the pair, then even if the only references to the elements of that pair are from that pair, there are still multiple references to those elements:

This means that the type of writeTwoArrays is actually

writeTwoArrays :: 1:(1:(Vector ω:Double), 1:(Vector ω:Double)) -- correct
               -> 1:(1:(Vector ω:Double), 1:(Vector ω:Double))
writeTwoArrays (arr1, arr2) = (write 0 2.3 arr1, write 1 4.5 arr2)

recording that the pair itself must be unique, too. This means that haveAndEatTwo is no longer type correct, because it would need to duplicate a unique array, which is not allowed.

Note that it’s perfectly fine to construct a non-unique array with unique elements:

mkPair :: 1:a -> 1:b -> ω:(1:a, 1:b) -- correct but useless
mkPair x y = (x, y)

However, such a pair is just not very useful as we cannot extract those unique elements.

Linearity (in types)

In linearity the situation is dual: there isn’t necessarily a problem with destructing a pair with linear elements (as long as those elements are then used linearly, of course). This means the following function is fine:

swap :: ω:(1:a, 1:b) -> 1:(1:b, 1:a) -- correct but useless
swap (x, y) = (y, x)

However, we cannot construct such pairs; the following is incorrect:

mkPair :: 1:a -> 1:b -> ω:(1:a, 1:b) -- incorrect
mkPair x y = (x, y)

We are not allowed to construct a non-linear pair from linear arguments.

Linearity (in variable bindings)

In the linearity-in-types world, the elements of the pair have their own linearity annotations; this means that a function such as

first :: 1:(1:a, 1:b) -> 1:a -- incorrect
first (x, y) = x

is type incorrect, because the type of y is 1:b and hence weakening does not apply: we cannot throw y away.

When we track linearity only through variable bindings, we have to be more explicit out this propagation: the rule for pattern matching is that in a case statement such as

case expr of (x, y) -> ...

if we use any linear variables inside expr (i.e., variables with binding ::1) then x and y must also be linear: an obligation to use a pair linearly translates to two linear obligations to use the elements of the pair linearly. Note that this rule talks about the variables inside expr, not the type of expr; there is no other choice, because types do not track linearity.

This means that

first :: (a, b) ⊸ a -- incorrect

is type incorrect, though

first :: (a, b) -> a -- correct

is fine. As a more concrete example, consider the Chan use case again: if we pass a pair of channels to a slave process, the slave must send exactly one value on both channels.

(This means that these pairs are multiplicative, not additive. The Haskell linearity does not consider additive pairs at all, so I will not discuss them in this article either.)

Higher order functions

In some ways the situation for higher order functions is very similar to the situation for pairs; however, there are some additional complications.

Uniqueness Typing

Consider the following attempt to outsmart the type system by partially applying const:

const :: 1:a -> ω:b -> 1:a -- not quite correct
const x y = x

sneakyDup :: 1:a -> 1:(1:a, 1:a)
sneakyDup x = let f = const x in (f (), f ())

Obviously we should rule out sneakyDup as it claims it can duplicate unique values; but how? Here’s the rule: just like a pair must be unique if we want to extract a unique value from it, a function with unique elements in its closure must be unique if we want to execute it.

However, since we don’t typically record the types of the elements in a function closure, we cannot know whether or not a function has unique elements in its closure when we apply it. Therefore, we have to approximate and instead say that when we construct a function with unique elements in its closure that function itself must be unique. Hence, the type of const becomes

const :: 1:a -> 1::b -> 1:a) -- correct
const x y = x

This allows us to rule out sneakyDup because it uses f twice.

You might object at this point that sneakyDup could use subtyping to forget the uniqueness guarantee of f, and you would be right. For this reason subtyping does not apply to functions. This non-uniformity of the subtyping relation causes all kinds of trouble, not least of which loss of principal types. A large part of my PhD thesis is dedicated to solving this problem, but further discussion of this topic is beyond the scope of this article.

Linearity (in types)

By analogy to the rule for pairs, you might expect that in linearity there isn’t necessarily a problem with applying non-linear functions with linear values in their closure, but instead we cannot construct such functions. Indeed, this is the case. This means that in a linear type system, we get

const :: 1:a -> 1::b -> 1:a) -- correct
const x y = x

and sneakyDup gets rejected for the same reason as in the uniqueness typing example.

This might not look dual to the case for uniqueness typing, but that’s only because in uniqueness typing we had to make some unpleasant conservative approximations, making the uniqueness case identical rather than dual to the linear case. Again, see my thesis for details.

Linearity (in variable bindings)

Let’s consider the sneakyDup example one more time. If we track linearity through bindings only, we get

const :: a ⊸ b -> a
const x y = x

sneakyDup :: a ⊸ (a, a) -- incorrect
sneakyDup x = let f = const x in (f (), f ())

In this case we are rescued by the rule for let-bindings which we discussed above: any let-bound variable must be used linearly if we use any linear variables inside the body. No need to take special precautions with partial application in this case.

But there is a subtlety in a different case. Consider this alternative attempt to outsmart the type system:

(.) :: (b -> c) ⊸ (a -> b) ⊸ a -> c -- incorrect
(f . g) x = f (g x)

dup :: a -> (a, a)
dup x = (x, x)

sneakyDup :: a ⊸ (a, a)
sneakyDup x = (dup . const x) ()

Something clearly went wrong here, but it’s somewhat subtle. It went wrong in the definition of (.); as we saw before, the result of applying a linear function (a function of type (a ⊸ b)) must itself be treated linearly. However, the same goes for the result of a non-linear function if we have a linear assumption g ::1 (a -> b). This is somewhat comparable to the rule in Clean that functions with unique elements in their closure must themselves be unique, except that here it extends to the results of those functions too. In the typing rules this is simply a consequence of the rule we saw already: we cannot apply a non-linear function (in this example, f) to an argument which is constructed using linear assumptions (in this case, g).

Polymorphism

One of the main obstacles to adding support for uniqueness or linearity to a language like Haskell is that the most general (principal) types of many functions get very complicated. For example, in Clean we have

const :: u:a -> w:(v:b -> u:a), [w <= u]
const x = \y -> x

After you stare at it for a while, this type makes sense: const receives two arguments of type a and b, and returns something of type a; moreover, the first argument may or may not be unique (u), and similarly for the second argument (v); but the result will be unique only if the first argument is; and moreover, if that first argument is unique, then if we partially apply const to just one argument then the resulting function must itself be unique (w <= u), due to the rule about functions with unique elements in their closure, discussed above.

Sensible or not, it certainly is a lot more complicated than the type of const in a regular functional programming language

const :: a -> b -> a
const x y = x

As another example, the type of fst is

fst :: w:(u:a, v:b) -> u:a, [w <= u]
fst (x, y) = x

which similarly reflects that the pair itself must be unique if we want to extract a unique element from it.

In Idris these two functions look like

const : {a : AnyType} -> {b : AnyType} -> a -> b -> a
const x y = x

fst : {a : AnyType} -> {b : AnyType} -> UPair a b -> a
fst (MkUPair x y) = x

where AnyType is the Idris kind which covers both unique and non-unique types. These functions look much simpler than the functions in Clean, but that’s just because the kinding rules are not spelled out (function space has kind UniqueType if argument has; pair has kind UniqueType if either of the elements has).

One of the benefits of tracking linearity in bindings instead, like in the Haskell linearity proposal, is that these types become much simpler:

const :: a ⊸ b -> a
const x y = x

fst :: (a, b) -> a
fst (x, y) = x

However, we don’t get away completely scot-free. For example, consider composition one last time:

(f . g) x = f (g x)

Does (f . g) use x linearly or not? Well, this clearly depends on f and g: (f . g) uses x linearly only if both f and g do. We can express this as follows:

(.) :: (b ->:u c)  ⊸  (a ->:v b)  ->:u  a  ->:uv  c

where (a ->:v b) is a generalization of both (a -> b) (letting v = ω) and (a ⊸ b) (letting v = 1), and uv is type-level multiplication of these variables such that uv = ω if either u or v is (I don’t know what concrete syntax is proposed here; the above syntax is my own).

Conclusions

If we want to retrofit uniqueness typing or linearity to Haskell, we need to find a way to do this without changing everything about the language. Both the uniqueness-through-kinds approach used in Idris and the linearity-through-bindings approach used in the Haskell linearity proposal are reasonable ways to do this.

Nonetheless, if we want to give functions their most general type we’d still end up with different, more complicated, types than the types we had before the introduction of uniqueness/linearity. Syntactic conventions can go some way to hide this complexity. Moreover, we don’t of course have to give functions their most general type. We can still use

const :: a -> b -> a
(.)   :: (b -> c) -> (a -> b) -> a -> c

and so on; this would limit the applicability of these functions in code that does really on linearity, but I think that’s a reasonable tradeoff. It does raise some questions about type inference, and principal types, but those are not insurmountable problems.

As I have argued elsewhere (for example, see my paper Modelling Unique and Affine Typing using Polymorphism), I do prefer a substructural type system with no subtyping relation. As we have seen, without subtyping linearity and uniqueness become pretty much the same thing, and this gives library authors the freedom to use the type system to express what they want. That said, a linear type system can encode the absence of sharing. For example, we can do something like

withMutableArray :: (1:(Vector Double) -> a) -> a

This function creates a new mutable array, and then passes that as a linear object to the callback. In this setup, the definition of withMutableArray (though not its type) guarantees that we start with a single (unique!) reference to a mutable array, and once we pass that array to the callback linearity guarantees that the array will never be duplicated. In a way we are using this negative occurrence of a linear type to model uniqueness typing.

To me a system based on types, rather than variable bindings, is easier to understand; but I’ll readily admit I am biased, having worked on systems in that style for many years. That said, Haskell programmers are used to thinking in terms of types, so a type or kind based approach would seem to fit better with the language. Moreover, we can abstract over types, but we cannot abstract over the syntactic shape of arguments, so if we track linearity in bindings only we will lose some expressiveness.

It’s hard to evaluate a type system without an implementation however. We will need more practical experience to be able to really compare the approaches. Either way, I hope this article has managed to place the Haskell linearity in some context and make it a bit easier to understand. No matter what the final system looks like, I look forward to having support for substructural types in Haskell!

Acknowledgements

Thanks to Arnaud Spiwack of Tweag I/O, one of the primary authors of the Haskell linearity proposal, for valuable feedback on an early draft of this article.