Lockstep-style testing with quickcheck-dynamic

Posted on September 8, 2022

Recently IOG and QuviQ released a new library for testing stateful systems called quickcheck-dynamic. In this blog post we will take a look at this library, and how it relates to quickcheck-state-machine. We will focus on the state machine testing aspect; quickcheck-dynamic also has support for dynamic logic, but we will not discuss that here.

Specifically, we will consider how we might do lockstep-style testing with quickcheck-dynamic. This is a particular approach to testing that we described in great detail in an earlier blog post, An in-depth look at quickcheck-state-machine. We will recap the general philosophy in this new blog post, but we will focus here on the hows, not necessarily the whys; it might be helpful to be familiar with the previous blog post to understand the larger context of what we’re trying to achieve.

We have developed a library called quickcheck-lockstep which builds on top of quickcheck-dynamic to provide an abstraction called InLockstep which provides support for lockstep-style testing. In this blog post we will describe this library in two parts:

In the first half we will show a test author’s perspective of how to use the abstraction. In the second half we show how we can implement the abstraction on top of quickcheck-dynamic.

Read more at well-typed.com

Verifying initial conditions in Plutus

Posted on August 23, 2022

On a UTxO-style blockchain such as Cardano, transaction outputs are normally to the (hash) of a public key; in order to spend such an output, the spending transaction must be signed with the corresponding private key as evidence that the party creating the transaction has the right to spend the output.

The extended UTxO model introduces a second kind of output: an output to a smart contract: a piece of code f. Along with the contract itself, the output also contains an additional argument d, known as the datum. When a transaction spends such an output, f(d) is executed; when f indicates that all is fine, the transaction is approved; otherwise, it is rejected.

An immediate consequence of this model is that outputs are only verified when they are spent, not when they are created. This can lead to some tricky-to-verify initial conditions. We will explore this problem in this blog post, and suggest some ways in which we can solve it. Along the way, we will recap some often misunderstood Plutus concepts.

Read more at well-typed.com

The Plutus Compilation Pipeline: Understanding Plutus Core(s)

Posted on August 3, 2022

Plutus is a strict, pure functional language. It is developed by IOHK for use on the Cardano blockchain, but in this blog post we will not be concerned with specific applications of the language, but instead look at its compilation pipeline.

Technically speaking, Plutus is not one language, but three, and most people who write “Plutus” are not really writing Plutus at all, but Haskell. These Haskell programs are translated to the Plutus Intermediate Representation (PIR). After that, data types are replaced by their Scott encodings, and recursion is replaced by explicit fixpoints to get to typed Plutus Core (PLC). Finally, types are erased to get to the Untyped Plutus Core (UPLC).

In this blog post, we will explain this entire process, with a particular focus on

Read more at well-typed.com

(Blog posts/Videos) Avoiding quadratic core code size

Posted on August 20, 2021 (last updated 2021-12-17)

I’ve thought, written and spoken quite a lot about this topic:

Blog post: Avoiding quadratic core code size with large records

A module that contains nothing but the definition of a single record with 100 fields and some type class instances, using ghc’s standard representation for records, along with the code generated by the RecordDotPreprocessor plugin and the Generic instance generated by generics-sop, results in a core representation of a whopping 450,000 terms/types/coercions, takes 3 seconds to compile, and requires 500M of RAM to compile. With the large-records library, we get a module with essentially the same functionality, but with a core size of a mere 14,000 terms/types/coercions, which compiles within 1 second and requires roughly 100M of RAM. In this blog post we describe why this simple module generates so much code, and how the large-records library manages to reduce this by more than an order of magnitude.

Read more at well-typed.com

Blog post: Induction without core-size blow-up (a.k.a. Large records: anonymous edition)

An important factor affecting compilation speed and memory requirements is the size of the core code generated by ghc from Haskell source modules. Thus, if compilation time is an issue, this is something we should be conscious of and optimize for. In [part 1][part1] of this blog post we took an in-depth look at why certain Haskell constructs lead to quadratic blow-up in the generated ghc core code, and how the [large-records][lr-hackage] library avoids these problems. Indeed, the large-records library provides support for records, including support for generic programming, with a guarantee that the generated core is never more than O(n) in the number of record fields.

The approach described there does however not directly apply to the case of anonymous records. This is the topic we will tackle in this part 2. Unfortunately, it seems that for anonymous records the best we can hope for is O(n log n), and even to achieve that we need to explore some dark corners of ghc. We have not attempted to write a new anynomous records library, but instead consider the problems in isolation; on the other hand, the solutions we propose should be applicable in other contexts as well. Apart from section “Putting it all together”, the rest of this blog post can be understood without having read part 1.

Read more at well-typed.com

Blog post: Type-level sharing in Haskell, now

The lack of type-level sharing in Haskell is a long-standing problem. It means, for example, that this simple Haskell code

apBaseline :: Applicative f => (A -> B -> C -> r) -> f r
apBaseline f =
        pure f
    <*> pure A
    <*> pure B
    <*> pure C

results in core code that is quadratic in size, due to type arguments; in pseudo-Haskell, it will look something like

apBaseline :: Applicative f => (A -> B -> C -> r) -> f r
apBaseline f =
                          (pure f)
    <*> @A @(B -> C -> r) (pure A)
    <*> @B @(     C -> r) (pure B)
    <*> @C @(          r) (pure C)

Each application of (<*>) records both the type of the argument that we are applying, as well as the types of all remaining arguments. Since the latter is linear in the number of arguments, and we have a linear number of applications of <*>, the core size of this function becomes quadratic in the number of arguments.

We recently discovered a way to solve this problem, in ghc as it is today (tested with 8.8, 8.10, 9.0 and 9.2). In this blog post we will describe the approach, as well as introduce a new typechecker plugin, which makes the process more convenient.

Read more at well-typed.com

Blog post: New large-records release: now with 100% fewer quote

The large-records library provides support for large records in Haskell with much better compilation time performance than vanilla ghc does. Well-Typed and MonadFix are happy to announce a new release of this library, which avoids all Template Haskell or quasi-quote brackets. Example:

{-# ANN type User largeRecordLazy #-}
data User = MkUser {
      name   :: String
    , active :: Bool
    }
  deriving stock (Show, Eq)

instance ToJSON User where
  toJSON = gtoJSON

john :: User
john = MkUser { name = "john", active = True }

setInactive :: User -> User
setInactive u = u{active = False}

This makes for a nicer user experience and provides better integration with tooling (for example, better syntax highlighting, auto-formatting, and auto-completion). Importantly, avoiding Template Haskell also means we avoid the unnecessary recompilations that this incurs1, a significant benefit for a library aimed at improving compilation time.

Read more at well-typed.com

Blog post: large-anon: Practical scalable anonymous records for Haskell

The large-anon library provides support for anonymous records; that is, records that do not have to be declared up-front. For example, used as a plugin along with the record-dot-preprocessor plugin, it makes it possible to write code such as this:

magenta :: Record [ "red" := Double, "green" := Double, "blue" := Double ]
magenta = ANON { red = 1, green = 0, blue = 1 }

reduceRed :: RowHasField "red" r Double => Record r -> Record r
reduceRed c = c{red = c.red * 0.9}

The type signatures are not necessary; type inference works as aspected for these records. If you prefer to use lenses, that is also possible:

reduceBlue :: RowHasField "blue" r Double => Record r -> Record r
reduceBlue = over #blue (* 0.9)

The library offers a small but very expressive API, and it scales to large records (with 100 fields and beyond), with excellent compilation time performance and good runtime performance. In this blog post we will first present the library from a user’s perspective, then give an overview of the internals with an aim to better to understand the library’s runtime characteristics, and finally show some benchmarks. The library is available from Hackage and is currently compatible with ghc 8.8, 8.10 and 9.0 (extending this to 9.2 should not be too hard).

Read more at well-typed.com

Presentations (videos)

The fundamental problem solved by blockchain

Posted on January 9, 2021

There are many blog posts and descriptions of blockchain technology on the web, but I feel that very few of them get to the heart of the matter: what is the fundamental problem that blockchains solve? In this blog post I want to take a look at this question; this blog post is intended for a general audience and I will assume no technical knowledge. That does not mean however that we will need to dumb things down; we will see what the fundamental problem solved by blockchain is, and we will even see two alternative ways in which this problem is solved, known as proof-of-work (used by for example Bitcoin) and proof-of-stake (used by for example the Cardano blockchain).

Read more

Reconstructing the Ableton velocity compand curve in Max for Live

Posted on January 3, 2021

The Velocity effect in Ableton Live allows us to apply some transformations to the velocity of incoming MIDI notes.

In this brief blog post we will look at how we might recreate this velocity curve; specifically, the Comp part of the curve. The goal won’t necessarily be to recreate the curve exactly, but rather to have something that functions similarly.

Read more

Designing a custom Push2 instrument in Max for Live/JavaScript: Part 2

Posted on December 27, 2020

This is part 2 of the two-part series on designing custom Push instruments in Max for Live (using JavaScript). In Part 1 we explored the LiveAPI object object, and used it to construct an abstraction that allows us to monitor when the track that our instrument lives on is selected or deselected. In this part 2 we will get to the fun stuff, and actually develop our Push instrument.

Read more

Designing a custom Push2 instrument in Max for Live/JavaScript: Part 1

Posted on December 26, 2020

As any Ableton Push user knows, the layout of the Push is different when you are using a standard instrument and when you are using a drum rack. In this two-part tutorial we are going to explore how we can design our own instruments that use a completely custom Push layout, but yet integrate well with the normal Ableton work flow.

I have a short video on YouTube demonstrating what the instrument does and how it is used.

Read more

Programming in Max (a Haskeller's perspective)

Posted on November 22, 2020

Introduction

The programming language Max has been around since the mid 80s, as long as Haskell has. As a programming language it is quite interesting. Where Haskell takes purity to its logical extreme, Max does the opposite and takes stateful programming to its logical extreme. Where Haskell is entirely pull based (laziness drives everything, nothing is evaluated unless and until the result demands it), Max is entirely push based: computation is exclusively driven by incoming data.

Read more

(Video) Being lazy without being bloated

Posted on September 11, 2020

This presentation was part of the Munihac 2020.

Laziness is one of Haskell’s most distinctive features. It is one of the two features of functional programming that “Why Functional Programming Matters” identifies as key to modularity, but it is also one of the most frequently cited features of Haskell that programmers would perhaps like to change. One reason for this ambivalence is that laziness can give rise to space leaks, which can sometimes be fiendishly difficult to debug. In this talk we will present a new library called nothunks which can be used to test for the absence of unexpected thunks in long-lived data; when an unexpected thunk is found, a “stack trace” is returned identifying precisely where the thunk is (“the second coordinate of a pair in a map in a list in type T”). In combination with QuickCheck, this can be used to test that an API does not create any thunks when it shouldn’t and thunks that are created are easily identified and fixed. Whilst it doesn’t of course fix all space leaks, it can help avoid a significant proportion of space leaks due to excessive laziness.

Watch the presentation on YouTube

(Video) There is no fork: supporting an ever evolving blockchain

Posted on July 29, 2020

This presentation was part of the Cardano Virtual Summit 2020. It presents an overview of the consensus layer, emphasizes again its abstract nature (see also my previous blog post on that topic), and then focusses on the hard fork combinator: a way to combine different ledgers into a single blockchain without ever executing a traditional change-the-world hard fork.

Watch the presentation on YouTube
Watch the Q&A session on YouTube

The abstract nature of the Cardano consensus layer

Posted on May 28, 2020

The Cardano consensus layer has two important responsibilities:

It runs the blockchain consensus protocol. In the context of a blockchain, consensus—that is, ‘majority of opinion’—means everyone involved in running the blockchain agrees on what the one true chain is. This means that the consensus layer is in charge of adopting blocks, choosing between competing chains if there are any, and deciding when to produce blocks of its own. It is responsible for maintaining all the state required to make these decisions. To decide whether or not to adopt a block, the protocol needs to validate that block with respect to the state of the ledger. If it decides to switch to a different chain (a different tine of a fork in the chain), it must keep enough history to be able to reconstruct the ledger state on that chain. To be able to produce blocks, it must maintain a mempool of transactions to be inserted into those blocks.

The consensus layer mediates between the network layer below it, which deals with concerns such as communication protocols and peer selection, and the ledger layer above it, which specifies what the state of the ledger looks like and how it must be updated with each new block. The ledger layer is entirely stateless, and consists exclusively of pure functions. In turn, the consensus layer does not need to know the exact nature of the ledger state, or indeed the contents of the blocks (apart from some header fields required to run the consensus protocol).

Read more at iohk.io

Integrated versus Manual Shrinking

Posted on May 13, 2019

Property-based testing is an approach to software testing where instead of writing tests we generate tests, based on properties that the software should have. To make this work, we need to be able to generate test data and, when we find a counter example, we need to shrink that test data to attempt to construct a minimal test case.

In Haskell, the library QuickCheck has long been the library of choice for property based testing, but recently another library called Hedgehog has been gaining popularity. One of the key differences between these two libraries is that in QuickCheck one writes explicit generation and shrinking functions, whereas in Hedgehog shrinking is integrated in generation. In this blog post we will explain what that means by developing a mini-QuickCheck and mini-Hedgehog and compare the two. We will consider some examples where integrated shrinking gives us benefits over manual shrinking, but we we will also see that the belief that integrated shrinking basically means that we can forget about shrinking altogether is not justified. There is no such thing as a free shrinker.

Read more at well-typed.com

An in-depth look at quickcheck-state-machine

Posted on January 23, 2019

Stateful APIs are everywhere: file systems, databases, widget libraries, the list goes on. Automated testing of such APIs requires generating sequences of API calls, and when we find a failing test, ideally shrinking such a sequence to a minimal test case. Neither the generation nor the shrinking of such sequences is trivial. After all, it is the very nature of stateful systems that later calls may depend on earlier calls: we can only add rows to a database table after we create it, we can only write to a file after we open it, etc. Such dependencies need to be tracked carefully. Moreover, in order to verify the responses we get back from the system, the test needs to maintain some kind of internal representation of what it thinks the internal state of the system is: when we read from a file, we need to know what was in the file in order to be able to verify if the response was correct or not.

In this blog post we will take an in-depth look at quickcheck-state-machine, a library for testing stateful code. Our running example will be the development of a simple mock file system that should behave identically to a real file system. Although simple, the example will be large enough to give us an opportunity to discuss how we can verify that our generator is producing all test cases we need, and how we can inspect whether the shrinker is doing a good job; in both cases, test case labelling will turn out to be essential. Throughout we will also discuss design patterns for quickcheck-state-machine tests which improve separation of concerns and reduce duplication. It should probably be pointed out that this is an opinionated piece: there are other ways to set things up than we present here.

Read more at well-typed.com

Compositional zooming for StateT and ReaderT using lens

Posted on September 4, 2018

Consider writing updates in a state monad where the state contains deeply nested structures. As our running example we will consider a state containing multiple “wallets”, where each wallet has multiple “accounts”, and each account has multiple “addresses”. Suppose we want to write an update that changes one of the fields in a particular address. If the address cannot be found, we want a precise error message that distinguishes between the address itself not being found, or one of its parents (the account, or the wallet) not being found. In this blog post we will show how we can develop some composable abstractions that will help with writing code like this in a concise manner.

Read more at well-typed.com

(Video) The Cardano Wallet

Posted on July 18, 2018

This is a recording of a presentation I gave for the Cardano foundation at the first Cardano meetup in the Netherlands. The presentation was aimed at a general audience, and assumed no prior knowledge. The abstract was:

A cryptocurrency wallet is the primary means by which end users from private individuals to large exchange nodes interact with a blockchain. Its primary function is to keep track of the user’s unspent outputs and enable them to submit new transactions into the network. As part of the redesign of the Cardano wallet we have written a detailed mathematical specification, which we use for development and testing. In addition, we have done extensive research into “coin selection”, the process of choosing unspent outputs from the user’s wallet to fulfill a payment request. In this presentation we will give a high-level description of both of these strands of work.

Watch the presentation on YouTube

Self Organisation in Coin Selection

Posted on July 3, 2018

The term “self organisation” refers to the emergence of complex behaviour (typically in biological systems) from simple rules and random fluctuations. In this blog post we will see how we can take advantage of self organisation to design a simple yet effective coin selection algorithm.

Coin selection is the process of selecting unspent outputs in a user’s wallet to satisfy a particular payment request (for a recap of UTxO style accounting, see section “Background: UTxO-style Accounting” of my previous blog post). As Jameson Lopp points out in his blog post The Challenges of Optimizing Unspent Output Selection, coin selection is thorny problem. Moreover, there is a surprising lack of academic publications on the topic; indeed, the only scientific study of coin selection appears to be Mark Erhardt’s masters thesis An Evaluation of Coin Selection Strategies.

Read more at iohk.io

Semi-Formal Development: The Cardano Wallet

Posted on May 31, 2018

As part of our consulting work for IOHK, Well-Typed have been working with IOHK on the design and implementation of the new version of the Cardano cryptocurrency wallet. As a crucial component of this process, we have written a semi-formal specification of the wallet: a mathematical model of the wallet along with invariants and lemmas about how it behaves.

We refer to this specification as “semi-formal” because while it states many of the wallet’s properties, and proves some of them, it by no means proves all of them. As we will see, however, we can use QuickCheck to test such properties, producing counter-examples where they fail to hold. Not only is this an invaluable tool during the development of the specification itself, it also gives us a very principled way of testing the real implementation, even if later we do prove all the remaining properties as well.

Read more at well-typed.com

Object Oriented Programming in Haskell

Posted on March 8, 2018

In object oriented programming an object is a value with a well-defined interface. The internal state of the object is closed to the outside world (encapsulation), but the behaviour of an object can be modified by redefining one or more of the functions in the object’s interface, typically through subclasses (inheritance). The internal state of the object is open to, and can be extended by, such subclasses. This is known as the open/closed principle.

In this blog post we will explain what open recursion is, provide some runnable examples that you can evaluate one step at a time to develop a better intuition for how it works, and show how it can be used to solve the problem above. These results are not new, but we hope that this blog post can serve to make them a bit more accessible to a larger audience. We then discuss how we can extend this approach using lenses to allow “inheritance” to also extend the “state” of the object.

Read more at well-typed.com

Writing a High-Assurance Blockchain Implementation

Posted on November 2, 2017

In his blog post Cryptocurrencies need a safeguard to prevent another DAO disaster, Duncan Coutts discussed the need for high assurance development of cryptocurrencies and their underlying blockchain protocols. He also sketched one way in which one might go about this, but did not give much detail. In this follow-up blog post I delve into computer science theory a bit more, and in particular will take a closer look at process algebras. I will have no choice but omit a lot of detail still, but hopefully we can develop some intuition and provide a starting point for reading more.

Read more at iohk.io

(Video) Static Pointers, Closures and Polymorphism

Posted on October 12, 2017

A static pointer can be thought of as the address of symbol in a program. Their primary use lies in the fact that they can always be serialized and hence sent across a network; thus, while we cannot send a Haskell function over a network, we /can/ send a static pointer to that function. This makes them very useful in distributed frameworks such as Cloud Haskell (an library for Erlang-like distributed programs) and Sparkle (Haskell bindings to the Apache Spark cluster computing framework). In this talk, you will learn how static pointers work, what kind of infrastructure we can define on top to make working with static pointers more practical, and outline some use cases.

Watch the presentation at skillsmatter.com

Visualizing lazy evaluation

Posted on September 18, 2017

Haskell and other call-by-need languages use lazy evaluation as their default evaluation strategy. For beginners and advanced programmers alike this can sometimes be confusing. We therefore developed a tool called visualize-cbn. It is a simple interpreter for a mini Haskell-like language which outputs the state of the program at every step in a human readable format. It can also generate a HTML/JavaScript version with “Previous” and “Next” buttons to step through a program. We released the tool as open source to github and Hackage, in the hope that it will be useful to others. In this blog post we will illustrate how one might take advantage of this tool. We will revisit the infamous triple of functions foldr, foldl, foldl', and show how they behave. As a slightly more advanced example, we will also study the memory behaviour of mapM in the Maybe monad. Hopefully, this show-rather-than-tell blog post might help some people understand these functions better.

Read more at well-typed.com

Binary instances for GADTs (or: RTTI in Haskell)

Posted on June 15, 2017

In this blog post we consider the problem of defining Binary instances for GADTs such as

data Val :: * -> * where
  VI :: Int    -> Val Int
  VD :: Double -> Val Double

Read more at well-typed.com

Linearity, Uniqueness, and Haskell

Posted on January 8, 2017

There is a group of people consisting of Tweag I/O, ghc head-quarters and JP Bernardy working on a proposal for adding linear types to Haskell. In this blog post I will attempt to put this proposal in some context. I will explain what linearity is and how it relates to its close cousin uniqueness typing, and I will compare the proposal to some other ways of doing things. My hope is that this will make the work more accessible.

Read more

Sharing, Memory Leaks, and Conduit and friends

Posted on September 29, 2016

TL;DR: Sharing conduit values leads to memory leaks. Make sure to disable the full laziness optimization in the module with your top-level calls to runConduit or ($$) (skip to the end of the conclusion for some details on how to do this). Similar considerations apply to other streaming libraries and indeed any Haskell code that uses lazy data structures to drive computation.

Read more at well-typed.com

Efficient Amortised and Real-Time Queues in Haskell

Posted on January 15, 2016

A queue is a datastructure that provides efficient—O(1)—operations to remove an element from the front of the queue and to insert an element at the rear of the queue. In this blog post we will discuss how we can take advantage of laziness to implement such queues in Haskell, both with amortised and with worst-case O(1) bounds.

The results in this blog post are not new, and can be found in Chris Okasaki’s book “Purely Functional Data Structures”. However, the implementation and presentation here is different from Okasaki’s. In particular, the technique we use for real-time datastructures is more explicit and should scale to datastructures other than queues more easily than Okasaki’s.

Read more at well-typed.com

Dependently typed servers in Servant

Posted on December 2, 2015

Suppose we have a webserver that can perform different kinds of operations on different kinds of values. Perhaps it can reverse or capitalize strings, and increment or negate integers. Moreover, it can echo back any value. However, it does not make sense to try to reverse an integer or increment a string.

This is an example of a dependently typed server: the value that we are given as input determines the type of the rest of the server. If we get a string as input, we expect a string operation as the second argument and the response is also a string; if we get an integer as input, we expect an integer operation as the second argument and the response is also an integer.

Read more at well-typed.com

Lightweight Checked Exceptions in Haskell

Posted on July 31, 2015

Consider this function from the http-conduit library:

-- | Download the specified URL (..)
--
-- This function will 'throwIO' an 'HttpException' for (..)
simpleHttp :: MonadIO m => String -> m ByteString

Notice that part of the semantics of this function—that it may throw an HttpException—is encoded in a comment, which the compiler cannot check. Wouldn’t it be much nicer if we could simply express in the type that simpleHttp may throw a HttpException? In this blog post I will propose a very lightweight scheme to do precisely that.

Read more at well-typed.com

Parametricity Tutorial

Posted on May 23, 2015

A powerful feature of Haskell’s type system is that we can deduce properties of functions by looking only at their type. For example, a function of type

f :: ∀a. a -> a can only be the identity function: since it must return something of type a, for any type a, the only thing it can do is return the argument of type a that it was given (or crash).

To reason about the properties of functions based only on their types we make use of the theory of parametricity, which tells how to derive a so-called “free theorem” from a type. This blog post is a tutorial on how to do this; it won’t explain why the theory works, only how it works. A Haskell practitioner’s guide to parametricity, if you will.

Part 1: Constant types, functions and polymorphism
Part 2: Type constructors and type classes

Qualified Goals in the Cabal Solver

Posted on March 27, 2015

When you ask cabal-install to install one or more packages, it needs to solve a constraint satisfaction problem: select a version for each of the packages you want to install, plus all their dependencies, such that all version constraints are satisfied. Those version constraints come both from the user (“please install lens version 4”) and from the packages themselves (“this package relies on mtl between version 2.1 and 2.2”). Internally cabal-install uses a modular constraint solver, written by Andres Löh. It was first presented at the Haskell Implementor’s Workshop in 2011, and is based on the paper Modular lazy search for Constraint Satisfaction Problems.

For the Industrial Haskell Group, Well-Typed has recently extended the constraint solver to deal with qualified goals. In this blog post we will explain how this constraint solver works by means of a running example, and look at how the new qualified goals change things.

Read more at well-typed.com

Comprehensive Haskell Sandboxes, Revisited

Posted on March 9, 2015 (last updated April 2, 2015)

For the last few years I’ve been using a system of symlinks as Haskell sandboxes. This worked well, but has one important drawback: symlinks are basically user-global state, and hence it is not possible to have multiple sandboxes active at the same time. However, as Haskell tools are getting better and better we can actually simplify the entire setup dramatically. Here’s a summary of my new approach:

Read more

Switching from Vim to Atom
(A Haskeller's Perspective)

Posted on March 7, 2015 (last updated Aug 23, 2016)

I have been a Vim user for many many years, but have long wanted to switch to an editor that was more easily hackable. I tried Emacs and Sublime, but both had various hurdles that were just too annoying for me and both times I switched back to vim very quickly. However, I started using Atom a while back, and I have to say, I liked it from the start. I very quickly felt almost as productive as I was in vim, and in some ways I already much prefer Atom. (And since it’s JavaScript, we can write plugins in Haskell!)

Below are some tips that helped me transition from Vim to Atom.

Read more

Writing Atom plugins in Haskell using ghcjs

Posted on February 14, 2015

Atom is an editor developed by the people behind GitHub. Written in JavaScript, it is expressly designed to be hackable, and hackable it is; I find writing Atom plugins a rather pleasant experience, even though this means writing CoffeeScript. Nonetheless, one of the things that initially attracted me to Atom is the existence of ghcjs, and hence the possibility of writing Atom plugins in Haskell. Why bother? Well, Haskell versus JavaScript, need I say more? Moreover, it means we can use almost any existing Haskell library in our Atom plugins; for instance, my Cabal extension calls into the Cabal library to parse .cabal files.

Read more

Monads: From Web 2.0 to Hardware Drivers

Posted on February 3, 2015

Monads are often considered to be a stumbling block for learning Haskell. Somehow they are thought of as scary and hard to understand. They are however nothing more than a design pattern, and a rather simple one at that. Monads give us a standardized way to glue a particular kind of computations together, passing the result of each computation to the next. As such they are useful well beyond functional programming.

Read more at well-typed.com

Core JavaScript Semantics

Posted on January 17, 2015

This post is a summary of the core JavaScript semantics, based on the papers The Essence of JavaScript and A Tested Semantics for Getters, Setters, and Eval in JavaScript from the semantics of JavaScript project at Brown University. Note that throughout this post “reference” has a very precise meaning as a pointer to a memory location; think ML-style references (or, if you must, C++-style references).

Read more

Simple SMT solver for use in an optimizing compiler

Posted on December 31, 2014

This is a second blog post in a series about engineering optimizing compilers; the previous was Quasi-quoting DSLs for free. In this blog we will see how we might write a very simple SMT solver that will allow us to write an optimization pass that can turn something like

if a == 0 then
  if !(a == 0) && b == 1 then
    write 1
  else
    write 2
else
  write 3

into

if a == 0 then
  write 2
else
  write 3

without much effort at all. Read more at well-typed.com

Quasi-quoting DSLs for free

Posted on October 19, 2014

Suppose you are writing a compiler for some programming language or DSL. If you are doing source to source transformations in your compiler, perhaps as part of an optimization pass, you will need to construct and deconstruct bits of abstract syntax. It would be very convenient if we could write that abstract syntax using the syntax of your language. In this blog post we show how you can reuse your existing compiler infrastructure to make this possible by writing a quasi-quoter with support for metavariables. As we will see, a key insight is that we can reuse object variables as meta variables.

Read more at well-typed.com

Haskell CPP macros

Posted on September 13, 2014

Listing these here mostly because whenever I need to use them I seem to be unable to find them and have to dig for them in previous code I wrote.

Checking for minimal package versions

This relies on macros generated by Cabal. Example:

#if MIN_VERSION_process(1,2,0)
    -- something
#else
    -- something else
#endif

Read more

Dealing with Asynchronous Exceptions during Resource Acquisition

Posted on August 28, 2014

Consider the following code: we open a socket, compute with it, and finally close the socket again. The computation happens inside an exception handler (try), so even when an exception happens we still close the socket:

example1 :: (Socket -> IO a) -> IO a 
example1 compute = do -- WRONG
  s <- openSocket 
  r <- try $ compute s
  closeSocket s
  case r of
    Left ex -> throwIO (ex :: SomeException)
    Right a -> return a

Although this code correctly deals with synchronous exceptions–exceptions that are the direct result of the execution of the program–it does not deal correctly with asynchronous exceptions–exceptions that are raised as the result of an external event, such as a signal from another thread.

Read more at well-typed.com

Debugging Haskell at assembly level
by scripting lldb in Python

Posted on August 1, 2014

Haskell programmers tend to spend far less time with debuggers than programmers in other languages. Partly this is because for pure code debuggers are of limited value anyway, and Haskell’s type system is so strong that a lot of bugs are caught at compile time rather than at runtime. Moreover, Haskell is a managed language – like Java, say – and errors are turned into exceptions. Unlike in unmanaged languages such as C “true” runtime errors such as segmentation faults almost never happen.

Read more at well-typed.com

Understanding the RealWorld

Posted on June 12, 2014

In my previous blog post Understanding the Stack I mentioned that “RealWorld tokens disappear” in the generated code. This is not entirely accurate. Simon Marlow gives a very brief explanation of this on the old glasgow-haskell-users mailing list. In this blog post we will explore this in more depth and hopefully come to a better understanding of the RealWorld, though perhaps not of the real world.

Read more at well-typed.com

Understanding the Stack

Posted on May 21, 2014

One of our clients sent us a bug report, which consisted of a 700 line Haskell program with a complaint that “it deadlocks”. After studying the code I concluded that, for the sake of the bug report, it could be summarized as

print a bunch of stuff, then crash with a stack overflow

So I wanted to replace the original code with code that did just that: print a bunch of stuff, then crash with a stack overflow. No big deal:

Read more at well-typed.com

Haskell (including GTK) on Mavericks

Posted on April 27, 2014

This post is out of date. See Comprehensive Haskell Sandboxes, Revisited instead.

Installing GTK

Although the GTK libraries can be installed through fink, macports, or brew, I’ve never had much luck with any of them. I recently tried installing them through brew and although they appeared to install ok, as soon as I tried to run a Haskell GTK application it segfaulted with a null pointer exception somewhere in libcairo. I therefore prefer to install from source. This is a fairly painless process; some hints are in my Comprehensive Haskell Sandboxes post. You can follow those instructions or just follow the instructions on the GTK+ website. The only important thing is that you need set set LDFLAGS so that it can find -lpython:

export LDFLAGS=-L/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/config

Read more

Pointwise Lenses

Posted on April 24, 2014

Lenses are a current hot topic in the Haskell community, with a bunch of packages providing implementations (data-accessor, fclabels, lens, amongst others). Although we will recall definitions, this post is not meant as an introduction to lenses. If you have not worked with lenses before, the talk from Simon Peyton Jones or the blog post by Sebastiaan Visser about fclabels are good starting points.

In this blog post we will propose a generalization to the lens representation used in fclabels and in many other packages (with various minor variations); we will consider the relation to the representation used in lens in a separate section.

Read more at well-typed.com

Performance profiling with ghc-events-analyze

Posted on February 12, 2014

ghc-events-analyze is a new simple Haskell profiling tool that uses GHC’s eventlog system. It helps with some profiling use cases that are not covered by the existing GHC profiling modes or tools. It has two major features:

Read more at well-typed.com

sendmail (postfix) on OSX

Posted on October 9, 2013

Scenario: you want to be able to run sendmail from the command line and have it deliver mail to mail servers other than localhost. It’s not particularly difficult once you know what to do but it was a bit of a pain to figure out. So, mostly for my own benefit:

  1. Fix configuration and permissions
  2. Set up SASL authentication for your favourite SMTP server
  3. Tell postfix which users are local (none)

(Video) Lazy I/O and Alternatives in Haskell

Posted on July 10, 2013

Haskell is one of the few programming languages that use lazy evaluation: computations get performed only when their result is demanded. It is however not so straightforward to combine lazy evaluation with side effects, such as reading from a file.

Lazy I/O is the standard solution to this in the current base libraries, but has its problems. We explain how lazy I/O works, what the problems are, and explore some of the solutions that are being proposed at the moment: iteratees, pipes, stream I/O, etc.

Watch the presentation at skillsmatter.com

The darker corners of throwTo

Posted on June 11, 2013

We will assume

import System.IO.Unsafe (unsafePerformIO, unsafeInterleaveIO)
import System.IO.Error (catchIOError)
import Control.Concurrent (throwTo, myThreadId, threadDelay)
import Control.Concurrent.Async (async, cancel, wait)
import qualified Control.Exception as Ex

in the remainder of this post.

Whatever work the target thread was doing when the exception was raised is not lost: the computation is suspended until required by another thread.

We can observe this directly when using unsafePerformIO :

Read more

Brief Intro to Quasi-Quotation (Show me the types already)

Posted on May 9, 2013

This posts serves as a brief introduction to quasi-quotation in Haskell, because I couldn’t make much sense of the existing documentation, which gets bogged down in unnecessary details about writing parsers, the use of SYB, etc. Also, although quasi-quotation borrows syntax from Template Haskell, the TH expression [|e|] and the QQ [qq|e|] are false friends at best.

Read more

Comprehensive Haskell Sandboxes

Posted on February 10, 2013

For an alternative approach, see Comprehensive Haskell Sandboxes, Revisited instead.

Serious Haskell developers will want to test their packages against different versions of ghc, different versions of libraries, etc. It is therefore useful to have multiple Hackage development environments, or sandboxes. A comprensive sandbox contains:

Moreover, good sandboxes are

  1. isolated from each other: only the active sandbox can be modified
  2. relocatable: it should be possible to copy the sandbox ~/env/ghc762 to ~/env/client1 to use as the basis for a new sandbox, or to copy the entire set of sandboxes (~/env) from one machine to another (suitably similar) machine (this means that paths in config files or baked into executables should never mention the name of the sandbox)
  3. self-contained: there should be no need for a system wide install of ghc
  4. transparent: we should be able to use the standard tools (ghc, cabal) as they are

There are existing tools for Haskell sandboxes (such as cabal-dev or hsenv), but none satisfy all of the above requirements (to the best of my knowledge). However, it is easy enough to set up our own.

Read more

Performance problems with -threaded?

Posted on February 6, 2013

Consider the following two implementations of a very simple ping/pong server and client, one written in C and one written in Haskell:

Neither of them has been particularly optimized, that’s not the point of this post. The point of this post will become evident when we measure the latency for the following three programs:

  1. The C program compiled with -O2
  2. The Haskell program compiled with -O2
  3. The Haskell program compiled with -O2 and -threaded

Read more

ThreadScope 0.2.2 binary for Mac OS X 10.8.2

Posted on January 24, 2013

This is an application bundle which should hopefully Just Work without any additional libraries (in particular, you should not have to install any GTK libraries).

threadscope-0.2.2.dmg

Communication Patterns in Cloud Haskell (Part 4)

Posted on October 15, 2012

K-Means

In Part 3 of this series we showed how to write a simple distributed implementation of Map-Reduce using Cloud Haskell. In this final part of the series we will explain the K-Means algorithm and show how it can be implemented in terms of Map-Reduce.

K-Means is an algorithm to partition a set of points into n clusters. The algorithm iterates the following two steps for a fixed number of times (or until convergence):

Read more at well-typed.com

Communication Patterns in Cloud Haskell (Part 3)

Posted on October 12, 2012

Map-Reduce

In Part 1 and Part 2 of this series we described a number of ways in which we might compute the number of prime factors of a set of numbers in a distributed manner. Abstractly, these examples can be described as

a problem (computing the number of prime factors of 100 natural numbers) is split into subproblems (factorizing each number), those subproblems are solved by slave nodes, and the partial results are then combined (summing the number of factors)

Read more at well-typed.com

Communication Patterns in Cloud Haskell (Part 2)

Posted on October 8, 2012

Performance

In Part 1 of this series we introduced a number of basic Cloud Haskell communication patterns. All these patterns will work fine for modest numbers of work items; in this blog post we’re going to look at the performance and memory use for much larger numbers of work items, so we can see how each pattern scales up. We will see that scalability improves as we move from master-slave to work-pushing to work-stealing, and that many further improvements can still be made. More importantly, we hope to help you to predict runtime behaviour from your code, so that you can write more scalable and high-performance code.

Read more at well-typed.com

Communication Patterns in Cloud Haskell (Part 1)

Posted on October 5, 2012

Master-Slave, Work-Stealing and Work-Pushing

In this series (2, 3, 4) of blog posts we will describe a number of basic communication patterns in Cloud Haskell. We don’t assume much familiarity with Cloud Haskell, but it will probably be useful to be familiar with the basics; Towards Haskell in the Cloud (Epstein, Black and Peyton Jones, Haskell Symposium 2011) is a good starting point. We will start simple but we will finish with some more advanced techniques.

Read more at well-typed.com

Memory Layout for Multiple and Virtual Inheritance

Posted on January 1, 2006

In this article we explain the object layout implemented by gcc for multiple and virtual inheritance. Although in an ideal world C++ programmers should not need to know these details of the compiler internals, unfortunately the way multiple (and especially virtual) inheritance is implemented has various non-obvious consequences for writing C++ code (in particular, for downcasting pointers, using pointers to pointers, and the invocation order of constructors for virtual bases). If you understand how multiple inheritance is implemented, you will be able anticipate these consequences and deal with them in your code. Also, it is useful to understand the cost of using virtual inheritance if you care about efficiency. Finally, it is interesting :-)

Read more at phpcompiler.org

DLX/MIPS tutorial

Posted on May 19, 2005

This is a tutorial about the MIPS microprocessor, that I wrote for Prof Jeremy Jones at Trinity College, Dublin. The original tutorial can be found on his website, and if you are running Windows I recommend you go there instead because all the examples are in fact interactive. Unfortunately this does not work on Linux or OSX, however, so I’ve mirrored the tutorial here and created YouTube videos for all the examples.

Read more