Summary of the Haskell Conduit package
As part of my PhD I'm developing a distributed graph database in Haskell, Conduits is used heavily for IO (Network and File) and where ever else it's appropriate. So here's a quick summary of what it is, it's main concepts and primitives/operations. This is largely a re-hash of Michael Snoyman post but only the most important details are included. I'll do more posts later with concrete code examples, this one is all about the concepts.
What is it?
Conduit is a library which provides a solution for streaming data. It uses a pipeline like mechanism to allow the production, transformation and consumption of streams in constant memory. There are modules for File IO, Network IO and parsing structured data with attoparsec.
3 Main components
SourceProduces a stream of data values and sends them downstream.
SinkConsumes a stream of data values from upstream and produces a return value.
ConduitConsumers a stream from upstream and produces a new stream of the same type and send it downstream. In other words, a conduit is a way of modifying a stream on demand, before it is consumed.
1 Connect operator
$$__ is the connect operator which connects a
Sink. It feeds the values from the
Sinkto produce a final result.
3 Fusion operators
Fusion is the combination of two components to form a new one.
=$__ takes two components and generates a new one. For example it can fuse a
Sinkinto a new
Sinkwhich consumes the same values as the original
Conduitand produce the same result as the original
$=__ combines a
Conduitinto a new
=$=__ combines two
Conduits into a __new
awaittakes a single value from upstream, if available
yieldsends a single value downstream
leftover__ puts a single value back in the upstream queue for it to be read by the next call to
Imagine conduits as being a pipeline where by data starts at the source, possibly moves to one or more conduits and are consumed by a sink as in,
The pipeline is driven by the
Sink. When the sink requests data with await, it pauses until input is available from upstream. The
Source will be woken up and asked to produce more output for downstream, it effectively goes back to sleep after producing a value until the Sink requests another value. When the Sink completes, the entire pipeline terminates, causing resources to be freed.
Misc / TODO
Conduit comes with a way to provide exception safety, this will be covered in another post. Similarly, conduit has the notion of a resumable source, this is a source that's been run partially but can be continued by reconnecting it to another sink.
blog comments powered by Disqus