Summary of the Haskell Conduit package
Conduits
As part of my PhD I'm developing a distributed graph database in Haskell, Conduits is used heavily for IO (Network and File) and where ever else it's appropriate. So here's a quick summary of what it is, it's main concepts and primitives/operations. This is largely a re-hash of Michael Snoyman post but only the most important details are included. I'll do more posts later with concrete code examples, this one is all about the concepts.
What is it?
Conduit is a library which provides a solution for streaming data. It uses a pipeline like mechanism to allow the production, transformation and consumption of streams in constant memory. There are modules for File IO, Network IO and parsing structured data with attoparsec.
3 Main components
Source
Produces a stream of data values and sends them downstream.Sink
Consumes a stream of data values from upstream and produces a return value.Conduit
Consumers a stream from upstream and produces a new stream of the same type and send it downstream. In other words, a conduit is a way of modifying a stream on demand, before it is consumed.
1 Connect operator
- __
$$
__ is the connect operator which connects aSource
to aSink
. It feeds the values from theSource
into theSink
to produce a final result.
3 Fusion operators
Fusion is the combination of two components to form a new one.
__
=$
__ takes two components and generates a new one. For example it can fuse aConduit
and aSink
into a newSink
which consumes the same values as the originalConduit
and produce the same result as the originalSink
.__
$=
__ combines aSource
and aConduit
into a newSource
.=$=
__ combines twoConduit
s into a __newConduit
3 Operations
await
takes a single value from upstream, if availableyield
sends a single value downstream__
leftover
__ puts a single value back in the upstream queue for it to be read by the next call toawait
.
Life cycle
Imagine conduits as being a pipeline where by data starts at the source, possibly moves to one or more conduits and are consumed by a sink as in, Source
=> Conduit
=> Sink
.
The pipeline is driven by the Sink
. When the sink requests data with await, it pauses until input is available from upstream. The Source
will be woken up and asked to produce more output for downstream, it effectively goes back to sleep after producing a value until the Sink requests another value. When the Sink completes, the entire pipeline terminates, causing resources to be freed.
Misc / TODO
Conduit comes with a way to provide exception safety, this will be covered in another post. Similarly, conduit has the notion of a resumable source, this is a source that's been run partially but can be continued by reconnecting it to another sink.
blog comments powered by Disqus