You may have remarked that Play2 provides an intriguing feature called
Iteratee (and its counterparts
The main aim of this article is (to try) to make the
Iteratee concept understandable for most of us with reasonably simple arguments and without functional/math theory.
This article is not meant to explain everything about
Enumerateebut just the ideas behind it.
I’ll try to write another article to show practical samples of coding with Iteratee/Enumerator/Enumeratee.
In Play2 doc,
Iterateesare presented as a great tool to handle data streams reactively in a non-blocking, generic & composable way for modern web programming in distributed environments.
Seems great, isn’t it?
But what is an
What is the difference between the
Iterateeand the classic
Iteratoryou certainly know?
Why use it? In which cases?
A bit obscure and complex, isn’t it?
If you are lazy and want to know just a few things
Iterateeis an abstraction of iteration over chunks of data in a non-blocking and asynchronous way
Iterateeis able to consume chunks of data of a given type from a producer generating data chunks of the same type called
Iterateecan compute a progressive result from data chunks over steps of iteration (an incremented total for ex)
Iterateeis a thread-safe and immutable object that can be re-used over several `Enumerators`
1st advice: DO NOT SEARCH ABOUT ITERATEES ON GOOGLE
When you search on Google for
Iteratee, you find very obscure explanations based on pure functional approach or even mathematical theories. Even the documentation on Play Framework (there) explains
Iteratee with a fairly low-level approach which might be hard for beginners…
As a beginner in Play2, it might seem a bit tough to handle
Iteratee concept presented in a really abstract way of manipulating data chunks.
It might seem so complicated that you will occult it and won’t use it.
It would be a shame because Iteratees are so powerful and provide a really interesting and new way to manipulate your data flows in a web app.
So, let’s try to explain things in a simple way. I don’t pretend to be a theoretical expert on those functional concepts and I may even say wrong things but I want to write an article that reflects what
Iteratee means for me. Hope this could be useful to somebody…
This article uses Scala for code samples but the code should be understandable by anyone having a few notions of coding and I promise not to use any weird operator (but in last paragraph) ><> ><> ><> ><>
The code samples are based on incoming Play2.1 master code which greatly simplifies and rationalizes the code of
Iteratee. So don’t be surprised if API doesn’t look like Play2.0.x sometimes
Before diving into deep
Iteratee sea, I want to clarify what I call iteration and to try to go progressively from the concept of
You may know the
Iterator concept you can find in Java. An
Iterator allows to run over a collection of elements and then do something at each step of iteration. Let’s begin with a very simple iteration in Java classic way that sums all the integers of a
1 2 3 4 5 6 7 8 9 10
Without any surprise, iterating over a collection means :
- Get an iterator from the collection,
- Get an element from the iterator (if there are any),
- Do something : here add the element value to the total,
- If there are other elements, go to the next element,
- Do it again,
- Etc… till there are no more element to consume in the iterator
- A state of iteration (is iteration finished ? This is naturally linked to the fact that there are more elements or not in the iterator?)
- A context updated from one step to the next (the total)
- An action updating the context
It’s a bit better because you don’t have to use the iterator.
Here we introduce
List.foreach function accepting an anonymous function
(Int => Unit) as a parameter and iterating over the list: for each element in the list, it calls the function which can update the context (here the total).
The anonymous function contains the action executed at each loop while iterating over the collection.
The anonymous function could be stored in a variable so that it can be reused in different places.
1 2 3 4 5 6 7 8
You should then say to me: “This is ugly design, your function has side-effects and uses a variable which is not a nice design at all and you even have to reset total to 0 at second call!”
That’s completely true:
Side-effect functions are quite dangerous because they change the state of something that is external to the function. This state is not exclusive to the function and can be changed by other entities, potentially in other threads. Function with side-effects are not recommended to have clean and robust designs and functional languages such as Scala tend to reduce side-effects functions to the strict necessary (IO operations for ex).
Mutable variables are also risky because if your code is run over several threads, if 2 threads try to change the value of the variable, who wins? In this case, you need synchronization which means blocking threads while writing the variable which means breaking one of the reason of being of Play2 (non-blocking web apps)…
1 2 3 4 5 6 7 8 9 10 11 12 13
A bit more code, isn’t it ?
But please notice at least:
stepfunction is the action executed at each step of iteration but it does something more than before:
stepalso manages the state of the iteration. It executes as following:
- If the list is empty, return current
- If the list has 1 element, return
total + elt
- If the list has more than 1 element, calls
stepwith the tail elements and the new total
total + head
- If the list is empty, return current
So at each step of iteration, depending on the result of previous iteration,
step can choose between 2 states:
- Continue iteration because it has more elements
- Stop iteration because it reached end of list or no element at all
Notice also that :
stepis a tail-recursive function (doesn’t unfold the full call stack at the end of recursion and returns immediately) preventing from stack overflow and behaving almost like the previous code with
steptransmits the remaining elements of the list & the new total to the next step
stepreturns the total without any side-effects at all
So, yes, this code consumes a bit more memory because it re-copies some parts of the list at each step (only the references to the elements) but it has no side-effect and uses only immutable data structures. This makes it very robust and distributable without any problem.
Notice you can write the code in a very shorter way using the wonderful functions provided by Scala collections:
In this article, I consider iteration based on immutable structures propagated over steps. From this point of view, iterating involves:
- receiving information from the previous step: context & state
- getting current/remaining element(s)
- computing a new state & context from remaining elements
- propagating the new state & context to next step
Now that we are clear about iteration, let’s go back to our
Imagine you want to generalize the previous iteration mechanism and be able to write something like:
1 2 3 4 5 6
Yes I know, with Scala collection APIs, you can do many things :)
Imagine you want to compose a first iteration with another one:
1 2 3
Imagine you want to apply this iteration on something else than a collection:
- a stream of data produced progressively by a file, a network connection, a database connection,
- a data flow generated by an algorithm,
- a data flow from an asynchronous data producer such as a scheduler or an actor.
Iteratees are exactly meant for this…
Just to tease, here is how you would write the previous sum iteration with an
Ok, it looks like the previous code and doesn’t seem to do much more…
Not false but trust me, it can do much more.
At least, it doesn’t seem so complicated?
But, as you can see,
Iteratee is used with
Enumerator and both concepts are tightly related.
Now let’s dive into those concepts on a step by step approach.
Enumerator is a more generic concept than collections or arrays
Till now, we have used collections in our iterations. But as explained before, we could iterate over something more generic, simply being able to produce simple chunks of data available immediately or asynchronously in the future.
Enumerator is designed for this purpose.
A few examples of simple Enumerators:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Enumerator is a PRODUCER of statically typed chunks of data.
Enumerator[E] produces chunks of data of type
E and can be of the 3 following kinds:
Input[E]is a chunk of data of type E : for ex, Input[Pizza] is a chunk of Pizza.
Input.Emptymeans the enumerator is empty : for ex, an Enumerator streaming an empty file.
Input.EOFmeans the enumerator has reached its end : for ex, Enumerator streaming a file and reaching the end of file.
You can draw a parallel between the kinds of chunks and the states presented above (has more/no/no more elements).
Input[E] so you can put an
Input[E] in it:
1 2 3 4 5 6
Enumerator is a non-blocking producer
The idea behind Play2 is, as you may know, to be fully non-blocking and asynchronous. Thus,
Iteratee reflects this philosophy.
Enumerator produces chunks in a completely asynchronous and non-blocking way. This means the concept of
Enumerator is not by default related to an active process or a background task generating chunks of data.
Remember the code snippet above with
dateGenerator which reflects exactly the asynchronous and non-blocking nature of
1 2 3 4 5 6 7 8 9
What’s a Promise?
It would require a whole article but let’s say the name corresponds exactly to what it does.
Promise[String]means : ”It will provide a String in the future (or an error)”, that’s all. Meanwhile, it doesn’t block current thread and just releases it.
Enumerator requires a consumer to produce
Due to its non-blocking nature, if nobody consumes those chunks, the
Enumerator doesn’t block anything and doesn’t consume any hidden runtime resources.
Enumerator MAY produce chunks of data only if there is someone to consume them.
You have deduced it yourself: the
Iteratee is a generic “stuff” that can iterate over an
Let’s be windy for one sentence:
Iterateeis the generic translation of the concept of iteration in pure functional programming.
Iteratoris built from the collection over which it will iterate,
Iterateeis a generic entity that waits for an
Enumeratorto be iterated over.
Do you see the difference between
Iteratee? No? Not a problem… Just remember that:
Iterateeis a generic entity that can iterate over the chunks of data produced by an
Enumerator(or something else)
Iterateeis created independently of the
Enumeratorover which it will iterate and the
Enumeratoris provided to it
Iterateeis immutable, stateless and fully reusable for different enumerators
That’s why we say:
Iterateeis applied on an
Enumeratoror run over an
Do you remember the example above computing the total of all elements of an Enumerator[Int] ?
Here is the same code showing that an
Iteratee can be created once and reused several times on different Enumerators.
1 2 3 4 5 6 7 8 9 10 11 12
Enumerator.runare slightly different functions and we will explain that later.
Iteratee is an active consumer of chunks of data
By default, the
Iteratee awaits a first chunk of data and immediately after, it launches the iteration mechanism.
Iteratee goes on consuming data until it considers it has finished its computation.
Once initiated, the
Iteratee is fully responsible for the full iteration process and decides when it stops.
1 2 3 4 5 6 7 8 9 10 11
As explained above, the
Enumerator is a producer of chunks of data and it expects a consumer to consume those chunks of data.
To be consumed/iterated, the
Enumerator has to be injected/plugged into an
Iteratee or more precisely the first chunk of data has to be
injected/pushed into the
Iteratee is dependent on speed of production of
Enumerator: if it’s slow, the
Iteratee is also slow.
Notice the relation Iteratee/Enumerator can be considered with respect to inversion of control and dependency injection pattern.
Iteratee is a ”1-chunk-loop” function
Iteratee consumes chunks one by one until it considers it has ended iteration.
Actually, the real scope of an
Iteratee is limited to the treatment of one chunk.
That’s why it can be defined as a function being able to consume one chunk of data.
Iteratee accepts static typed chunks and computes a static typed result
Iterator iterates over chunks of data coming from the collection that created it, an
Iteratee is a bit more ambitious : it can
compute something meanwhile it consumes chunks of data.
That’s why the signature of Iteratee is :
1 2 3
Let’s go back to our first sample : compute the total of all integers produced by an
1 2 3 4 5 6
Notice the usage of
run: You can see that the result is not the total itself but a
Promise[Int]of the total because we are in an asynchronous world.
To retrieve the real total, you could use scala concurrent blocking
Await._functions. But this is NOT good because it’s a blocking API. As Play2 is fully async/non-blocking, the best practice is to propagate the promise using
But a result is not mandatory. For ex, let’s just println all consumed chunks:
1 2 3 4 5 6
The result is not necessarily a primitive type, it can just be the concatenation of all chunks into a List for ex:
1 2 3
Iteratee can propagate the immutable context & state over iterations
To be able to compute final total, the
Iteratee needs to propagate the partial totals along iteration steps.
This means the
Iteratee is able to receive a context (the previous total for ex) from the previous step, then compute the new context with current chunk of data
(new total = previous total + current element) and can finally propagate this context to the next step (if there need to be a next step).
Iteratee is simply a state machine
Ok this is cool but how does the
Iteratee know it has to stop iterating?
What happens if there were an error/ EOF or it has reached the end of
Therefore, in addition to the context, the
Iteratee should also receive previous state, decides what to do and potentially computes the new state to be sent to next step.
Now, remember the classic iteration states described above. For
Iteratee, there are almost the same 2 possible states of iteration:
Cont: the iteration can continue with next chunk and potentially compute new context
Done: it signals it has reached the end of its process and can return the resulting context value
and a 3rd one which seems quite logical:
Error: it signals there was an Error during current step and stops iterating
From this point of view, we can consider the
Iterateeis just a state machine in charge of looping over state
Contuntil it detects conditions to switch to terminal states
Done/Error/Cont are also
Iteratee is defined as a 1-chunk-loop function and it’s main purpose is to change from one state to another one.
Let’s consider those states are also
We have 3 ”State” Iteratees:
Done[E, A](a: A, remaining: Input[E])
a:Athe context received from previous step
remaining: Input[E]representing the next chunk
Error[E](msg: String, input: Input[E])
Very simple to understand also: an error message and the input on which it failed.
Cont[E, A](k: Input[E] => Iteratee[E, A])
This is the most complicated State as it’s built from a function taking an
Input[E] and returning another
Without going too deep in the theory, you can easily understand that
Input[E] => Iteratee[E, A] is simply a good way to consume one input and return a new state/iteratee
which can consume another input and return another state/iteratee etc… till reaching state Done or Error.
This construction ensures feeding the iteration mechanism (in a typical functional way).
Ok lots of information, isn’t it?
You certainly wonder why I explain of all of that?
This is just because if you understand that, you will understand how to create an custom
Let’s write an
Iteratee computing the total of the 2 first elements in an
Enumerator[Int] to show an example.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
With this example, you can understand that writing an Iteratee is not much different than choosing what to do at each step depending on the type of Chunk you received and returning the new
Enumerator is just a helper to deal with
As you could see, in
Iteratee API, there is nowhere any mention about
This is just because
Enumerator is just a helper to interact with
Iteratee: it can plug itself to
Iteratee and injects the first chunk of data into it.
But you don’t need
Enumerator to use
Iteratee even if this is really easier and well integrated everywhere in Play2.
Let’s go back to this point evoked earlier.
Have a look at the signature of main APIs in
1 2 3 4 5 6
apply returns last Iteratee/State
apply function injects the
Enumerator into the
Iteratee which consumes the chunks, does its job and returns a Promise of
From previous explanation, you may deduce by yourself that the returned
Iteratee might simply be the last state after it has finished consuming the chunks it required from
run returns a Promise[Result]
run has 3 steps:
- Call previous
Iterateeto be sure it has ended
- Get the last context from
Iterateeas a promise.
Here is an example:
1 2 3 4 5 6 7 8 9
When you need the result of
Iteratee, you shall use
When you need to apply an
Iteratee over an
Enumerator without retrieving the result, you shall use
Iteratee is a Promise[Iteratee] (IMPORTANT TO KNOW)
One more thing to know about an Iteratee is that Iteratee is a Promise[Iteratee] by definition.
1 2 3 4 5 6 7 8 9 10
This means that you can build your code around Iteratee in a very lazy way : with Iteratee, you can switch to Promise and back as you want.
And now you come across this…
What is that new stuff in
2nd advice : DON’T PANIC NOW…
Enumeratee concept is really simple to understand
Enumeratee is just a pipe adapter between
Imagine you have an
Enumerator[Int] and an
You can transform an
Int into a
String, isn’t it?
So you should be able to transform the chunks of
Int into chunks of
String and then inject them into the Iteratee.
Enumeratee is there to save you.
1 2 3 4
What happened there?
You just piped
Enumerator[Int] through and
Enumeratee[Int, String] into
In 2 steps:
So, you may understand that
Enumeratee is a very useful tool to convert your custom
Enumerator to be used with generic
Iteratee provided by Play2 API.
You’ll see that this is certainly the tool you will use the most while coding with
Enumeratee can be applied to an
This is a very useful feature of
You can transform Enumerate[From] into Enumerator[To] with an Enumeratee[From, To]
Enumeratee is quite explicit:
So you can use it as following:
Enumeratee can transform an
This is a bit stranger feature because you can transform an
Iteratee[To, A] to an
Iteratee[From, A] with
1 2 3
Enumeratee can be composed with an
Yes, this is the final very useful feature of
1 2 3 4
So once again, very easy to see that you can create your generic
Enumeratees and then compose them into the custom
Enumeratee you need for your custom
Now I hope you have a bit more information and are not lost anymore.
Next step is to use
Enumeratee all together.
I’ll write other articles presenting more specific and practical ideas and concepts and samples…
There are a lot of interesting features that are worth precise explanations.
Understanding clearly what’s an
Iteratee is important because it helps writing new
Iteratees but you can also stay superficial and use the many helpers provided by Play2 Iteratee API.
Ok, documentation is not yet as complete as it should but we are working on this!!!
Anyway, why should I use
I want to tell you that
Enumeratee is not a funny tool for people found of functional constructions.
They are useful in many domains and once you will understand how they work, I can promise you that you will begin to use it more and more.
Modern web applications are not only dynamically generated pages anymore. Now you manipulate flows of data coming from different sources, in different formats, with different availability timing. You may have to serve huge amount of data to huge number of clients and to work in distributed environments.
Iteratee are made for those cases because there are safe, immutable and very good to deal with data flows in realtime.
Let’s tell the buzzword you can see more & more “Realtime WebApp” and
Iteratee is associated to that ;)
Note on weird operators
You will certainly see lots of those operators in code based on
|>>>and the famous fish operator
><>. Don’t focus on those operators right now, there are just aliases of real explicit words such as
compose. I’ll try to write an article about those operators to demystify them. With practice, some people will find the code with operators clearer and more compact, some people will prefer words.