Thanks to @loic_d, we have a French translation of this article there
You may have remarked that Play2 provides an intriguing feature called Iteratee
(and its counterparts Enumerator
and Enumeratee
).
The main aim of this article is (to try) to make the Iteratee
concept understandable for most of us with reasonably simple arguments and without functional/math theory.
This article is not meant to explain everything about
Iteratee
/Enumerator
/Enumeratee
but just the ideas behind it.
I’ll try to write another article to show practical samples of coding with Iteratee/Enumerator/Enumeratee.
Introduction
In Play2 doc,
Iteratees
are presented as a great tool to handle data streams reactively in a non-blocking, generic & composable way for modern web programming in distributed environments.Seems great, isn’t it?
But what is anIteratee
exactly?
What is the difference between theIteratee
and the classicIterator
you certainly know?
Why use it? In which cases?
A bit obscure and complex, isn’t it?
If you are lazy and want to know just a few things
Iteratee
is an abstraction of iteration over chunks of data in a non-blocking and asynchronous wayIteratee
is able to consume chunks of data of a given type from a producer generating data chunks of the same type calledEnumerator
Iteratee
can compute a progressive result from data chunks over steps of iteration (an incremented total for ex)Iteratee
is a thread-safe and immutable object that can be re-used over several `Enumerators`
1st advice: DO NOT SEARCH ABOUT ITERATEES ON GOOGLE
When you search on Google for Iteratee
, you find very obscure explanations based on pure functional approach or even mathematical theories. Even the documentation on Play Framework (there) explains Iteratee
with a fairly low-level approach which might be hard for beginners…
As a beginner in Play2, it might seem a bit tough to handle Iteratee
concept presented in a really abstract way of manipulating data chunks.
It might seem so complicated that you will occult it and won’t use it.
It would be a shame because Iteratees are so powerful and provide a really interesting and new way to manipulate your data flows in a web app.
So, let’s try to explain things in a simple way. I don’t pretend to be a theoretical expert on those functional concepts and I may even say wrong things but I want to write an article that reflects what Iteratee
means for me. Hope this could be useful to somebody…
This article uses Scala for code samples but the code should be understandable by anyone having a few notions of coding and I promise not to use any weird operator (but in last paragraph) ><> ><> ><> ><>
The code samples are based on incoming Play2.1 master code which greatly simplifies and rationalizes the code of
Iteratee
. So don’t be surprised if API doesn’t look like Play2.0.x sometimes
Reminders about iteration
Before diving into deep Iteratee
sea, I want to clarify what I call iteration and to try to go progressively from the concept of Iterator
to Iteratee
.
You may know the Iterator
concept you can find in Java. An Iterator
allows to run over a collection of elements and then do something at each step of iteration. Let’s begin with a very simple iteration in Java classic way that sums all the integers of a List[Int]
The first very naive implementation with Java-like Iterator
1 2 3 4 5 6 7 8 9 10 |
|
Without any surprise, iterating over a collection means :
- Get an iterator from the collection,
- Get an element from the iterator (if there are any),
- Do something : here add the element value to the total,
- If there are other elements, go to the next element,
- Do it again,
- Etc… till there are no more element to consume in the iterator
- A state of iteration (is iteration finished ? This is naturally linked to the fact that there are more elements or not in the iterator?)
- A context updated from one step to the next (the total)
- An action updating the context
Rewrite that using Scala for-comprehension
1
|
|
It’s a bit better because you don’t have to use the iterator.
Rewrite it in a more functional way
1
|
|
Here we introduce List.foreach
function accepting an anonymous function (Int => Unit)
as a parameter and iterating over the list: for each element in the list, it calls the function which can update the context (here the total).
The anonymous function contains the action executed at each loop while iterating over the collection.
Rewrite in a more generic way
The anonymous function could be stored in a variable so that it can be reused in different places.
1 2 3 4 5 6 7 8 |
|
You should then say to me: “This is ugly design, your function has side-effects and uses a variable which is not a nice design at all and you even have to reset total to 0 at second call!”
That’s completely true:
Side-effect functions are quite dangerous because they change the state of something that is external to the function. This state is not exclusive to the function and can be changed by other entities, potentially in other threads. Function with side-effects are not recommended to have clean and robust designs and functional languages such as Scala tend to reduce side-effects functions to the strict necessary (IO operations for ex).
Mutable variables are also risky because if your code is run over several threads, if 2 threads try to change the value of the variable, who wins? In this case, you need synchronization which means blocking threads while writing the variable which means breaking one of the reason of being of Play2 (non-blocking web apps)…
Rewrite the code in an immutable way without side-effects
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
A bit more code, isn’t it ?
But please notice at least:
var total
disappeared.step
function is the action executed at each step of iteration but it does something more than before:step
also manages the state of the iteration. It executes as following:- If the list is empty, return current
total
- If the list has 1 element, return
total + elt
- If the list has more than 1 element, calls
step
with the tail elements and the new totaltotal + head
- If the list is empty, return current
So at each step of iteration, depending on the result of previous iteration, step
can choose between 2 states:
- Continue iteration because it has more elements
- Stop iteration because it reached end of list or no element at all
Notice also that :
step
is a tail-recursive function (doesn’t unfold the full call stack at the end of recursion and returns immediately) preventing from stack overflow and behaving almost like the previous code withIterator
step
transmits the remaining elements of the list & the new total to the next stepstep
returns the total without any side-effects at all
So, yes, this code consumes a bit more memory because it re-copies some parts of the list at each step (only the references to the elements) but it has no side-effect and uses only immutable data structures. This makes it very robust and distributable without any problem.
Notice you can write the code in a very shorter way using the wonderful functions provided by Scala collections:
1
|
|
Milestone
In this article, I consider iteration based on immutable structures propagated over steps. From this point of view, iterating involves:
- receiving information from the previous step: context & state
- getting current/remaining element(s)
- computing a new state & context from remaining elements
- propagating the new state & context to next step
Step by Step to Iterator & Iteratees
Now that we are clear about iteration, let’s go back to our Iteratee
!!!
Imagine you want to generalize the previous iteration mechanism and be able to write something like:
1 2 3 4 5 6 |
|
Yes I know, with Scala collection APIs, you can do many things :)
Imagine you want to compose a first iteration with another one:
1 2 3 |
|
Imagine you want to apply this iteration on something else than a collection:
- a stream of data produced progressively by a file, a network connection, a database connection,
- a data flow generated by an algorithm,
- a data flow from an asynchronous data producer such as a scheduler or an actor.
Iteratees are exactly meant for this…
Just to tease, here is how you would write the previous sum iteration with an Iteratee
.
1 2 |
|
Ok, it looks like the previous code and doesn’t seem to do much more…
Not false but trust me, it can do much more.
At least, it doesn’t seem so complicated?
But, as you can see, Iteratee
is used with Enumerator
and both concepts are tightly related.
Now let’s dive into those concepts on a step by step approach.
><> About Enumerator ><>
Enumerator
is a more generic concept than collections or arrays
Till now, we have used collections in our iterations. But as explained before, we could iterate over something more generic, simply being able to produce simple chunks of data available immediately or asynchronously in the future.
Enumerator
is designed for this purpose.
A few examples of simple Enumerators:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
Enumerator
is a PRODUCER of statically typed chunks of data.
Enumerator[E]
produces chunks of data of type E
and can be of the 3 following kinds:
Input[E]
is a chunk of data of type E : for ex, Input[Pizza] is a chunk of Pizza.Input.Empty
means the enumerator is empty : for ex, an Enumerator streaming an empty file.Input.EOF
means the enumerator has reached its end : for ex, Enumerator streaming a file and reaching the end of file.
You can draw a parallel between the kinds of chunks and the states presented above (has more/no/no more elements).
Actually, Enumerator[E]
contains Input[E]
so you can put an Input[E]
in it:
1 2 3 4 5 6 |
|
Enumerator
is a non-blocking producer
The idea behind Play2 is, as you may know, to be fully non-blocking and asynchronous. Thus, Enumerator
/Iteratee
reflects this philosophy.
The Enumerator
produces chunks in a completely asynchronous and non-blocking way. This means the concept of Enumerator
is not by default related to an active process or a background task generating chunks of data.
Remember the code snippet above with dateGenerator
which reflects exactly the asynchronous and non-blocking nature of Enumerator
/Iteratee
?
1 2 3 4 5 6 7 8 9 |
|
What’s a Promise?
It would require a whole article but let’s say the name corresponds exactly to what it does.
APromise[String]
means : ”It will provide a String in the future (or an error)”, that’s all. Meanwhile, it doesn’t block current thread and just releases it.
Enumerator
requires a consumer to produce
Due to its non-blocking nature, if nobody consumes those chunks, the Enumerator
doesn’t block anything and doesn’t consume any hidden runtime resources.
So, Enumerator
MAY produce chunks of data only if there is someone to consume them.
Enumerator
?You have deduced it yourself: the
Iteratee
><> About Iteratee ><>
Iteratee
is a generic “stuff” that can iterate over an Enumerator
Let’s be windy for one sentence:
Iteratee
is the generic translation of the concept of iteration in pure functional programming.
WhileIterator
is built from the collection over which it will iterate,Iteratee
is a generic entity that waits for anEnumerator
to be iterated over.
Do you see the difference between Iterator
and Iteratee
? No? Not a problem… Just remember that:
- an
Iteratee
is a generic entity that can iterate over the chunks of data produced by anEnumerator
(or something else) - an
Iteratee
is created independently of theEnumerator
over which it will iterate and theEnumerator
is provided to it - an
Iteratee
is immutable, stateless and fully reusable for different enumerators
That’s why we say:
An
Iteratee
is applied on anEnumerator
or run over anEnumerator
.
Do you remember the example above computing the total of all elements of an Enumerator[Int] ?
Here is the same code showing that an Iteratee
can be created once and reused several times on different Enumerators.
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Enumerator.apply
andEnumerator.run
are slightly different functions and we will explain that later.
Iteratee
is an active consumer of chunks of data
By default, the Iteratee
awaits a first chunk of data and immediately after, it launches the iteration mechanism.
The Iteratee
goes on consuming data until it considers it has finished its computation.
Once initiated, the Iteratee
is fully responsible for the full iteration process and decides when it stops.
1 2 3 4 5 6 7 8 9 10 11 |
|
As explained above, the Enumerator
is a producer of chunks of data and it expects a consumer to consume those chunks of data.
To be consumed/iterated, the Enumerator
has to be injected/plugged into an Iteratee
or more precisely the first chunk of data has to be
injected/pushed into the Iteratee
.
Naturally the Iteratee
is dependent on speed of production of Enumerator
: if it’s slow, the Iteratee
is also slow.
Notice the relation Iteratee/Enumerator can be considered with respect to inversion of control and dependency injection pattern.
Iteratee
is a ”1-chunk-loop” function
The Iteratee
consumes chunks one by one until it considers it has ended iteration.
Actually, the real scope of an Iteratee
is limited to the treatment of one chunk.
That’s why it can be defined as a function being able to consume one chunk of data.
Iteratee
accepts static typed chunks and computes a static typed result
Whereas an Iterator
iterates over chunks of data coming from the collection that created it, an Iteratee
is a bit more ambitious : it can
compute something meanwhile it consumes chunks of data.
That’s why the signature of Iteratee is :
1 2 3 |
|
Let’s go back to our first sample : compute the total of all integers produced by an Enumerator[Int]
:
1 2 3 4 5 6 |
|
Notice the usage of
run
: You can see that the result is not the total itself but aPromise[Int]
of the total because we are in an asynchronous world.
To retrieve the real total, you could use scala concurrent blockingAwait._
functions. But this is NOT good because it’s a blocking API. As Play2 is fully async/non-blocking, the best practice is to propagate the promise usingPromise.map/flatMap
.
But a result is not mandatory. For ex, let’s just println all consumed chunks:
1 2 3 4 5 6 |
|
The result is not necessarily a primitive type, it can just be the concatenation of all chunks into a List for ex:
1 2 3 |
|
Iteratee
can propagate the immutable context & state over iterations
To be able to compute final total, the Iteratee
needs to propagate the partial totals along iteration steps.
This means the Iteratee
is able to receive a context (the previous total for ex) from the previous step, then compute the new context with current chunk of data
(new total = previous total + current element) and can finally propagate this context to the next step (if there need to be a next step).
Iteratee
is simply a state machine
Ok this is cool but how does the Iteratee
know it has to stop iterating?
What happens if there were an error/ EOF or it has reached the end of Enumerator
?
Therefore, in addition to the context, the Iteratee
should also receive previous state, decides what to do and potentially computes the new state to be sent to next step.
Now, remember the classic iteration states described above. For Iteratee
, there are almost the same 2 possible states of iteration:
- State
Cont
: the iteration can continue with next chunk and potentially compute new context - State
Done
: it signals it has reached the end of its process and can return the resulting context value
and a 3rd one which seems quite logical:
- State
Error
: it signals there was an Error during current step and stops iterating
From this point of view, we can consider the
Iteratee
is just a state machine in charge of looping over stateCont
until it detects conditions to switch to terminal statesDone
orError
.
Iteratee
states Done/Error/Cont
are also Iteratee
Remember, the Iteratee
is defined as a 1-chunk-loop function and it’s main purpose is to change from one state to another one.
Let’s consider those states are also Iteratee
.
We have 3 ”State” Iteratees:
Done[E, A](a: A, remaining: Input[E])
a:A
the context received from previous stepremaining: Input[E]
representing the next chunk
Error[E](msg: String, input: Input[E])
Very simple to understand also: an error message and the input on which it failed.
Cont[E, A](k: Input[E] => Iteratee[E, A])
This is the most complicated State as it’s built from a function taking an Input[E]
and returning another Iteratee[E,A]
.
Without going too deep in the theory, you can easily understand that Input[E] => Iteratee[E, A]
is simply a good way to consume one input and return a new state/iteratee
which can consume another input and return another state/iteratee etc… till reaching state Done or Error.
This construction ensures feeding the iteration mechanism (in a typical functional way).
Ok lots of information, isn’t it?
You certainly wonder why I explain of all of that?
This is just because if you understand that, you will understand how to create an custom Iteratee
.
Let’s write an Iteratee
computing the total of the 2 first elements in an Enumerator[Int]
to show an example.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
|
With this example, you can understand that writing an Iteratee is not much different than choosing what to do at each step depending on the type of Chunk you received and returning the new
State/Iteratee
.
A few candy for those who did not drown yet
Enumerator
is just a helper to deal with Iteratee
As you could see, in Iteratee
API, there is nowhere any mention about Enumerator
.
This is just because Enumerator
is just a helper to interact with Iteratee
: it can plug itself to Iteratee
and injects the first chunk of data into it.
But you don’t need Enumerator
to use Iteratee
even if this is really easier and well integrated everywhere in Play2.
Difference between Enumerator.apply(Iteratee)
and Enumerator.run(Iteratee)
Let’s go back to this point evoked earlier.
Have a look at the signature of main APIs in Enumerator
:
1 2 3 4 5 6 |
|
apply
returns last Iteratee/State
The apply
function injects the Enumerator
into the Iteratee
which consumes the chunks, does its job and returns a Promise of Iteratee
.
From previous explanation, you may deduce by yourself that the returned Iteratee
might simply be the last state after it has finished consuming the chunks it required from Enumerator
.
run
returns a Promise[Result]
run
has 3 steps:
- Call previous
apply
function - Inject
Input.EOF
intoIteratee
to be sure it has ended - Get the last context from
Iteratee
as a promise.
Here is an example:
1 2 3 4 5 6 7 8 9 |
|
To Remember
When you need the result of Iteratee
, you shall use run
When you need to apply an Iteratee
over an Enumerator
without retrieving the result, you shall use apply
Iteratee
is a Promise[Iteratee] (IMPORTANT TO KNOW)
One more thing to know about an Iteratee is that Iteratee is a Promise[Iteratee] by definition.
1 2 3 4 5 6 7 8 9 10 |
|
Iteratee
<=> Promise[Iteratee]
This means that you can build your code around Iteratee in a very lazy way : with Iteratee, you can switch to Promise and back as you want.
Final words about Enumeratee
You discovered
Iteratee
, thenEnumerator
…
And now you come across this…Enumeratee
???
What is that new stuff inXXXtee
?????
2nd advice : DON’T PANIC NOW… Enumeratee
concept is really simple to understand
Enumeratee
is just a pipe adapter between Enumerator
and Iteratee
Imagine you have an Enumerator[Int]
and an Iteratee[String, Lis[String]]
.
You can transform an Int
into a String
, isn’t it?
So you should be able to transform the chunks of Int
into chunks of String
and then inject them into the Iteratee.
Enumeratee is there to save you.
1 2 3 4 |
|
What happened there?
You just piped Enumerator[Int]
through and Enumeratee[Int, String]
into Iteratee[String, List[String]]
In 2 steps:
1 2 |
|
So, you may understand that Enumeratee
is a very useful tool to convert your custom Enumerator
to be used with generic Iteratee
provided by Play2 API.
You’ll see that this is certainly the tool you will use the most while coding with Enumerator
/ Iteratee
.
Enumeratee
can be applied to an Enumerator
without Iteratee
This is a very useful feature of Enumeratee
.
You can transform Enumerate[From] into Enumerator[To] with an Enumeratee[From, To]
Signature of Enumeratee
is quite explicit:
1
|
|
So you can use it as following:
1
|
|
Enumeratee
can transform an Iteratee
This is a bit stranger feature because you can transform an Iteratee[To, A]
to an Iteratee[From, A]
with Enumeratee[From, To]
1 2 3 |
|
Enumeratee
can be composed with an Enumeratee
Yes, this is the final very useful feature of Enumeratee
.
1 2 3 4 |
|
So once again, very easy to see that you can create your generic Enumeratees
and then compose them into the custom Enumeratee
you need for your custom Enumerator
/ Iteratee
.
Conclusion
Now I hope you have a bit more information and are not lost anymore.
Next step is to use Iteratee
/ Enumerator
/ Enumeratee
all together.
I’ll write other articles presenting more specific and practical ideas and concepts and samples…
There are a lot of interesting features that are worth precise explanations.
Understanding clearly what’s an Iteratee
is important because it helps writing new Iteratees
but you can also stay superficial and use the many helpers provided by Play2 Iteratee API.
Ok, documentation is not yet as complete as it should but we are working on this!!!
Anyway, why should I use Iteratee
/ Enumerator
/ Enumeratee
?
I want to tell you that Iteratee
/ Enumerator
/ Enumeratee
is not a funny tool for people found of functional constructions.
They are useful in many domains and once you will understand how they work, I can promise you that you will begin to use it more and more.
Modern web applications are not only dynamically generated pages anymore. Now you manipulate flows of data coming from different sources, in different formats, with different availability timing. You may have to serve huge amount of data to huge number of clients and to work in distributed environments.
Iteratee
are made for those cases because there are safe, immutable and very good to deal with data flows in realtime.
Let’s tell the buzzword you can see more & more “Realtime WebApp” and Iteratee
is associated to that ;)
Note on weird operators
You will certainly see lots of those operators in code based on
Iteratee
/Enumerator
/Enumeratee
such as&>
,|>>
,|>>>
and the famous fish operator><>
. Don’t focus on those operators right now, there are just aliases of real explicit words such asthrough
,apply
,applyOn
orcompose
. I’ll try to write an article about those operators to demystify them. With practice, some people will find the code with operators clearer and more compact, some people will prefer words.
Have fun