The code is on github project shapotomic

Datomisca is a Scala API for Datomic DB

If you want to know more about Datomisca/Datomic schema go to my recent article. What’s interesting with Datomisca schema is that they are statically typed allowing some compiler validations and type inference.

Shapeless HList are heterogenous polymorphic lists

HList are able to contain different types of data and able to keep tracks of these types.


This project is an experience trying to :

  • convert HList to/from Datomic Entities
  • check HList types against schema at compile-time

This uses :

  • Datomisca type-safe schema
  • Shapeless HList
  • Shapeless polymorphic functions

Please note that we don’t provide any Iso[From, To] since there is no isomorphism here. Actually, there are 2 monomorphisms (injective):

  • HList => AddEntity to provision an entity
  • DEntity => HList when retrieving entity

We would need to implement Mono[From, To] certainly for our case…

Code sample

Create schema based on HList

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// Koala Schema
object Koala {
  object ns {
    val koala = Namespace("koala")
  }

  // schema attributes
  val name        = Attribute(ns.koala / "name", SchemaType.string, Cardinality.one).withDoc("Koala's name")
  val age         = Attribute(ns.koala / "age", SchemaType.long, Cardinality.one).withDoc("Koala's age")
  val trees       = Attribute(ns.koala / "trees", SchemaType.string, Cardinality.many).withDoc("Koala's trees")

  // the schema in HList form
  val schema = name :: age :: trees :: HNil

  // the datomic facts corresponding to schema 
  // (need specifying upper type for shapeless conversion to list)
  val txData = schema.toList[Operation]
}

// Provision schema
Datomic.transact(Koala.txData) map { tx => ... }

Validate HList against Schema

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// creates a Temporary ID & keeps it for resolving entity after insertion
val id = DId(Partition.USER)
// creates an HList entity 
val hListEntity =
  id :: "kaylee" :: 3L ::
  Set( "manna_gum", "tallowwood" ) ::
  HNil

// validates and converts at compile-time this HList against schema
hListEntity.toAddEntity(Koala.schema)

// If you remove a field from HList and try again, the compiler fails
val badHListEntity =
  id :: "kaylee" ::
  Set( "manna_gum", "tallowwood" ) ::
  HNil

scala> badHListEntity.toAddEntity(Koala.schema)
<console>:23: error: could not find implicit value for parameter pull:
  shapotomic.SchemaCheckerFromHList.Pullback2[shapeless.::[datomisca.TempId,shapeless.::[String,shapeless.::[scala.collection.immutable.Set[String],shapeless.HNil]]],
  shapeless.::[datomisca.RawAttribute[datomisca.DString,datomisca.CardinalityOne.type],
  shapeless.::[datomisca.RawAttribute[datomisca.DLong,datomisca.CardinalityOne.type],
  shapeless.::[datomisca.RawAttribute[datomisca.DString,datomisca.CardinalityMany.type],shapeless.HNil]]],datomisca.AddEntity]

The compiler error is a bit weird at first but if you take a few seconds to read it, you’ll see that there is nothing hard about it, it just says:

1
2
3
scala> I can't convert
(TempId ::) String             :: Set[String]      :: HNil =>
            Attr[DString, one] :: Attr[DLong, one] :: Attr[DString, many] :: HNil

Convert DEntity to static-typed HList based on schema

1
2
3
4
5
6
7
val e = Datomic.resolveEntity(tx, id)

// rebuilds HList entity from DEntity statically typed by schema
val postHListEntity = e.toHList(Koala.schema)

// Explicitly typing the value to show that the compiler builds the right typed HList from schema
val validateHListEntityType: Long :: String :: Long :: Set[String] :: HNil = postHListEntity

Conclusion

Using HList with compile-time schema validation is quite interesting because it provides a very basic and versatile data structure to manipulate Datomic entities in a type-safe style.

Moreover, as Datomic pushes atomic data manipulation (simple facts instead of full entities), it’s really cool to use HList instead of rigid static structure such as case-class.

For ex:

1
val simplerOp = (id :: "kaylee" :: 5L).toAddEntity(Koala.name :: Koala.age :: HNil)

Have TypedFun

One more step in our progressive unveiling of Datomisca, our opensource Scala API (sponsored by Pellucid & Zenexity) trying to enhance Datomic experience for Scala developers…

After evoking queries compiled by Scala macros in previous article and then reactive transaction & fact operation API, let’s explain how Datomisca manages Datomic schema attributes.


Datomic Schema Reminders

As explained in previous articles, Datomic stores lots of atomic facts called datoms which are constituted of entity-id, attribute, value and transaction-id.

An attribute is just a namespaced keyword :<namespace>.<nested-namespace>/<name> such as:person.address/street`:

  • person.address is just a hierarchical namespace person -> address
  • street is the name of the attribute

It’s cool to provision all thoses atomic pieces of information but what if we provision non existing attribute with bad format, type, …? Is there a way to control the format of data in Datomic?

In a less strict way than SQL, Datomic provides schema facility allowing to constrain the accepted attributes and their type values.

Schema attribute definition

Datomic schema just defines the accepted attributes and some constraints on those attributes. Each schema attribute can be defined by following fields:

value type

  • basic types : string, long, float, bigint, bigdec, boolean, instant, uuid, uri, bytes (yes NO int).
  • reference : in Datomic you can reference other entities (these are lazy relations not as strict as the ones in RDBMS)

cardinality

  • one : one-to-one relation if you want an analogy with RDBMS
  • many : one-to-many relation

Please note that in Datomic, all relations are bidirectional even for one-to-many.

optional constraints:

Schema attributes are entities

The schema validation is applied at fact insertion and allows to prevent from inserting unknown attributes or bad value types. But how are schema attributes defined?

Actually, schema attributes are themselves entities.

Remember, in previous article, I had introduced entities as being just loose aggregation of datoms just identified by the same entity ID (the first attribute of a datom).

So a schema attribute is just an entity stored in a special partition :db.part/db and defined by a few specific fields corresponding to the ones in previous paragraph. Here are the fields used to define a Datomic schema attribute technically speaking:

mandatory fields

  • :db/ident : specifies unique name of the attribute
  • :db/valueType : specifies one the previous types - Please note that even those types are not hard-coded in Datomic and in the future, adding new types could be a new feature.
  • :db/cardinality : specifies the cardinality one or many of the attribute - a many attribute is just a set of values and type Set is important because Datomic only manages sets of unique values as it won’t return multiple times the same value when querying.

optional fields

  • :db/unique
  • :db/doc (useful to document your schema)
  • :db/index
  • :db/fulltext
  • :db/isComponent
  • :db/noHistory

Here is an example of schema attribute declaration written in Clojure:

1
2
3
4
5
{:db/id #db/id[:db.part/db]
 :db/ident :person/name
 :db/valueType :db.type/string
 :db/cardinality :db.cardinality/one
 :db/doc "A person's name"}

As you can see, creating schema attributes just means creating new entities in the right partition. So, to add new attributes to Datomic, you just have to add new facts.


Schema sample

Let’s create a schema defining a Koala living in an eucalyptus.

Yes I’m a super-Koala fan! Don’t ask me why, this is a long story not linked at all to Australia :D… But saving Koalas is important to me so I put this little banner for them…

Let’s define a koala by following attributes:

  • a name String
  • an age Long
  • a sex which can be male or `female
  • a few eucalyptus trees in which to feed defined by:

    • a species being a reference to one of the possible species of eucalyptus trees
    • a row Long (let’s imagine those trees are planted in rows/columns)
    • a column Long

Here is the Datomic schema for this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
[
{:db/id #db/id[:db.part/db]
 :db/ident :koala/name
 :db/valueType :db.type/string
 :db/unique :db.unique/value
 :db/cardinality :db.cardinality/one
 :db/doc "A koala's name"}

{:db/id #db/id[:db.part/db]
 :db/ident :koala/age
 :db/valueType :db.type/long
 :db/cardinality :db.cardinality/one
 :db/doc "A koala's age"}

{:db/id #db/id[:db.part/db]
 :db/ident :koala/sex
 :db/valueType :db.type/ref
 :db/cardinality :db.cardinality/one
 :db/doc "A koala's sex"}

{:db/id #db/id[:db.part/db]
 :db/ident :koala/eucalyptus
 :db/valueType :db.type/ref
 :db/cardinality :db.cardinality/many
 :db/doc "A koala's eucalyptus trees"}

{:db/id #db/id[:db.part/db]
 :db/ident :eucalyptus/species
 :db/valueType :db.type/ref
 :db/cardinality :db.cardinality/one
 :db/doc "A eucalyptus specie"}

{:db/id #db/id[:db.part/db]
 :db/ident :eucalyptus/row
 :db/valueType :db.type/long
 :db/cardinality :db.cardinality/one
 :db/doc "A eucalyptus row"}

{:db/id #db/id[:db.part/db]
 :db/ident :eucalyptus/col
 :db/valueType :db.type/long
 :db/cardinality :db.cardinality/one
 :db/doc "A eucalyptus column"}

;; koala sexes as keywords
[:db/add #db/id[:db.part/user] :db/ident :sex/male]
[:db/add #db/id[:db.part/user] :db/ident :sex/female]

;; eucalyptus species
[:db/add #db/id[:db.part/user] :db/ident :eucalyptus.species/manna_gum]
[:db/add #db/id[:db.part/user] :db/ident :eucalyptus.species/tasmanian_blue_gum]
[:db/add #db/id[:db.part/user] :db/ident :eucalyptus.species/swamp_gum]
[:db/add #db/id[:db.part/user] :db/ident :eucalyptus.species/grey_gum]
[:db/add #db/id[:db.part/user] :db/ident :eucalyptus.species/river_red_gum]
[:db/add #db/id[:db.part/user] :db/ident :eucalyptus.species/tallowwood]

]

In this sample, you can see that we have defined 4 namespaces:

  • koala used to logically regroup koala entity fields
  • eucalyptus used to logically regroup eucalyptus entity fields
  • sex used to identify koala sex male or female as unique keywords
  • eucalyptus.species to identify eucalyptus species as unique keywords

Remark also:

  • :koala/name field is uniquely valued meaning no koala can have the same name
  • :koala/eucalyptus field is a one-to-many reference to eucalyptus entities

Datomisca way of declaring schema

First of all, initialize your Datomic DB

1
2
3
4
5
6
7
8
9
import scala.concurrent.ExecutionContext.Implicits.global

import datomisca._
import Datomic._

val uri = "datomic:mem://koala-db"

Datomic.createDatabase(uri)
implicit val conn = Datomic.connect(uri)

The NOT-preferred way

Now, you must know it but Datomisca intensively uses Scala 2.10 macros to provide compile-time parsing and validation of Datomic queries or operations written in Clojure.

Previous Schema attributes definition is just a set of classic operations so you can ask Datomisca to parse them at compile-time as following:

1
2
3
4
5
6
7
8
9
val ops = Datomic.ops("""[
{:db/id #db/id[:db.part/db]
 :db/ident :koala/name
 :db/valueType :db.type/string
 :db/unique :db.unique/value
 :db/cardinality :db.cardinality/one
 :db/doc "A koala's name"}
...
]""")

Then you can provision the schema into Datomic using:

1
2
3
4
5
Datomic.transact(ops) map { tx =>
  ...
  // do something
  //
}

The preferred way

Ok the previous is cool as you can validate and provision a clojure schema using Datomisca. But Datomisca provides a programmatic way of writing schema in Scala. This brings :

  • scala idiomatic way of manipulating schema
  • Type-safety to Datomic schema attributes.

Let’s see the code directly:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
// Sex Schema
object SexSchema {
  // First create your namespace
  object ns {
    val sex = Namespace("sex")
  }

  // enumerated values
  val FEMALE  = AddIdent(ns.sex / "female") // :sex/female
  val MALE    = AddIdent(ns.sex / "male")   // :sex/male

  // facts representing the schema to be provisioned
  val txData = Seq(FEMALE, MALE)
}

// Eucalyptus Schema
object EucalyptusSchema {
  object ns {
    val eucalyptus  = new Namespace("eucalyptus") { // new is just here to allow structural construction
      val species   = Namespace("species")
    }
  }

  // different species
  val MANNA_GUM           = AddIdent(ns.eucalyptus.species / "manna_gum")
  val TASMANIAN_BLUE_GUM  = AddIdent(ns.eucalyptus.species / "tasmanian_blue_gum")
  val SWAMP_GUM           = AddIdent(ns.eucalyptus.species / "swamp_gum")
  val GRY_GUM             = AddIdent(ns.eucalyptus.species / "grey_gum")
  val RIVER_RED_GUM       = AddIdent(ns.eucalyptus.species / "river_red_gum")
  val TALLOWWOOD          = AddIdent(ns.eucalyptus.species / "tallowwood")

  // schema attributes
  val species  = Attribute(ns.eucalyptus / "species", SchemaType.ref, Cardinality.one).withDoc("Eucalyptus's species")
  val row      = Attribute(ns.eucalyptus / "row", SchemaType.long, Cardinality.one).withDoc("Eucalyptus's row")
  val col      = Attribute(ns.eucalyptus / "col", SchemaType.long, Cardinality.one).withDoc("Eucalyptus's column")

  // facts representing the schema to be provisioned
  val txData = Seq(
    species, row, col,
    MANNA_GUM, TASMANIAN_BLUE_GUM, SWAMP_GUM,
    GRY_GUM, RIVER_RED_GUM, TALLOWWOOD
  )
}

// Koala Schema
object KoalaSchema {
  object ns {
    val koala = Namespace("koala")
  }

  // schema attributes
  val name         = Attribute(ns.koala / "name", SchemaType.string, Cardinality.one).withDoc("Koala's name").withUnique(Unique.value)
  val age          = Attribute(ns.koala / "age", SchemaType.long, Cardinality.one).withDoc("Koala's age")
  val sex          = Attribute(ns.koala / "sex", SchemaType.ref, Cardinality.one).withDoc("Koala's sex")
  val eucalyptus   = Attribute(ns.koala / "eucalyptus", SchemaType.ref, Cardinality.many).withDoc("Koala's trees")

  // facts representing the schema to be provisioned
  val txData = Seq(name, age, sex, eucalyptus)
}


// Provision Schema by just accumulating all txData
Datomic.transact(
  SexSchema.txData ++
  EucalyptusSchema.txData ++
  KoalaSchema.txData
) map { tx =>
  ...
}

Nothing complicated, isn’t it?

Exactly the same as writing Clojure schema but in Scala…


Datomisca type-safe schema

Datomisca takes advantage of Scala type-safety to enhance Datomic schema attribute and make them static-typed. Have a look at Datomisca Attribute definition:

1
sealed trait Attribute[DD <: DatomicData, Card <: Cardinality]

So an Attribute is typed by 2 parameters:

  • a DatomicData type
  • a Cardinality type

So when you define a schema attribute using Datomisca API, the compiler also infers those types.

Take this example:

1
val name  = Attribute(ns / "name", SchemaType.string, Cardinality.one).withDoc("Koala's name").withUnique(Unique.value)
  • SchemaType.string implies this is a Attribute[DString, _]
  • Cardinality.one implies this is a `Attribute[_, Cardinality.one]

So name is a Attribute[DString, Cardinality.one]

In the same way:

  • age is Attribute[DLong, Cardinality.one]
  • sex is Attribute[DRef, Cardinality.one]
  • eucalyptus is Attribute[DRef, Cardinality.many]

As you can imagine, using this type-safe schema attributes, Datomisca can ensure consistency between the Datomic schema and the types manipulated in Scala.


Taking advantage of type-safe schema

Checking types when creating facts

Based on the typed attribute, the compiler can help us a lot to validate that we give the right type for the right attribute.

Schema facilities are extensions of basic Datomisca so you must import following to use them:

1
import DatomicMapping._

Here is a code sample:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
//////////////////////////////////////////////////////////////////////
// correct tree with right types
scala> val tree58 = SchemaEntity.add(DId(Partition.USER))(Props() +
  (EucalyptusSchema.species -> EucalyptusSchema.SWAMP_GUM.ref) +
  (EucalyptusSchema.row     -> 5L) +
  (EucalyptusSchema.col     -> 8L)
)
tree58: datomisca.AddEntity =
{
  :eucalyptus/species :species/swamp_gum
  :eucalyptus/row 5
  :eucalyptus/col 8
  :db/id #db/id[:db.part/user -1000000]
}

//////////////////////////////////////////////////////////////////////
// incorrect tree with a string instead of a long for row
scala> val tree58 = SchemaEntity.add(DId(Partition.USER))(Props() +
  (EucalyptusSchema.species -> EucalyptusSchema.SWAMP_GUM.ref) +
  (EucalyptusSchema.row     -> "toto") +
  (EucalyptusSchema.col     -> 8L)
)
<console>:18: error: could not find implicit value for parameter attrC:
  datomisca.Attribute2PartialAddEntityWriter[datomisca.DLong,datomisca.CardinalityOne.type,String]
         (EucalyptusSchema.species -> EucalyptusSchema.SWAMP_GUM.ref) +

In second case, compiling fails because DLong => String doesn’t exist.

In first case, it works because DLong => Long is valid.


Checking types when getting fields from Datomic entities

First of all, let’s create our first little Koala named Rose which loves feeding from 2 eucalyptus trees.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
scala> val tree58 = SchemaEntity.add(DId(Partition.USER))(Props() +
  (EucalyptusSchema.species -> EucalyptusSchema.SWAMP_GUM.ref) +
  (EucalyptusSchema.row     -> 5L) +
  (EucalyptusSchema.col     -> 8L)
)
tree74: datomisca.AddEntity =
{
  :eucalyptus/species :species/swamp_gum
  :eucalyptus/row 5
  :eucalyptus/col 8
  :db/id #db/id[:db.part/user -1000002]
}

scala> val tree74 = SchemaEntity.add(DId(Partition.USER))(Props() +
  (EucalyptusSchema.species -> EucalyptusSchema.RIVER_RED_GUM.ref) +
  (EucalyptusSchema.row     -> 7L) +
  (EucalyptusSchema.col     -> 4L)
)
tree74: datomisca.AddEntity =
{
  :eucalyptus/species :species/river_red_gum
  :eucalyptus/row 7
  :eucalyptus/col 4
  :db/id #db/id[:db.part/user -1000004]
}

scala> val rose = SchemaEntity.add(DId(Partition.USER))(Props() +
  (KoalaSchema.name        -> "rose" ) +
  (KoalaSchema.age         -> 3L ) +
  (KoalaSchema.sex         -> SexSchema.FEMALE.ref ) +
  (KoalaSchema.eucalyptus  -> Set(DRef(tree58.id), DRef(tree74.id)) )
)
rose: datomisca.AddEntity =
{
  :koala/eucalyptus [#db/id[:db.part/user -1000001], #db/id[:db.part/user -1000002]]
  :koala/name "rose"
  :db/id #db/id[:db.part/user -1000003]
  :koala/sex :sex/female
  :koala/age 3
}

Now let’s provision those koala & trees into Datomic and retrieve real entity corresponding to our little Rose kitty.

1
2
3
4
Datomic.transact(tree58, tree74, rose) map { tx =>
  val realRose = Datomic.resolveEntity(tx, rose.id)
  ...
}

Finally let’s take advantage of typed schema attribute to access safely to fiels of the entity:

1
2
3
4
5
6
7
8
9
10
11
scala> val maybeRose = Datomic.transact(tree58, tree74, rose) map { tx =>
  val realRose = Datomic.resolveEntity(tx, rose.id)

  val name = realRose(KoalaSchema.name)
  val age = realRose(KoalaSchema.age)
  val sex = realRose(KoalaSchema.sex)
  val eucalyptus = realRose(KoalaSchema.eucalyptus)

  (name, age, sex, eucalyptus)
}
maybeRose: scala.concurrent.Future[(String, Long, Long, Set[Long])] = scala.concurrent.impl.Promise$DefaultPromise@49f454d6

What’s important here is that you get a (String, Long, Long, Set[Long]) which means the compiler was able to infer the right types from the Schema Attribute…

Greattt!!!

Ok that’s all for today!

Next article about an extension Datomisca provides for convenience : mapping Datomic entities to Scala structures such as case-classes or tuples. We don’t believe this is really the philosophy of Datomic in which atomic operations are much more interesting. But sometimes it’s convenient when you want to have data abstraction layer…

Have KoalaFun!

Do you like Shapeless, this great API developed by Miles Sabin studying generic/polytypic programming in Scala?

Do you like Play-json, the Play Json 2.1 Json API developed for Play 2.1 framework and now usable as stand-alone module providing functional & typesafe Json validation and Scala conversion?


Here is Shapelaysson an API interleaving Play-Json with Shapeless to be able to manipulate Json from/to Shapeless HList

HList are heterogenous polymorphic lists able to contain different types of data and able to keep tracks of these types


Shapelaysson is a Github project with test/samples


Shapelaysson takes part in my reflexions around manipulating pure data structures from/to JSON.

A few pure Json from/to HList samples

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
import play.api.libs.json._
import shapeless._
import HList._
import Tuples._
import shapelaysson._

// validates + converts a JsArray into HList
scala> Json.arr("foo", 123L).validate[ String :: Long :: HNil ]
res1: play.api.libs.json.JsResult[shapeless.::[String,shapeless.::[Long,shapeless.HNil]]] =
JsSuccess(foo :: 123 :: HNil,)

// validates + converts a JsObject into HList
scala> Json.obj("foo" -> "toto", "bar" -> 123L).validate[ String :: Long :: HNil ]
res3: play.api.libs.json.JsResult[shapeless.::[String,shapeless.::[Long,shapeless.HNil]]] =
JsSuccess(toto :: 123 :: HNil,)

// validates + converts imbricated JsObject into HList
scala> Json.obj(
     |   "foo" -> "toto",
     |   "foofoo" -> Json.obj("barbar1" -> 123.45, "barbar2" -> "tutu"),
     |      "bar" -> 123L,
     |      "barbar" -> Json.arr(123, true, "blabla")
     |   ).validate[ String :: (Float :: String :: HNil) :: Long :: (Int :: Boolean :: String :: HNil) :: HNil ]
res4: play.api.libs.json.JsResult[shapeless.::[String,shapeless.::[shapeless.::[Float,shapeless.::[String,shapeless.HNil]],shapeless.::[Long,shapeless.::[shapeless.::[Int,shapeless.::[Boolean,shapeless.::[String,shapeless.HNil]]],shapeless.HNil]]]]] =
JsSuccess(toto :: 123.45 :: tutu :: HNil :: 123 :: 123 :: true :: blabla :: HNil :: HNil,)

// validates with ERROR JsArray into HList
scala> Json.arr("foo", 123L).validate[ Long :: Long :: HNil ] must beEqualTo( JsError("validate.error.expected.jsnumber") )
<console>:23: error: value must is not a member of play.api.libs.json.JsResult[shapeless.::[Long,shapeless.::[Long,shapeless.HNil]]]
                    Json.arr("foo", 123L).validate[ Long :: Long :: HNil ] must beEqualTo( JsError("validate.error.expected.jsnumber") )

// converts HList to JsValue
scala> Json.toJson(123.45F :: "tutu" :: HNil)
res6: play.api.libs.json.JsValue = [123.44999694824219,"tutu"]

A few Json Reads/Writes[HList] samples

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
import play.api.libs.functional.syntax._

// creates a Reads[ String :: Long :: (String :: Boolean :: HNil) :: HNil]
scala> val HListReads2 = (
     |    (__ \ "foo").read[String] and
     |    (__ \ "bar").read[Long] and
     |    (__ \ "toto").read(
     |      (
     |        (__ \ "alpha").read[String] and
     |        (__ \ "beta").read[Boolean]
     |      ).tupled.hlisted
     |    )
     | ).tupled.hlisted
HListReads2: play.api.libs.json.Reads[shapeless.::[String,shapeless.::[Long,shapeless.::[shapeless.::[String,shapeless.::[Boolean,shapeless.HNil]],shapeless.HNil]]]] = play.api.libs.json.Reads$$anon$8@7e4a09ee

// validates/converts JsObject to HList
scala> Json.obj(
     |   "foo" -> "toto",
     |   "bar" -> 123L,
     |   "toto" -> Json.obj(
     |      "alpha" -> "chboing",
     |      "beta" -> true
     |   )
     | ).validate(HListReads2)
res7: play.api.libs.json.JsResult[shapeless.::[String,shapeless.::[Long,shapeless.::[shapeless.::[String,shapeless.::[Boolean,shapeless.HNil]],shapeless.HNil]]]] =
JsSuccess(toto :: 123 :: chboing :: true :: HNil :: HNil,)

// Create a Writes[String :: Long :: HNil]
scala> implicit val HListWrites: Writes[ String :: Long :: HNil ] = (
     |         (__ \ "foo").write[String] and
     |         (__ \ "bar").write[Long]
     |       ).tupled.hlisted
HListWrites: play.api.libs.json.Writes[shapeless.::[String,shapeless.::[Long,shapeless.HNil]]] = play.api.libs.json.Writes$$anon$5@7c9d07e2

// writes a HList to JsValue
scala> Json.toJson("toto" :: 123L :: HNil)
res8: play.api.libs.json.JsValue = {"foo":"toto","bar":123}

Adding shapelaysson in your dependencies

In your Build.scala, add:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import sbt._
import Keys._

object ApplicationBuild extends Build {

  val mandubianRepo = Seq(
    "Mandubian repository snapshots" at "https://github.com/mandubian/mandubian-mvn/raw/master/snapshots/",
    "Mandubian repository releases" at "https://github.com/mandubian/mandubian-mvn/raw/master/releases/"
  )

  val sonatypeRepo = Seq(
    "Sonatype OSS Releases" at "http://oss.sonatype.org/content/repositories/releases/",
    "Sonatype OSS Snapshots" at "http://oss.sonatype.org/content/repositories/snapshots/"
  )

  lazy val playJsonAlone = Project(
    BuildSettings.buildName, file("."),
    settings = BuildSettings.buildSettings ++ Seq(
      resolvers ++= mandubianRepo ++ sonatypeRepo,
      libraryDependencies ++= Seq(
        "org.mandubian"  %% "shapelaysson"  % "0.1-SNAPSHOT",
        "org.specs2"     %% "specs2"        % "1.13" % "test",
        "junit"           % "junit"         % "4.8" % "test"
      )
    )
  )
}

More to come maybe in this draft project… Suggestions are welcome too

Have Fun :: HNil!

A short article to talk about an interesting issue concerning Scala 2.10.0 Future that might interest you.

Summary


When a Fatal exception is thrown in your Future callback, it’s not caught by the Future and is thrown to the provided ExecutionContext.

But the current default Scala global ExecutionContext doesn’t register an UncaughtExceptionHandler for these fatal exceptions and your Future just hangs forever without notifying anything to anybody.

This issue is well known and a solution to the problem has already been merged into branch 2.10.x. But this issue is present in Scala 2.10.0 so it’s interesting to keep this issue in mind IMHO. Let’s explain clearly about it.

Exceptions can be contained by Future

Let’s write some stupid code with Futures.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
scala> import scala.concurrent._
scala> import scala.concurrent.duration._

// Take default Scala global ExecutionContext which is a ForkJoin Thread Pool
scala> val ec = scala.concurrent.ExecutionContext.global
ec: scala.concurrent.ExecutionContextExecutor = scala.concurrent.impl.ExecutionContextImpl@15f445b7

// Create an immediately redeemed Future with a simple RuntimeException
scala> val f = future( throw new RuntimeException("foo") )(ec)
f: scala.concurrent.Future[Nothing] = scala.concurrent.impl.Promise$DefaultPromise@27380357

// Access brutally the value to show that the Future contains my RuntimeException
scala> f.value
res22: Option[scala.util.Try[Nothing]] = Some(Failure(java.lang.RuntimeException: foo))

// Use blocking await to get Future result
scala> Await.result(f, 2 seconds)
warning: there were 1 feature warnings; re-run with -feature for details
java.lang.RuntimeException: foo
  at $anonfun$1.apply(<console>:14)
  at $anonfun$1.apply(<console>:14)
  ...

You can see that a Future can contain an Exception (or more generally Throwable).


Fatal Exceptions can’t be contained by Future

If you look in Scala 2.10.0 Future.scala, in the scaladoc, you can find:

1
2
3
4
* The following throwable objects are not contained in the future:
* - `Error` - errors are not contained within futures
* - `InterruptedException` - not contained within futures
* - all `scala.util.control.ControlThrowable` except `NonLocalReturnControl` - not contained within futures

and in the code, in several places, in map or flatMap for example, you can read:

1
2
3
4
5
try {
...
} catch {
  case NonFatal(t) => p failure t
}

This means that every Throwable that is Fatal can’t be contained in the Future.Failure.


What’s a Fatal Throwable?

To define what’s fatal, let’s see what’s declared as non-fatal in NonFatal ScalaDoc.

1
2
3
4
5
6
7
* Extractor of non-fatal Throwables.  
* Will not match fatal errors like VirtualMachineError  
* (for example, OutOfMemoryError, a subclass of VirtualMachineError),  
* ThreadDeath, LinkageError, InterruptedException, ControlThrowable, or NotImplementedError. 
*
* Note that [[scala.util.control.ControlThrowable]], an internal Throwable, is not matched by
* `NonFatal` (and would therefore be thrown).

Let’s consider Fatal exceptions are just critical errors that can’t be recovered in general.

So what’s the problem?

It seems right not to catch fatal errors in the `Future, isn’t it?

But, look at following code:

1
2
3
4
5
6
// Let's throw a simple Fatal exception
scala> val f = future( throw new NotImplementedError() )(ec)
f: scala.concurrent.Future[Nothing] = scala.concurrent.impl.Promise$DefaultPromise@59747b17

scala> f.value
res0: Option[scala.util.Try[Nothing]] = None

Ok, the Future doesn’t contain the Fatal Exception as expected.

But where is my Fatal Exception if it’s not caught??? No crash, notification or whatever?

There should be an `UncaughtExceptionHandler at least notifying it!


The problem is in the default Scala ExecutionContext.

As explained in this issue, the exception is lost due to the implementation of the default global ExecutionContext provided in Scala.

This is a simple ForkJoin pool of threads but it has no UncaughtExceptionHandler. Have a look at code in Scala 2.10.0 ExecutionContextImpl.scala

1
2
3
4
5
6
7
8
9
10
try {
      new ForkJoinPool(
        desiredParallelism,
        threadFactory,
        null, //FIXME we should have an UncaughtExceptionHandler, see what Akka does
        true) // Async all the way baby
    } catch {
      case NonFatal(t) =>
        ...
    }

Here it’s quite clear: there is no registered `UncaughtExceptionHandler.

What’s the consequence?

Your Future hangs forever

1
2
3
4
scala> Await.result(f, 30 seconds)
warning: there were 1 feature warnings; re-run with -feature for details
java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
  at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:96)

As you can see, you can wait as long as you want, the Future is never redeemed properly, it just hangs forever and you don’t even know that a Fatal Exception has been thrown.

As explained in the issue, please note, if you use a custom ExecutionContext based on SingleThreadExecutor, this issue doesn’t appear!

1
2
3
4
5
6
7
8
9
10
11
scala> val es = java.util.concurrent.Executors.newSingleThreadExecutor
es: java.util.concurrent.ExecutorService = java.util.concurrent.Executors$FinalizableDelegatedExecutorService@1e336f59

scala> val ec = ExecutionContext.fromExecutorService(es)
ec: scala.concurrent.ExecutionContextExecutorService = scala.concurrent.impl.ExecutionContextImpl$$anon$1@34f43dac

scala>  val f = Future[Unit](throw new NotImplementedError())(ec)
Exception in thread "pool-1-thread-1" f: scala.concurrent.Future[Unit] = scala.concurrent.impl.Promise$DefaultPromise@7d01f935
scala.NotImplementedError: an implementation is missing
  at $line41.$read$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$1.apply(<console>:15)
  at $line41.$read$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$1.apply(<console>:15)

Conclusion

In Scala 2.10.0, if you have a Fatal Exception in a Future callback, your Future just trashes the Fatal Exception and hangs forever without notifying anything.

Hopefully, due to this already merged PR, in a future delivery of Scala 2.10.x, this problem should be corrected.

To finish, in the same old good issue, Viktor Klang also raised the question of what should be considered as fatal or not:

there’s a bigger topic at hand here, the one whether NotImplementedError, InterruptedException and ControlThrowable are to be considered fatal or not.

Meanwhile, be aware and take care ;)

Have Promise[NonFatal]!

Now play-json stand-alone is officially a stand-alone module in Play2.2. So you don’t have to use the dependency given in this article anymore but use Typesafe one like com.typesafe.play:play-json:2.2


In a very recent Pull Request, `play-json has been made a stand-alone module in Play2.2-SNAPSHOT master as play-iteratees.

It means:

  • You can take Play2 Scala Json API as a stand-alone library and keep using Json philosophy promoted by Play Framework anywhere.
  • play-json module is stand-alone in terms of dependencies but is a part & parcel of Play2.2 so it will evolve and follow Play2.x releases (and following versions) always ensuring full compatibility with play ecosystem.
  • play-json module has 3 ultra lightweight dependencies:
    • play-functional
    • play-datacommons
    • play-iteratees

These are pure Scala generic pieces of code from Play framework so no Netty or whatever dependencies in it.
You can then import play-json in your project without any fear of bringing unwanted deps.

play-json will be released with future Play2.2 certainly so meanwhile, I provide:


Even if the version is 2.2-SNAPSHOT, be aware that this is the same code as the one released in Play 2.1.0. This API has reached a good stability level. Enhancements and bug corrections will be brought to it but it’s production-ready right now.

Adding play-json 2.2-SNAPSHOT in your dependencies

In your Build.scala, add:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
import sbt._
import Keys._

object ApplicationBuild extends Build {

  val mandubianRepo = Seq(
    "Mandubian repository snapshots" at "https://github.com/mandubian/mandubian-mvn/raw/master/snapshots/",
    "Mandubian repository releases" at "https://github.com/mandubian/mandubian-mvn/raw/master/releases/"
  )

  lazy val playJsonAlone = Project(
    BuildSettings.buildName, file("."),
    settings = BuildSettings.buildSettings ++ Seq(
      resolvers ++= mandubianRepo,
      libraryDependencies ++= Seq(
        "play"        %% "play-json" % "2.2-SNAPSHOT",
        "org.specs2"  %% "specs2" % "1.13" % "test",
        "junit"        % "junit" % "4.8" % "test"
      )
    )
  )
}

Using play-json 2.2-SNAPSHOT in your code:

Just import the following and get everything from Play2.1 Json API:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import play.api.libs.json._
import play.api.libs.functional._

case class EucalyptusTree(col:Int, row: Int)

object EucalyptusTree{
  implicit val fmt = Json.format[EucalyptusTree]
}

case class Koala(name: String, home: EucalyptusTree)

object Koala{
  implicit val fmt = Json.format[Koala]
}

val kaylee = Koala("kaylee", EucalyptusTree(10, 23))

println(Json.prettyPrint(Json.toJson(kaylee)))

Json.fromJson[Koala](
  Json.obj(
    "name" -> "kaylee",
    "home" -> Json.obj(
      "col" -> 10,
      "row" -> 23
    )
)

Using play-json, you can get some bits of Play Framework pure Web philosophy.
Naturally, to unleash its full power, don’t hesitate to dive into Play Framework and discover 100% full Web Reactive Stack ;)

Thanks a lot to Play Framework team for promoting play-json as stand-alone module!
Lots of interesting features incoming soon ;)

Have fun!

Let’s go on unveiling Datomisca a bit more.

Remember Datomisca is an opensource Scala API (sponsored by Pellucid and Zenexity) trying to enhance Datomic experience for Scala developers.

After evoking queries compiled by Scala macros in previous article, I’m going to describe how Datomisca allows to create Datomic fact operations in a programmatic way and sending them to Datomic transactor using asynchronous/non-blocking API based on Scala 2.10 Future/ExecutionContext.


Facts about Datomic

First, let’s remind a few facts about Datomic:

Datomic is a immutable fact-oriented distributed schema-constrained database

It means:

Datomic stores very small units of data called facts

Yes no tables, documents or even columns in Datomic. Everything stored in it is a very small fact.


Fact is the atomic unit of data

Facts are represented by the following tuple called Datom

1
datom = [entity attribute value tx]
  • entity is an ID and several facts can share the same ID making them facts of the same entity. Here you can see that an entity is very loose concept in Datomic.
  • attribute is just a namespaced keyword : :person/name which is generally constrained by a typed schema attribute. The namespace can be used to logically identify an entity like “person” by regrouping several attributes in the same namespace.
  • value is the value of this attribute for this entity at this instant
  • tx uniquely identifies the transaction in which this fact was inserted. Naturally a transaction is associated with a time.

Facts are immutable & temporal

It means that:

  • You can’t change the past
    Facts are immutable ie you can’t mutate a fact as other databases generally do: Datomic always creates a new version of the fact with a new value.
  • Datomic always grows
    If you add more facts, nothing is deleted so the DB grows. Naturally you can truncate a DB, export it and rebuild a new smaller one.
  • You can foresee a possible future
    From your present, you can temporarily add facts to Datomic without committing them on central storage thus simulating a possible future.

Reads/writes are distributed across different components

  • One Storage service storing physically the data (Dynamo DB/Infinispan/Postgres/Riak/…)
  • Multiple Peers (generally local to your app instances) behaving like high-speed synchronized cache obfuscating all the local data storage and synchro mechanism and providing the Datalog queries.
  • One (or several) transactor(s) centralizing the write mechanism allowing ACID transactions and notifying peers about those evolutions.

For more info about architecture, go to this page


Immutability means known DB state is always consistent

You might not be up-to-date with central data storage as Datomic is distributed, you can even lose connection with it but the data you know are always consistent because nothing can be mutated.

This immutability concept is one of the most important to understand in Datomic.


Schema contrains entity attributes

Datomic allows to define that a given attribute must :

  • be of given type : String or Long or Instant etc…
  • have cardinality (one or many)
  • be unique or not
  • be fullsearchable or not
  • be documented

It means that if you try to insert a fact with an attribute and a value of the wrong type, Datomic will refuse it.

Datomic entity can also reference other entities in Datomic providing relations in Datomic (even if Datomic is not RDBMS). One interesting thing to know is that all relations in Datomic are bidirectional.

I hope you immediately see the link between these typed schema attributes and potential Scala type-safe features…


Author’s note : Datomic is more about evolution than mutation
I’ll let you meditate this sentence linked to theory of evolution ;)


Datomic operations

When you want to create a new fact in Datomic, you send a write operation request to the Transactor.

Basic operations

There are 2 basic operations:

Add a Fact

1
[:db/add entity-id attribute value]

Adding a fact for the same entity will NOT update existing fact but create a new fact with same entity-id and a new tx.

Retract a Fact

1
[:db/retract entity-id attribute value]

Retracting a fact doesn’t erase any fact but just tells: “for this entity-id, from now, there is no more this attribute”

You might wonder why providing the value when you want to remove a fact? This is because an attribute can have a MANY cardinality in which case you want to remove just a value from the set of values.

Entity operations

In Datomic, you often manipulate groups of facts identifying an entity. An entity has no physical existence in Datomic but is just a group of facts having the same entity-id. Generally, the attributes constituting an entity are logically grouped under the same namespace (:person/name, :person/age…) but this is not mandatory at all.

Datomic provides 2 operations to manipulate entities directly

Add Entity

1
2
3
{:db/id #db/id[:db.part/user -1]
  :person/name "Bob"
  :person/spouse #db/id[:db.part/user -2]}

Actually this is equivalent to 2 Add-Fact operations:

1
2
3
(def id #db/id[:db.part/user -1])
[:db/add id :person/name "Bob"]
[:db/add id :person/age 30]

Retract Entity

1
[:db.fn/retractEntity entity-id]

Special case of identified values

In Datomic, there are special entities built using the special attribute :db/ident of type Keyword which are said to be identified by the given keyword.

There are created as following:

1
2
[:db/add #db/id[:db.part/user] :db/ident :person.characters/clever]
[:db/add #db/id[:db.part/user] :db/ident :person.characters/dumb]

If you use :person.characters/clever or :person.characters/dumb, it references directly one of those 2 entities without using their ID.

You can see those identified entities as enumerated values also.

Now that you know how it works in Datomic, let’s go to Datomisca!


Datomisca programmatic operations

Datomisca’s preferred way to build Fact/Entity operations is programmatic because it provides more flexibility to Scala developers. Here are the translation of previous operations in Scala:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
import datomisca._
import Datomic._

// creates a Namespace
val person = Namespace("person")

// creates a add-fact operation 
// It creates the datom (id keyword value _) from
//   - a temporary id (or a final long ID)
//   - the couple `(keyword, value)`
val addFact = Fact.add(DId(Partition.USER))(person / "name" -> "Bob")

// creates a retract-fact operation
val retractFact = Fact.retract(DId(Partition.USER))(person / "age" -> 123L)

// creates identified values
val violent = AddIdent(person.character / "violent")
val dumb = AddIdent(person.character / "dumb")

// creates a add-entity operation
val addEntity = Entity.add(DId(Partition.USER))(
  person / "name" -> "Bob",
  person / "age" -> 30L,
  person / "characters" -> Set(violent.ref, dumb.ref)
)

// creates a retract-entity operation from real Long ID of the entity
val retractEntity = Entity.retract(3L)

val ops = Seq(addFact, retractFact, addEntity, retractEntity)

Note that:

  • person / "name" creates the keyword :person/name from namespace person
  • DId(Partition.USER) generates a temporary Datomic Id in Partition USER. Please note that you can create your own partition too.
  • violent.ref is used to access the keyword reference of the identified entity.
  • ops = Seq(…) represents a collection of operations to be sent to transactor.

Datomisca Macro operations

Remember the way Datomisca dealt with query by parsing/validating Datalog/Clojure queries at compile-time using Scala macros?

You can do the same in Datomisca with operations:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
val id = DId(Partition.USER)
val weak = AddIdent( person / "character", "weak"))
val dumb = AddIdent( person / "character", "dumb"))

val ops = Datomic.ops("""[
   [:db/add #db/id[:db.part/user] :db/ident :region/n]
   [:db/add \${DId(Partition.USER)} :db/ident :region/n]
   [:db/retract #db/id[:db.part/user] :db/ident :region/n]
   [:db/retractEntity 1234]
   {
      :db/id \${id}
      :person/name "toto"
      :person/age 30
      :person/characters [ \$weak \$dumb ]
   }
]""")

It compiles what’s between """…""" at compile-time and tells you if there are errors and then it builds Scala corresponding operations.

Ok it’s cool but if you look better, you’ll see there is some sugar in this Clojure code:

  • \${DId(Partition.USER)}
  • \$weak
  • \$dumb

You can use Scala variables and inject them into Clojure operations at compile-time as you do for Scala string interpolation

For Datomic queries, the compiled way is really natural but we tend to prefer programmatic way to build operations because it feels to be much more “scala-like” after experiencing both methods.

Datomisca runtime parsing

There is a last way to create operations by parsing at runtime a String and throwing an exception if the syntax is not valid.

1
val ops = Datomic.parseOps(""" … """)

It’s very useful if you have existing Datomic Clojure files (containing schema or bootstrap data) that you want to load into Datomic.


Datomisca reactive transactions

Last but not the least, let’s send those operations to Datomic Transactor.

In its Java API, Datomic Connection provides a transact asynchronous API based on a ListenableFuture. This API can be enhanced in Scala because Scala provides much more evolved asynchronous/non-blocking facilities than Java based on Scala 2.10 Future/ExecutionContext.

Future allows to implement your asynchronous call using continuation style based on Scala classic map/flatMap methods. ExecutionContext is a great tool allowing to specify in which pool of threads your asynchronous call will be executed making it non-blocking with respect to your current execution context (or thread).

This new feature is really important when you work with reactive API such as Datomisca or Play too so don’t hesitate to study it further.

Let’s look at code directly to show how it works in Datomisca:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
import datomisca._
import Datomic._

// don't forget to bring an ExecutionContext in your scope… 
// Here is default Scala ExecutionContext which is a simple pool of threads with one thread per core by default
import scala.concurrent.ExecutionContext.Implicits.global

// creates an URI
val uri = "datomic:mem://mydatomicdn"
// creates implicit connection
implicit val conn = Datomic.connect(uri)

// a few operations
val ops = Seq(addFact, retractFact, addEntity, retractEntity)

val res: Future[R] = Datomic.transact(ops) map { tx : TxReport =>
   // do something
   

   // return a value of type R (anything you want)
   val res: R = 

   res
}

// Another example by building ops directly in the transact call and using flatMap
Datomic.transact(
  Entity.add(id)(
    person / "name"      -> "toto",
    person / "age"       -> 30L,
    person / "character" -> Set(weak.ref, dumb.ref)
  ),
  Entity.add(DId(Partition.USER))(
    person / "name"      -> "tutu",
    person / "age"       -> 54L,
    person / "character" -> Set(violent.ref, clever.ref)
  ),
  Entity.add(DId(Partition.USER))(
    person / "name"      -> "tata",
    person / "age"       -> 23L,
    person / "character" -> Set(weak.ref, clever.ref)
  )
) flatMap { tx =>
  // do something
  
  val res: Future[R] = 

  res
}

Please note the tx: TxReport which is a structure returned by Datomic transactor containing information about last transaction.

Datomisca resolving Real ID

In all samples, we create operations based on temporary ID built by Datomic in a given partition.

1
DId(Partition.USER)

But once you have inserted a fact or an entity into Datomic, you need to resolve the real final ID to use it further because the temporary ID is no more meaningful.

The final ID is resolved from the TxReport send back by Datomic transactor. This TxReport contains a map between temporary ID and final ID. Here is how you can use it in Datomisca:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
val id1 = DId(Partition.USER)
val id2 = DId(Partition.USER)

Datomic.transact(
  Entity.add(id1)(
    person / "name"      -> "toto",
    person / "age"       -> 30L,
    person / "character" -> Set(weak.ref, dumb.ref)
  ),
  Entity.add(id2)(
    person / "name"      -> "tutu",
    person / "age"       -> 54L,
    person / "character" -> Set(violent.ref, clever.ref)
  )
) map { tx =>
  val finalId1: Long = tx.resolve(id1)
  val finalId2: Long = tx.resolve(id2)
  // or
  val List(finalId1, finalId2) = List(id1, id2) map { tx.resolve(_) }
}
That’s all for now… Next articles about writing programmatic Datomic schema with Datomisca.

Have Promise[Fun]!

Last week, we have launched Datomisca, our opensource Scala API trying to enhance Datomic experience for Scala developers.

Datomic is great in Clojure because it is was made for it. Yet, we believe Scala can provide a very good platform for Datomic too because the functional concepts found in Clojure are also in Scala except that Scala is a compiled and statically typed language whereas Clojure is dynamic. Scala could also bring a few features on top of Clojure based on its features such as static typing, typeclasses, macros…

This article is the first of a serie of articles aiming at describing as shortly as possible specific features provided by Datomisca. Today, let’s present how Datomisca enhances Datomic queries.


Query in Datomic?

Let’s take the same old example of a Person having :

  • a name String
  • a age Long
  • a birth Date

So, how do you write a query in Datomic searching a person by its name? Like that…

1
2
3
4
(def q [ :find ?e
  :in $ ?name
  :where [ ?e :person/name ?name ]
])

As you can see, this is Clojure using Datalog rules.

In a summary, this query:

  • accepts 2 inputs parameters:
    • a datasource $
    • a name parameter ?name
  • searches facts respecting datalog rule [ ?e :person/name ?name ]: a fact having attribute :person/name with value ?name
  • returns the ID of the found facts ?e

Reminders about Datomic queries

Query is a static data structure

An important aspect of queries to understand in Datomic is that a query is purely a static data structure and not something functional. We could compare it to a prepared statement in SQL: build it once and reuse it as much as you need.

Query has input/ouput parameters

In previous example:

  • :in enumerates input parameters
  • :find enumerates output parameters

When executing this query, you must provide the right number of input parameters and you will retrieve the given number of output parameters.


Query in Datomisca?

So now, how do you write the same query in Datomisca?

1
2
3
4
5
val q  = Query("""
[ :find ?e
  :in $ ?name
  :where [ ?e :person/name ?name ] 
]""")

I see you’re a bit disappointed: a query as a string whereas in Clojure, it’s a real data structure…

This is actually the way the Java API sends query for now. Moreover, using strings like implies potential bad practices such as building queries by concatenating strings which are often the origin of risks of code injection in SQL for example…

But in Scala we can do a bit better using new Scala 2.10 features : Scala macros.

So, using Datomisca, when you write this code, in fact, the query string is parsed by a Scala macro:

  • If there are any error, the compilation breaks showing where the error was detected.
  • If the query seems valid (with respect to our parser), the String is actually replaced by a AST representing this query as a data structure.
  • The input/output parameters are infered to determine their numbers.

Please note that the compiled query is a simple immutable AST which could be manipulated as a Clojure query and re-used as many times as you want.

Example OK with single output

1
2
3
4
5
6
7
8
9
scala> import datomisca._
import datomisca._

scala> val q  = Query("""
     [ :find ?e
       :in $ ?name
       :where [ ?e :person/name ?name ] 
     ]""")
q: datomisca.TypedQueryAuto2[datomisca.DatomicData,datomisca.DatomicData,datomisca.DatomicData] = [ :find ?e :in $ ?name :where [?e :person/name ?name] ]

Without going in deep details, here you can see that the compiled version of q isn’t a Query[String] but a TypedQueryAuto2[DatomicData, DatomicData, DatomicData] being an AST representing the query.

TypedQueryAuto2[DatomicData, DatomicData, DatomicData] means you have:

  • 2 input parameters $ ?name of type DatomicData and DatomicData
  • Last type parameter represents output parameter ?e of type DatomicData

Note : DatomicData is explained in next paragraph.


Example OK with several outputs

1
2
3
4
5
6
7
8
9
10
scala> import datomisca._
import datomisca._

scala> val q  = Query("""
     [ :find ?e ?age
       :in $ ?name
       :where [ ?e :person/name ?name ] 
              [ ?e :person/age ?age ]  
     ]""")
q: datomisca.TypedQueryAuto2[datomisca.DatomicData,datomisca.DatomicData,(datomisca.DatomicData, datomisca.DatomicData)] = [ :find ?e ?age :in $ ?name :where [?e :person/name ?name] [?e :person/age ?age] ]

TypedQueryAuto2[DatomicData,DatomicData,(DatomicData, DatomicData)] means you have:

  • 2 input parameters $ ?name of type DatomicData
  • last tupled type parameter represents the 2 output parameters ?e ?age of type DatomicData

Examples with syntax-error

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
scala> import datomisca._
import datomisca._

scala> val q  = Query("""
     [ :find ?e
       :in $ ?name
       :where [ ?e :person/name ?name 
     ]""")
<console>:14: error: `]' expected but end of source found
     ]""")
      ^

scala> val q  = Query("""
     [ :find ?e
       :in $ ?name
       :where [ ?e person/name ?name ]
     ]""")
<console>:13: error: `]' expected but `p' found
       :where [ ?e person/name ?name ]
                   ^

Here is see that the compiler will tell you where it detects syntax errors.

The query compiler is not yet complete so don’t hesitate to report us when you discover issues.


What’s DatomicData ?

Datomisca wraps completely Datomic API and types. So Datomisca doesn’t let any Datomic/Clojure types perspirating into its domain and wraps them all in the so-called DatomicData which is the abstract parent trait of all Datomic types seen from Datomisca. For each Datomic type, you have the corresponding specific DatomicData:

  • DString for String
  • DLong for Long
  • DatomicFloat for Float
  • DSet for Set
  • DInstant for Instant

Why not using Pure Scala types directly?

Firstly, because type correspondence is not exact between Datomic types and Scala. The best sample is Instant: is it a java.util.Date or a jodatime.DateTime?

Secondly, we wanted to keep the possibility of converting Datomic types into different Scala types depending on our needs so we have abstracted those types.

This abstraction also isolates us and we can decide exactly how we want to map Datomic types to Scala. The trade-off is naturally that, if new types appear in Datomic, we must wrap them.


Keep in mind that Datomisca queries accept and return DatomicData

All query data used as input and output paremeters shall be DatomicData. When getting results, you can convert those generic DatomicData into one of the previous specific types (DString, DLong, … ).

From DatomicData, you can also convert to Scala pure types based on implicit typeclasses:

1
2
3
4
5
6
7
8
DatomicData.as[T](implicit rd: DDReader[DatomicData, T])

scala> DString("toto").as[String]
res0: String = toto

scala> DString("toto").as[Long]
java.lang.ClassCastException: datomisca.DString cannot be cast to datomisca.DLong
...

Note 1 : that current Scala query compiler is a bit restricted to the specific domain of Datomic queries and doesn’t support all Clojure syntax which might create a few limitations when calling Clojure functions in queries. Anyway, a full Clojure syntax Scala compiler is in the TODO list so these limitations will disappear once it’s implemented…


Note 2 : Current macro just infers the number of input/output parameters but, using Schema typed attributes that we will present in a future article, we will provide some deeper features such as parameter type inference.


Execute the query

You can create queries independently of any connection to Datomic. But you need an implicit DatomicConnection in your scope to execute it.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import datomisca._
import Datomic._

// Creates an implicit connection
val uri = "…"
implicit lazy val conn = Datomic.connect(uri)

// Creates the query
val queryFindByName = Query("""
[ :find ?e ?birth
  :in $ ?name
  :where [ ?e :person/name ?name ]
         [ ?e :person/birth ?birth ]        
]""")

// Executes the query     
val results: List[(DatomicData, DatomicData] = Datomic.q(queryFindByName, database, DString("John"))
// Results type is precised for the example but not required

Please note we made the database input parameter mandatory even if it’s implicit in when importing Datomic._ because in Clojure, it’s also required and we wanted to stick to it.

Compile-error if wrong number of inputs

If you don’t provide 2 input parameters, you will get a compile error because the query expects 2 input parameters.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
// Following would not compile because query expects 2 input parameters
val results: List[(DatomicData, DatomicData] = Datomic.q(queryFindByName, DString("John"))

[info] Compiling 1 Scala source to /Users/pvo/zenexity/workspaces/workspace_pellucid/datomisca/samples/getting-started/target/scala-2.10/classes...
[error] /Users/pvo/zenexity/workspaces/workspace_pellucid/datomisca/samples/getting-started/src/main/scala/GettingStarted.scala:87: overloaded method value q with alternatives:
[error]   [A, R(in method q)(in method q)(in method q)(in method q)(in method q)(in method q)(in method q)(in method q)](query: datomisca.TypedQueryAuto1[A,R(in method q)(in method q)(in method q)(in method q)(in method q)(in method q)(in method q)(in method q)], a: A)(implicit db: datomisca.DDatabase, implicit ddwa: datomisca.DD2Writer[A], implicit outConv: datomisca.DatomicDataToArgs[R(in method q)(in method q)(in method q)(in method q)(in method q)(in method q)(in method q)(in method q)])List[R(in method q)(in method q)(in method q)(in method q)(in method q)(in method q)(in method q)(in method q)] <and>
[error]   [R(in method q)(in method q)(in method q)(in method q)(in method q)(in method q)(in method q)(in method q)](query: datomisca.TypedQueryAuto0[R(in method q)(in method q)(in method q)(in method q)(in method q)(in method q)(in method q)(in method q)], db: datomisca.DDatabase)(implicit outConv: datomisca.DatomicDataToArgs[R(in method q)(in method q)(in method q)(in method q)(in method q)(in method q)(in method q)(in method q)])List[R(in method q)(in method q)(in method q)(in method q)(in method q)(in method q)(in method q)(in method q)] <and>
[error]   [OutArgs <: datomisca.Args, T](q: datomisca.TypedQueryInOut[datomisca.Args1,OutArgs], d1: datomisca.DatomicData)(implicit db: datomisca.DDatabase, implicit outConv: datomisca.DatomicDataToArgs[OutArgs], implicit ott: datomisca.ArgsToTuple[OutArgs,T])List[T] <and>
[error]   [InArgs <: datomisca.Args](query: datomisca.PureQuery, in: InArgs)(implicit db: datomisca.DDatabase)List[List[datomisca.DatomicData]]
[error]  cannot be applied to (datomisca.TypedQueryAuto2[datomisca.DatomicData,datomisca.DatomicData,(datomisca.DatomicData, datomisca.DatomicData)], datomisca.DString)
[error]         val results = Datomic.q(queryFindByName, DString("John"))
[error]                               ^
[error] one error found
[error] (compile:compile) Compilation failed

The compile error seems a bit long as the compiler tries a few different version of Datomic.q but just remind that when you see cannot be applied to (datomisca.TypedQueryAuto2[…, it means you provided the wrong number of input parameters.


Use query results

Query results are List[DatomicData…] depending on the output parameters inferred by the Scala macros.

In our case, we have 2 output parameters so we expect a List[(DatomicData, DatomicData)]. Using List.map (or headOption to get the first one only), you can then use pattern matching to specialize your (DatomicData, DatomicData) to (DLong, DInstant) as you expect.

1
2
3
4
5
6
results map {
  case (e: DLong, birth: DInstant) =>
    // converts into Scala types
    val eAsLong = e.as[Long]
    val birthAsDate = birth.as[java.util.Date]
}

Note 1: that when you want to convert your DatomicData, you can use our converters based on implicit typeclasses as following

Note 2: The Scala macro has not way just based on query to infer the real types of output parameters but ther is a TODO in the roadmap: using typed schema attributes presented in a future article, we will be able to do better certainly… Be patient ;)


More complex queries

As Datomisca parses the queries, you may wonder what is the level of completeness of the query parser for now?

Here are a few examples showing what can be executed already:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
////////////////////////////////////////////////////
// using variable number of inputs
val q = Query("""[
 :find ?e
 :in $ [?names ...] 
 :where [?e :person/name ?names]
]""")

Datomic.q(q, database, DSet(DString("toto"), DString("tata")))

////////////////////////////////////////////////////
// using tuple inputs
val q = Query("""[
  :find ?e ?name ?age
  :in $ [[?name ?age]]
  :where [?e :person/name ?name]
         [?e :person/age ?age]
]""")

Datomic.q(q,
  database,
  DSet(
    DSet(DString("toto"), DLong(30L)),
    DSet(DString("tutu"), DLong(54L))
  )
)

////////////////////////////////////////////////////
// using function such as fulltext search
val q = Query("""[
  :find ?e ?n
  :where [(fulltext $ :person/name "toto") [[ ?e ?n ]]]
]""")

////////////////////////////////////////////////////
// using rules
val totoRule = Query.rules("""
[ [ [toto ?e]
    [?e :person/name "toto"]
] ]
""")

val q = Query("""[
 :find ?e ?age
 :in $ %
 :where [?e :person/age ?age]
        (toto ?e)
]
""")

////////////////////////////////////////////////////
// using query specifying just the field in fact to be searched
val q = Query("""[:find ?e :where [?e :person/name]]""")

Note that currently Datomisca reserializes queries to string when executing because Java API requires it but once Datomic Java API accepts that we pass List[List[Object]] instead of strings for query, the interaction will be more direct…

Next articles about Datomic operations to insert/retract facts or entities in Datomic using Datomisca.

Have datomiscafun!

You can find the code on my Github code Maquereau

Maquereau [Scomber Scombrus]

[/makʀo/] in french phonetics

Since I discovered Scala Macros with Scala 2.10, I’ve been really impressed by their power. But great power means great responsibility as you know. Nevertheless, I don’t care about responsability as I’m just experimenting. As if mad scientists couldn’t experiment freely!

Besides being a very tasty pelagic fish from scombroid family, Maquereau is my new sandbox project to experiment eccentric ideas with Scala Macros.

Here is my first experiment which aims at studying the concepts of pataphysics applied to Scala Macros.



Pataphysics applied to Macros

Programming is math

I’ve heard people saying that programming is not math.
This is really wrong, programming is math.

And let’s be serious, how would you seek attention in urbane cocktails without those cute little words such as functors, monoids, contravariance, monads?

She/He> What do you do?
You> I’m coding a list fold.
She/He> Ah ok, bye.
You> Hey wait…


She/He> What do you do?
You> I’m deconstructing my list with a catamorphism based on a F-algebra as underlying functor.
She/He> Whahhh this is so exciting! Actually you’re really sexy!!!
You> Yes I known insignificant creature!


Programming is also a bit of Physics

Code is static meanwhile your program is launched in a runtime environment which is dynamic and you must take these dynamic aspects into account in your code too (memory, synchronization, blocking calls, resource consumption…). For the purpose of the demo, let’s accept programming also implies some concepts of physics when dealing with dynamic aspects of a program.


Compiling is Programming Metaphysics

Between code and runtime, there is a weird realm, the compiler runtime which is in charge of converting static code to dynamic program:

  • The compiler knows things you can’t imagine.
  • The compiler is aware of the fundamental nature of math & physics of programming.
  • The compiler is beyond these rules of math & physics, it’s metaphysics.

Macro is Pataphysics

Now we have Scala Macros which are able:

  • to intercept the compiling process for a given piece of code
  • to analyze the compiler AST code and do some computation on it
  • to generate another AST and inject it back into the compile-chain

When you write a Macro in your own code, you write code which runs in the compiler runtime. Moreover a macro can go even further by asking for compilation from within the compiler: c.universe.reify{ some code }… Isn’t it great to imagine those recursive compilers?

So Scala macro knows the fundamental rules of the compiler. Given compiler is metaphysics, Scala macro lies beyond metaphysics and the science studying this topic is called pataphysics.

This science provides very interesting concepts and this article is about applying them to the domain of Scala Macros.

I’ll let you discover pataphysics by yourself on wikipedia


Let’s explore the realm of pataphysics applied to Scala macro development by implementing the great concept of patamorphism, well-known among pataphysicians.



Defining Patamorphism

In 1976, the great pataphysician, Ernst Von Schmurtz defined patamorphism as following:

A patamorphism is a patatoid in the category of endopatafunctors…

Explaining the theory would be too long with lots of weird formulas. Let’s just skip that and go directly to the conclusions.

First of all, we can consider the realm of pataphysics is the context of Scala Macros.


Now, let’s take point by point and see if Scala Macro can implement a patamorphism.

A patamorphism should be callable from outside the realm of pataphysics

A Scala macro is called from your code which is out of the realm of Scala macro.

A patamorphism can’t have effect outside the realm of pataphysics after execution

This means we must implement a Scala Macro that :

  • has effect only at compile-time
  • has NO effect at run-time

From outside the compiler, a patamorphism is an identity morphism that could be translated in Scala as:

1
def pataMorph[T](t: T): T

A patamorphism can change the nature of things while being computed

Even if it has no effect once applied, meanwhile it is computed, it can :

  • have side-effects on anything
  • be blocking
  • be mutable

Concerning these points, nothing prevents a Scala Macro from respecting those points.

A patamorphism is a patatoid

You may know it but patatoid principles require that the morphism should be customisable by a custom abstract seed. In Scala samples, patatoid are generally described as following:

1
2
3
4
5
6
trait Patatoid {
  // Seed is specific to a Patatoid and is used to configure the sprout mechanism  
  type Seed
  // sprout is the classic name
  def sprout[T](t: T)(seed: Seed): T
}

So a patamorphism could be implemented as :

1
trait PataMorphism extends Patatoid

A custom patamorphism implemented as a Scala Macro would be written as :

1
2
3
4
5
6
7
object MyPataMorphism extends PataMorphism {
  type Seed = MyCustomSeed
  def sprout[T](t: T)(seed: Seed): T = macro sproutImpl

  // here is the corresponding macro implementation
  def sproutImpl[T: c1.WeakTypeTag](c1: Context)(t: c1.Expr[T])(seed: c1.Expr[Seed]): c1.Expr[T] = {  }
}

But in current Scala Macro API, this is not possible for a Scala Macro to override an abstract function so we can’t write it like that and we need to trick a bit. Here is how we can do simply :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
trait Patatoid{
  // Seed is specific to a Patatoid and is used to configure the sprout mechanism  
  type Seed
  // we put the signature of the macro implementation in the abstract trait
  def sproutMacro[T: c1.WeakTypeTag](c1: Context)(t: c1.Expr[T])(seed: c1.Expr[Seed]): c1.Expr[T]
}

/**
  * PataMorphism 
  */
trait PataMorphism extends Patatoid

// Custom patamorphism
object MyPataMorphism extends PataMorphism {
  type Seed = MyCustomSeed
  // the real sprout function expected for patatoid
  def sprout[T](t: T)(implicit seed: Seed): T = macro sproutMacro[T]

  // the real implementation of the macro and of the patatoid abstract operation
  def sproutMacro[T: c1.WeakTypeTag](c1: Context)(t: c1.Expr[T])(seed: c1.Expr[Seed]): c1.Expr[T] = {
    
    // Your implementation here
    
  }
}

Conclusion

We have shown that we could implement a patamorphism using a Scala Macro.

But the most important is the implementation of the macro which shall:

  • have effect only at compile-time (with potential side-effect, sync, blocking)
  • have NO effect at runtime

Please note that pataphysics is the science of exceptions so all previous rules are true as long as there are no exception to them.

Let’s implement a 1st sample of patamorphism called VerySeriousCompiler.



Sample #1: VerySeriousCompiler

What is it?

VerySeriousCompiler is a pure patamorphism which allows to change compiler behavior by :

  • Choosing how long you want the compilation to last
  • Displaying great messages at a given speed while compiling
  • Returning the exact same code tree given in input

VerySeriousCompiler is an identity morphism returning your exact code without leaving any trace in AST after macro execution.

VerySeriousCompiler is implemented exactly using previous patamorphic pattern and the compiling phase can be configured using custom Seed:

1
2
3
4
5
6
/** Seed builder 
  * @param duration the duration of compiling in ms
  * @param speed the speed between each message display in ms
  * @param messages the messages to display
  */
def seed(duration: Long, speed: Long, messages: Seq[String])

When to use it?

VerySeriousCompiler is a useful tool when you want to have a coffee or discuss quietly at work with colleagues and fool your boss making him/her believe you’re waiting for the end of a very long compiling process.

To use it, you just have to modify your code using :

1
2
3
4
5
6
7
8
9
VerySeriousCompiler.sprout{
  some code
}

//or even 

val xxx = VerySeriousCompiler.sprout{
  some code returning something
}

Then you launch compilation for the duration you want, displaying meaningful messages in case your boss looks at your screen. Then, you have an excuse if your boss is not happy about your long pause, tell him/her: “Look, it’s compiling”.

Remember that this PataMorphism doesn’t pollute your code at runtime at all, it has only effects at compile-time and doesn’t inject any other code in the AST.



Usage

With default seed (5s compiling with msgs each 400ms)

1
2
3
4
5
6
7
import VerySeriousCompiler._

// Create a class for ex
case class Toto(name: String)

// using default seed
sprout(Toto("toto")) must beEqualTo(Toto("toto"))

If you compile:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
[info] Compiling 1 Scala source to /workspace_mandubian/maquereau/target/scala-2.11/classes...
[info] Compiling 1 Scala source to /workspace_mandubian/maquereau/target/scala-2.11/test-classes...
Finding ring kernel that rules them all...................
computing fast fourier transform code optimization....................
asking why Obiwan Kenobi...................
resolving implicit typeclass from scope....................
constructing costate comonad....................
Do you like gladiator movies?....................
generating language systemic metafunction....................
verifying isomorphic behavior....................
inflating into applicative functor...................
verifying isomorphic behavior...................
invoking Nyarlathotep to prevent crawling chaos....................
Hear me carefully, your eyelids are very heavy, you're a koalaaaaa....................
resolving implicit typeclass from scope...................
[info] PataMorphismSpec
[info] 
[info] VerySeriousCompiler should
[info] + sprout with default seed
[info] Total for specification PataMorphismSpec
[info] Finished in xx ms
[info] 1 example, 0 failure, 0 error
[info] 
[info] Passed: : Total 1, Failed 0, Errors 0, Passed 1, Skipped 0
[success] Total time: xx s, completed 3 f?vr. 2013 01:25:42

With custom seed (1s compiling with msgs each 200ms)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// using custom seed
sprout{
  val a = "this is"
  val b = "some code"
  val c = 123L
  s"msg $a $b $c"
}(
  VerySeriousCompiler.seed(
    1000L,     // duration of compiling in ms
    200L,       // speed between each message display in ms
    Seq(        // the message to display randomly 
      "very interesting message",
      "cool message"
    )
  )
) must beEqualTo( "msg this is some code 123" )

If you compile:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[info] Compiling 1 Scala source to /workspace_mandubian/maquereau/target/scala-2.11/classes...
[info] Compiling 1 Scala source to /workspace_mandubian/maquereau/target/scala-2.11/test-classes...
toto..........
coucou..........
toto..........
coucou..........
[info] PataMorphismSpec
[info] 
[info] VerySeriousCompiler should
[info] + sprout with custom seed
[info] Total for specification PataMorphismSpec
[info] Finished in xx ms
[info] 1 example, 0 failure, 0 error
[info] 
[info] Passed: : Total 1, Failed 0, Errors 0, Passed 1, Skipped 0
[success] Total time: xx s, completed 3 f?vr. 2013 01:25:42


Macro implementation details

The code can be found on Github Maquereau.

Here are the interesting points of the macro implementation.

Modular macro building

We use the method described on scalamacros.org/writing bigger macros

1
2
3
4
5
6
7
8
9
abstract class VerySeriousCompilerHelper {
  val c: Context
  
}

def sproutMacro[T: c1.WeakTypeTag](c1: Context)(t: c1.Expr[T])(seed: c1.Expr[Seed]): c1.Expr[T] = {
  val helper = new { val c: c1.type = c1 } with VerySeriousCompilerHelper
  helper.sprout(t, seed)
}

input code evaluation in macro

The Seed passed to the macro doesn’t belong to the realm of Scala Macro but to your code. In the macro, we don’t get the Seed type but the expression Expr[Seed]. So in order to use the seed value in the macro, we must evaluate the expression passed to the macro:

1
val speed = c.eval(c.Expr[Long](c.resetAllAttrs(speedTree.duplicate)))

Please note that this code is a work-around because in Scala 2.10, you can’t evaluate any code as you want due to some compiler limitations when evaluating an already typechecked tree in a macro. This is explained in this Scala issue


input code re-compiling before returning from macro

We don’t return directly the input tree in the macro even if it would be valid with respect to patamorphism contract. But to test Macro a bit further, I decided to ”re-compile” the input code from within the macro. You can do that using following code:

1
reify(a.splice)

Using macro paradise

The maquereau project is based on Macro Paradise which is the experimental branch of Scala Macros. This implementation of patamorphism doesn’t use any experimental feature from Macro Paradise but future samples will certainly.



Conclusion

This article shows that applying the concepts of pataphysics to Scala Macro is possible and can help creating really useful tools.

The sample is still a bit basic but it shows that a patamorphism can be implemented relatively simply.

I’ll create other samples and hope you’ll like them!

Have patafun!

Today, let’s talk a bit more about the JSON coast-to-coast design I had introduced as a buzz word in a previous article about Play2.1 Json combinators.

Sorry this is a very long article with tens of >140-chars strings: I wanted to put a few ideas on paper to present my approach before writing any line of code…

Original idea

1) Manipulate pure data structure (JSON) from client to DB without any static model
2) Focus on the idea of pure data manipulation and data-flow management

The underlying idea is more global than just questioning the fact that we generally serialize JSON data to static OO models. I really want to discuss how we manipulate data between frontend/backend(s). I’d also like to reconsider the whole data flow from a pure functional point of view and not through the lens of technical constraints such as “OO languages implies static OO models”.

In the code sample, we’ll use ReactiveMongo and Play2.1-RC2 and explain why those frameworks fit our demonstation.



Philosophical Considerations

Data-centric approach

Recent evolutions in backend architecture tend to push (again) UI to the frontend (the client side). As a consequence, backends concentrate more and more on data serving, manipulation, transformation, aggregation, distribution and naturally some business logic. I must admit that I’ve always considered backend are very good data provider and not so good UI provider. So I won’t say it’s a bad thing.

From this point of view, the backend focuses on:

  • getting/serving data from/to clients
  • retrieving/distributing data from/to other backends
  • storing data in DB/cache/files/whatever locally/remotely
  • doing some business logic (outside data manipulation)… sometimes but not so often.

I consider data management is the reason of being of backend and also the finality of it.

The word “data” is everywhere and that’s why I use the term “data-centric approach” (even if I would prefer that we speak about “information” more than data but this is another discussion…)



Backend system is a chain-link in global data-flow

Data-centric doesn’t mean centralized data

  • With Internet and global mobility, data tend to be gathered, re-distributed, scattered and re-shared logically and geographically
  • When you develop your server, you can receive data from many different sources and you can exchange data with many actors.

In this context, backend is often just a chain link participating to a whole data flow. So you must consider the relations existing with other actors of this system.

Data flow

Besides the simple “what does my backend receive, transmit or answer?”, it has become more important to consider the relative role of the backend in the whole data flow and not only locally. Basically, knowing your exact role in the data flow really impacts your technical design:

  • if your server must aggregate data from one backend which can respond immediately and another one which will respond tomorrow, do you use a runtime stateful framework without persistence for that?
  • if your server is just there to validate data before sending it to a DB for storage, do you need to build a full generic & static model for that?
  • if your server is responsible for streaming data in realtime to hundreds of clients, do you want to use a blocking framework ?
  • if your server takes part to a high-speed realtime transactional system, is it reasonable to choose ultra heavyweight frameworks ?


Rise of temporal & polymorphic data flows

In the past, we often used to model our data to be used by a single backend or a restricted system. Data model weren’t evolving too much for a few years. So you could choose a very strict data model using normalization in RDBMS for ex.

But since a few years, nature of data and their usage has changed a lot:

  • same data are shared with lots of different solutions,
  • same data are used in very different application domains,
  • unstructured data storage have increased
  • formats of data evolve much faster than before
  • global quantity of data has increased exponentially

The temporal nature of data has changed drastically also with:

  • realtime data streaming
  • on fhe fly distributed data updates
  • very long-time persistence
  • immutable data keeping all updates without losing anything

Nothing tremendous until now, isn’t it?

This is exactly what you already know or do every day…

I wanted to remind that a backend system is often only an element of a more global system and a chain link in a more global data flow.


Now let’s try to consider the data flow through a backend system taking all those modern aspects into account!


In the rest of this article, I’ll focus on a specific domain: the very classic case of backend-to-DB interaction.


For last 10 years, in all widespread enterprise platforms based on OO languages, we have all used those well-known ORM frameworks to manage our data from/to RDBMS. We have discovered their advantages and also their drawbacks. Let’s consider a bit how those ORM have changed our way of manipulating data.



The ORM prism of distortion

After coding with a few different backend platforms, I tend to think we have been kind-of mesmerized into thinking it’s non-sense to talk to a DB from an OO language without going through a static language structure such as a class. ORM frameworks are the best witnesses of this tendency.

ORM frameworks lead us to:

  • Get some data (from client for ex) in a pure data format such as JSON (this format will be used in all samples because it’s very representative)
  • Convert JSON to OO structure such as class instance
  • Transmit class instance to ORM framework that translates/transmits it to the DB mystery limbo.

All-Model Approach

Pros

Classes are OO natural structure

Classes are the native structures in OO languages so it seems quite natural to use them

Classes imply structural validations

Conversion into classes also implies type constraint validations (and more specific constraints with annotations or config) in order to verify data do not corrupt the DB

Boundaries isolation

By performing the conversion in OO, the client format is completely decorrelated from DB format making them separated layers. Moreover, by manipulating OO structure in code, the DB can be abstracted almost completely and one can imagine changing DB later. This seems a good manner in theory.

Business logic compliant

Once converted, class instances can be manipulated in business logic without taking care about the DB.


Cons

Requirement for Business Logic is not the most frequent case

In many cases (CRUD being the 1st of all), once you get class instance, business logic is simply non-existing. You just pass the class to the ORM, that’s all. So you simply serialize to a class instance to be able to speak to ORM which is a pity.

ORM forces to speak OO because they can’t speak anything else

In many cases, the only needed thing is data validation with precise (and potentially complex) constraints. A class is just a type validator but it doesn’t validate anything else. If the String should be an email, your class sees it as a String. So, frameworks have provided constraint validators based on annotations or external configurations.

Classes are not made to manipulate/validate dynamic forms of data

Classes are static structure which can’t evolve so easily because the whole code depends on those classes and modifying a model class can imply lots of code refactoring. If data format is not clear and can evolve, classes are not very good. If data are polymorphic and can be seen using different views, it generally ends into multiplying the number of classes making your code hard to maintain.

Hiding DB more than abstracting it

ORM approach is just a pure OO view of relational data. It states that outside OO, nothing exists and no data can be modelled with something else than an OO structure.
So ORMs haven’t tried bringing DB structures to OO language but kind-of pushing OO to the DB and abstracting the DB data structure almost completely. So you don’t manipulate DB data anymore but you manipulate OO “mimicing” more or less DB structures.

Was OO really a so good choice against relational approach???

It seemed a good idea to map DB data to OO structures. But we all know the problems brought by ORM to our solutions:

  • How DB structures are mapped to OO?
  • How relations are managed by ORM?
  • How updates are managed in time? (the famous cache issues)
  • How transactions are delimited in a OO world?
  • How full compatibility between all RDBMS can be ensured?
  • etc…

I think ORM just moved the problems:

Before ORM, you had problems of SQL
After ORM, you had problems of ORM

Now the difference is that issues appear on the abstraction layer (ie the ORM) which you don’t control at all and not anymore at the lower-level DB layer. SQL is a bit painful sometimes but it is the DB native language so when you have an error, it’s generally far easier to find why and how to work around.

My aim here is not to tell ORM are bad (and there aren’t so bad in a few cases).
I just want to point the OO deviation introduced by ORM in our way of modelling our data.
I’ll let you discover the subject by yourself and make your own mind and you don’t have to agree with me. As a very beginning, you can go to wikipedia there.



The All-Model world

What interests me more is the fact that ORM brought a very systematic way of manipulating the data flowing through backend systems. ORM dictates that whatever data you manipulate, you must convert it to a OO structure before doing anything else for more or less good reasons. OO are very useful when you absolutely want to manipulate static typed structures. But in other cases, isn’t it better to use a List or a Map or more versatile pure data structure such as Json tree?

Let’s call it the All-Model Approach : no data can be manipulated without a static model and underlying abstraction/serialization mechanism.

The first move into the All-Model direction was quite logical in reaction to the difficulty of SQL integration with OO languages. Hibernate idea, for ex, was to abstract completely the relational DB models with OO structures so that people don’t care anymore with SQL.

As we all know, in software industry, when an idea becomes a standard of fact, as became ORMs, balance is rarely reached between 2 positions. As a consequence, lots of people decided to completely trash SQL in favor of ORM. That’s why we have seen this litany of ORM frameworks around hibernate, JPA, Toplink and almost nobody could escape from this global move.



Living in a changing world

After a few years of suffering more or less with ORM, some people have begun to re-consider this position seeing the difficulty they had to use ORM. The real change of mind was linked to the evolution of the whole data ecosystem linked to internet, distribution of data and mobility of clients also.

NoSQL emergence

First, the move concerned the underlying layer: the DB.
RDBMS are really good to model very consistent data and provide robust transactions but not so good for managing high scalability, data distribution, massive updates, data streaming and very huge amount of data. That’s why we have seen these NoSQL new kids on the block initiated by Internet companies mainly.

Once again, the balance was upsetted: after the “SQL is evil” movement, there have been a funny “RDBMS is evil” movement. Extremist positions are not worth globally, what’s interesting is the result of NoSQL initiative. It allowed to re-consider the way we modelled our data: 100% normalized schema with full ACID transactions were no more the only possibility. NoSQL broke the rules: why not model your data as key/values, documents, graphs using redundancy, without full consistency if it fits better your needs?

I really think NoSQL breaking the holy RDBMS rule brought the important subject on the front stage: we care about the data, the way we manipulate them. We don’t need a single DB ruling them all but DB that answer to our requirements and not the other way… Data-Centric as I said before…

Why ORM again for NoQSL?

NoSQL DBs bring their own native API in general providing data representation fitting their own DB model. For ex, MongoDB is document oriented and a record is stored as a binary treemap.

But once again, we have seen ORM-kind API appear on top of those low-level API, as if we couldn’t think anymore in terms of pure data and not in terms of OO structures.

But the holy rule had been broken and next change would necessarily target ORM. So people rediscovered SQL could be used from modern languages (even OO) using simple mapping mechanism, simpler data structure (tuples, list, map, trees) and query facilities. Microsoft LINQ was a really interesting initiative… Modern languages such as Scala also bring interesting API based on the functional power of the language (cf Slick, Squeryl, Anorm etc…).

I know some people will tell replacing Class models by HashMaps makes the code harder to maintain and the lack of contract imposed by static typed classes results in code mess. I could answer I’ve seen exactly the same in projects having tens of classes to model all data views and it was also a mess impossible to maintain.

The question is not to forget static models but to use them only when required and keep simple and dynamic structures as much as possible.
ORM are still used widely but we can at least openly question their usefulness. Dictatorship is over and diversity is always better!

Layered genericity as a talisman

I want to question another fact in the way we model data:

  • we write generic OO model to have very strong static model and be independent of the DB.
  • we interact with those models using DAO providing all sorts of access functions.
  • we encapsulate all of that in abstract DB service layers to completely isolate from the DB.

Why?
“Maybe I’ll change the DB and I’ll be able to do it…”
“Maybe I’ll need to re-use those DAO somewhere else…”
“Maybe Maybe Maybe…”

It works almost like superstition, as if making everything generic with strict isolated layers was the only way to protect us against failure and that it will make it work and be reused forever…

Do you change DB very often without re-considering the whole model to fit DB specific features?
Do you reuse code so often without re-considering the whole design?

I don’t say layered design and boundaries isolation is bad once again. I just say it has a cost and consequences that we don’t really consider anymore.

By trying to hide the DB completely, we don’t use the real power that DB can provide to us and we forget their specific capacities. There are so many DB types (sql, nosql, key/value, transactional, graph, column etc…) on the market and choosing the right one according to your data requirements is really important…

DB diversity gives us more control on our data so why hiding them behind too generic layers?



The Data-Centric or No-Model approach

Let’s go back to our data-centric approach and try to manipulate data flow going through our backend system to the DB without OO (de)serialization in the majority of cases.

What I really need when I manipulate the data flow is:

  • being able to manipulate data directly
  • validating the data structure and format according to different constraints
  • transforming/aggregating data coming from different sources

I call it the Data-centric or No-Model approach. It doesn’t mean my data aren’t structured but that I manipulate the data as directly as possible without going through an OO model when I don’t need it.

No-Model Approach



Should I trash the all-model approach?

Answer : NO… You must find the right balance.

As explained before, using the same design for everything seems a good idea because homogeneity and standardization is a good principle in general.

But “in general” is not “always” and we often confound homogeneity with uniformity in its bad meaning i.e. diversity loss.

That’s why I prefer speaking about “Data-Centric approach” than “No-Model”: the important is to ponder your requirements with respect to your data flow and to choose the right tool:

  • If you need to perform business logic with your data, it’s often better to work with static OO structures so using a model might be better
  • If you just need to validate and transform your data, then why going through a model which is completely artificial.
  • If you just need to manipulate a real-time data flow, then manipulate the flow directly and forget models.

Now stop philosophizing and go to practice with a very basic sample as a beginning : let’s manipulate a flow of JSON data in very simple case of CRUD.
Hey this is the famous “Json coast-to-coast” approach ;)



Json Coast-to-Coast sample

To illustrate this data-centric approach manipulating a pure data flow without OO serialization, let’s focus on a pure CRUD sample based on JSON. I won’t speak about the client side to make it shorter but don’t forget the JSON data flow doesn’t begin or end at backend boundaries.

I also don’t focus on real-time flow here because this is worth another discussion. Play2.1 provides us with one of the best platform for real-time web applications. First get accustomed with data-centric design and then consider real-time data management…

The CRUD case is a very good one for our subject:

  • CRUD implies no backend business logic at all
    Backend receives data corresponding to entity, validates their format and directly transmit the data to the DB to be stored.

  • CRUD targets pure data resources and JSON is good to represent pure data in Web world.

  • CRUD is compliant to REST approach
    REST is very interesting because it implies that every resource is reachable through a single URL and a HTTP verb from the Web. This is also another discussion about how we access or link data…



Thinking data flow in terms of Input/Output

The CRUD sample is not really the right example to consider the impact of relative place in data flow on design. In CRUD, there are no temporal or dynamic requirements. But let’s stay simple for a beginning.

As there is no business logic in the CRUD caser, we can focus on backend boundaries:

  • Backend/Client
  • Backend/DB

We can just consider the data received at previous boundaries:

  • What Input data is received from client or DB?
  • What Output data should be sent to DB or client?

In a summary, we can really consider the data-flow just in terms of inputs/outputs:

  • Backend/Client input/output
  • Backend/DB input/output

No-Model Approach



Why ReactiveMongo enables Json flow manipulation?

MongoDB provides a very versatile document-oriented data storage platform. MongoDB is not meant to model relational data but data structured as trees. So when you retrieve a document from Mongo, you also get all related data at once. In Mongo, normalized model is not really the main target and redundancy is not necessarily bad as long as you know what you do.
Mongo document are stored using BSON Binary JSON format which is simply inspired by JSON and optimized for binary storage.

Imagine you could get JSON (after validation) directly to or from Mongo without going through any superficial structure, wouldn’t it be really practical?
Please remark that serializing a JSON tree to a case-class and from a case class to a BSON document is just useless if you don’t have any business logic to fulfill with the case-class.

Play2.1 provides a very good JSON transformation API from/to any data structure. Converting JSON to BSON is not really an issue. But now remember that we also want to be able to manage realtime data flow, to stream data from or to the DB (using Mongo capped collections for ex). Play2.x has been designed to be fully asynchronous/non-blocking. but unfortunately, default Java Mongo driver and its Scala counterpart (Casbah), despite their qualities, provide synchronous and blocking API.

But we are lucky since Stephane Godbillon decided to develop ReactiveMongo a full async/non-blocking Scala driver based on Akka and Play Iteratees (but independent of Play framework) and we worked together to develop a Play2.1/ReactiveMongo module providing all the tooling we need for JSON/BSON conversions in the context of Play2.1/Scala.

With Play/ReactiveMongo, you can simply send/receive JSON to/from Mongo and it’s transparently translated into BSON and vis versa.



Sample Data format

Let’s try to manipulate a flow of data containing the following representation of a Person in JSON (or BSON)…

1
2
3
4
5
6
7
8
9
10
11
12
{
  _id: PERSON_ID,
  name: "Mike Dirolf",
  pw: "Some Hashed Password",
  addresses: ["mike@corp.fiesta.cc", "mike@dirolf.com", ...],
  memberships: [{
    address: "mike@corp.fiesta.cc",
    group_name: "family",
    group: GROUP_ID
  }, ...],
  created: 123456789
}

A person consists in:

  • a unique technical ID
  • a name
  • a hashed password which shouldn’t be transmitted outside anyway
  • zero or more email addresses
  • zero or more group memberships
  • a creation date

A group membership consists in:

  • the group email
  • the group name
  • the group ID


Data flow description

We will consider the following 4 CRUD actions:

  • Create
  • Get
  • Delete
  • Full/Restricted Update (Full document at once or a part of it)

CREATE

1
PUT http://xxx/person/ID

Input

Backend receives the whole Person minus _id and created which is not yet known till insertion in DB.

1
2
3
4
5
6
7
8
9
10
{
  name: "Mike Dirolf",
  pw: "password",
  addresses: ["mike@corp.fiesta.cc", "mike@dirolf.com", ...],
  memberships: [{
    address: "mike@corp.fiesta.cc",
    group_name: "family",
    group: GROUP_ID
  }, ...]
}

output

Backend sends the generated ID in a JSON object for ex.

1
2
3
{
  _id: "123456789123456789"
}


GET

1
GET http://xxx/person/ID

Input

Backend just receives the ID in the URL:

1
http://xxx/person/ID

Output

The whole person plus ID is sent back in JSON but for the demo, let’s remove a few fields we don’t want to send back:

  • _id which is not needed as the client knows it
  • pw because this is a password even if hashed and we want it to stay in the server
1
2
3
4
5
6
7
8
9
10
{
  name: "Mike Dirolf",
  addresses: ["mike@corp.fiesta.cc", "mike@dirolf.com", ...],
  memberships: [{
    address: "mike@corp.fiesta.cc",
    group_name: "family",
    group: GROUP_ID
  }, …],
  created: 123456789
}


DELETE

1
DELETE http://xxx/person/ID

Input

Backend just receives the ID in the URL:

1
http://xxx/ID

Output

Nothing very interesting. Use a “200 OK” to stay simple



Full UPDATE

Macro update is meant to update a whole person document.

1
POST http://xxx/person/ID

Input

Backend just receives the ID in the URL:

1
http://xxx/ID

Updated person document is in the Post body:

1
2
3
4
5
6
7
8
9
10
{
  name: "Mike Dirolf",
  pw: "new_password",
  addresses: ["mike@corp.fiesta.cc", "mike@dirolf.com", ...],
  memberships: [{
    address: "mike@corp.fiesta.cc",
    group_name: "family",
    group: GROUP_ID
  }, ...]
}

Output

Nothing very interesting. Use a “200 OK” to stay simple



Restricted UPDATE

Restricted update is meant to update just a part of a person document.

1
POST http://xxx/person/ID/spec

Input

Backend just receives the ID in the URL:

1
http://xxx/ID/spec

Updated person document is in the Post body:

1
2
3
4
5
6
7
8
{
  addresses: ["mike@corp.fiesta.cc", "mike@dirolf.com", ...],
  memberships: [{
    address: "mike@corp.fiesta.cc",
    group_name: "family",
    group: GROUP_ID
  }, ...]
}

or

1
2
3
4
5
6
7
{
  memberships: [{
    address: "mike@corp.fiesta.cc",
    group_name: "family",
    group: GROUP_ID
  }, ...]
}

or

1
2
3
{
  addresses: ["mike@corp.fiesta.cc", "mike@dirolf.com", ...],
}

Output

Nothing very interesting. Use a “200 OK” to stay simple



Backend/Client boundary

Now that we know input/output data on our boundaries, we can describe how to validate these data and transform them within our backend system.

Input data from client (CREATE/ UPDATE)

Full person validation

When receiving JSON from client for Create and Update actions, we must be able to validate the Person structure without ID which will be generated at insertion:

1
2
3
4
5
6
7
8
9
10
{
  name: "Mike Dirolf",
  pw: "password",
  addresses: ["mike@corp.fiesta.cc", "mike@dirolf.com", ...],
  memberships: [{
    address: "mike@corp.fiesta.cc",
    group_name: "family",
    group: GROUP_ID
  }, ...]
}

Using Play2.1 JSON transformers (see my other article about it), you would validate this structure as following:

1
2
3
4
5
6
7
/** Full Person validator */
val validatePerson: Reads[JsObject] = (
  (__ \ 'name).json.pickBranch and
  (__ \ 'pw).json.pickBranch and
  (__ \ 'addresses).json.copyFrom(addressesOrEmptyArray) and
  (__ \ 'memberships).json.copyFrom(membershipsOrEmptyArray)
).reduce
addressesOrEmptyArray

It’s a transformer validating an array of email strings and if not found it returns an empty array.
Here is how you can write this:

1
2
3
4
5
/** Addresses validators */
// if array is not empty, it validates each element as an email string
val validateAddresses = Reads.verifyingIf( (arr: JsArray) => !arr.value.isEmpty )( Reads.list[String](Reads.email) )
// extracts "addresses" field or returns an empty array and then validates all addresses
val addressesOrEmptyArray = ((__ \ 'addresses).json.pick[JsArray] orElse Reads.pure(Json.arr())) andThen validateAddresses
membershipsOrEmptyArray

It is a transformer validating an array of memberships and if not found it returns an empty array.
First, let’s write a Membership validator searching for address which must be an email, group_name and a group_id.

1
2
3
4
5
val membership = (
  (__ \ 'address).json.pickBranch( Reads.of[JsString] keepAnd Reads.email ) and
  (__ \ 'group_name).json.pickBranch and
  (__ \ 'group).json.pickBranch
).reduce  // reduce merges all branches in a single JsObject

Now, use it to validate the membership list.

1
2
3
4
// if array is not empty, it validates each element as a membership
val validateMemberships = Reads.verifyingIf( (arr: JsArray) => !arr.value.isEmpty )( Reads.list(membership) )
// extracts "memberchips" field or returns an empty array and then validates all memberships
val membershipsOrEmptyArray = ((__ \ 'memberships).json.pick[JsArray] orElse Reads.pure(Json.arr())) andThen validateMemberships

Restricted person validation

For restricted update, the client sends just the part that should be updated in the document and not all the document. Yet the validator must accept only authorized fields.

Here is how you can write it:

1
2
3
4
5
6
7
8
9
10
11
/** Person validator for restricted update */
// creates an empty JsObject whatever Json is provided
val emptyObj = __.json.put(Json.obj())

// for each field, if not found, it simply writes an empty JsObject
val validatePerson4RestrictedUpdate: Reads[JsObject] = (
  ((__ \ 'name).json.pickBranch or emptyObj) and
  ((__ \ 'pw).json.pickBranch or emptyObj) and
  ((__ \ 'addresses).json.copyFrom(addresses) or emptyObj) and
  ((__ \ 'memberships).json.copyFrom(memberships) or emptyObj)
).reduce // merges all results
addresses

This is the same as addressesOrEmptyArray but it doesn’t return an empty array if addresses are not found.

1
val addresses = (__ \ 'addresses).json.pick[JsArray] andThen validateAddresses
memberships

This is the same as membershipsOrEmptyArray but it doesn’t return an empty array if memberships are not found.

1
val memberships = (__ \ 'memberships).json.pick[JsArray] andThen validateMemberships

Output data to Client (GET/DELETE)

When a person document is retrieved from DB, this is the whole document and you may need to transform it before sending it to the output. In our case, let’s modify it as following:

  • prune the password (even if hashed)
  • prune the _id (because client already knows it if it requested it)

This can be done with the following JSON transformer:

1
2
3
4
5
6
/** prunes _id 
  * and then prunes pw
  */
val outputPerson =
  (__ \ '_id).json.prune andThen
  (__ \ 'pw).json.prune

Please note we don’t write it as following:

1
2
3
4
val outputPerson = (
  (__ \ '_id).json.prune and
  (__ \ 'pw).json.prune
).reduce

Why? Because reduce merges results of both Reads[JsObject] so:

  • (__ \ '_id).json.prune removes _id field but keeps pw
  • (__ \ 'pw).json.prune removes pw field but keeps _id

When reduce merges both results, it would return a Json with both _id and pw which is not exactly what we expect.



Backend/MongoDB boundary

Output to MongoDB

Now we can validate a received JSON as a Person structure.
But we need to write it to Mongo and Mongo has a few specificities.

ID in Mongo is a BsonObjectID

Instead of waiting for Mongo to generate the ID, you can generate it using ReactiveMongo API BSONObjectID.generate before inserting it into Mongo.
So before sending JSON to Mongo, let’s add field "_id" : "GENERATED_ID" to validated JSON.
Here is the JSON transformer generating an ID:

1
val generateId = (__ \ '_id).json.put( BSONObjectID.generate.stringify ) // this generates a new ID and adds it to your JSON

BsonObjectID using JSON extended notation

In JSON, BsonObjectID is represented as a String but to inform Mongo that it’s an ID and not a simple String, we use the following extended JSON notation:

1
2
3
4
5
6
7
8
9
{
  "_id" : "123456789123456789"
}
// becomes
{
  "_id" : {
    "$oid" : "123456789123456789"
  }
}

Here is the JSON transformer to generate an ID using extended JSON notation:

1
val generateId = (__ \ '_id \ '$oid).json.put( BSONObjectID.generate.stringify )

Date Extended JSON

created field is a Date represented as a JsNumber (a long) in JSON. When passing it to Mongo, we use the following extended JSON notation:

1
2
3
4
5
6
7
8
9
{
  "creation" : 123456789123456789
}
// becomes
{
  "creation" : {
     "$date" : 123456789123456789
  }
}

Here is the final JSON transformer to generate a date using extended JSON notation:

1
val generateCreated = (__ \ 'created \ '$date).json.put( new java.util.Date )

Input from Mongo

As explained, using Play/ReactiveMongo, you don’t have to care about BSON because it deals with BSON/JSON conversion behind the curtain.
We could transform data received from Mongo in case we don’t really trust them.
But in my case, I trust Mongo as all inserted data are mine so no use to transform those input data from Mongo.

We just need to remove all JSON extended notation for _id or created when sending to the output.
The _id is pruned so no need to convert it. So we just have to convert Json extended notation for created field. Here is the transformer:

1
2
// update duplicates full JSON and replaces "created" field by removing "$date" level
val fromCreated = __.json.update((__ \ 'created).json.copyFrom( (__ \ 'created \ '$date).json.pick ))


Play2.1 controller as pipe plug

Now, we can:

  • validate input JSON received from client,
  • transform into Mongo format
  • transform from Mongo format to output

Let’s plug the pipes all together:

  • client inputs to Mongo outputs
  • Mongo inputs to client outputs

Play controller is the place to do that and we can write one action per REST action.

Please note that the whole code is in a single Controller in the sample to make it compact. But a good manner would be to put transformers outside the controller to be able to share them between controllers.

In the following samples, please notice the way we compose all Json transformers described previously as if we were piping them.

Insert Person

When a Person document is created, there are 2 steps:

  • validate the JSON using validatePerson
  • transform JSON to fit Mongo format by:
    • adding a generated BSONObjectID field using generateId
    • adding a generated created date field using generateCreated

Here is the JSON transformer to transform into Mongo format:

1
2
/** Updates Json by adding both ID and date */
val addMongoIdAndDate: Reads[JsObject] = __.json.update( (generateId and generateCreated).reduce )

Finally the insert action could be coded as:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
def insertPerson = Action(parse.json){ request =>
  request.body.transform(validatePerson andThen addMongoIdAndDate).map{ jsobj =>
    Async{
      persons.insert(jsobj).map{ p =>
        // removes extended JSON to ouput generated _id
        Ok( resOK(jsobj.transform(fromObjectId).get) )
      }.recover{ case e =>
        InternalServerError( resKO(JsString("exception %s".format(e.getMessage))) )
      }
    }
  }.recoverTotal{ err =>
    BadRequest( resKO(JsError.toFlatJson(err)) )
  }
}

resOK and resKO are just function building JSON result with response status. Have a look at code for more info.



Get Person

The action receives the ID of the person as a String and we only need to generate the right Mongo JSON format to retrieve the document. Here is the Json Writes[String] that creates the extended JSON notation from ID:

1
val toObjectId = OWrites[String]{ s => Json.obj("_id" -> Json.obj("$oid" -> s)) }

Now the getPerson action code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def getPerson(id: String) = Action{
  // builds a query from ID
  val q = QueryBuilder().query(toObjectId.writes(id))
  Async {
    persons.find[JsValue](q).headOption.map{
      case None => NotFound(Json.obj("res" -> "KO", "error" -> s"person with ID $id not found"))
      case Some(p) =>
        p.transform(outputPerson).map{ jsonp =>
          Ok( resOK(Json.obj("person" -> jsonp)) )
        }.recoverTotal{ e =>
          BadRequest( resKO(JsError.toFlatJson(e)) )
        }
    }
  }
}


Delete Person

Delete is exactly the same as Get in terms of input and it doesn’t require any output except to inform about success or failure.
So let’s give directly the deletePersoncode:

1
2
3
4
5
6
7
8
9
10
def deletePerson(id: String) = Action{
  Async {
    persons.remove[JsValue](toObjectId.writes(id)).map{ lastError =>
      if(lastError.ok)
        Ok( resOK(Json.obj("msg" -> s"person $id deleted")) )
      else
        InternalServerError( resKO(JsString("error %s".format(lastError.stringify))) )
    }
  }
}


Update Full Person

When updating a full person:

  • we receive the ID in the URL
  • we receive the new Json representing the person in the body
  • we need to update the corresponding document in DB.

So we must do the following:

  • validate input JSON using validatePerson
  • transform the ID into MongoID JSON extended notation using toObjectId described previously
  • transform json into Update Json extended notation:
1
2
3
4
5
6
7
8
9
10
11
12
{
  "$set" : {
    name: "Mike Dirolf",
    pw: "password",
    addresses: ["mike@corp.fiesta.cc", "mike@dirolf.com", ...],
    memberships: [{
      address: "mike@corp.fiesta.cc",
      group_name: "family",
      group: GROUP_ID
    }, ...]
  }
}

Here is the JSON transformer for update notation:

1
2
/** Converts JSON into Mongo update selector by just copying whole object in $set field */
  val toMongoUpdate = (__ \ '$set).json.copyFrom( __.json.pick )

Finally here is the corresponding updatePerson action code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
def updatePerson(id: String) = Action(parse.json){ request =>
  request.body.transform(validatePerson).flatMap{ jsobj =>
    jsobj.transform(toMongoUpdate).map{ updateSelector =>
      Async{
        persons.update(
          toObjectId.writes(id),
          updateSelector
        ).map{ lastError =>
          if(lastError.ok)
            Ok( resOK(Json.obj("msg" -> s"person $id updated")) )
          else
            InternalServerError( resKO(JsString("error %s".format(lastError.stringify))) )
        }
      }
    }
  }.recoverTotal{ e =>
    BadRequest( resKO(JsError.toFlatJson(e)) )
  }
}

Update Restricted Person

Restricted update is exactly the same as Full update except it validates the input JSON using validatePerson4RestrictedUpdate instead of validatePerson

So here is the updatePersonRestricted action code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
def updatePersonRestricted(id: String) = Action(parse.json){ request =>
  request.body.transform(validatePerson4RestrictedUpdate).flatMap{ jsobj =>
    jsobj.transform(toMongoUpdate).map{ updateSelector =>
      Async{
        persons.update(
          toObjectId.writes(id),
         updateSelector
        ).map{ lastError =>
          if(lastError.ok)
            Ok( resOK(Json.obj("msg" -> s"person $id updated")) )
          else
            InternalServerError( resKO(JsString("error %s".format(lastError.stringify))) )
        }
      }
    }
  }.recoverTotal{ e =>
    BadRequest( resKO(JsError.toFlatJson(e)) )
  }
}


Full Code

The whole sample can be found on Github json-coast-to-coast sample To test it, use a Rest client such as Postman or whatever.



Conclusion

Many things in this article… Maybe too many…
Anyway, the subject is huge and deserves it.

This sample demonstrates it’s possible to transmit a JSON data-flow from client to DB without going through any static model. That’s why I speak about JSON coast-to-coast and I find it’s a very good pattern in many cases in our every-day-as-backend-designer life.

Just remind 3 things maybe:

  • data flow direct manipulation is possible, practical and useful.
  • pure data manipulation doesn’t lessen type-safety or data structuring as you control everything at the boundaries of your backend system.
  • static model is useful sometimes but not always so before writing generic classes, DAO everywhere, think about your real needs.

In the code sample, I don’t take into account the temporal behavior of data and the dynamic requirements of interactions with other elements of the data flow. But don’t forget this aspect in your backend design.

Finally, as you could see, ReactiveMongo mixed with Play2.1 JSON API provides us with a really good toolbox for data-centric approach. It also allows to deal with realtime data flow to design so-called reactive applications (which is also the reason of being of Play2 framework).

Have flowfun!

A relatively short article, this time, to present an experimental feature developed by Sadek Drobi (@sadache) & I (@mandubian) and that we’ve decided to integrate into Play-2.1 because we think it’s really interesting and useful.



WTF is JSON Inception???

Writing a default case class Reads/Writes/Format is so boring!!!

Remember how you write a Reads[T] for a case class.

1
2
3
4
5
6
7
8
9
10
import play.api.libs.json._
import play.api.libs.functional.syntax._

case class Person(name: String, age: Int, lovesChocolate: Boolean)

implicit val personReads = (
  (__ \ 'name).reads[String] and
  (__ \ 'age).reads[Int] and
  (__ \ 'lovesChocolate).reads[Boolean]
)(Person)

So you write 4 lines for this case class.
You know what? We have had a few complaints from some people who think it’s not cool to write a Reads[TheirClass] because usually Java JSON frameworks like Jackson or Gson do it behind the curtain without writing anything.
We argued that Play2.1 JSON serializers/deserializers are:

  • completely typesafe,
  • fully compiled,
  • nothing was performed using introspection/reflection at runtime.

But for some, this didn’t justify the extra lines of code for case classes.

We believe this is a really good approach so we persisted and proposed:

  • JSON simplified syntax
  • JSON combinators
  • JSON transformers

Added power, but nothing changed for the additional 4 lines.

Let’s be minimalist

As we are perfectionist, now we propose a new way of writing the same code:

1
2
3
4
5
6
import play.api.libs.json._
import play.api.libs.functional.syntax._

case class Person(name: String, age: Int, lovesChocolate: Boolean)

implicit val personReads = Json.reads[Person]

1 line only.
Questions you may ask immediately:

Does it use runtime bytecode enhancement? -> NO

Does it use runtime introspection? -> NO

Does it break type-safety? -> NO

So what???

As I’m currently in a mood of creating new expressions, after creating JSON coast-to-coast design, let’s call it JSON INCEPTION (cool word, quite puzzling isn’t it? ;)



JSON Inception

Code Equivalence

As explained just before:

1
2
3
4
5
6
7
8
9
implicit val personReads = Json.reads[Person]

// IS STRICTLY EQUIVALENT TO writing

implicit val personReads = (
  (__ \ 'name).reads[String] and
  (__ \ 'age).reads[Int] and
  (__ \ 'lovesChocolate).reads[Boolean]
)(Person)

Inception equation

Here is the equation describing the windy Inception concept:

1
(Case Class INSPECTION) + (Code INJECTION) + (COMPILE Time) = INCEPTION

Case Class Inspection

As you may deduce by yourself, in order to ensure preceding code equivalence, we need :

  • to inspect Person case class,
  • to extract the 3 fields name, age, lovesChocolate and their types,
  • to resolve typeclasses implicits,
  • to find Person.apply.

INJECTION??? Injjjjjectiiiiion….??? injectionnnnnnnn????

No I stop you immediately…

Code injection is not dependency injection…
No Spring behind inception… No IOC, No DI… No No No ;)

I used this term on purpose because I know that injection is now linked immediately to IOC and Spring. But I’d like to re-establish this word with its real meaning.
Here code injection just means that we inject code at compile-time into the compiled scala AST (Abstract Syntax Tree).

So Json.reads[Person] is compiled and replaced in the compile AST by:

1
2
3
4
5
(
  (__ \ 'name).reads[String] and
  (__ \ 'age).reads[Int] and
  (__ \ 'lovesChocolate).reads[Boolean]
)(Person)

Nothing less, nothing more…


COMPILE-TIME

Yes everything is performed at compile-time.
No runtime bytecode enhancement.
No runtime introspection.

As everything is resolved at compile-time, you will have a compile error if you did not import the required implicits for all the types of the fields.


Json inception is Scala 2.10 Macros

We needed a Scala feature enabling:

  • compile-time code enhancement
  • compile-time class/implicits inspection
  • compile-time code injection

This is enabled by a new experimental feature introduced in Scala 2.10: Scala Macros

Scala macros is a new feature (still experimental) with a huge potential. You can :

  • introspect code at compile-time based on Scala reflection API,
  • access all imports, implicits in the current compile context
  • create new code expressions, generate compiling errors and inject them into compile chain.

Please note that:

  • We use Scala Macros because it corresponds exactly to our requirements.
  • We use Scala macros as an enabler, not as an end in itself.
  • The macro is a helper that generates the code you could write by yourself.
  • It doesn’t add, hide unexpected code behind the curtain.
  • We follow the no-surprise principle

As you may discover, writing a macro is not a trivial process since your macro code executes in the compiler runtime (or universe).

So you write macro code 
  that is compiled and executed 
  in a runtime that manipulates your code 
     to be compiled and executed 
     in a future runtime…           

That’s also certainly why I called it Inception ;)

So it requires some mental exercises to follow exactly what you do. The API is also quite complex and not fully documented yet. Therefore, you must persevere when you begin using macros.

I’ll certainly write other articles about Scala macros because there are lots of things to say.
This article is also meant to begin the reflection about the right way to use Scala Macros.
Great power means greater responsability so it’s better to discuss all together and establish a few good manners…


Writes[T] & Format[T]

Please remark that JSON inception just works for case class having unapply/apply functions.

Naturally, you can also incept Writes[T]and Format[T].

Writes[T]

1
2
3
4
import play.api.libs.json._
import play.api.libs.functional.syntax._

implicit val personWrites = Json.writes[Person]

Format[T]

1
2
3
4
import play.api.libs.json._
import play.api.libs.functional.syntax._

implicit val personWrites = Json.format[Person]

Conclusion

With the so-called JSON inception, we have added a helper providing a trivial way to define your default typesafe Reads[T]/Writes[T]/Format[T] for case classes.

If anyone tells me there is still 1 line to write, I think I might become unpolite ;)

Have Macrofun!