Functional Validation in Scala

One can tell a story about validation in Scala from different angles. One is to build up a motivation by describing the theoretical foundations like Applicatives and Monads another option is to start practical, focus on a domain and see where things go from there. The former angle has been a topic of various blog posts in the past, so I try the other way and start of with a little domain not that unrelated to the ones you may find in your Scala programs:

trait Classification

case object StringInstrument extends Classification

case object Keyboard extends Classification

abstract class Instrument(val classification: Classification)

case object BassGuitar extends Instrument(StringInstrument)

case object Guitar extends Instrument(StringInstrument)

case object Piano extends Instrument(Keyboard)

case class Band(name: String)

case class MemberOfBand(from: LocalDate, membership: Period)

case class Musician(
    name: String,
    born: LocalDate,
    instruments: Seq[Instrument],
    currentBand: Option[Band] = None,
    formerBands: Seq[MemberOfBand] = Nil)
  }

This is the domain of tradeyourmusican.com, where bands can search for musicians matching their profile. The musician is at the core of this domain model which makes its quality a business concern. Validation is a process which we use to ensure data quality and give feedback to users about the interactions with our applications. What would be the simplest way to give feedback about a wrong interaction? Lets take the birthdate as an example. We want to make sure that the musicians on our site are at least 16 years old.

A simple way to do that would be to use the Option type in Scala. This type can either be Some(value) or a None. Given the restrictions of Option the only way to model our age validation would look like this:

def validateAge(birthDate: LocalDate): Option[LocalDate] = {
    if (birthDate.isAfter(LocalDate.now().minusYears(16))) None
    else Some(birthDate)
}

Combined with other rules a complete validation with Option might look like this:

def validate(musician: Musician): Option[Musician]  = {
  for {
    band <- validCurrentBand(musician.currentBand)
    name <- validName(musician.name)
    born <- validateAge(musician.born)
    instruments <- validInstrument(musician.instruments)
  } yield musician

If the musician does not match our expectation we get a None. While this can work for very simple cases it has the disadvantage that None does not really tell you much about the context of the error and this also limits the value of the error message you can present to your users.

The classical Java way of doing this would be to use exceptions to give feedback about what happened exactly. In Scala syntax this approach would look like this:

def validate(musician: Musician): Unit = {
  if(validName(musician.name))
    throw new IllegalArgumentException("name")
  else if(validateAge(musician.born))
    throw new IllegalArgumentException("too young")
  else if(validInstrument(musician.born))
    throw new IllegalArgumentException("at least one instrument")
  else if(validCurrentBand(musician.currentBand))
    throw new IllegalArgumentException("band must have a name")
}

This approach gives you a way to communicate the cause of an validation error but the return type Unit leads to bleeding in the eyes of a functional programmer ;). But what are the practical implications of this design? In order to interact with the results of the validation you have to work with try/catch or your application will fail. In contrast to Java, exceptions in Scala are not part of a functions contract, which makes it easy to forget about them.

A way to get the best of both worlds is to use scala.util.Try which is a similar to Option in a way that it has two states. Success like Some encodes a value if all steps were successful. The failure type on the other hand is a very different beast compared to None. Failure takes a exception as a value which gives Try the possibility to use this as error communication channel. This is similar to throwing exceptions, with the difference that the validation result is a value which can be composed, passed around and processed much like Option. Given those similarities it is not a surprise that the validation looks very similar to one using Option:

def validate(musician: Musician): Try[Musician] = {
  def validCurrentBand(band: Option[Band]) = ???

  def validName(name: String): Try[String] = ???

  def validateAge(born: LocalDate): Try[LocalDate] = {
    if(born.isAfter(LocalDate.now().minusYears(12)))
      Failure(new IllegalArgumentException("too young"))
    else Success(born)
  }

  def validInstrument(instruments: Seq[Instrument]) = ???

  for {
    band <- validCurrentBand(musician.currentBand)
    name <- validName(musician.name)
    born <- validateAge(musician.born)
    instruments <- validInstrument(musician.instruments)
  } yield musician
}

Very related to Try is the Either data type, which is part of the standard library. It also has two states, with the difference that you can choose to use a type of you choice to represent an error. Since Either is not bound to validation scenarios it does not even prescribe which side has to be used to hold the error.

Now we have seen four options how to do validation in Scala, each having its own merits but if you take a look what you can do with those abstraction you soon going to find out that all of them are not suited for a wide range of validation scenarios. Why is that the case? Lets take a look at a sample validation:

val opeth = Band("Opeth")
val mikael = Musician(
    name = "Mikael Åkerfeldt",
    born = LocalDate.parse("1974-04-17"),
    instruments = List(Guitar, BassGuitar),
    currentBand = Option(opeth))

val badMikael = mikael.copy(born = LocalDate.now.minusYears(2))
                      .copy(instruments = Nil)

validate(mikael)
// -> Success(Musician(Mikael Åkerfeldt,1974-04-17, List(Guitar, BassGuitar),Some(Band(Opeth)),List()))

validate(badMikael)
// -> Failure(java.lang.IllegalArgumentException: too young)

Have you noticed that we got only one error, even if we would expect another error for the missing instruments? The data type driven abstractions we have seen so far implement a so called fail-fast behavior. With other words we compose Option / Try or Either in a way that we assume that the validation was successful, but as soon as we hit the first error, the validation chain stops and the result is fixed to the result of that error.

The reason why thats not the preferred approach in many scenarios are modern user interfaces including APIs. For those you want to get as much feedback as possible given a user interaction. Thats why we need additional data types with a collecting validation behavior. The most prominent data type of this kind in the Scala ecosystem is scalaz.Validation. Like Either you are free to choose which type you want to use for representing your errors. For the sake of simplicity we start of with a String as error type. Because we do not want to repeat this decision for every validation, we define a type alias like this:

type StringValidation[T] = Validation[String, T]

If we integrate this type into our validation scenario we get the following:

def validate(musician: Musician): StringValidation[Musician] = {
    import scalaz._
    import scalaz.Scalaz._

    def validCurrentBand(band: Option[Band]): StringValidation[Option[Band]] = ???

    def validName(name: String): StringValidation[String] = ???

    def validateAge(born: LocalDate): StringValidation[LocalDate] =
      if (born.isAfter(LocalDate.now().minusYears(12))) "too young".failure
      else born.success

    def validInstrument(instruments: Seq[Instrument]): StringValidation[Seq[Instrument]] = ???

    (validName(musician.name)
      |@| atLeast12(musician.born)
      |@| validInstrument(musician.instruments)
      |@| validCurrentBand(musician.currentBand))(_ => musician)
}

While the syntax of the single validations it not that surprising you may wonder about the syntax of the validation composition. Since scalaz.Validation collects errors, it can not rely on the same composition functions used by data types like Option. The function at the end of the composition with |@| gets called when all validations were successful. This function can be used to transform the result of individual validations into a new datatype. In this example I just ignore the parameters and return the musician we got as input paramter.

When we evaluate the validations based on scalaz.Validation we get this:

validate(mikael)
// -> Success(Musician(Mikael Åkerfeldt,1974-04-17,List(Guitar, BassGuitar),Some(Band(Opeth)),List()))

validate(badMikael)
// -> Failure(too youngat least one instrument)

While the success case is pretty normal the failure case gives one concatenated string resulting form two individual errors. While this shows that validation collection works, it is not really practical to have them in this format. What we want instead is a list of errors, which we can use to display errors individually. Fortunately scalaz already defines a type alias for that use case:

type ValidationNel[+E, +X] = Validation[NonEmptyList[E], X]

Instead of a simple scala.collection.List the scalaz.NonEmptyList type was chosen since the Failure case of a validation only gets selected if at least one error is present. Thats why scalaz.NonEmptyList is the natural error type. Lets see how the result looks like:

Failure(NonEmptyList(too young, at least one instrument))

Nice! I guess thats it for the introduction. Next time I will show you an extension to scalaz.Validation which aims to make data type based validation as practical as annotation driven Java tools like Hibernate Validatior.

Blog-Post

Functional Validation in Scala

TAGS