I think we are trapped in a loop.
We have an input, we validate the input, we call whatever other service we need to call, and then we save that input, maybe with some extras, in the database. Repeat.
Do we need to search for something in the database? We get the criteria from the input, we validate, and we search the database. Repeat.
For simple operations, like CRUD (Create, Retrieve, Update, and Delete), let's agree that it is just enough, right? That's how we learned it, why should we change it?
But when things start to get complex, this simple validation starts getting tricky, starting with ambiguity.
What does the word valid mean across many contexts? Is it valid for being saved in the database? Is it valid for the business rules too? Does it mean that the database record is valid to be used somewhere?
In a complex scenario, where "validation" can mean anything, it is not enough. It doesn't guarantee the data integrity you need to perform the level of operations you need.
If you validate too much, you have unnecessary complexity. If you don't validate at all, you might take the risk of harming the most precious asset your application contains, which is data.
In this article, I explore the alternatives to achieve data consistency in a comprehensive and scalable way.
It is a matter of Encapsulation
Encapsulation without cohesion should not be considered a good encapsulation. Cohesion says that the "things that solve the same problem belong together".
Projecting that thought on the validation scenario, if you have a class that should contain data and behavior coupled together, and you create another class to validate the data of the first class, you are not encapsulating it well.
Here's a not-so-good example:
<?php
class User {
private $name;
private $email;
public function __construct($name, $email) {
$this->name = $name;
$this->email = $email;
}
public function getName()
{
return $this->name;
}
public function getEmail()
{
return $this->email;
}
public function save() {
// Save the user to the database here
}
}
class Validator {
public function validate(User $user) {
if (empty($user->getName())) {
return false;
}
if (!filter_var($user->getEmail(), FILTER_VALIDATE_EMAIL)) {
return false;
}
return true;
}
}
$user = new User("John Doe", "johndoe@example.com");
$validator = new Validator();
if ($validator->validate($user)) {
$user->save();
}
There is a fundamental concept from encapsulation in Object Orienting, that says "Every conceptual unit is responsible for itself". It means that an object shouldn't depend on another object to assert whether it allows its operation.
The example above should be checking on itself whether it can save its data or not. If you are worried about the single responsibility principle, you can opt for composition:
<?php
class User {
private $name;
private $email;
private $validator;
public function __construct($name, $email, UserValidator $validator) {
$this->name = $name;
$this->email = $email;
$this->validator = $validator;
}
public function getName()
{
return $this->name;
}
public function getEmail()
{
return $this->email;
}
public function save() {
if (!$this->validator->validate($this)) {
throw new Exception("Invalid user data.");
}
// Save the user to the database here
}
}
class UserValidator {
public function validate(User $user) {
if (empty($user->getName())) {
return false;
}
if (!filter_var($user->getEmail(), FILTER_VALIDATE_EMAIL)) {
return false;
}
return true;
}
}
$validator = new UserValidator();
$user = new User("John Doe", "johndoe@example.com", $validator);
$user->save();
We still have two classes, but they are now bounded together. The UserValidator is a composition of User.
But this is still far from appropriate for complex scenarios, by the simple fact that you still rely on this very generic word "validate". There is room for improvement.
Avoiding so many validations
I consider to exist two types of data validations: Constraint validations, and Business validations.
The only validation we should be really worried about is the business validation, because this is the type that fails silently, creates logic bugs, and are hard to specify and catch before it bites the user. Let's talk about this one later.
For constraint validations, thanks to modern programming language resources like strong typing, abstractions, or even libraries, we shouldn't be doing any heavy lifting.
Strong types
This is most of what composes a constraint validation. Verifications like whether the type of input is correct, or whether the input data matches very generic criteria.
Strongly typed languages like Java or C automatically assure it for programmers, but for languages that are more flexible like PHP or Javascript, you have the option of using them or not.
That said, the moment you adopt strong types, you don't need to worry about these validations anymore. Your classes do the work for you, especially if you are potentializing the use of Interfaces.
Look how your code gets cleaner, with no effort:
// Without strong types:
class User {
private $id;
private $email;
public function __construct($id, $email)
{
if (is_int($id)) {
throw new InvalidArgumentException('Id parameter must be integer');
}
if (!is_string($email)) {
throw new InvalidArgumentException('Email parameter must be a text');
}
$this->id = $id;
$this->email = $email;
}
...
}
// With strong types:
class User {
private int $id;
private string $email;
public function __construct(int $id, string $email)
{
$this->id = $id;
$this->email = $email;
}
// ...
}
Keep in mind that strong types for properties in PHP are only available after version 7.4.
Your object shouldn't reach invalid states
It is a common strategy to create these little self-checking behaviors at the moment an object is being created. For example, if you try to create a new class with an invalid DateTime property, that class should already return a problem to you, complaining about a bad input. Why would you even allow an invalid object to be sitting around?
There are multiple ways of achieving that. The top ones that come to my mind are:
Checking your rules during the initialization of your class. AKA Constructors.
Add constraint rules in the setter, in case it is ok for your class to be mutable.
With the use of some design patterns like mappers, factories, or builders. For this last one, to stay loyal to what we've seen above, we should make sure these classes are bounded together inside the same context, not leaving cohesion behind.
Let's go back to our User example:
class User
{
private int $id;
private string $email;
private string $name;
public function __construct(int $id, string $email, string $name)
{
$this->id = $id;
$this->name = $name;
$this->email = $email;
}
private function validate(): bool
{
if (!filter_var($this->email, FILTER_VALIDATE_EMAIL)) {
return false;
}
return true;
}
public function save(): void
{
if (!$this->validate()) {
throw new Exception('Invalid data');
}
// Save the user to the database here
}
// ...
}
The class above will allow you to hold an invalid email inside it until you need to save it in the database.
It can be avoided by doing this:
class User
{
private int $id;
private string $email;
private string $name;
public function __construct(int $id, string $email, string $name)
{
$this->id = $id;
$this->name = $name;
if (!filter_var($email, FILTER_VALIDATE_EMAIL)) {
throw new InvalidArgumentException('Invalid email');
}
$this->email = $email;
}
public function save(): void
{
// Save the user to the database here
}
// ...
}
Now basically what this class does is to say "I'm sorry, but if you are not giving me a good email, I won't even exist for you". The risk of making a mistake should decrease, given you are not keeping a class that contains an invalid email around.
For fans of the "return early principle", it is a delicious treat.
If you are worried about your class growing too much or doing too much, keep in mind that you should be promoting separation of concerns also for the business aspect. Break your business class down into aggregations, and you should be fine while keeping these classes small although they have their little creation criteria.
More ahead we should be talking about Value Objects.
Primitive obsession
Before we jump into Value Objects, let's think about the problems that staying too close to low-level data, or so-called primitive data like int, float, string, bool, array, etc.
Primitives contain very little behavior leaving them mostly meaningless. Data without context has no meaning.
A float might satisfy the conditions for it to exist inside a database field, but it tells very little about other characteristics it can assume. Can it be positive? How many decimals? Is it a percentage? A price?
Talking about validation, it is rather challenging to keep all these primitives clear in a complex scenario, forcing you to carry around a heavy load of assertions every time you need to work with them, let aside the risk of code duplication it can cause.
There is more to say about the cons of working directly with primitives, and Primitive Obsession is a code smell that speaks for itself. Feel free to dig deeper into that subject.
Value Objects
By definition, "A value object is a small object that represents a simple entity whose equality is not based on identity".
Whereas your larger classes are a more significant representation of your business scenario, the value objects are meant to break these bigger classes into smaller chunks, facilitating the maintenance and comprehension of the business idea that your code is dealing with.
Think of complex organisms. They are not made of only one type of cell. Instead, they are composed of systems, which composed are composed of organs, composed of specific cells. Although these cells contain a simpler behavior, they are still part of a bigger tissue that performs a more complex behavior.
Examples of Value Objects:
Address. This is the most famous example.
Money. Can be composed also of smaller pieces, like Currency.
Date Range. A very interesting example of a pair of primitives that will usually walk together, like start date and end date.
Instead of having to manage the complexity of multiple properties like street, number, city, zip code, and country, you can reduce them to only one property called Address.
Now all the peculiarities of a zip code are sitting together with its value and will be there every time you need it, but without you having to validate it over and over, every time you need to perform an operation that involves it.
A few more conditions to characterize a Value Object:
Must not carry too much business value.
Must be comparable by its properties, not its ID.
Should be immutable.
Can be part of a bigger entity.
Can be composed of other Value Objects.
Here is an example of a Date Range:
class DatetimeRange implements DatetimeRangeInterface
{
private DateTime $start;
private DateTime $end;
public function __construct(DateTime $from, DateTime $end)
{
if ($from > $end) {
throw new InvalidArgumentException('Start date must be before End date');
}
$this->start = $from;
$this->end = $end;
}
public function getStart(): DateTime
{
return $this->start;
}
public function getEnd(): DateTime
{
return $this->end;
}
public function isEqualTo(DatetimeRangeInterface $other): bool
{
return $this->getStart() == $other->getStart() && $this->getEnd() == $other->getEnd();
}
public function interpolatesWith(DatetimeRangeInterface $other): bool
{
return $this->getStart() <= $other->getEnd() && $this->getEnd() >= $other->getStart();
}
public function containsRange(DatetimeRangeInterface $other): bool
{
return $this->getStart() <= $other->getStart() && $this->getEnd() >= $other->getEnd();
}
public function isContained(DatetimeRangeInterface $other): bool
{
return $other->containsRange($this);
}
}
And its implementation:
class Post
{
private string $content;
private DatetimeRangeInterface $datetimeRange;
public function __construct(string $content, DatetimeRangeInterface $datetimeRange)
{
$this->content = $content;
$this->datetimeRange = $datetimeRange;
}
//...
}
Date ranges can scale in complexity as we can see in the class above. After encapsulating it into a Value Object, we can even cover it with unit tests and have the clean consciousness that it will behave the same way across many implementations. Besides, think of how much room is now free for you to ask against your requirements. Are posts not allowed to go live on the same date period? Now they can check it by themselves without removing their power of being responsible by themselves.
Relaxing validation in trusted environments
Before performing specific validation criteria, we should consider whether it has been verified before, then whether we need it.
For example, your module receives a request from another part within the same system and your module needs to check whether a relationship exists before populating foreign keys. However, for that same request to be made, the previous client already needed to rely on that valid relationship or existence. In scenarios like these, the verification of whether that dependency is valid or exists become pretty much redundant.
Unless you have an API extremely public, you should consider reducing the edge cases under the input for your service depending on the business case you have. It can tell how assessed the input of a content field should be, which for it to be triggered, the user already had to go through multistep authentications and possesses a very specific role.
A system that already received and tested the authenticity of an IBAN number would provide you the possibility to skip that very same validation under that scope. We can call it "trusted data".
Tell the intention instead of saying 'validation'
This might be the most empirical or polemical topic in this subject. That said, I will leave it here in form of a mere suggestion for you to consider.
Once again, Object Orienting programming is about clumping data and behavior together, so its elements can represent real-life problems and scenarios.
When dealing with more complex situations, instead of over-simplifying things only by exposing one public operation like "save", we could be exposing the business intention, almost like wrapping our entities into a context layer where we declare what is being attempted.
Starting from the premise that your business entity should be aware of its meaning, its validations could adopt words and jargon related to what is about to happen instead of simply limiting itself to the word "validate".
After your entities are sure that they only exist if their states are reliable, the only thing left to validate is the business. So instead of "validate", here are some possible suggestions for methods:
User.canCreateUser();
Comment.readyToBeRemovedFromSession();
Item.willAddToProduct();
Search.isReady();
The possibilities are infinite and they can all explain what are these validations going to verify. Given this strategy should be usually adopted in complex business scenarios, it will probably contain some abstraction for the infrastructure or database layer, which per se, can have simple methods like "save()" or "delete".
Key takeaways
I had the idea of writing this article after thinking how little a "validation" could tell us, and how hard is it to scale into more specific scenarios and edge cases.
Finding this thread in StackOverflow helped me to give shape to the concepts I wanted to share with you.
Long story short, here are the key takeaways to carry from this subject:
Don't detach the validation from the class that is the subject of the action. You can break the solution into multiple classes, but keep them coupled via composition or aggregation.
Strong Typing will save you a lot of validation.
Encapsulate your primitives and group their behaviors by context. See Value Objects or Complex Types.
Sometimes another system has already validated things for you, so you can work with trusted data. Just be careful, please.
You can give better names to your validation methods, that explain better what is desired to be achieved in the code.
If you followed the whole idea, you noticed that it is not about literally stopping validating data. Validations are mandatory, of course. But there are better ways of achieving data integrity without having to use a method called "validate" for everything.