When a software collapses, it’s usually due to it being no longer maintainable
because of complexity. Brian Kernighan, the creator of AWK programming
language, puts it quite explicitly:

Controlling complexity is the essence of computer programming.

This post is about Data Transfer Object Pattern, which:

  • is just a way of organizing some pieces of code
  • is simple
  • clearly specifies shapes of objects when communicating with backend/external
    services
  • makes code more structured and self-descriptive
  • provides a way to plug-in validation or parsing without too much effort, if
    such need arises
  • does not require any extra libraries
  • is totally technology agnostic, so it plays nice with plain POJOs, ES classes,
    TypeScript interfaces or anything else you’d normally use

image

In the code samples I will use TypeScript, which looks like ECMAScript with
added type annotations. If you’d rather use plain JS, just disregard all type
annotations (so treat function f(a: number): string as function f(a)). Static typing is, however, extremely beneficial! Refer to this post on static typing for more details.
Example implementation is available in this plunk.

The Problem: Data Model Shape

For the sake of this post, let’s assume we’re writing a web app that integrates
with GitHub. Somewhere in our code we have a function for getting all
organization repositories:


function getOrganizationRepos(org: string): Promise<{}> { return fetch(`https://api.github.com/orgs/${org}/repos`) .then(response => response.json()) }

Then you can, for example, list all git urls:

getOrganizationRepos('npm')
  .then(repos => repos.forEach(repo => console.log(repo.clone_url)))

It happens to work. There is however one point that makes me queasy: the
repo.clone_url expression depends on the shape of the response from GitHub. It
might not sound like a big deal, but let’s imagine our app is more complex and
it passes data around. So, let’s say that we first fetch data from GitHub, then
we cache it somewhere, then we pass it along to business logic, then depending
on the routing we use some components or views, and we finally render a list of
repos with some kind of template, which might look like this:

<ul class="repos">
  <!-- some kind of `for repo in repos` loop -->
    <li class="repo">
      <h4>{{ repo.name }}</h4>
      <span class="git-url">{{ repo.git_url }}</span>
    </li>
  <!-- end forloop -->
</ul>

It doesn’t work – the template should get repo.clone_url instead of
repo.git_url. To track this bug down, you will need to go all the way to the
API layer of your app. Not good. Moreover, if GitHub API changes (and your API will probably change more often than GitHub’s),
we will need to update templates – and that’s just madness. Everything boils
down to a single conclusion:

Avoid things you cannot control.

We cannot control the shape of backend responses.

Conquering the Data Shape by Abstracting

We’ll take over the data shape by providing an extra intermediary. Instead of
passing around the original parsed JSON, we’ll use our own custom data type.
This has a few advantages:

  1. The data shape will be explicitly described in our app, so we control it.
  2. The intermediary will be able to perform some conversions, if we need them.
  3. We will decouple communication layer from logic layer.

I will describe two flavors of DTOs I use on a daily basis: POJO-like DTOs and
class-based DTOs. As no two apps are identical, you might have to adapt one of
these or devise your own, but the principles stay the same:

Prepare a type that closely resembles communication protocol objects.

POJO-style DTO

So, we still want to use POJOs, but we want to be explicit about their shape.
Let’s extend our communication layer:


function parseRepo({ clone_url, commits_url, forks, full_name, html_url, issues_url, language, name, owner }) { return { commits_url, forks, full_name, git_url: clone_url, html_url, issues_url, language, name, owner } } function getOrganizationRepos(org) { return fetch(`https://api.github.com/orgs/${org}/repos`) .then(response => response.json()) .then(repos => repos.map(parseRepo)) }

This way we freeze the data shape to be used in the rest of the application.
We also did a small conversion: our objects will have a git_url attribute
instead of clone_url.

For small apps, we can than use the DTO as our model. For more complex
situations, though, we’d rather design the data model so that it does not depend
on the communication layer. Therefore we need two different types: DTO for
communication, and a regular model for internal use. We also need a conversion from one to another:


function repoToModel({ clone_url, commits_url, forks, full_name, html_url, issues_url, language, name, owner }) { return { urls: { git: clone_url, commits: commits_url, html: html_url, issues: issues_url }, forks, full_name, language, name, owner } }

The last step to do is to describe the types using TypeScript interfaces: the
communication object (RepoDTO) and the model (Repo), which in turn depends
on RepoUrls:

interface RepoDTO {
  clone_url: string
  commits_url: string
  forks: number
  full_name: string
  html_url: string
  issues_url: string
  language: string
  name: string
  owner: {}
}

interface RepoUrls {
  git: string
  commits: string
  html: string
  issues: string
}

interface Repo {
  urls: RepoUrls
  forks: number
  full_name: string
  language: string
  name: string
  owner: {}
}

Having these interfaces, final fetching and listing might look like this:

function repoToModel(raw: RepoDTO): Repo {
  return { ... } 
}

function getOrganizationRepos(org: string): Promise<Repo[]>  {
  return fetch(`https://api.github.com/orgs/${org}/repos`)
    .then(response => response.json())
    .then(repos => repos.map(repoToModel))
}

Pros

  • The data shape is explicitly described with Repo interface
  • If API changes, we only need to change repoToModel function and RepoDTO
    interface
  • The rest of the app does not depend on backend API
  • Issuing PUT/POST/PATCH requests with specific payload is now a breeze

Cons

  • If backend responds with malformed record (e.g. without clone_url field), we
    probably won’t catch it and pass undefineds somewhere

This solution is very often good enough – you’re free to assume backend will
play nice.

More Guarantees: Adding Validation

Sometimes you’d rather have the fail-fast approach when response is in the wrong
shape. This can be achieved by modifying the converter function so that it
validates the input more carefully:

function nonEmptyText(maybeText: any): string {
  if (typeof maybeText === 'string' && maybeText) {
    return maybeText
  }
  throw new TypeError(`Expected non-empty string, got ${maybeText}`)
}

function repoToModel(raw: RepoDTO): Repo = {
  return {
    urls: {
      git: nonEmptyText(raw.clone_url),
      issues: nonEmptyText(raw.issues_url),
      ...
    },
    ...
  }
}

This way you can check various constraints on the data, but I’d rather you spend
your valuable time doing something else. Why? All this validation will not stop
your app from crashing: If backend decides to cheat, you lose anyway. The only
thing you gain is failing fast (and maybe a nicer error message for your user,
if you organize your exception handling properly), but it seems like a lot of
unnecessary effort. Sometimes it’s worth it, but rarely.

Pros

  • Data is validates as soon as it enters your app, so you know that internal
    objects are well-formed

Cons

  • Requires substantial effort

More Convenience: Classy DTOs

There is this never-ending discussion on rich interfaces vs thin interfaces.
For now, our DTOs just contain data, but are dumb: they have no methods, no
nothing. If you’re a fan of rich interfaces, you might want to add extra
behaviour to the models. This is easily achievable by using classes instead of
POJOs:

interface AbstractRepoDTO {
  ...  // as RepoDTO previously
}

class RepoDTO {

  public static parse(raw: RepoDTO): RepoDTO {
    /* If you need validation, do it here: */
    const cleanedData = validateRepo(raw)
    return new RepoTO(cleanedData)
  }

  public isHot: boolean

  constructor(
    public html_url: string,
    public clone_url: string,
    public forks: number,
    public full_name: string,
    ...
  ) {
    this.isHot = forks > 500
  }

  public toString() {
    return `${this.full_name}, written in ${this.language}`
  }

  public toJson(): string { ... }
  public toXml(): string { ... }

  public toModel(): Repo { ... }
}

function getOrganizationRepos(org: string): Promise<Repo[]>  {
  return fetch(`https://api.github.com/orgs/${org}/repos`)
    .then(response => response.json())
    .then(repos => repos.map(item => RepoDTO.parse(item).toModel())
}

Pros

  • You can add any extra behaviors to your models
  • A class has an explicit list of attributes, so you’re type-safe even in plain
    ECMAScript

Cons

  • REST APIs are based on POJOs, so you need to be careful during serialization
    and parsing
  • You might be tempted to put lots of custom business logic in the model,
    which in my opinion validates the Single Responsibility Principle – use DTO’s
    for modelling data; complex logic should be in models or services

Bonus: Abstracting even further

You’re a good software engineer, so you probably already figured this on your
own, but just for the sake of completeness here’s a method that fetches and
parses any array of DTOs:

function fetchAndParseArray<A>(url: string, parser: (raw: {}) => A): Promise<A[]> {
  return fetch(url)
    .then(response => response.json())
    .then(parser)
}

fetchAndParse(
    'https://api.github.com/orgs/npm/repos',
    item => Repo.parse(item).toModel()
  ).then((repos: Repo[]) => repos.forEach(r => console.log(r.toString())))

Wrapping Up

The main advantage of using a DTO is isolating your application from the backend changes. When choosing the exact shape of your DTO, try and keep it simple. Wrapping API entities in TypeScript interfaces is usually sufficient to make the development and maintenance easier.