Skip to main content

Android Actual Clean Architecture (AACA)

by Adrian Tache, 01.05.2024

The Data Layer

What is the Data Layer?

TL;DR

The Data Layer contains all the app's interaction with fetching, storing and retrieving data, as well as handling caching strategies and DTO manipulation.

The Data Layer represents interactions with all the data that the Use Case works with, whether that is cache, persistent storage or access to external storage via APIs and so on. It contains all sources of data and the abstracted logic of accessing that data.

It represents a tangible implementation of the data requirements defined in the Use Case via the Repository Interface. Also, where applicable, it receives the payloads from the Use Case and returns data objects, both as defined inside the Domain Layer.

Here is a simple diagram showing the structure of the data layer and its interactions with the domain layer:

AACA Data Layer Diagram

Why use this layer?

TL;DR

It's always a very good idea to separate the rest of your app from API calls / database entities etc. since these are likely to change over time, leading to complex refactoring otherwise.

While some people dislike Clean Architecture as a way of working, most of them agree that it is a good idea to separate the core business logic from messy, frequently changing components such as the connection to the API, and decoupling the way the application performs tasks from the abstractions performed on the backend. This is not an encouragement towards becoming detached from backend teams, but rather one to expect that backend and frontend can evolve in different ways, due to different needs, and that we should protect the frontend from the kinds of situations that can arise and threaten its stability.

Also, while the use case performs a lot of the orchestration of components required to support the entities in performing their tasks, some things are details to it. In general, the important part is to receive data, but it's a detail how that data is stored, composed or converted, and whenever that changes, we should be protected from it.

The repository can also make decisions regarding the data it stores or provides, unless there are strict business concerns regarding that. By this I mean that one of the responsibilities of the repository is also to decide when to fetch data from remote or storage, and when the local cache needs to be updated, etc. This means that the repository can bypass certain network calls, serve data from cache if the user if offline without even trying the API, decide to always cache data when fetching it remotely and so on. These are all details for the use case, but can be very useful behaviours for the app itself.

Practical example

I once worked on a project that required checking dates to see if a payment can be performed on a certain date, as well as for date filters and so on. The backend required the use of exact dates, so of course we used milliseconds from Epoch to represent them in order to have exact timestamps for these operations. However, milliseconds don't offer the best way to work with dates, so the use cases used a mapper to convert these to ZonedDateTime (adjusted to the expected timezone for the headquarters of the company) and perform filtering operations etc. (and finally, converted to a LocalDateTime in the UI to display them related to the current timezone of the user). In this kind of case, it's very useful to have an extra layer of separation so that if, for example, the backend starts using a completely new date format (maybe ISO dates as a string), we can still easily convert them to what the domain layer expects, without impacting it in any way.

Repository

TL;DR

The Repository is the component which coordinates all the data sources and compiles the data into the format the Domain Layer requires.

The Repository is a component which has two main purposes: communicating with various data sources and composing that data into the formats that the use case is expecting. It is usually created based on an interface defined inside the domain layer, and (where applicable) it consumes payload objects and returns data objects which are also defined inside the domain layer.

The repository can communicate with data sources for a number of purposes, where that is retrieving information from remote sources, local storage or cache, or sending information to remote sources, local storage or cache.

The repository then combines and transforms the data that it receives from the data sources to compile the data objects that are required from it by the use cases. This sometimes means converting some data formats to others (for example, strings to BigDecimal or dates, or enums), or combining multiple data sources (and potentially API calls) in order to create a single object that is easier to work with. This is very useful over time as well, as certain calls can be separated (or combined) based on how things evolve, without impacting anything upstream from the Repository.

Of course, in very simple cases where information is simply downloaded from an API with no need for caching or processing, the repository might not be necessary, and we can simply use the mapper directly to transform the objects from the data source to the data objects the use case expects.

In some implementations, this component also interacts with platform components that are unrelated to storage, but in AACA I have chosen to have a completely different layer to handle this functionality, called the Platform Layer. In others, it can also contain the business logic when a domain layer is not present, but this is not something I recommend, since it complicates that logic with things that are generally irrelevant to it.

Practical example

I once worked on a project that was fetching transaction data from one endpoint, and then fetching currency info for each currency from a different endpoint. After a while, we managed to convince the backend devs that doing all these fetches can be problematic, as any mobile device can have fluctuations is network quality, which can lead to long loading periods or even timeouts. Their first implementation was a relatively simple one, which was merely to collate this data for us when responding to our first request. By having a repository in place, and different objects for the data layer and other layers (we were still using MVVM at that point), this change only impacted the mapping from the data source and simplified the logic inside the repository. Later still, they realised the inefficiency of sending the same data many times, and they changed the structure to simply have a dictionary sent inside the response. This, again, required a change to the mapper, but nothing else. Should we have had a naive implementation, with these backend objects and calls coupled to the UI, these changes would have required changing a number of UI components, which would have increased the likelihood of bugs.

Data Sources

TL;DR

Data Sources are quite literally sources (or destinations for data). They usually represent either interaction with the web, or local storage, whether it's a database, file storage or in-memory cache, to name but a few.

The Data Sources are components that store or provide data. They can literally be anything, and they usually abstract away whatever technology they use from the Repository, instead simply accepting the objects from the Repository and returning the ones it expects (via Mappers). This way, we can even have multiple technologies for the same data type, enabling easier migrations of data and experimentation.

This pattern is very powerful, because it hides away the complexity of databases, interacting with REST or GraphQL APIs, or in-memory or local storage, from the functioning of the Repository, leaving that with cleaner, simpler code. Of course, depending on the technology, Data Sources can have their own structure, or call helper classes, or even have an entire package to handle more complex implementations, for example manual dis/assembly of complex data structures in order to store them.

Practical example

I once worked on a project where we were displaying some articles that were fetched from multiple data sources (they were initially retrieved as a list of titles, and then various details were fetched as the users interacted with them). These articles were also cached, since they represented quite a bit of data, and they almost never changed (we agreed with the backend that for important corrections, the articles would simply be removed and then given a different ID). Initially, they were stored in a strange configuration, where only one article was ever cached, so we decided to correct this and just store all articles that were fetched, for a reasonable amount of time. However, due to the complex structure of the articles, Room had a hard time composing some of the objects, even when given the correct entity relationships. So, out of frustration and wanting to get a solution working, I just built some components which would manually get each object from its respective table, and then assemble to get the final result. This resulted in a somewhat complex data source, but this also meant that for the repository, none of this work was visible, and the code was easy to work with and maintain.

Mappers

TL;DR

Similarly to the Domain Layer Mappers, these are quite simple objects, usually extension functions, which convert one object into another, usually DTOs coming from the Data Sources to either local data objects or the ones requested by the Domain Layer.

In order to convert the objects which are provided by the data source to the simple, composed objects which the repository expects, we can either build them manually in the data sources or, in order to yet again reduce complexity and keep logic simpler, use some mapper objects (or just extension functions). This isn't necessary in all cases, but it helps a lot if things begin becoming complex.

Why map to new objects?

TL;DR

While it's not mandatory to have mappers or a repository, it's a good idea to have separation between messy external components and the ones that we control, in order to avoid a number of issues.

In most cases, you won't need intermediary objects, since you'll just map the data source objects pretty directly to the data objects which the domain is demanding, and that mapping is pretty much direct, maybe with some extra parsing or conversions. In that case, you'll probably also notice that your repository implementation isn't even doing anything more than the data source is! For these cases, you'd be right to just skip a couple of components to cut down on code and time spent on them. But, as I mentioned in the initial Rules, you should only ever do this if you know what you're removing, and exactly what it would take to add it back.

So why bother, if we're not doing anything special in our repository? Each of these components exist for a reason. Sure, a repository can easily be added, but it will take time, and when you're in a hurry, this can lead to mistakes. Furthermore, these simplifications can uncover issues when the objects become more complex. And worst still, they leave questions without answers, which can mean that your architecture standards become decided by someone's quick and dirty bug fix, rather than careful consideration when you have the time for it.

Practical example

I once worked on a project where we decided, in order to keep things simple, we decided to merge the mappers and the DTOs (which were essentially the objects sent by our backend). This worked well initially, when our mapping between DTOs and resulting objects was one to one, and very straightforward, but things devolved quite quickly as complexity grew. Some people were moving the mapping (or parts of it) to the data source. Others, to the repository. Common tasks, such as converting dates, became duplicated, and often had different implementations as people didn't find the time to talk to each other and share knowledge, or to look at other people's code, and of course this practice became worse over time as people starting copy paste each of the different implementations.

This all got a lot better when we decided to build dedicated mappers for each data source, at which point the mapping code had a very clear place, and whatever converters we needed would just be injected to it, also making them easy to find.