Skip to main content

Microservice architecture

There are many different tech stacks with various components, that all communicate with one another to build a full-blown application. In general, there are 2 types of architectures - the microservice architecture and the monolithic architecture. A monolithic application is built as a single unified unit while a microservices architecture is a collection of smaller, independently deployable services.

Frontend

The core feature of any application is the frontend - the user interface that users interact with the most. While it may not be the most advanced/important in functionality, users often use the frontend as the first impression of the entire application stack.

ACUBETotal uses Nextjs for the frontend, which is built in React and TypeScript. Sandbox, on the other hand, is written in Python and uses Django, which is more of a templating language.

Rendering techniques

Next.js supports both client-side rendering (CSR) and server-side rendering (SSR). Django, on the other hand, is a templating framework and entirely SSR.

There are many different types of rendering techniques, and the benefits/drawbacks of each technique will not be discussed further. Here is a video if you want to dive deeper into the various rendering patterns and how web technologies got to where it is today.

Backend

Most advanced applications require a backend to power and process the application of the data. For ACUBETotal, it is the pipeline (written in Node.js). There are multiple different ways to structure a backend application, the most common being the factory pattern. You can take a look at a few software design patterns that are most commonly used, to understand the pros and cons of each pattern.

Microservices

With so many microservices, it is important to communicate properly between them. There are 2 main ways to do so: via HTTP API calls, and via message queue systems.

HTTP calls

The easiest way to pass data around is via HTTP API calls.

REST API

REST stands for Representational State Transfer. These APIs make use of HTTP methods for RESTful services, corresponding to the create, read, update and delete (CRUD) operations. Each HTTP method has semantic meanings, and you can reference them here. (e.g. The PUT and DELETE HTTP methods are meant to be idempotent, that is, calling it once or several times successively has the same effect (or no side effects)).

Building RESTful APIs is made easy with the Python package FastAPI. It makes use of Pydantic to enforce and transform the arguments passed in the API request into the types that the application expects it to be. It is also able to take in files and perform asynchronous operations using asyncio.

However, communication via FastAPI has its limitations. For instance, tasks that take a significant amount of time can cause timeouts while the client waits for the response from the server. HTTP polling may be able to overcome this hurdle, but this can create numerous back-and-forth requests and responses, which can get extremely costly during scaling.

GraphQL

GraphQL was originally developed by Facebook to simplify endpoint management for REST-based APIs. Instead of maintaining multiple endpoints with small amounts of disjointed data, GraphQL provides a single endpoint that inputs complex queries and outputs only as much information as is needed for the query. GraphQL queries access not just the properties of one resource but also smoothly follow references between them. Moreover, while typical REST APIs require loading from multiple endpoints, GraphQL APIs get all the data necessary in a single request.

However, the drawback is the increased complexity that it brings to your application. Moreover, caching of results is exceptionally hard due to the nature of GraphQL, in comparison to REST APIs.

Currently, only OpenCTI uses GraphQL, due to the steep learning curve and complexity necessary to use GraphQL.

Message queue systems

To combat the issue regarding long-running tasks in APIs, message queue systems are often used instead, where code is run in a callback-style fashion. The message queue system of choice is RabbitMQ.

When to use which?

While there is no hard and fast rule, in general, functions that can be run within 1 second (such as Detect-It-Easy and lief) should be deployed using FastAPI, while those that require more time to run (such as FLOSS and CAPA) should be run using RabbitMQ instead.

Databases

Another core portion of any application is the database - the storage mechanism for data. In ACUBETotal, the 2 main databases used are PostgreSQL and MongoDB.

SQL vs NoSQL

PostgreSQL is a relational database, whereas MongoDB is a NoSQL database (document database). This means that while PostgreSQL has specific schemas for specific tables, MongoDB serves more as a database that can store any arbitrary document (though it also does have support for specific schemas in collections). There are numerous differences between PostgreSQL and MongoDB, but they will not be covered here. Specifically, the Sandbox team deals primarily with MongoDB, while most other microservices communicate with PostgreSQL as the database.

Searching

Another important microservice is Elasticsearch, which is a search engine that provides a full-text search engine. This allows users to search for data that is indexed in each Elasticsearch cluster, providing users with autocomplete during searching, and more. The data used in Elasticsearch has to be explicitly indexed with specific fields for the users to search across.

In-memory storage

note

This is not currently used in ACUBETotal, but it is noteworthy.

Another common portion of the microservice architecture is an in-memory storage, most currently Redis. In-memory storages and databases serve the same purpose - provide a centralised location to easily store and retrieve data - but they differ slightly in functionality. For instance, the main purpose of Redis is to provide a much faster key-value store. This is because Redis is stored entirely in memory, and there are never any filesystem writes, helping to speed up the key-value retrieval and storage process. However, the drawback is that the data is ephemeral - once the process exits, the data is lost forever and does not persist across reboots.

Redis is so much more!

That being said, Redis is also capable of being a database on its own. Redis is also capable of persisting data across reboots, much like other regular databases such as PostgreSQL and NoSQL, and primitive basic types are also available for use.