top of page

Data intensive applications - introduction

  • Writer: Sriman Eshwarappa
    Sriman Eshwarappa
  • Dec 20, 2021
  • 2 min read

An application is data-intensive if the primary challenge of the application is to tackle

1. The volume of data

2. The complexity of data &

3. The speed at which the data changes.


Background: The growth of internet and the way people started using it prompted the chance of coming up with new kind of applications that required handling extreme volumes of data and traffic. This inherently called for figuring out of new way to build the applications - say 1. in a distributed fashion – because there’s a limit to CPU processing but there is a way make them work parallelly to increase the performance – 2. so that the applications are scalable – meaning they can cater to extreme traffic. Many new kind of databases, new ways of building applications, new ways of commissioning infrastructure (IaaS) came about.


Why should we read about Designing Data-Intensive Applications? - No matter what latest tools are there, the founding principles are timeless. Knowing these principles helps us make better choice of tools when we are building new applications.


What do we need data systems for?

1. Store data to retrieve later – Databases

2. Store result of an operation temporarily to retrieve it faster – Caches

3. Search data by keyword or filter – Search indexes

4. Send message – asynchronously – Message Queues/Stream processing

5. Do something out of a stored data, Periodically – Batch processing



Thinking about data systems.

How do you ensure that the data remains correct and complete?

How do you provide consistently good performance?

How do you scale to handle an increase in the load?

What does a good API for a service look like?

Where do you keep certain functions – in application layer(Caches, Search indexes)? Or in data base layer?


These are some of many aspects that we concern ourselves with when we are thinking of designing data systems.


The three main aspects that we are going to focus in this series of articles are.

1. Reliability

2. Scalability

3. Maintainability



Reliability – Tolerating hardware & software faults, Tolerating Human error.

Scalability – System’s ability to cope with increased load.

Maintainability – Keeping the system – Operable, Simple and Evolvable.


We will see each these aspects in a little more detail in next articles.


Recent Posts

See All

Comments


bottom of page