Lambda vs. Delta Architecture: An Overview - Pynomial (2024)

Menu

Delta Architecture is gaining popularity and advocates around the Big Data world. It’s because it offers more simplicity, quality, and reliability with ACID transactions, compared to other options, such as Lambda or Kappa Architectures.

As pointed out by Denny Lee, Developer Advocate at Databricks, a data engineer’s dream is to “process data continuously and incrementally as new data arrive in a cost-efficient way without having to choose between batch or streaming”. The Delta Architecture promises to move engineers one step closer to that dream.

Previously, we discussed the differences between Lambda vs. Kappa Architecture. So, now it’s time to focus and understand better how Delta Architecture can be an evolution for data management.

The Lambda Architecture, an Old Friend

During the start of the decade of 2010, processing data, especially huge amounts of datasets, in real-time was still a problem. Latency, complexity, and no single tools to build a Big Data System were some of the problems signalized by Nathan Marz at the time. In this context, Marz proposed the Lambda architecture, which tried to solve this problem with a hybrid approach, “by decomposing the problem into three layers: the batch layer, the serving layer, and the speed layer.”

In this architecture, the Batch Layer can take its time to process tons of data that take a lot of computation time (cold path), while the Speed Layer computes in real time and performs incremental updates to the batch layer results (hot path). Finally, the Serving Layer takes the outputs of both and uses this data to solve pending queries. Additionally, ‘it features an append-only immutable data source that serves as a system of record. Timestamped events are appended to existing events and nothing ever gets overwritten”, as this blog post refers.

However, complexity has always been a downside. “While a Lambda architecture can handle large volumes of batch and streaming data, it increases complexity by requiring different code bases for batch and streaming, along with its tendency to cause data loss and corruption. In response to these data reliability issues, the traditional data pipeline architecture adds even more complexity by adding steps like validation, reprocessing for job failures, and manual update and merge”, says Hector Leno in this article.

The Kappa Architecture, an Improvement

Later, the Kappa Architecture appeared as an alternative. It’s event-based and doesn’t separate the layers. The Kappa Architecture only has the Streaming Layer and the Serving Layer, so every kind of data that needs to be processed will be handled by a single technology stack.

The Kappa proposal represented an evolution to data processing and data analysis. Anyhow, it still has a high level of complexity for implementation and the use of extensive compute resources. It’s also hard to scale.

Delta Architecture, a New Approach

Currently, the Delta Architecture seems to be the next step around data. But first, it’s better to be familiar with the Delta Lake concept, considering that Delta Architecture relies on that. Delta Lake, as we explained previously, is an open-source storage framework that brings ACID transaction support and schema enforcement to Apache Spark-driven data lakes. It allows users to build a “Data Lakehouse” architecture that works with structured, semi-structured, and unstructured data.

Delta Lake “extends Parquet data files with a file-based transaction log for ACID transactions and scalable metadata handling.” It’s also compatible with Apache Spark APIs and integrated with Structured Streaming. Additionally, the separation between Layers in Delta Architecture is minimal compared to the Lambda Architecture, so there’s no need to treat data differently based on its source.

Considering this context, Databricks presents it as “a completely different approach to ingesting, processing, storing, and managing data focused on simplicity. All the processing and enrichment of data from Bronze (raw data) to Silver (filtered) to Gold (fully ready to be used by analytics, reporting, and data science) happens within Delta Lake, requiring fewer data hops”.

Delta Architecture’s Promises

  • Lower your costs: its simplicity helps you reduce costs significantly by reducing the amount of data that needs to be sent and received, the time needed to process data, as well as the amount of times you need to run jobs because of failures.
  • Delta = Less code: as we already said, Lambda Architectures need different code bases for each part of the architecture. But using Delta, as transactions are ACID compliant, you ensure your code is less complex because several parts of the code that needed to be done manually (to guarantee data consistency, for example) aren’t needed anymore.
  • Improved Indexing: when you use Delta Lake as the storage for your architecture, you bring together the capabilities of using Bloom Filter Indexes, which improve query execution performance by over 50%, according to MSSQLTips.com.
  • One source of data: when using other architectures and trying to simplify processes, data will often be copied from a data lake to other smaller data warehouses. This creates consistency and versioning issues that are solved by using the Delta Architecture.
  • Adding more data sources? No problem: usually, after a data architecture is designed and deployed for a specific use case, it’s hard for new data sources to be added. But when using Delta Lake as your engine, this no longer presents an enormous challenge as schema evolution makes adding new data sources (or changing the formats of existing data sources) a simpler task.

Summary

In a world seeking to be data-driven, developing a robust solution that can scale and handle any amount or type of data, has been the biggest challenge in the last few years. At this time, proposals such as Lambda and Kappa architecture have surged as a response to this need. However, they’re still far from ideal.

“There have been attempts to unify batch and streaming into a single system in the past. Organizations have not been that successful though in those attempts. But, with the advent of Delta Lake, we are seeing a lot of our customers adopting a simple continuous data flow model to process data as it arrives. We call this architecture, the Delta Architecture”, explains Databricks, the company behind it.

“Using this approach, we can improve our data through a connected pipeline that allows us to combine streaming and batch workflows through a shared file store with ACID-compliant transactions and provides the best of both worlds”, complements this analysis.

Subscribe

Login

0 Comments

Inline Feedbacks

View all comments

Lambda vs. Delta Architecture: An Overview - Pynomial (2024)

References

Top Articles
New LSU series “The Money Game”, starring Livvy Dunne, JD5, Angel Reese, Flaujae
Unveiling The Truth Behind Livvy Leaks: A Deep Dive Into The Controversy
Mcgeorge Academic Calendar
Terrorist Usually Avoid Tourist Locations
Occupational therapist
Missing 2023 Showtimes Near Cinemark West Springfield 15 And Xd
Fort Carson Cif Phone Number
Jesus Calling December 1 2022
Evil Dead Rise Showtimes Near Massena Movieplex
Barstool Sports Gif
AB Solutions Portal | Login
South Ms Farm Trader
Pwc Transparency Report
Ave Bradley, Global SVP of design and creative director at Kimpton Hotels & Restaurants | Hospitality Interiors
Valentina Gonzalez Leak
Hellraiser III [1996] [R] - 5.8.6 | Parents' Guide & Review | Kids-In-Mind.com
3S Bivy Cover 2D Gen
Nine Perfect Strangers (Miniserie, 2021)
The Blind Showtimes Near Amc Merchants Crossing 16
Reptile Expo Fayetteville Nc
Walmart Near South Lake Tahoe Ca
Craigslist Apartments Baltimore
Two Babies One Fox Full Comic Pdf
Yonkers Results For Tonight
Riversweeps Admin Login
48 Oz Equals How Many Quarts
Umn Biology
Wbap Iheart
Does Royal Honey Work For Erectile Dysfunction - SCOBES-AR
91 Octane Gas Prices Near Me
Trust/Family Bank Contingency Plan
Babbychula
Lake Dunson Robertson Funeral Home Lagrange Georgia Obituary
Duff Tuff
Raising Canes Franchise Cost
NHL training camps open with Swayman's status with the Bruins among the many questions
2700 Yen To Usd
R/Moissanite
Entry of the Globbots - 20th Century Electro​-​Synthesis, Avant Garde & Experimental Music 02;31,​07 - Volume II, by Various
Henry Ford’s Greatest Achievements and Inventions - World History Edu
Cl Bellingham
Disassemble Malm Bed Frame
Busted Newspaper Mcpherson Kansas
Alpha Labs Male Enhancement – Complete Reviews And Guide
Arnesons Webcam
Tacos Diego Hugoton Ks
Bellelement.com Review: Real Store or A Scam? Read This
Craigslist Anc Ak
Nfl Espn Expert Picks 2023
Naughty Natt Farting
Ff14 Palebloom Kudzu Cloth
Equinox Great Neck Class Schedule
Latest Posts
Article information

Author: Duncan Muller

Last Updated:

Views: 5983

Rating: 4.9 / 5 (59 voted)

Reviews: 90% of readers found this page helpful

Author information

Name: Duncan Muller

Birthday: 1997-01-13

Address: Apt. 505 914 Phillip Crossroad, O'Konborough, NV 62411

Phone: +8555305800947

Job: Construction Agent

Hobby: Shopping, Table tennis, Snowboarding, Rafting, Motor sports, Homebrewing, Taxidermy

Introduction: My name is Duncan Muller, I am a enchanting, good, gentle, modern, tasty, nice, elegant person who loves writing and wants to share my knowledge and understanding with you.