當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

设计数据密集型应用程序_设计数据密集型应用程序书评

發布時間：2023/12/15 编程问答 25 豆豆

生活随笔收集整理的這篇文章主要介紹了设计数据密集型应用程序_设计数据密集型应用程序书评小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

設計數據密集型應用程序

Realising how little you know about something can potentially be a demoralising experience. This book, however, manages to make it invigorating and fascinating. In Designing Data-Intensive Applications Martin Kleppmann starts by explaining the basics of how simple databases work and works up to how multiple systems interacting in distributed environments work. Along the way he takes many concepts I thought I understood and shows the depth and complexity I never knew was there in order to provide a much more thorough understanding.

意識到自己對某事的了解很少，可能會使人沮喪。然而，這本書設法使它充滿活力和令人著迷。在設計數據密集型應用程序時， Martin Kleppmann首先解釋了簡單數據庫如何工作的基礎知識，以及在分布式環境中交互的多個系統如何工作的基礎知識。一路走來，他接受了許多我認為我理解的概念，并展示了我以前所不知道的深度和復雜性，以便提供更全面的理解。

For example, ACID (Atomicity, Consistency, Isolation, Durability) transactions in databases (covered in chapter 7). I’ve seen that acronym a million times and thought I understood it well. I thought databases either offered it or didn’t but did you know that it’s actually surprisingly vague? Kleppmann goes as far as to call it mainly a marketing term. Did you know Consistency doesn’t really belong with the others? That Isolation can actually be implemented with many different levels of strictness? Kleppmann explains all this (in chapter 7) in a clear and engaging way. I felt like I was getting the inside scoop on how databases really operate.

例如，數據庫中的ACID (原子性，一致性，隔離性，耐久性)事務(在第7章中介紹)。我已經看過這個縮寫詞一百萬次了，并認為我理解得很好。我以為數據庫提供了它或沒有提供它，但是您知道它實際上是令人驚訝的模糊嗎？克萊普曼甚至稱其為市場術語。您是否知道一致性并不真正屬于其他人？隔離實際上可以以許多不同級別的嚴格性來實施嗎？克萊普曼(Kleppmann)以清晰而引人入勝的方式解釋了所有這些(在第7章中)。我覺得自己正在深入了解數據庫的實際運行方式。

書籍內容 (Book Content)

The book has 12 chapters split into 3 parts.

本書分為12章，分為3部分。

Foundations of Data Systems: how databases actually store data, indexes and how they’re updated, file types used.

數據系統的基礎：數據庫實際上如何存儲數據，索引以及如何更新它們，使用的文件類型。

Distributed Data: replication of data across multiple nodes, partitioning, how transactions actually get executed.

分布式數據：跨多個節點的數據復制，分區，事務實際執行方式。

Derived Data: batch (including MapReduce) and stream processing.

派生數據：批處理(包括MapReduce)和流處理。

The chapters do build on each other but you could dip into chapters you’re specifically interested in. If he mentions previous topics in the book he’ll include what page number to go to for more info so you’re unlikely to feel lost if you do this.

各章之間確實是相互依存的，但您可以深入閱讀您特別感興趣的章節。如果他提到了本書中的先前主題，則將包括要獲取更多信息的頁碼，因此，如果出現這種情況，您不太可能會迷失方向你做這個。

Each chapter starts with fantastically nerdy maps like this one. Image by author.每章都從像這樣的書呆子般的書呆子圖開始。圖片由作者提供。

There’s no exercises in this book and only a few code examples. It’s descriptive of problems and how solutions can be implemented, but you won’t get any hands on experience tackling those problems. Expect only to raise your awareness and increase your understanding.

本書沒有練習，只有幾個代碼示例。它描述了問題以及如何實施解決方案，但是您不會獲得解決這些問題的經驗。期望只會提高您的意識并增進理解。

He uses copious citations from all through the history of databases and computing (over 100 in some chapters) from the 1970’s all the way up to 2016. They come from books and papers but also blog posts and even a Hacker News chat discussion (reference 61 in chapter 11). This variety, and that they aren’t all academic references, helps give confidence you’re getting the full picture.

他使用了從1970年到2016年的數據庫和計算歷史(在某些章節中超過100個)中的所有引用。這些引用既來自書籍和論文，也來自博客文章，甚至是Hacker News聊天討論(參考文獻61)在第11章)。這種多樣性(并非全都是學術參考)有助于使您有信心獲得完整的圖像。

There’s interesting tidbits along the way. Like the fact that every subtask in a MapReduce job writes to disk between each step might seem like the designers at Google, who originally developed it, were excessively worried about hardware failure. But it makes more sense if you know that it was originally intended to run jobs in the background when there’s spare resources. Apparently an hour long MapReduce task had a 50% chance of getting terminated so the computing resources could be used by a higher priority job.

一路上有一些有趣的花絮。就像MapReduce作業中的每個子任務在每個步驟之間寫入磁盤的事實一樣，最初開發該工具的Google設計師似乎過分擔心硬件故障。但是，如果您知道它最初是打算在有備用資源的情況下在后臺運行作業的話，則更有意義。顯然，一個小時的MapReduce任務有50％的機會被終止，因此計算資源可以由更高優先級的作業使用。

壞點 (Bad Points)

The only bad thing I’ve to say about this book is that it was originally published in early 2017 and that age is starting to show. VoltDB and Riak are two databases that get some of the most frequent references, but I’d never heard of them. Looking on Google Trends they’ve been declining in (already niche) popularity since 2015.

關于這本書，我唯一要說的不好的是它最初于2017年初出版，而且這個年齡正在開始顯現。 VoltDB和Riak是兩個獲得一些最頻繁引用的數據庫，但我從未聽說過它們。自2015年以來，在Google趨勢上，他們的人氣一直在下降(已經是小眾)。

The book has a lot of interesting citations about distributed databases and batch and stream processing from the 2010’s, but these stop in 2016. I’d love to know what he would have included from the past 4 years as it’s an area that’s evolving fast.

這本書對2010年以來的分布式數據庫以及批處理和流處理有很多有趣的引用，但這些引用在2016年就停止了。我很想知道他在過去4年中將包括哪些內容，因為這是一個發展Swift的領域。

誰應該讀？ (Who should read it?)

I’d recommend having some experience with databases or software development in order to see the relevance of the material. It’s a fairly thick book (550 pages or so) and took me 45 hours to read it all so you do need to be willing to invest some time to get through it all. That said I do believe it was absolutely worth that time investment.

我建議您有一些數據庫或軟件開發方面的經驗，以了解材料的相關性。這是一本相當厚的書(約550頁左右)，花了我45個小時才讀完，所以您確實需要花一些時間來閱讀全部。話雖如此，但我確實相信那段時間絕對值得投資。

結論 (Conclusions)

Will you benefit from reading? I did very quickly. This Medium post about Google’s Spanner distributed database came up in my feed recently. It mentions things like failover, transactions, consistency, preventing stale reads, replication across regions. Before reading the book I would have either not known what the concepts are or not understood the complexity of them. So now I’m benefiting from being able to better understand and digest posts like that.

您會從閱讀中受益嗎？我做得很快。有關Google的Spanner分布式數據庫的這篇中型帖子最近出現在我的供稿中。它提到了諸如故障轉移，事務，一致性，防止陳舊讀取，跨區域復制之類的事情。在閱讀本書之前，我要么不知道這些概念是什么，要么不了解它們的復雜性。因此，現在，我可以更好地理解和消化這樣的帖子而受益。

評分：🦅🦅🦅🦅🦅 (Rating: 🦅🦅🦅🦅🦅)

5 eagles is the only appropriate rating for this book. Its ability to show you a familiar world in a whole new light could only be matched by 5 eagles plucking you from your chair and taking you on a soaring journey over your city.

這本書只有5鷹。它以全新的眼光展示給您熟悉的世界的能力只有五只鷹從您的椅子上拔下并帶您飛越城市的高空旅程才能與之媲美。

翻譯自: https://towardsdatascience.com/designing-data-intensive-applications-book-review-cc34ba1f90a7