Learning Guide: Designing Data Solution Architecture - Resources
There are many aspects to a successful Data Solution Architecture, and this guide will review a set of curated resources that will get you started. The resources are split into several sections that roughly follow the order of your design decisions. With the wide variety of possible architectures and platforms, this post concentrates on a solution’s data reporting and analytics side.
Start with a High-Level Data Solution Logical Architecture
When designing a data architecture, you must start with the requirements of your solution. All solutions can be broken down to a specific set of elements. As pictured below,;
You have the information you want to collect
Either in real-time streaming or in batches
This information will be stored in a data lake or database
Data will be made available for other people who will use it to create analytics
The resulting data will be used to develop reports and make decisions.
Most of the data solutions you create will start like this.
Your next decisions will be what tools you can use to manage your requirements successfully.
Logical Data Architecture for an Analytical Solution
Most, if not all, solutions start with the following design layers;
Data Sources – Various types of data in different locations comprise information you need to bring together.
Ingestion – Processes to automate and transform the source data into various storage areas. The Orchestration layer represents this.
Store – Depending on the type of data you are gathering, you will bring the data into various storage areas. This allows data to be made available for further analysis down the workflow, or dataflow.
Model & Serve—Once you have your raw data, create data models and serve that data to downstream applications.
Analytic Data Store – These are generally read-only systems that store data that supports business intelligence and analytic-style queries. (See Delta Lake for another take)
Analyze and Report – This represents the applications and products that are served up to end users who use and make decisions on the data. The datasets at this level are ready to use.
These functional areas translate to various products and services in Azure. Below, you can see how these products translate into various physical solution designs. Some tools can be used in multiple layers.
General Azure Physical Solution Architecture (Technology)
First Resources – Start Here
The following resources provide you must have and must start with links. The Azure Architecture Center provides a landing page and guide to content. The site also provides a searchable set of Azure Architectures and use cases as both learning and inspiration. I have also listed various popular data solution architectures with examples of real-world deployments.
Azure Architecture Center – Microsoft Docs – Main landing page for Azure Architecture. Guidance for architecting solutions on Azure using established patterns and practices.
Browse Azure Architecture – Azure Architecture Center | Microsoft Docs – Find reference architectures, technology descriptions, real-world examples, and solution ideas for common workloads on Azure.
Solution Architecture Example: Analytics end-to-end with Azure Synapse – | Microsoft Docs – This example scenario demonstrates how to use the extensive family of Azure Data Services to build a modern data platform capable of handling the most common data challenges in an organization. The solution described in this article combines a range of Azure services that will ingest, store, process, enrich, and serve data and insights from different sources (structured, semi-structured, unstructured, and streaming).
Solution Architecture Example: Analytics architecture design – Azure Architecture Center | Microsoft Docs – The workflow starts with learning about common approaches, aligning processes and roles around a cloud mindset.
Solution Architecture Example: Enterprise business intelligence – Azure Reference Architectures | Microsoft Docs – This reference architecture implements an extract, load, and transform (ELT) pipeline that moves data from an on-premises SQL Server database into Azure Synapse and transforms the data for analysis.
Solution Architecture Example: SQL Server on Azure Virtual Machines with Azure NetApp Files – Microsoft Docs – The most demanding SQL Server database workloads require very high I/O capacity. They also need low-latency access to storage. This document describes a high-bandwidth, low-latency solution for SQL Server workloads.
Solution Architecture Example: Enterprise Data Warehouse Architecture | Microsoft Docs – An enterprise data warehouse lets you bring together all your data at any scale easily, and to get insights through analytical dashboards, operational reports, or advanced analytics for all your users.
Solution Architecture Example: Real Time Analytics on Big Data Architecture – Azure Solution Ideas – Get insights from live streaming data with ease. Capture data continuously from any IoT device, or logs from website clickstreams, and process it in near-real time.
Solution Architecture Example: Demand forecasting – Azure Solution Ideas | Microsoft Docs – Almost every business needs to predict the future to make better decisions and allocate resources more effectively. This article focuses on presenting useful links to the forecasting best practices and an example of detailed architecture for an end-to-end implementation in Azure.
Overall Azure Data Architecture Design
As you start to look at the design of your solution, the Microsoft Application Architecture Guide guides a series of steps summarized below. With each decision, you need to take your solution requirements and run them past various architecture and technologies features. Then, you make tradeoffs to determine what will be best for your solution today and will grow and scale to match any planned growth.
This creates a tech stack that combines technologies in scope for your solution. In addition, there is guidance around the benefits and challenges of each technology you choose.
Architecture Style
What type of architecture you are constructing is the most basic detail and usually the first decision. It could be a big data solution, an old-school analytic solution, or part of a more traditional N-tier application. Numerous architectural styles require examining and selecting between advantages and challenges to each.
For our example here, we have been looking at big data architectures and will concentrate on those.
Learn more:
Azure Application Architecture Guide – Azure Architecture Center – Overall application architecture, a set of architecture styles that are commonly found in cloud applications.
Big data architecture style – Azure Application Architecture Guide – A big data architecture is designed to handle the ingestion, processing, and analysis of data that is too large or complex for traditional database systems. This article provides an overview.
Technology Choices
Application architectures start with a technology choice or answer a specific set of questions best explained by defining your workload. You can also use Power BI as an embedded solution in your web application..
It is essential to choose the right data store for your needs. The Azure database offerings offer many data implementations. You select your data stores by their structure and operations. Each store supports different types of operations, such as SQL and NoSQL.
Let’s use the documented example. You have an application in one of the following use cases;
Inventory management
CRM
Sales and Order Management
Event Organization
Reporting database
Accounting and Payroll
Employee Performance Data
You might define your workload this way;
Need to Create, Read, Update, and Delete (CRUD) heavy – frequently created and updated.
Support multiple operations and changes that must be completed in a single transaction—A.C.I.D. Atomicity, consistency, isolation, and durability
Data and subjects have relationships are enforced using database constraints
Indexes are used to optimize query performance
Your solution might classify the data you wish to store with these specifications;
Data is highly normalized.
Database schemas are required and enforced.
Many-to-many relationships between data entities in the database.
Constraints are defined in the schema and imposed on any data in the database.
Data requires high integrity. Indexes and relationships need to be maintained accurately.
Data requires strong consistency. Transactions operate to ensure that all data are 100% consistent for all users and processes.
The size of individual data entries is small to medium-sized.
Defining our solution requirements and specifications would lead you to use a Relational Data Store in this example. You would look at the following relational data services that match those workload requirements;
Azure SQL Database
Azure Synapse Analytics
Azure SQL Managed Instances
Azure VM running SQL Server
Azure Database for MySQL
Azure Database for PostgreSQL
Azure Database for MariaDB
Once you decide on architecture, there are many other things to consider. Decisions around costing and scale, for example, have a big impact on deployment and implementation. See Azure Pricing calculator to Estimate Your Azure Solution Costs for more information on determining Azure costing.
Learn More: This is just one data store; the reference articles below review other data architectures, advantages, and limitations.
Understand data store models—Generally, you should start by considering which storage model is best suited for your requirements. Then, consider a particular data store within that category based on factors such as feature set, cost, and ease of management. This article covers a great process for doing this.
Criteria for choosing a data store – This article describes the comparison criteria you should use when evaluating a data store. The goal is to help you determine which data storage types can meet your solution’s requirements.
Must Read !!: – Data store decision tree – Azure Application Architecture Guide | Microsoft Docs – Azure offers a number of managed data storage solutions, each providing different features and capabilities. This article will help you to choose a managed data store for your application.
Review your storage options – Cloud Adoption Framework – Storage capabilities are critical for supporting workloads and services that are hosted in the cloud. Review this information to plan for your storage needs as you prepare for your cloud adoption.
Review your data options – Cloud Adoption Framework – When you prepare your landing zone environment for your cloud adoption, you need to determine the data requirements for hosting your workloads.
Designing the Application Architecture
Once you have decided on the architectural style and technology components, your application's specific design will come together. Splitting the application architecture into the following areas will help organize the tasks and resources.
Reference Architectures
Rather than reinventing the wheel or even starting from scratch, several Reference Architectures may be a good place to start. Each architecture considers security, resilience, availability, and other design aspects. Some of these reference architectures also include a deployable solution.
Reference: Browse Azure Architecture – Azure Architecture Center | Microsoft Docs
Design Principles
Ten high-level Azure Data Solution Architecture design principles allow your solution to be more scalable, resilient, and manageable. These are general principles that you can use with any architectural style. These principles include;
Design for self healing. Design your application so it can survive and deal with failures.
Make all things redundant. Build redundancy as to avoid having single points of failure.
Minimize coordination. Minimize coordination between application services to achieve scalability.
Design to scale out. Your application should be designed to scale horizontally, adding or removing new instances as demand requires.
Partition around limits. Use partitioning to work around database, network, and compute limits.
Design for operations. Design your application so that the operations team has the tools they need.
Use managed services. When possible, use platform as a service (PaaS) rather than infrastructure as a service (IaaS). Let someone else manage your platform so you can concentrate on the solution.
Use the best data store for the job. Pick the storage technology that is the best fit for your data. Watch out for edge cases and future requirements growth. What happens if your application goes viral?
Design for evolution. If data solutions have one constant, requirements change over time. Don’t design yourself into a corner.
Build for the needs of business. Watch our for scope creep. A business requirement must justify every design decision.
Source: Design principles for Azure applications – Azure Architecture Center | Microsoft Docs
Design Patterns
Several Cloud design patterns address specific challenges in distributed systems. They include availability, high availability, operational excellence, resiliency, performance, and security. A specific section in the resource below covers Data management patterns.
Resource: Cloud design patterns – Azure Architecture Center | Microsoft Docs
Best Practices
I tend not to use the term Best Practice, assuming the suggestions are best for everyone. I prefer recommended practices as you should always make sure that you follow what is best for your specific solution and situation. One person’s best practice could grind your solution to a halt!!
Resource: Best practices in cloud applications – Azure Architecture Center | Microsoft Docs Specifically, Data partitioning and Monitoring and diagnostics.
Security best practices
Business data storage and processing require high confidentiality, integrity, and availability. The resources below cover some security and governance topics important for Azure Data Solution Architecture and design.
Resources:
Application security in Azure | Microsoft Docs – Applications and their associated data ultimately act as the primary store of business value on a cloud platform. This article covers a high-level review of application platform security topics.
Data Governance: Why You Need a Data Governance Process Now!! 5MinuteBI—Data Governance Initiatives must determine how to secure data usage, manage activity, and gain visibility and control of one of your most important assets immediately. In many of the analytic projects I have been involved in, whether big or small, guiding those using the data increases the adoption and long-term value of the solution.
Protection of customer data in Azure | Microsoft Docs – Protection of your data in Azure
Azure SQL Database security features | Microsoft Docs – To protect customer data and provide strong security features that customers expect from a relational database service, SQL Database has its own set of security capabilities. These capabilities build upon the controls that are inherited from Azure.
What is the Microsoft Azure Well-Architected Framework?
Successful Azure Data Solutions start with a well-defined and architected platform. As more and more solutions have been migrated to Azure, Microsoft has produced a series of recommended practices and guidance called the Azure Well-architected Framework.
Remember that the framework’s goal is to help you design a solution that is of high quality, stable under load, cost-effective while being a scalable and efficient cloud architecture for your solution.
The Azure Well-Architected Framework is divided into various principles or tenets called the five pillars of architectural excellence. These sections review these principles with an eye to an Azure Data Solution architecture. In addition, links in the table provide more detailed information on the topic.
Pillar (Link to Topic) | Description |
---|---|
Reliability | The ability of a system to recover from failures and continue to function. |
Security | Protecting applications and data from threats. |
Cost Optimization | Managing costs to maximize the value delivered. |
Operational Excellence | Operations processes that keep a system running in production. |
Performance Efficiency | The ability of a system to adapt to changes in load. |
Source: Microsoft Azure Well-Architected Framework
Utilizing Data Lakes & Delta Lakes in a Data Architecture
Azure Blob Storage
The Azure Blob storage is a service that provides the ability to store large amounts of unstructured data, such as text or binary data, that can be accessed from anywhere in the world via HTTP or HTTPS. This is a low-cost way to store large amounts of data that can then be used in many different Azure services. Blob storage can expose data publicly to the world or store application data privately.
Azure Data Lake Store & Analytics
These services allow unstructured, semi-structured, and structured data to be stored in an enterprise-class service with no limits on data size. In addition, Azure Data Lake Store is secured, scalable, and built to the open HDFS standard, which can then run massively parallel analytics.
Resources:
Azure Data Lake Storage Gen2 Introduction – Data Lake Storage Gen2 converges the capabilities of Azure Data Lake Storage Gen1 with Azure Blob Storage. For example, Data Lake Storage Gen2 provides file system semantics, file-level security, and scale. Because these capabilities are built on Blob storage, you’ll also get low-cost, tiered storage, with high availability/disaster recovery capabilities.
Tutorial: Azure Data Lake Storage Gen2, Azure Databricks & Spark – This tutorial shows you how to connect your Azure Databricks cluster to data stored in an Azure storage account that has Azure Data Lake Storage Gen2 enabled. This connection enables you to natively run queries and analytics from your cluster on your data.
Azure Data Lake Storage Gen2 Hierarchical Namespace – A key mechanism that allows Azure Data Lake Storage Gen2 to provide file system performance at object storage scale and prices is the addition of a hierarchical namespace. This allows the collection of objects/files within an account to be organized into a hierarchy of directories and nested subdirectories in the same way that the file system on your computer is organized.
Azure Delta Lakes (Databricks)
Delta Lake is an interesting option for the Lakehouse architecture pattern put forward by Databricks. It addresses many of the challenges of traditional data architectures. As a result, this is becoming a prevalent option for data solutions. Learn More with this great introductory article, Simplify Your Lakehouse Architecture with Azure Databricks, Delta Lake, and Azure Data Lake Storage.
Delta Lake is an open-source storage layer that brings reliability to data lakes. Delta Lake offers ACID transactions, Scalable Metadata handling, and unified streaming and data processing. The Delta Lake runs on top of your existing data lake and is compatible with Apache.
Azure Databricks also includes Delta Engine, which provides optimized layouts and indexes for fast, interactive queries.
Resources:
Delta Lake on Azure – Microsoft Tech Community – Shows how Delta integrate with other Azure Services.
What is Delta Lake – Azure Synapse Analytics – Azure Synapse Analytics is compatible with Linux Foundation Delta Lake. Delta Lake is an open-source storage layer that brings ACID (atomicity, consistency, isolation, and durability) transactions to Apache Spark and big data workloads. The current version of Delta Lake included with Azure Synapse has language support for Scala, PySpark, and .NET.
Delta Lake and Delta Engine guide – Azure Databricks – Azure Databricks also includes Delta Engine, which provides optimized layouts and indexes for fast interactive queries. This guide covers Delta Lake on Azure Databricks and Delta Engine.
Tutorial: Delta Lake QuickStart – Azure Databricks – Step by step tutorial to get started.
Resources
The emerging big data architectural pattern | Azure blog and updates – The Lambda architecture is a popular pattern that allows you to handle massive quantities of data by taking advantage of both a batch and stream-processing layer. This article reviews the reasons that have led to the popularity and success of the lambda architecture, particularly in big data processing pipelines.
Azure Icons – Azure Architecture Center | Microsoft Docs – For your diagrams, these are SVG graphics that can be used in your documentation.
Modern analytics architecture with Azure Databricks – Azure Solution Ideas | Microsoft Docs – Azure Databricks forms the core of the solution. This platform works seamlessly with other services such as Azure Data Lake Storage, Azure Data Factory, Azure Synapse Analytics, and Power BI.
Case Study: How to reduce infrastructure costs by up to 80% with Azure Databricks and Delta Lake – Microsoft Tech Community – The implementation of the modern data architecture allowed Relogix to scale back costs on wasted compute resources by 80% while further empowering their data team.
Microsoft Cloud Adoption Framework for Azure – Microsoft Doc – The Cloud Adoption Framework is a collection of documentation, implementation guidance, best practices, and tools that are proven guidance from Microsoft designed to accelerate your cloud adoption journey.
Microsoft Azure Well-Architected Framework – Azure Architecture Center | Microsoft Docs – The Azure Well-Architected Framework is a set of guiding tenets that can be used to improve the quality of a workload.