Learning Guide: Designing Data Solution Architecture - Resources

There are many aspects to a successful Data Solution Architecture, and this guide will review a set of curated resources that will get you started. The resources are split into several sections that roughly follow the order of your design decisions.   With the wide variety of possible architectures and platforms, this post concentrates on a solution’s data reporting and analytics side.

Start with a High-Level Data Solution Logical Architecture

When designing a data architecture, you must start with the requirements of your solution. All solutions can be broken down to a specific set of elements. As pictured below,;

  1. You have the information you want to collect

  2. Either in real-time streaming or in batches

  3. This information will be stored in a data lake or database

  4. Data will be made available for other people who will use it to create analytics

  5. The resulting data will be used to develop reports and make decisions.  

Most of the data solutions you create will start like this.

Your next decisions will be what tools you can use to manage your requirements successfully.   

Logical Data Architecture for an Analytical Solution

Most, if not all, solutions start with the following design layers;

  • Data Sources – Various types of data in different locations comprise information you need to bring together.  

  • Ingestion – Processes to automate and transform the source data into various storage areas.  The Orchestration layer represents this.

  • Store – Depending on the type of data you are gathering, you will bring the data into various storage areas. This allows data to be made available for further analysis down the workflow, or dataflow.

  • Model & Serve—Once you have your raw data, create data models and serve that data to downstream applications.

  • Analytic Data Store – These are generally read-only systems that store data that supports business intelligence and analytic-style queries. (See Delta Lake for another take)

  • Analyze and Report – This represents the applications and products that are served up to end users who use and make decisions on the data. The datasets at this level are ready to use.

These functional areas translate to various products and services in Azure. Below, you can see how these products translate into various physical solution designs. Some tools can be used in multiple layers.

General Azure Physical Solution Architecture (Technology)

First Resources – Start Here

The following resources provide you must have and must start with links. The Azure Architecture Center provides a landing page and guide to content. The site also provides a searchable set of Azure Architectures and use cases as both learning and inspiration. I have also listed various popular data solution architectures with examples of real-world deployments.

Overall Azure Data Architecture Design

As you start to look at the design of your solution, the Microsoft Application Architecture Guide guides a series of steps summarized below. With each decision, you need to take your solution requirements and run them past various architecture and technologies features. Then, you make tradeoffs to determine what will be best for your solution today and will grow and scale to match any planned growth.

This creates a tech stack that combines technologies in scope for your solution. In addition, there is guidance around the benefits and challenges of each technology you choose.

Architecture Style

What type of architecture you are constructing is the most basic detail and usually the first decision. It could be a big data solution, an old-school analytic solution, or part of a more traditional N-tier application. Numerous architectural styles require examining and selecting between advantages and challenges to each.

For our example here, we have been looking at big data architectures and will concentrate on those.

Learn more

Technology Choices

Application architectures start with a technology choice or answer a specific set of questions best explained by defining your workload. You can also use Power BI as an embedded solution in your web application..

It is essential to choose the right data store for your needs. The Azure database offerings offer many data implementations. You select your data stores by their structure and operations. Each store supports different types of operations, such as SQL and NoSQL.

Let’s use the documented example. You have an application in one of the following use cases;

  • Inventory management

  • CRM

  • Sales and Order Management

  • Event Organization

  • Reporting database

  • Accounting and Payroll

  • Employee Performance Data

You might define your workload this way;

  • Need to Create, Read, Update, and Delete (CRUD) heavy – frequently created and updated.

  • Support multiple operations and changes that must be completed in a single transaction—A.C.I.D. Atomicity, consistency, isolation, and durability

  • Data and subjects have relationships are enforced using database constraints

  • Indexes are used to optimize query performance

Your solution might classify the data you wish to store with these specifications;

  • Data is highly normalized.

  • Database schemas are required and enforced.

  • Many-to-many relationships between data entities in the database.

  • Constraints are defined in the schema and imposed on any data in the database.

  • Data requires high integrity. Indexes and relationships need to be maintained accurately.

  • Data requires strong consistency. Transactions operate to ensure that all data are 100% consistent for all users and processes.

  • The size of individual data entries is small to medium-sized.

Defining our solution requirements and specifications would lead you to use a Relational Data Store in this example. You would look at the following relational data services that match those workload requirements;

  • Azure SQL Database

  • Azure Synapse Analytics

  • Azure SQL Managed Instances

  • Azure VM running SQL Server

  • Azure Database for MySQL

  • Azure Database for PostgreSQL

  • Azure Database for MariaDB

Once you decide on architecture, there are many other things to consider. Decisions around costing and scale, for example, have a big impact on deployment and implementation. See Azure Pricing calculator to Estimate Your Azure Solution Costs for more information on determining Azure costing.

Learn More: This is just one data store; the reference articles below review other data architectures, advantages, and limitations.

Designing the Application Architecture

Once you have decided on the architectural style and technology components, your application's specific design will come together. Splitting the application architecture into the following areas will help organize the tasks and resources.

Reference Architectures

Rather than reinventing the wheel or even starting from scratch, several Reference Architectures may be a good place to start. Each architecture considers security, resilience, availability, and other design aspects. Some of these reference architectures also include a deployable solution.

Reference: Browse Azure Architecture – Azure Architecture Center | Microsoft Docs

Design Principles

Ten high-level Azure Data Solution Architecture design principles allow your solution to be more scalable, resilient, and manageable. These are general principles that you can use with any architectural style. These principles include;

  • Design for self healing. Design your application so it can survive and deal with failures.

  • Make all things redundant. Build redundancy as to avoid having single points of failure.

  • Minimize coordination. Minimize coordination between application services to achieve scalability.

  • Design to scale out. Your application should be designed to scale horizontally, adding or removing new instances as demand requires.

  • Partition around limits. Use partitioning to work around database, network, and compute limits.

  • Design for operations. Design your application so that the operations team has the tools they need.

  • Use managed services. When possible, use platform as a service (PaaS) rather than infrastructure as a service (IaaS). Let someone else manage your platform so you can concentrate on the solution.

  • Use the best data store for the job. Pick the storage technology that is the best fit for your data. Watch out for edge cases and future requirements growth. What happens if your application goes viral?

  • Design for evolution. If data solutions have one constant, requirements change over time. Don’t design yourself into a corner.

  • Build for the needs of business. Watch our for scope creep. A business requirement must justify every design decision.

Source: Design principles for Azure applications – Azure Architecture Center | Microsoft Docs

Design Patterns

Several Cloud design patterns address specific challenges in distributed systems. They include availability, high availability, operational excellence, resiliency, performance, and security. A specific section in the resource below covers Data management patterns.

Resource: Cloud design patterns – Azure Architecture Center | Microsoft Docs

Best Practices

I tend not to use the term Best Practice, assuming the suggestions are best for everyone. I prefer recommended practices as you should always make sure that you follow what is best for your specific solution and situation. One person’s best practice could grind your solution to a halt!!

Resource: Best practices in cloud applications – Azure Architecture Center | Microsoft Docs Specifically, Data partitioning and Monitoring and diagnostics.

Security best practices

Business data storage and processing require high confidentiality, integrity, and availability. The resources below cover some security and governance topics important for Azure Data Solution Architecture and design.

Resources:

  • Application security in Azure | Microsoft Docs – Applications and their associated data ultimately act as the primary store of business value on a cloud platform. This article covers a high-level review of application platform security topics.

  • Data Governance: Why You Need a Data Governance Process Now!! 5MinuteBI—Data Governance Initiatives must determine how to secure data usage, manage activity, and gain visibility and control of one of your most important assets immediately. In many of the analytic projects I have been involved in, whether big or small, guiding those using the data increases the adoption and long-term value of the solution.

  • Protection of customer data in Azure | Microsoft Docs – Protection of your data in Azure

  • Azure SQL Database security features | Microsoft Docs – To protect customer data and provide strong security features that customers expect from a relational database service, SQL Database has its own set of security capabilities. These capabilities build upon the controls that are inherited from Azure.

What is the Microsoft Azure Well-Architected Framework?

Successful Azure Data Solutions start with a well-defined and architected platform. As more and more solutions have been migrated to Azure, Microsoft has produced a series of recommended practices and guidance called the Azure Well-architected Framework

Remember that the framework’s goal is to help you design a solution that is of high quality, stable under load, cost-effective while being a scalable and efficient cloud architecture for your solution.

The Azure Well-Architected Framework is divided into various principles or tenets called the five pillars of architectural excellence. These sections review these principles with an eye to an Azure Data Solution architecture. In addition, links in the table provide more detailed information on the topic. 

Pillar (Link to Topic) Description
Reliability The ability of a system to recover from failures and continue to function.
Security Protecting applications and data from threats.
Cost Optimization Managing costs to maximize the value delivered.
Operational Excellence Operations processes that keep a system running in production.
Performance Efficiency The ability of a system to adapt to changes in load.

Source: Microsoft Azure Well-Architected Framework

Utilizing Data Lakes & Delta Lakes in a Data Architecture

Azure Blob Storage

The Azure Blob storage is a service that provides the ability to store large amounts of unstructured data, such as text or binary data, that can be accessed from anywhere in the world via HTTP or HTTPS. This is a low-cost way to store large amounts of data that can then be used in many different Azure services. Blob storage can expose data publicly to the world or store application data privately.

Azure Data Lake Store & Analytics

These services allow unstructured, semi-structured, and structured data to be stored in an enterprise-class service with no limits on data size. In addition, Azure Data Lake Store is secured, scalable, and built to the open HDFS standard, which can then run massively parallel analytics.

Resources:

  • Azure Data Lake Storage Gen2 Introduction – Data Lake Storage Gen2 converges the capabilities of Azure Data Lake Storage Gen1 with Azure Blob Storage. For example, Data Lake Storage Gen2 provides file system semantics, file-level security, and scale. Because these capabilities are built on Blob storage, you’ll also get low-cost, tiered storage, with high availability/disaster recovery capabilities.

  • Tutorial: Azure Data Lake Storage Gen2, Azure Databricks & Spark – This tutorial shows you how to connect your Azure Databricks cluster to data stored in an Azure storage account that has Azure Data Lake Storage Gen2 enabled. This connection enables you to natively run queries and analytics from your cluster on your data.

  • Azure Data Lake Storage Gen2 Hierarchical Namespace – A key mechanism that allows Azure Data Lake Storage Gen2 to provide file system performance at object storage scale and prices is the addition of a hierarchical namespace. This allows the collection of objects/files within an account to be organized into a hierarchy of directories and nested subdirectories in the same way that the file system on your computer is organized.

Azure Delta Lakes (Databricks)

Delta Lake is an interesting option for the Lakehouse architecture pattern put forward by Databricks. It addresses many of the challenges of traditional data architectures. As a result, this is becoming a prevalent option for data solutions. Learn More with this great introductory article, Simplify Your Lakehouse Architecture with Azure Databricks, Delta Lake, and Azure Data Lake Storage.

Delta Lake is an open-source storage layer that brings reliability to data lakes. Delta Lake offers ACID transactions, Scalable Metadata handling, and unified streaming and data processing. The Delta Lake runs on top of your existing data lake and is compatible with Apache.

Azure Databricks also includes Delta Engine, which provides optimized layouts and indexes for fast, interactive queries.

Resources:

Resources

Steve Young

With over 34 years in the tech industry, including 17 years at Microsoft, I’ve honed my Data Engineering, Power BI, and Enablement skills. My focus? Empowering Technical Education Professionals to excel with adding AI to their content creation workflow.

https://steveyoungcreative.com
Previous
Previous

The Productive No: Productivity at Work by Saying No the Right Way

Next
Next

How to Pick Color Combinations for Data Visualizations