Building an Effective Data Architecture:
Part 4 – Enabling Self-Service and Power Users
Kevin Michaud
Icon Analytics Co-Founder
More often than not, exploratory conversations with clients tend to lean towards self-service and end user enablement topics. Self-service is the area of data architecture that allows an organization’s employees to query and analyze its data sets. Unfortunately, we cannot give users free reign to build and query as they please. Consequences of free reign include downstream data silos, uncontrollable business logic, and deviations in coding standards. This article aims to define how to build upon the data architecture discussed in earlier installments of this series where we add self-service environments on top of the Governed data established in Part 2, and the Centralized Data Warehouse discussed in Part 3.
Self-service environments are segments of an organization’s data where business users can begin to see the impacts of an efficient data architecture. This is where reporting is performed and where company leadership can access and view the results of data transformations completed by upstream teams. There is no limit to the number of self-service environments a company can have. We generally recommend having one environment per Business Unit or Subject Area, depending on how your organization structures and defines its data. For most of this article I will be focused on building a single self-service environment. This process is repeatable so that standing up additional self-service environments is simple.
A self-service environment must be separated from the Centralized Data Warehouse we established in Part 3 of this series. Generally, the border between the Centralized Repository and a Self-Service Environment is a Business Intelligence (BI) or reporting tool such as Tableau, Power BI, or Looker. However, we do recognize the need to give end users the ability to build additional logical transformations on top of those already established in the Centralized Data Warehouse. For this use case, we use smaller localized repositories specific to the self-service environment, called a Datamart. Since the Datamart is specific to the Self-Service environment, it must also be logically separated from the Centralized Data Warehouse. This can be done in a number of ways depending on an organization’s established architecture. The simplest way is carving out a private schema on the same database that houses the Centralized Data Warehouse.
The Datamart is used to store Business Unit specific data transformations that may be too complex for the reporting tool to handle on its own. Transformations stored in the Datamart are not approved for enterprise-wide use, and therefore cannot reside within the Centralized Data Warehouse. A Datamart is like a miniature data warehouse exclusively used by the Business Unit owning the Self-Service environment in which it is held. For example, if multiple reports require common calculations, performing these calculations ahead of time improves performance when those calculation values are requested simultaneously. This also helps to enforce data integrity by ensuring the entire organization is calculating fields the same way.
After setting up the structure and objects of a Self-Service environment, it is important to then establish the governance around it. Each Self-Service environment will have the following roles and service accounts:
-
Roles:
-
User Access Role – This role allows all users within this Self-Service environment the ability to query the enterprise approved data sets stored within the Centralized Data Warehouse. It will also grant users read access to their localized Datamart.
-
Team Based Admin Role – This administration role will have write and execute privileges within the Self-Service environment only, not to the Centralized Data Warehouse. This role is generally granted to the Business Unit’s Power Users, or people who directly develop code against the data elements owned by that Business Unit. The Power Users can perform their development in both the reporting tool and the Datamart only. This role is needed if the Self-Service environment requires a Datamart.
-
-
Service Accounts:
-
reporting_tool – Needed for automated report builds and will be granted the User Access Role.
-
etl_tool – Only needed if a Datamart is present in the Self-Service environment. This service account will be granted the Team Based Admin Role.
-
For a deeper dive into roles and service accounts please check out Part 2 of this series: The Importance of Data Governance.
With this set up, we have enabled front end Business Users, Analysts, and Data Visualization specialists the ability to access and manipulate data as they see fit all under the same Data Governance umbrella as the Centralized Data Warehouse. This also ensures that these users cannot change or manipulate the other Self-Service environments out side of their own, nor the organization’s Centralized Data Warehouse. This architecture now has a point where an organization can wholeheartedly trust their data – the Centralized Data Warehouse.
With well-established Self-Service environments an organization can refocus their resource structures as well. By allowing Business Units to hire their own Power Users, the need for generalized Business Analysts is no longer needed. A Power User is someone technical enough to build reports and simplistic ETL flows, and understands the Business Unit’s specific data objects and organizational goals. This person should be a bridge between the Data Engineers and the Business Users.
If you feel that your organization could do a better job with reporting or trusting the output of those reports, you may want to explore or improve on Self-Service environments. Additionally, if you find that each new report requires a brand-new pipeline to be established by a Data Engineer, a Self-Service environment may be what you’re missing. Get in touch with us at www.iconanalytics.io/get-started to learn more and to see how we can optimize your data architecture. Stay tuned for the final article in this series where I discuss the Icon Layer, where we explore enterprise-wide reporting and how to set up trustworthy AI and Machine Learning!



