Building an Effective Data Architecture:
Part 1 – The Difficulties of Modernizing Your Analytics Platform
Kevin Michaud
Icon Analytics Co-Founder
In today’s day and age, having a data analytics platform is an essential element of a successful business. Realizing this, and to better understand their data, many companies will try to modernize their business by developing a data analytics platform. After building and improving many of these platforms, my partners and I discussed why so many companies were unable to accomplish a profitable build on their own. The common theme coming out of these conversations revolved heavily around front-end data architecture, and the ability of an organization to access, trust, and understand their data. With these common issues showing themselves to front-end users, adoption of these platforms can be low. My partners and I agreed most of these pain points commonly stem from upstream systems and processes:
Data Silos - As more end users or business units begin to access and analyze your organization’s data, the data itself can become siloed. This means what one user may aggregate their data on for reporting, may not necessarily match the aggregations or logic transformations another user does for their report. Or even worse, business units will source, transform, and report on data separately from the rest of the organization. This creates a sense of data distrust from leadership, not knowing which set of reports are true and accurate. This data distrust can make an organization feel overwhelmed by their data and feel unable to effectively use what is available to drive business growth. This can be solved with a dedicated data governance team in tandem with a centralized data source, and enterprise reporting.
No Coding Standards - There is a lack of standardization when building data ingestion pipelines that leads to support complexity, and confusion if gone unchecked. We like to call this Spaghetti Code because it becomes impossible to understand where one ingestion pipeline starts and another ends. You know you have Spaghetti Code when new ingestion pipelines pointing to different silos of data are required for every new report being built. This produces large lead times because everything required for a new report must be built from scratch starting at different data sources. This issue is solved by a centralized data source and a dedicated team of data engineers.
Technology Redundancy - This usually happens when an organization purchases a new tool or shifts the platform to a new coding language in good faith for architecture improvement. However, we will commonly see that the effort to remove users from legacy systems was poorly executed, or there was a fear to shut down the old system and potentially lose important data or logic transformations. In such cases, an organization must now support two systems which will stretch data engineers thin and will create additional data silos. It becomes apparent that these issues can overlap with one another and compound themselves quickly. Fortunately, implementing one solution can help solve more than one problem: similar to the lack of coding standards, technology redundancy can be mitigated by having a centralized data source and a dedicated team of data engineers.
Lack of Self-Service - An organization’s IT staff, or data engineers, are not well versed on the organization’s business rules or definitions but are needed to build and execute data transformations required for reporting. While data engineers are a necessity to keep a platform operational and efficient, asking them to also understand your data’s business definitions stretches that team very thin. Data engineers can end up acting as business analysts in these situations. They will work with end users to gather business requirements for new data sets, then promptly shift roles and will be needed to implement the technical requirements that satisfy the end user requests. In other instances, organizations will hire full time business analysts which can cut into the organization’s bottom line when they may not be needed in the first place. Providing your business units with a self-service layer that pulls data from a centralized data source allows your engineers and business unit SME’s to do the work they are best suited for.
Uncontrollable Business Logic – As additional business units gain access to the analytics platform, some organizations experience business logic being added on top of their centralized data faster than they can maintain it. This will often create another source of data distrust especially when data is allowed to be daisy chained from one business unit to another without any checks or balances. As one business unit’s logic gets added on top of another’s, the issue of Spaghetti Code and Data Silos reappears, but further downstream than in the initial issues presented above. Again, this problem is one that overlaps with previously mentioned issues but can be remedied with a combination of previously mentioned solutions. A data governance team, and enterprise reporting keep your organization’s business logic in check and helps maintain confidence in your data analytics.
I will be releasing a series of articles that will explore the solutions to each of these common data architecture pain points in depth. Some of these solutions are common practice in the data analytics industry today, and you may have heard of or explored a few of these options already. However, the partners at Icon Analytics have been able to successfully pair these more common solutions with a new practice that we are excited to bring to our clients. This new practice does require work up front to establish the framework upon which it operates, but allows your organization to easily scale and adapt to changes without ever losing trust of your data. In this series of articles, we will explore the benefits and the approach needed to implement these foundational solutions and our new practice:
-
A strong Data Governance layer to help manage the accuracy and reliability of your data.
-
A Centralized Data Source to establish a point in the data architecture, near the front-end, where your accurate and reliable, governed data can safely reside.
-
Enablement of self-service reporting to give your users with the most knowledge, the power to build and experiment on their own, without external blockers.
-
Implementation of "The Icon Layer" where accurate and reliable enterprise-wide reports can safely reside.
Be on the lookout for links to these articles on whatever platform you found this one, or check back here regularly where we will post these articles. Here, you can also find contact details to get in touch with us and continue the conversation: www.iconanalytics.io/get-started


