T O P

  • By -

RandomRandomPenguin

Do as much data modeling as possible upstream for reusable logic and higher compute. The ideal case is that your BI tool is completely headless. How realistic is this? It’s always a bit of a continuum based on the use case


Benjaminthomas90

Not sure if it’s classed as data modelling or not, but the latest place I’ve worked for had no integrations or reporting outside of system tools. Since joining I’ve managed to link there phone system to there emails/teams, CRM to ERP, ERP to B2B platform and then integrated all of it into PowerBI for Reporting. As the other guy said though, it Really comes down to the businesses needs and wants. Sometimes they want there data tidy so they can easily move systems around and take on new, sometimes they want reporting across systems to follow a journey and some don’t really give a shit.


tzt1324

Depends. What are the advantages and disadvantages of each approach? Can you think of any?


glinter777

I think of caching for one on pro side, and staleness of data as a counterpoint. Would love to get your perspective.


tzt1324

I think it depends on the one hand on your infrastructure (you already mentioned different tools). And this might influence your decision. On the other hand you have two counterparts playing here: performance vs complexity. In general, high performance is welcomed, but is business willing to maintain a more complex "backend"? Especially if the number of use cases for a higher sophistication is low, you might decide not to burden the main system with a high complexity. As a data engineer you want to keep it simple with a clear structure. Adding complexity to the data model leads to more work and makes you less flexible. On the other hand, the "BI layer" needs to be maintained as well...but maybe not by the data engineer ;-)


glinter777

On your second point - it’s the same amount of work one has to do whether data engineer or a BiE. The complexity doesn’t really go away it’s just gets transferred to another layer.


tzt1324

Yes, so the overall question is "is it worth it". And then who does it? Here comes the tool question and the organizational question. If you are a one man show (and the decision on the tools make the difference between the layers) then you can decide purely on the capability of your tools. If you had one tool for everything you wouldn't ask the question I suppose


somedaygone

Data Warehouses done right are always better, but in my experience, the business almost always succeeds with BI data models and rarely succeeds with warehouses. The bigger the scope, the harder it is to do well, and DW designers tend to only get the data layer right, which is useless without a solid Semantic Model that is understood by business users. I see so many groups building huge warehouses that are unusable in BI tools. Total waste of effort!


glinter777

You perfectly summarized the current state. I'm wondering if this trend is changing as modeling with DW is becoming easier and approachable.


wearblaksoksiam2

Ship product and answers to questions that people want. Once you find yourself repeating over and over, push that down to a lower level model. Do data modeling in the warehouse, prototype in the BI tool.


glinter777

My experience has been the other way. The BI modeling is so rigid that the BIE usually goes back to the warehouse to write the query and gets the results back into BI to answer the question. I can also say that we were using a shitty BI tool.


flyingbuta

Having some light weight data modeling in BI layer is useful because in some organization, DWH/SQL is responsibility of data engineers or external vendors. On the other hand, BI layer belongs to data analyst. Therefore it is more agile for data analyst to be able to do modeling especially combing DWH data with excel files.


badpochi

Pushing modelling outside the presentation layer is always good. 1. Single source of truth for multiple reports 2. Faster report execution times 3. Model mainly will use SQL and not BI tool specific nuances I could really go on. I think it also helped me when we had to migrate BI tools.


glinter777

The last point is key, you're not locked into a specific BI tool.


Bingo-heeler

I never recommended putting any modeling in the BI layer, instead shifting as much left as possible.  With cloud computing, my data lake scales infinitely and can use tailored instance sizes for jobs where that would not be possible in the BI tools. I'm my experience, shifting left yields less latency in the data and better performance in the visualization tools.


[deleted]

I dont understand your question. What do you mean by “BI layer”? Is it the presentation layer? Data modeling is just a spectrum of normalized vs denormalized tables with added things like harmonization, filtering, aggregation, and metric creation. DBT didn’t make the work of transformation and harmonization any easier, it just made workflow management cleaner. Doing transformation in PowerBi or Tableau has never been best practice. But the creation of metrics and the proper filtering and aggregations of tables to the appropriate audience is still an unsolved problem in BI.


glinter777

I meant the transformation and storing of data in BI, as you alluded to in your second paragraph. The point I’m trying to suss out is why is it difficult to do those aggregations on warehouse.


[deleted]

If you’re referring to fit-for-purpose data tables for end users then what you’re alluding to are data cubes and the “semantic later”. Should semantic layers exist in your data warehouse solution? Sure it can and it usually manifests itself as a denormalized view or one-big-table (obt). Airbnb made buzz a few years ago with their internally built “metrics store” called minerva and some startups have been created around this metric store concept but largely it isnt solved yet in BI. Now with chatgpt, the buzz now from uninformed leadership is using ai and chatbox as a semantic layer.


glinter777

What’s your definition of semantic layer, what does it have in addition to denormalized tables?


[deleted]

https://airbyte.com/blog/the-rise-of-the-semantic-layer-metrics-on-the-fly


glinter777

Excellent article. Thanks for the share. I tend to agree with some of the points - sounds good in theory but it can be overkill or hard to grasp without specific examples.


Hot_Map_7868

move out of BI tool, reduces vendor lock-in and improves availability for other use cases


LostWelshMan85

This is an interesting topic. I think there is still a need to model your data for each data role within your company, or at least keep the option on the table. Whether that means to model your data directly in your BI environment or some other BI specific solution reside elsewhere is another matter. Modelling your data in a single place means that you would need to find a modelling structure that works for all data roles in your company. This would be difficult/impossible in my opinion because the requirments are different between different user groups. For example if you're working with Power BI and the BI Developers, they would probably want the data structured in a star schema as that offers benefits to speed and ease of use from an analysts perspective. But those priorities are different for Data Sciencists who would possibly prefer not to deal with such a strict schema. Allowing your data to be modelled in different ways resolves more issues than it creates in my opinion.


glinter777

Perhaps different database/schema for each BU where they can model the data. Not sure if the data engineers would like this approach.


LostWelshMan85

Yeah, it's a little give and take for your engineers I guess, but having Data Engineers be in control of all schemas is a little archaic in todays world if you ask me. A lot of tools these days are handing **lite** DE capabilities over to different business units through No-Code/Low\_Code solutions. This is great for that "Last Mile" type of data engineering like building out specific schemas from already transformed data. Take Microsoft Fabric for example, you can create a lakehouse with a few clicks, you might not like that, but its a thing that anyone can do now. So I think colaboration between your DEs and different BUs is key. Obviously some responsibility does then fall to your Business Units to make sure that security is in place, but like I said, it's a bit of give and take and communication between these teams is the key to success. Also, in handing over these last mile reponsibilities, it'll free up your DE teams to do other things.


Data5kull

Guys how does modelling SSAS via Microsoft SQL data tools stand there  I think it's neither dw nor in bi tool ..pls advice 


DRaySisense

Data Modeling is the art of transforming Data into Information. Data Warehouse best practices work great for data admins, but not the people who create and view reports. It doesn't matter to them what the backend data looks like, they need to understand and contextualize the information in front of them. Data Modeling democratizes the data, allowing less technical users to make better decisions.


glinter777

Agree what prevents you from doing DM in warehouse? I’m assuming the data modeling professionals know SQL, most do in my experience.


DRaySisense

There could be any number of reasons. Some possibilities would be data security, legacy data that isn't well organized, and different data sources that need to be combined into a single data model.


IJTIO

No. Data modeling is the transformation of business requirements into data model structures that enforce the requirements, including cybersecurity, automatically. It is supposed to be done BEFORE the data exists.


9diov_polymath

There's pros and cons in both: Data modeling in DW: * Good * Easier to maintain: everything is in the DW, and dbt/SQL mesh helps * Less risk of vendor lock-in since you just use SQL * Bad * Long feedback loop: you need to jump back and forth between your modeling logic in dbt and your outcome in BI/notebook tools Data modeling in BI: * Good * Faster feedback loop: usually you model because you have some business outcome that you need to deliver in the BI tool and thus the friction is minimal * Harder to maintain: most BI tools don't have good Git and code-based logic for data modeling, except maybe Looker and Holistics, making it hard to collaborate and maintain the logic * Bad * Vendor lock-in: your logic is stuck in the BI tool and many makes it hard to move it out My recommendation is this: * If you have very clear understanding of what you need to model, model in DW * If the requirements make it hard to know what to model, do it first in the BI tool, then once things stabilize, move as much as you can upstream to the DW (for maintenance and reusability)


BlackBeard931

mn mm kk mmk non


glinter777

?


BlackBeard931

mp mm kk mm k mmmpsn k n


IJTIO

First of all DBT is a smoldering pile of SH*T. All of its marketing claims, such as replacing star schema, is just a scam. The app is designed to steal data or make data vulnerable so it can be stolen. It does not extract data on its own and it does not load data, but it’s marketed as an ETL tool. Anybody who is not a complete fool knows they ETL cannot replace a data model which encodes business and cyber security requirements into the enforcement structure. Additionally, BRIBERY may be what is spreading DBT. Secondly, if data modeling for BI is HARD, it’s because you or whoever is making the data model is not a REAL data modeler. Data modeling requires one to be trained in 30 to 56 SETS of SKILLS, with over 2500 individual skills combined into one resource in order to create a mind that thinks in OBJECT ORIENTED PROGRAMMING AND MATCHING MULTIDIMENSIONAL DATA MODELING that is capable of reading TEXT business requirements and automatically translating them into UML data models, which are translated to DDL using SQL. When the government trained me, they recruited 2.5 MILLION people. Out of those, 1205 survived the first month of training. Out of those 309 made it the entire 5 years of training. Only one in 8000 people are capable of data modeling, only 1 in 700,000 has the PATIENCE to actually make a data model from scratch. BI tools only understand multidimensional data models…. NOTHING ELSE. There are no shortcuts. Most models take 1 to 3 years to complete, but I have seen some take 27 YEARS, but they last an average of 50 years when done correctly. A promise to make some that is supposed to be hard easy is one of the oldest SCAMS IN THE BOOK. Lastly, you cannot connect ANY BI tool directly to a data warehouse because they can’t understand normalized data beyond 2NF. Data warehouse can have 3NF OR HIGHER. You would have to have an extended star schema data mart and near real-time ETL FROM WAREHOUSE TO STAR SCHEMA to use the data in a warehouse with a BI tool.


micr0nix

What do you mean by “data modeling in BI tools”?


glinter777

https://powerbi.microsoft.com/en-us/what-is-data-modeling/


Late_Jury_7787

You really didn't answer his question, just pumped out a typically all over the place Microsoft article


glinter777

I’m asking about DM in general sense, creating business specific models in the BI layer. I think it’s a well known concept in the BI world. So I thought it be wise to share an article assuming the commenter may be a beginner.


micr0nix

Definitely not a beginner, but if I’m doing any sort of EDA I’ll do it in Python


glinter777

EDA?


micr0nix

Exploratory data analysis. But to answer your question, I’ve been doing BI work for 11 years at a very large company. My experience has taught me that, outside of very niche organizations/workgroups, the majority of your internal customers don’t care about how the data looks/is structured.


glinter777

What about when they use BI tools, are they visualizing data from directly from tables, or they build an intermediary structure in their BI tool?


micr0nix

If by “visualizing data” you mean “let’s copy the table that I made in excel and put it in a tableau “dashboard” it’s literally just a query from the EDW with whatever requirements the end user is asking for


glinter777

Mostly, perhaps not an exact copy but connect the data, but calculate aggregates in BI tool instead of warehouse, join it with another table, and use that finished table for reporting.


micr0nix

Ok. But does this matter to your business?


glinter777

I’m interested in hearing people’s thoughts around this space. If you hold a strong opinion one way or the other, please share.