Data warehouses, Big data and Analytics. Do you need them?

Having worked in Data warehouse and Business analytics for a few years, I want to record a few lessons learnt. Data warehouse and analytics involve a lot of time, money and resources. They are inflexible and difficult to change once built. So it makes good sense to spend some time reflecting and analyzing the business case for such costly (and potentially career limiting) projects. Below are a few scenarios one comes across while debating about DWH.

Scenario 1: Are you building a Data warehouse to fix problems in operational databases? A lot of times, operational databases are built for limited functionality and they grow into inflexible behemoths over time. Reporting from these behemoths gets tedious and frustrating. The easy way out is to build an atomic layer over the operational databases and connect analytical tools over it. But this will not fix the fundamental problem of bad database design. At some point of time, it will all crumble. There is no way out of sound operational databases.

Scenario 2: Oftentimes clients want to be early adapters and force a business case to ride the hype cycle. One of our clients wanted to build a trade data warehouse from the general ledger for reasons best known to them. If existing systems and applications are well equipped to provide good reporting traceability and decision support, fancy toys like data warehouses and analytics do not provide much additional value add.

Scenario 3: Uniform meta data is important. Creating and enforcing a single definition of what something as simple as an account number means is very difficult to achieve if you have systems running for decades. The solution to this problem is not creating a parallel data universe with uniform data definitions and standards. The solution is to create a data strategy and inculcate data standards discipline across the organisation. It is a slow and long drawn process, fixing one system after another and will take years to consolidate. But it will ensure dependable data quality and a sound governance structure. With the other alternative, you will end up peddling two different worlds. One where you have control, data is organised and clean. But the other where all the clumsiness and irregularity of the legacy data exist and you are vulnerable to cyber attacks. In Europe, GDPR (General Data Protection Regulation) comes into force in 2018. Any slippage on maintaining confidentiality will invite regulatory scrutiny and probably wrath. It makes sense to start preparing now and keeping the house in order.

Scenario 4: Know what data points and decision support you are expecting from the analytics platform. Too many clients rush in to create start schema and snow flakes before actually identifying the metrics that support the business. For example, if you are a clearing house and you want to create a security price history to feed the risk engine, a separate price mart per a given time grain and product type, will make sense. But if you building a star schema with exposure data, because that is what you have, getting prices out of it might be difficult. 

Let me know if I am missing something or what you think. I would be very happy to earn some wisdom!

Comments