The return of data modeling -- this time it's strategic [Q&A]
Over the past decade data modeling -- setting up data structures aligned to business requirements -- has tended to take something of a back seat as businesses have rushed to bring products to market.
But we're producing more data than ever and need ways to process it effectively. That's why Satish Jayanthi, CTO and co-founder at Coalesce, believes it's time for data modeling to make a comeback in enterprise strategy. We spoke to him to find out more.
BN: Why has data modeling taken a back seat in recent years, and what is the cause of its resurgence?
SJ: Data modeling has always been one of the most critical steps in analytics projects, setting the groundwork to create databases, populate warehouses, manage data, and grant access to information in meaningful and governed ways. In recent years, the practice fell out of favor because it was 'too hard' and 'too slow'.
Not surprisingly, the demand for speed often came at the expense of data quality. The general sentiment was that we needed to get data to our users, so we pushed the product out, but all we did was push the problem downstream to users. Strategic attention has returned to data modeling as businesses aim to extract measurable business value from their data. Data modeling is also helping companies cut compute costs as modeling is about publishing reusable, consumable artifacts, which data engineers and analysts can reuse and eliminate the need to repeat pipelines.
In recent months, we've seen more emphasis on data modeling as it becomes even more critical with the increase in AI and machine learning use. Having trustworthy, governed data will result in accurate AI learnings and models, while data not in a suitable state or governed will skew AI and ML effectiveness.
BN: What are some of the key challenges associated with modeling complex, unstructured data, and how do you overcome them?
SJ: As organizations consider rearchitecting their data warehouse, flexibility must remain a priority. Decommissioning and rearchitecting a data warehouse can be compared to the challenge of replumbing an entire neighborhood without causing any disruption to water service and supply -- it will require a flexible and adaptable model.
Unstructured data comes in many forms, such as text, images, audio, and video. The challenges and solutions vary depending on the type of unstructured data. For instance, it's not uncommon for unstructured data to contain inconsistent, duplicate information. As a result, deciphering the context and extracting meaningful insights might become complicated and require additional data preprocessing methods to clean up and organize. Another challenge is processing unstructured data can be resource-intensive, and scalability could be a big issue. Using distributed processing frameworks and leveraging cloud platforms would be a way to address this particular challenge.
Overall, exploratory data analysis, experimentation, and iterative modeling refinement are crucial for success.
With a sound data modeling strategy, data can live up to its promises and deliver on the possibilities that come with proper practices, such as simplified, logical databases, reduced redundancies, minimal storage requirements, more efficient retrieval, and advanced analytics.
BN: What role do you see machine learning and artificial intelligence playing in the future of data modeling?
SJ: Data modeling is a key piece of the AI puzzle. As we look at generative AI uses today, the output is only as good as the data it can access.
The impact of data modeling on AI and machine learning is more pronounced, but both AI and machine learning will impact how data modeling and other data processes occur.
The most significant role that AI/ML will play in the future of data modeling is speed and efficiency. The pattern recognition capabilities of AI/ML can automatically identify entities and the hidden relationships that exist between them based on historical data and, therefore, significantly speed up the data modeling process.
AI/ML can potentially make data modeling accessible to individuals, particularly on the business side, unfamiliar with the data modeling practices but with more profound knowledge about the data itself. As a result, business users can build resilient and scalable data products.
BN: How do you ensure that data models are aligned with business objectives and meet the needs of end-users?
SJ: Without a sustainable data architecture to deliver high-quality data, businesses will not reach the promise of predictive analytics, machine learning and AI or even be able to make truly data-driven decisions. And in today’s world of generative AI tools like ChatGPT, a business’s ability to successfully integrate and take advantage of these AI technologies to boost productivity and output and decrease spend and labor will distinguish leaders from laggards.
One of the most common reasons for the failure of data projects is that the output produced doesn't meet the stated business needs. Since the data model is the foundation of a data analytics project, it's crucial to ensure that the built model aligns with the business objectives. There are several ways to achieve this, including the following:
- Collaborate with key stakeholders and domain experts to understand the business objectives thoroughly.
- Establish quantifiable metrics such as accuracy rates, customer satisfaction, revenue impact, or efficiency gains. By defining these metrics upfront, you can evaluate and measure the success of the data models against the desired outcomes.
- Take an iterative approach. Deploy the model's first version, gather end-user feedback, and incorporate their input into subsequent iterations. Ensure that the models are continuously adjusted to meet the evolving business needs.
- Rigorously test and validate the data models before deployment. Conduct thorough validation against historical data, simulate real-world scenarios, and compare model performance against predefined success metrics. This validation process helps ensure that the models are accurate, reliable, and capable of addressing specific business objectives.
BN: How do you think data modeling will continue to evolve in the coming years?
SJ: Data modeling will evolve by integrating artificial intelligence and machine learning techniques for improved predictive capabilities. Automation and AutoML will streamline the modeling process, for instance, by identifying entities and detecting hidden relationships. As a result, self-service data modeling may become a reality where a businessperson with a solid understanding of the data but needs to gain data modeling experience can actively participate in the process.
BN: What are some of the key factors to consider when optimizing data models for performance and scalability?
SJ: In general, there is yet to be a universally optimized data model. Instead, a data model is optimized based on the specific workload it is designed to support. For example, a highly normalized data model is well-suited for an operational database prioritizing transactional integrity and write performance. On the other hand, a denormalized model like a star schema is more suitable for an analytics database where read performance takes precedence. The optimization of a data model depends on the intended use case and the specific requirements of the workload it will handle.
Choosing appropriate data storage technologies, such as relational databases, NoSQL databases, or data lakes, depends on data volume, data structure, query patterns, and scalability requirements. Efficient indexing, partitioning, and caching strategies can optimize data access and retrieval.
BN: How can organizations stay up to date with the latest trends and best practices in data modeling?
SJ: Staying current with the latest trends and best practices in data modeling is crucial for organizations to ensure they are leveraging the most effective techniques. Here are some strategies that organizations can adopt to stay current:
- Follow thought leaders and industry experts: Identify influential thought leaders, data modeling experts, and researchers in the field and follow them through their blogs, social media accounts, and publications. Their insights and thought-provoking content can help organizations stay informed about emerging trends and innovative approaches.
- Engage in continuous learning and training: Encourage employees to participate in ongoing learning and training programs related to data modeling. This could involve attending workshops, webinars, or online courses from reputable institutions or industry-leading organizations. Keeping skills and knowledge up to date will ensure organizations are well-equipped to adopt the latest practices.
- Collaborate with consultants and external experts: Partnering with external consultants or experts specializing in data modeling can provide organizations with valuable guidance and access to cutting-edge practices.