Maximizing Big Data Benefits Needs Strong Data Governance

Dr. Hadj Batatia, Director of Research, Mathematical and Computer Sciences, Heriot-Watt University Dubai
Dr. Hadj Batatia, Director of Research, Mathematical and Computer Sciences, Heriot-Watt University Dubai
4 months ago

How have big data analytics evolved in the last few years, and what are the most significant changes you’ve observed?

1. Big data analytics scrutinises vast datasets to reveal concealed patterns, correlations, and additional insights. The concept of big data has existed for years. Before the term big data was coined, companies were using simple analytics tools such as spreadsheets to identify trends and correlations. Most organisations know that capturing data from their various business operations is key to applying analytics to derive significant value. In terms of data, organisations have expanded their sources to cover social media, emails, texts, and sensor data.  Big data technologies evolved from spreadsheets, databases, and conventional data mining towards data lakes and data warehouses to be able to manage large volumes of structured and unstructured data. Analytics requires important effort for data preparation and complex analytical processing. Cloud computing and today hybrid and multi-cloud deployments contributed to the development of scalable and effective solutions, by reducing physical and financial constraints. Today, in-memory and real-time analytics make it possible to analyse data from computer memory instead of secondary storage to extract immediate insight and react quickly. Businesses are increasingly adopting machine learning technologies to enable more sophisticated analysis and automated decision-making through predictive modelling, with automated analytics within business workflows removing requirements for technical skills. Edge analytics is another technology that developed due to the proliferation of IoT devices, bringing analytics closer to where the data originates.

2. What industries are currently benefitting the most from big data analytics?

The size of the big data analytics market reached 307 billion US dollars in 2023. It is expected to achieve 348 billion dollars in 2024 and 924 billion by 2032. Many industries are benefiting from this revolution. Retail is the sector that benefits most with use cases in inventory management, customer retention and segmentation, pricing, and direct marketing. Media and entertainment companies make heavy use of big data for content recommendation, advertising, content production, and distribution strategies.  Transportation and logistics use big data analytics for planning, fleet management, and tracking. In manufacturing, big data analytics is used for predictive maintenance, supply chain optimisation, and quality control, allowing the discovery opportunities of new cost savings and revenue. Finance, public services, and healthcare are also transforming the practices and improving services using big data analytics.

3. How do big data solutions handle the increasing volume, variety, and velocity of data?

State-of-the-art data analytics solutions are designed to handle the inherent volume, variety, and velocity of big data. Distributed storage is the key approach to handle increasing volumes. NoSQL and NewSQL technologies provide high availability while retaining eventual consistency. On the other hand, distributed computing provides analytics frameworks that enable parallel processing of large datasets across computer clusters, ensuring scalability and efficiency. Modern data streaming and real-time processing, such as Apache Kafka and Apache Storm, deal with Velocity. Data lake ad hybrid persistence architectures are utilised to deal with the variety of data formats. Data lakes are used to handle raw unstructured data, while NoSQL databases are used for semi-structured data, and relational databases for structured data.

4. How are artificial intelligence and machine learning integrated into big data solutions?

Artificial intelligence and more precisely machine learning are integral to state-of-the-art big data analytics. All prominent market participants such as Azure Databricks, SAP Analytics Cloud, SAP HANA Cloud, IBM Watson Studio, MS Azure Synapse, H2O.ai, RapidMiner, and Background Data Solutions include advanced AI and machine learning algorithms. AI and ML capabilities improve predictive analytics, anomaly detection, and personalised customer experiences, which in turn fuels their widespread adoption across various sectors. The synergy of AI and ML in big data analytics brings many advantages, including enhanced data analytics, improved data quality, higher data security, and better visualisation and interpretability.

Natural Language Processing (NLP) techniques are used to analyse unstructured data such as social media posts, emails, review comments, and other documents. Image, signal, and video analysis are being increasingly adopted to analyse complex data for anomaly detection security, natural communication, among other use cases.

In addition, as mentioned earlier, the proliferation of IoT devices has led to the development of edge computing to provide real-time analytics by processing data closer to its source. Moreover, explainable AI is essential in critical applications. This technology provides justifications to how the AI models derive decisions, providing the necessary transparency and ability for human to interpret results. More recently, the emergence of large language models (LLM) has created a transformative approach to big data analytics.  This emerging technology allows automated data cleaning and preparation, analysis, and reporting. Commercial technologies such as OpenAI GPT, Hugging Face, EleutherAI, NLP Cloud, and AI21 Labs leverage LLMs to automate different aspects of big data analytics. This established trend is revolutionising data analytics by reducing reliance on conventional code-based approaches that require important technical skills. They reduce technical barriers, provide natural language interfaces, allow for faster insights, promote automation of analytics processes, and enhance collaboration of functional teams.

5. What benefits do AI-enhanced analytics bring to clients?

AI-enhanced analytics offers numerous benefits to companies in all industries. It allows to make informed decisions based on data for optimising processes, enhancing innovation, and allocating resources. More precisely, these technologies are useful for direct marketing and customer management. Personalised recommendations and analysis of customer preferences, comments, behaviour, and sentiment create personalised customer experience and improve satisfaction and retention. Automation of repetitive tasks is another benefit to clients, leading to higher efficiency and reduction of costs. The predictive power of AI-enhanced analytics allows client to forecast market changes, analyse risks, and anticipate future trends, allowing proactive action and planning to address future challenges.  Fraud and anomaly detection are also made more reliable with AI allowing for improved security and reducing business loss.  Using AI algorithms, companies can smooth supply chain, adapt production, eliminate waste leading to costs reduction and higher performance. More advanced AI algorithms and models are used to design more innovative products and services to respond to emerging markets ad customer needs.

6. How do analytics tools help clients derive actionable insights from their data?

There is no single technology that entirely encapsulate big data analytics. Several technologies need to be integrated and coordinated to help users get value of their data. The use of these tools follows the pipeline of data analytics. Data integration is the first stage in any data project. Big data tools make it possible to integrate data from various internal and external sources, including business operation databases, data warehouses and lakes, social media, documents, emails, and IoT devices. The next set of tools are dedicated to data preparation through cleaning, transformation, and normalisation. Integrated and prepared data are used by exploratory analysis tools that allow users to explore the data through statistical analysis and appropriate visualisation, where insight in terms of patterns, trends, relationships, and deviations are uncovered.  Descriptive analytic tools provide explanation of past performance and results, through the computation of key performance indicators and metrics. Whereas, predictive analytics tools use machine learning algorithms to forecast future trends and outcomes based on past data, allowing analysts to detect potential risks and opportunities enabling proactive decision-making. The more sophisticated prescriptive analysis tools are based on advanced artificial intelligence algorithms to establish a course of action or strategic plans to achieve business objectives. For this purpose, simulation of competing scenarios, the evaluation of trade-offs, and the assessment of risks are required. The real-time analytics tools are utilised to monitor operational metrics, detect anomalies in systems or processes, and act immediately in a reactive mode through corrective actions. Most AI-driven data analytics tools allow for creating sophisticated visualisation and generating reports. They also include collaboration and sharing features to allows different roles to interact and cooperate to accomplish the analytics tasks.

7. What are best practices for establishing data governance frameworks with data analytics platforms?

Maximising benefit from big data analytics requires businesses to establish a data governance strategy. Data governance deals with organising, securing, managing, and using methods and technologies to ensure correctness, consistency, and availability of the data. Modern data-driven companies are organised with data governance and management at their centre. Data governance involves specific roles and responsibilities, where the IT responsibility is often less than 20%. Chief data officers are the lead for data management, governance, and analytics. The data steering committee includes representatives from different departments and is the body that oversees the implementation of the data strategy. Data owners are roles that control the creation, access, and management of data in specific business operations. These roles are responsible for setting up the data governance that defines policies to organise, secure and manage data. These policies include data quality, classification, ownership, tracking, privacy, security, and life cycle from creation to disposal. Data governance frameworks are conceptual approaches to creating data governance and organisation. Two significant examples of data governance frameworks are DAMA, and Stanford data governance maturity model. Companies should choose a data governance framework or adapt and existing one. Roles and responsibilities are then defined, and the various data policies established.  This includes selecting data analytics platform and setting up analytics processes. For the data governance to be effective it needs assessment in terms of maturity. Continuous monitoring and improvement mechanism must be implemented to assess effectiveness, select areas for improvement, and iterate data governance policies.

Don't Miss

Dr Ryad Soobhany, Assistant Professor, Postgraduate Project Director, School of Mathematical and Computer Sciences, Heriot-Watt University Dubai.

Multi-device, multi-cloud requires robust security

The adoption of remote working in most industries, digital-first customer experiences, no-contact
Stephen Gill, Academic Head of the School of Mathematical and Computer Sciences, Heriot-Watt University Dubai.

Compute power, algorithm bias, key challenges

While many industries were affected with the onset of the pandemic, e-commerce