Taming the data deluge

Fred Lherault, Field CTO Emerging at Pure Storage, explores how organisations can turn the overwhelming surge of data into lasting value through smarter storage, resilience, and sustainability.

Everyone is aware that there is an enormous amount of data being created every minute of every day. AI and GenAI generated content and images have only added to the mountain of data. In contrast to today’s digital data, one of the most ancient written records is a customer complaint about sub-standard copper, engraved into clay. At around 3,500 years old, it’s fascinating to see a) what the day to day lives of people were like millennia later, and b) how well preserved the clay is. Will future generations look back at the data we’re creating and feel the same?

Back to today’s data deluge, more information than ever needs to be stored. Costly, difficult to manage, underperforming and energy-inefficient storage options aren’t going to be able to support modern organisations. The question is how to deal with the volumes created today, and keep an eye on future needs.

Differing value of human and machine generated data

An increasingly important percentage of the data created every day is machine generated rather than human generated. Consider the amount of data that a single high resolution security camera can generate. Add to this all of the data created by IT systems for security, resilience or regulatory reasons and it can feel an insurmountable problem for companies to deal with this mass of data. Add onto this AI generated data, which many organisations are only now starting to deal with, and it’s clear that we need a scalable and flexible strategy to deal with this data deluge. While it’s doubtful future generations will look back at this information with fascination, it is generally accepted that we want to keep as much data as possible as it only takes one issue to make organisations wish they hadn’t deleted a certain data set. Interestingly there is a growing realisation that human created data is proving to be essential to train new AI models and avoid the so-called “model collapse”, so the value of human created data is definitely higher.

Growth and gravity

It is estimated that 90% of the data available worldwide was generated in the last 2 years. In practice this means that the data you will generate in the next few years will quickly outpace the total sum of data you already have. This exponential data growth means that it is critical to put the right policies in place in terms of location, protection and retention, as the related issues will only get bigger over time. Data gravity means that not only is a data set harder to move the larger it becomes, it will tend to attract smaller data sets to the same location, especially as they are likely to be linked by the application. This compounds the issue, multiple data sets grouping together become larger and the applications that rely on the larger data sets will not be able to be moved easily.

Dealing with the data deluge

Here are some strategies that organisations should look at in order to better handle these challenges:

Hybrid placement and mobility:

Most large organisations have a hybrid approach to Cloud, with some data sets on-premises and others in the public cloud. When it comes to data placement, it is key to understand the various implications in terms of cost, security, resilience, and deal with change as soon as an issue is identified because as data grows, it will only become harder to move. One should always ask “are my decisions still valid if this dataset grows 10x?”. While 10x might seem a lot, a growth rate of 40% equates to 10x growth in just seven years. It’s also important that should data need to be moved, it can be done as easily as possible without requiring a rewrite of what is further up the software and infrastructure stack.

On-demand consumption for new requirements:

On-demand consumption of on-premises storage systems can help deal with not just unplanned requirements, but also understand the current needs and growth profile of new applications and the associated data sets, while allowing for an easy transition to technology platforms owned by the organisation once the growth profile is understood.

Security and the growing need for data resilience:

The rise of ransomware attacks as well as increasingly broad regulatory requirements have resulted in a stronger focus on data resilience. This can result in greater data growth, as more resilience generally means more copies of data and more systems to track and manage them. Look for solutions that provide cyber data resilience with a low overhead in terms of storage requirements but also with fast recovery as resiliency requirements call for ever lower Recovery Time Objectives (RTO), generally beyond what traditional backup solutions can provide.

Sustainability:

The sustainability costs also need to be taken into account. That includes looking at power efficiency of data storage systems as well as their “carbon cost”. Power efficiency should be evaluated as a function of capacity per watt and performance per watt, while carbon cost should include the entire lifecycle including manufacturing, transport and decommissioning.

Long term archiving:

Keep your eyes wide open when evaluating archiving and long term data retention solutions (whether on-premises or in the cloud) as their cost profile will be very different if data is accessed (even rarely) versus completely cold. Take into account the total cost based on a realistic ratio of data being accessed. That includes retrieval costs for cloud object storage, but also offsite storage fees and physical retrieval, transport and time to restore for media such as tape.

The special case of video data:

Video makes up an estimated 50% of the world’s data so it makes a lot of sense to employ specific strategies to deal with it. Advances in the field of data compression, sampling and optimisation in general have helped keep the storage requirements as low as possible, however data access requirements have also changed. Traditionally, a lot of video data was an archive but advances in the domain of AI vision mean that people want to analyse and understand it. This means that a lot of this data now requires storage that is designed to support storing data while allowing for an increasing amount of simultaneous data access. High capacity Flash storage can facilitate both cost effective storage requirements while allowing data access with the performance levels required.

Moving data mountains: ideas, intent and innovation:

There are always going to be different requirements and options when it comes to data storage. Business needs change, new projects come up, capacity is reached, regulations launched. As well as a constant flow of technological innovation both in data generation as well as data storage fronts. Some of the innovation is now focusing on ceramic storage, as ceramic has definitely proven its ability to protect data — even the most mundane customer complaint — over long periods of time. Look for a data platform which enables flexibility (both in terms of placement options and consistency across clouds, as well as allowing data mobility), can be consumed on-demand or owned and is able to tackle the new data access, security and resiliency requirements in the most efficient manner possible.

Data-driven partnership

VAD Technologies is a leading value-added distributor in the Middle East, specialising in next-generation technology solutions across cybersecurity, cloud computing, artificial intelligence, data management, and IT infrastructure.

VAD Technologies is also a proud distributor of Pure Storage, one of the world’s leading data storage and management innovators. Through this partnership, VAD Technologies enables enterprises across the region to harness Pure Storage’s cutting-edge, all-flash solutions-enhancing performance, reliability, and sustainability in modern data environments. With a shared vision for digital transformation, both companies continue to empower businesses to accelerate innovation and achieve operational excellence.