Designing Next-Gen Data Centres in the Era of Big Data

Glen Ogden, Regional Sales Director, Middle East at A10 Networks

Perhaps the biggest challenge introduced by big data is the need to re-evaluate the storage-compute model. This fundamentally changes how we view storage and raises a bunch of questions around what to do with existing SAN and NAS, how to archive, and how legacy applications are to access this data.

One of the things that should be obvious is that we are very much at the ‘end of the beginning’ for big data, and this introduces genuine uncertainty

Big data has revolutionized the way we think about data centre design and orchestration. However, it’s important not to get carried away with the hype, and to understand that big data cannot be looked at in isolation when considering future data centre planning.

Internet of Things (IoT)

We can’t mention big data without also mentioning the Internet of Things (IoT). By 2020 various industry estimates put the number of Internet connected devices between 50 and 75 billion. This is going to radically change how humans interact with technology, the visibility we have on the state of these ‘things’, and the insights gained from analytics on those ‘things’.

In practice, this will result in the generation of much higher volumes of unstructured data (through instrumentation, external feeds, etc.). All this data will need to be stored in the enterprise data centres and analyzed using big data solutions – something that needs to be considered and factored in to future IT planning.
IPv6

Given the number of devices introduced by IoT and mobile technology, we need to think about addressing. While there are solid techniques for IP preservation (such as DHCP, NAT, and Carrier Grade NAT) there is no question that IPv6 will accommodate these new IoT entities. From an enterprise data centre perspective that means, at the very least, having tools at the edge to translate IPv4 to IPv6.

High Availability

Big data deals with scale and availability by design. Hadoop can effectively scale out to tens of thousands of nodes – transparent to the application. High availability is built directly into the clustering model, negating the need for expensive RAID arrays. This completely changes how we think about storage, and right now organisations are making their own rules on the type of hardware to deploy, driven by cost and processing needs.

Security Implications

Big data is relatively new; it has only been a decade since Google published the seminal MapReduce white paper, and as with any new technology the primary concern is functionality. This introduces a number of security challenges, not only in the secure handling and storage of the data, but in understanding the nature of the data itself, and how it can be manipulated to create insight (and potentiality breach confidentiality policy).

At the most basic level, big data components may include only rudimentary access control and integration with systems such as Kerberos, and depending on the components you choose, may introduce additional vulnerabilities when mapped against a mature security framework. It’s also important to determine how long to keep this data and how to ensure that data integrity is maintained (over potentially many years). With big data there may simply be a lot more data, but the scope of it may also be much broader, and it is likely to be more granular as the drive to instrument everything continues.

Also as big data lakes become more valuable they are likely to become more attractive as a target for hackers – Denial of Service (DoS) attacks for example may put an organization at risk and should be factored into future business continuity planning.

Migration of Legacy Data

Big data should not be used as a panacea for all data management functions. It is most beneficial to organizations that have a lot of legacy data. Using big data in those cases could require moving historical data, integrating with that data, or unarchiving that data from long term storage. Again this has implications for traffic management, security, data handling, and storage.

Distribution and Archival

One of the key advantages of big data is the ability to localize processing and massively scale low cost storage nodes within clusters; with availability built in. This introduces complexity in terms of archiving and how existing applications access that data.

Where data may have been mostly structured, centralised and automatically backed up on a SAN, we now have unstructured data that is highly distributed. Existing applications may require bridging middleware to access this data from the cluster, or their functionality may need to be rewritten to access big data natively.

Dealing with Uncertainty

One of the things that should be obvious is that we are very much at the ‘end of the beginning’ for big data, and this introduces genuine uncertainty. While on the face of it you cannot plan for uncertainty, those organizations that invest time in software programmable and heavily virtualized data centres are likely to be more prepared for big data.

One of the most exciting prospects of big data is the potential to integrate with the live running of data centre infrastructure, through soft programmable components such as SDN controller nodes as well as API-enabled hardware and virtualized appliances. Data centre planners need to start thinking about the full potential for automation, and the ability to have a closed feedback loop with big data analytics based on fine-grained instrumentation. Next generation data centres will undoubtedly have more in common with the Formula 1 racing car than a juggernaut.

Leave a Reply