While artificial intelligence has been around for the last 50 years, its use cases have only recently been looked at seriously. The reason for this is artificial intelligence requires large volumes of data that have only recently become available as unstructured data, crossing the volumes of data available in relational databases.
“90% of the world’s data has been created in the last two-three years and 90% of the data are not in relational databases. They are all in big data, social media feeds. Every day tons of data are coming in through IoT and 5G will bring in even more data,” says Han Chung Heng, Senior Vice President, Systems EMEA JAPAC, Oracle.
While social media, social collaboration, smartphones, are adding to the volumes of unstructured data, the actual amount of data being mined for intelligence and insights is only 2% to 5% of the total volume, points out Chung Heng. While there is a huge amount of data that can be used for modelling, people also need to spend a huge amount of time preparing the data for modelling. This leads to the low rate of conversion for modelling of data into insights.
“What is important in artificial intelligence is that you need to analyse high volumes of data. When you have high volumes of data, you need to be able to bring everything together. With Oracle’s Exadata and Engineered Systems you can actually reduce the bottleneck from reading the data source,” points out Chung Heng.
Exadata design
Oracle’s Exadata has the unique strength of combining machine learning and artificial intelligence solutions within the same environment of computing, networking, and storage. “Our design point for Exadata is very simple. We bring storage and compute to the database so that data does not have to travel. This makes it very fast and this is our design point for Exadata. We also bring algorithms to the database,” explains Chung Heng.
The traditional way of generating insights and intelligence from data has been to make data travel to where applications and algorithms are hosted. It was all about how people are going to bring data to the algorithms and do modeling. And that means the data has to do a lot of traveling.
This puts pressure on building performance capabilities to make data travel. Exadata has been designed to bring algorithms to the database, and build compute, storage and networking around the database.
“The combination of our database and our Exadata allows us to do analytics very fast, says Chung Heng.
Auto indexing
By bringing computing, storage, networking, closer to the database, Oracle is helping to boost the efficiency of applications that are built on algorithms of machine learning and artificial intelligence.
Generating insights from a database also implies that the data is well indexed. The larger and more complex the data, the longer it takes to index the database. However, once indexing has been completed, data retrieval is much faster.
“Indexing is required to help you get the information you want fast. Now for many years when you do indexing it takes a long time. It took us about 15 years of experience to index about close to 9,000 indexes. With the new database and new Exadata we are able to reduce the 15 years to around 10 hours. Obviously, they were 6,000 indexes,” adds Chung Heng.
Auto indexing is now an in-built feature in the Exadata platform and works in conjunction with the algorithms of machine learning, and artificial intelligence.
Unifying data
While large amounts of unstructured data are being generated, they also need to be consolidated into data lakes and then integrated with relational databases before algorithms can be applied to complete the contextual analysis.
As an example, closed circuit TVs capture millions of images across the globe at any particular time. How do you link face recognition with biometric thumb print and identity card information?
“People are trying to pull big data into data lakes but they have not unified them. We are unifying big data and structured data into one pool. And we are using machine learning to make sense and combine it together to analyse both big data and relational data to make sense of what you have,” explains Chung Heng.
Engineered Systems from Oracle help to unify the data from structured and unstructured sources. Engineered Systems have Oracle’s Data Fusion platform that is using NoSQL and data integration tools.
A NoSQL database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. NoSQL databases are increasingly used in big data and real-time web applications.
Training the data
Once the multiple databases have been unified, the data needs to be trained or modelled. Training the data is a compute intensive operation and demands large amount of energy. Training of data is meant to recognise certain trends, would be very specific to an industry, and would require algorithms.
Chung Heng advocates training and modelling of the data in the cloud and not on-premises. Training or modelling of the data needs to be done only once, and therefore is better suited in the cloud. Once the data has been modelled, testing of the data and production can be done on-premises.
By moving the training of data into the cloud, end users can use the complete computing, networking, and storage power of the Exadata platform to run the use cases of artificial intelligence.
The overall process to develop an artificial intelligence use case, is to build the training and modelling of data in the cloud, then move the testing of data on premises, and as the model develops, move it into production, again on-site. For all these stages algorithms need to be embedded in the processes.
If training of the data is moved to the cloud, the full power of Exadata’s computing, networking and storage can be used for test and development and production. Reinforces Chung Heng, “This is very good since it provides for green technology innovation. Sustainability is a very important part of the process.
Exadata
The Oracle Exadata Database Machine is engineered to deliver better performance, cost effectiveness, and availability for Oracle databases. Exadata features a modern cloud-enabled architecture with scale-out high-performance database servers, scale-out intelligent storage servers with state-of-the-art PCI Flash, and an ultra-fast InfiniBand internal fabric that connects all servers and storage. Algorithms and protocols in Exadata implement database intelligence in storage, compute, and InfiniBand networking to deliver higher performance and capacity at lower costs than other platforms.
Exadata runs all types of database workloads including Online Transaction Processing, Data Warehousing, In-Memory Analytics as well as consolidation of mixed workloads. Simple and fast to implement, the Exadata Database Machine powers and protects your most important databases. Exadata can be purchased and deployed on premises as the ideal foundation for a private database cloud, or it can be acquired using a subscription model and deployed in the Oracle Public Cloud or Cloud at Customer with all infrastructure management performed by Oracle.
Big data and NoSQL
A NoSQL database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. NoSQL databases are increasingly used in big data and real-time web applications. NoSQL systems are also sometimes called Not only SQL to emphasise that they may support SQL-like query languages, or sit alongside SQL databases.
Motivations for this approach include: simplicity of design, simpler horizontal scaling to clusters of machines which is a problem for relational databases, finer control over availability and limiting the object-relational impedance mismatch.
The data structures used by NoSQL databases that is key-value, wide column, graph, or document are different from those used by default in relational databases, making some operations faster in NoSQL. The particular suitability of a given NoSQL database depends on the problem it must solve. Sometimes the data structures used by NoSQL databases are also viewed as more flexible than relational database tables.
Insights
- 90% of the world’s data has been created in the last two-three years.
- 90% of the data are not in relational databases.
- What is important in artificial intelligence is that you need to analyse high volumes of data.
- When you have high volumes of data, you need to be able to bring everything together.
- The larger and more complex the data, the longer it takes to index the database.
- People are trying to pull big data into data lakes but they have not unified them.
- Engineered Systems from Oracle help to unify the data from structured and unstructured sources.
- Chung Heng advocates training and modelling of the data in the cloud and not on-premises.
By Arun Shankar