By 2020, there will be millions of automobiles with IP connectivity to the Internet, aka ‘Connected Cars’ on the road. These cars will generate enormous amounts of sensor data destined for the Cloud. Some data related to safe driver and vehicle operation will be real-time; other data such as vehicle telematics will be stored for later analysis. How much data? 25GB per hour per car, according to some projections.
Are you kidding me? Estimates of future Internet traffic volumes have been accuracy-challenged before (think WWW, video). Let’s agree that will be a very, very large number – a Big-data number.
How do we obtain something useful from all of this information?
We start with a Big-data platform because this is a Big-data challenge. Intelligent software (e.g machine learning) partnering with the Big-data platform sifts through the information to locate the tiny fragments of valuable small data. Yet more intelligent software will use that small data to initiate actions that benefit all – drivers, automakers, road planners, insurance companies – maybe even Internet Traffic Forecasters. Naturally, the platform must scale. Oh, and one more thing: the Big-data platform should be open to any and all who wish to search for the elusive small data.
What would such a system look like? To answer that question a group of us in the Cisco Chief Technology Architecture Office (CTAO) got together and built a Cloud system for Connected Cars – we call it ‘Smart Transport’.
Let’s take a look at the pieces that make up Smart Transport and what happens upon arrival of reams of Connected Car data.
First the pieces and functions from the bottom up:
- Connected Cars transmit data to the Smart Transport Cloud, housing the Big-data analytics platform and intelligent software. Large Connected Car datasets are hard to come by, so we developed a car data generator to simulate the data that would be sent to the Cloud.
- PNDA (pnda.io) is the Big-data analytics platform. PNDA ingests the data and places it on a Kafka message bus, making it available to other consumers (applications).
- One consumer is the PNDA database system used for storing the data and intelligent software output data. The usual open source database suspects including Impala, Hadoop, Apache Spark and HDFS integrated in PNDA, are used.
- One consumer is machine learning (ML) software. It scans the data coming in from the Connected Cars in real-time, looking for patterns that might indicate something unusual is afoot (aka an anomaly).
- Another consumer is the Machine Learning Anomaly Visualization (MLAV). This is a customized application with a backend for receiving Connected Car data and a front-end for visualizing the information, both realtime and historical.
Here is what happens:
- ML has detected a ‘candidate anomaly’ and needs to verify this by examining more data. How can ML gather more data?
- ML sends a request to the Smart Transport controller. This component converts this request to a series of policy commands sent to the car(s) generating the suspect data. What are the policy commands? You guessed it, ‘send more car data’!
- ML scans new data and is able to confirm that indeed there is an anomaly present. I knew it!
- ML places the anomaly data, root cause information and suggested actions on the kafka message bus.
- MLAV and the database consume the anomaly/root cause/actions data.
- MLAV displays this data in real-time and can query the database for small tidbits of value-added data. It would also be correct to say that any analytics application could retrieve the Big-data from the database, hoping to hit small data pay dirt.
You might be thinking that this blog is about Smart Transport using machine learning to look for anomalies. Actually, the unsung hero is all of this is PNDA.
- Big Data platform. Enough said.
- Open source. We like open source because there is familiarity and community support behind all of the components such as kafka, impala and hadoop. Not sure of how something works? Google it.
- Handles real-time and historical data with scale – that’s the operative word, ‘Scale’.
- Application Friendly. This means it is very easy to implement value-add functions. Smart Transport incorporated a few – ML, MLAV and the Smart Transport Controller.
- Rapid Development Environment. The entire Smart Transport effort operated inside an AWS cloud. We started hacking code on day 1.
If you squint your eyes you may also notice what we call the “Virtuous Circle”. It is an automated closed loop where car data is generated, collected by PNDA, processed by application(s) and application tell smart controller to send policy data to the cars.
What next? We want to scale up Smart Transport to handle orders of magnitude more Connected Cars and sensor data. More Big-data means more to opportunities to find small data, so we will apply new analytics tools.
PNDA allows us to build Big-Data applications. PNDA allows us to run BIG-Data applications. PNDA enables the Virtuous Circle.
Want to learn more about what we did? Take a look at the video below.