Let’s play a game – it’s called ‘find the needle in the haystack’.
You do know how to play this game, don’t you?
In this case, the ‘hay’ is made of up Virtual Machines and compute hosts running in your Openstack environment. The ‘needle’ is the misbehaving VM.
The clock is ticking… find that needle!
This is the ‘game’ that Openstack cluster operators are playing each day. In the largest Openstack deployments, a state of flux is ‘business as usual’, with VMs being spun up and shut down every few seconds. How, in such a situation, are you able to identify issues such as a problem with Openstack networking components or an errant compute host?
Openstack’s various API enable you to extract information from the infrastructure but this does not go down to the level of the compute host’s own function. This means that although operators are able to get insights into their compute platform, at least at the software-layer, to really be able to build up a full picture, additional details are required and those can’t be readily obtained by Openstack today.
Calipso enables the active discovery, visualisation and monitoring of VMs and Docker instances along with the related networking and hosting infrastructure. For Openstack, Calipso makes use of Openstack APIs along with additional reporting agents to gather real-time details of the operational environment. One of the key functions of Calipso is to bring fault conditions to the operator’s attention. Given that errors can occur anywhere in the stack, from host through to virtual machine, rapid identification and notification is vital in a fast-moving environment.
With large-scale Openstack environments, the volume and velocity of data soon gets into the territory of Big-data. This is where the utility of the PNDA platform is then leveraged.
Data collected by Calipso from the Openstack environment is passed over the Kafka bus into the PNDA platform and then stored in the Time-Series Data Base. Analysis of the information then enables the understanding of the changing patterns of platform operation and performance over time. At a fine-grain level, statistics gathered from running VMs and network infrastructure can be used to detect packet loss. Where the real power of Big-data comes into play though, is in Pattern analysis. By using pattern analysis to understand sequences of events, one can also detect warning indicators, precursor events that if left unchecked, will result in errors or failures that impact service.
In this way, operators will be able to find the needles, before they prick their fingers.
Calipso is a project currently under development by Cisco.
For more details, please take a look at the video below.