We often get the question of why use PNDA rather than tailor a solution from one of the upstream distributions?

The answer stems from the fact that PNDA was created in the first place because we had a real problem to solve.  In developing big data analytics applications from the upstream distributions, our experience was:

  • to create the platform and get these technologies up and running properly on your target platforms can take weeks if you’re lucky and months if you’re not
  • when you do, you have a unique installation unlike anything anyone else has
  • and you don’t know if it’s really working
  • then, getting applications working for it feels like dealing with 10 different technologies rather than one platform
  • when you do, you have unique applications unlike anything anyone else has, because there are so many points of variability
  • they’re a nightmare to debug as your logs are sprayed all over the cluster
  • and then you run out of disk space, or something falls over because nothing is looking after the cluster, cleaning up, etc
  • and so you have to create a new one, all over again …

In working with other teams, we realized we were not alone, and PNDA was developed as a solution to these issues.  PNDA doesn’t replace upstream Hadoop distributions, but it does augment them.

Here’s a table for you:

Concern PNDA Your favoured upstream Hadoop distro
Ease of creation on multiple platforms


Fully automated creation across AWS, OpenStack and bare metal from infrastructure layer to services layer Managed installation of the services layer only.
Repeatability of creation on multiple platforms


Templated creation based on the notion of flavours representing reasonable configurations at different scales that work out of the box Blueprints/deployment descriptor concept for services layer only. You get to write these yourself, assuming you know enough about Hadoop to do that.
Verifiability of what you’ve created Automated test agents aggregate metrics across all technologies. Metrics available via de facto standard Graphite API so can integrate with pretty much any ops tool out there. Automated test agents aggregate metrics across distro technologies. Metrics available via distro specific/Hadoop APIs that integrate with very little – they want you to use their tools.
Ease of getting bundles of functionality working across a variety of technologies in repeatable fashion Packages & applications concept, managed deployment, automatable APIs, console Deal with each technology individually. No application concept at the platform layer. No help with repeatability.
Ease of managing/debugging those bundles Collection of logs in one place, indexed and made searchable. Metrics from applications in the console. No application concept at the platform layer, Hadoop manager provides ways of looking at logs across the cluster.
Help with day-to-day housekeeping operations on the cluster Clean up of application artefacts and datasets on configurable policy. Nothing, until you go to Enterprise features.
Pulling together of a full set of technologies needed to get real domain problems solved Not just Hadoop and Kakfa but OpenTSDB, Grafana, Jupyter, Gobblin, Kafka Manager, etc Only Hadoop and Hadoop adjacencies
Generally applicable application layer functionality done for you


Ready to go paradigm of Avro wrapping data on ingest to ‘tag’ it and then PNDA creating automatically partitioned datasets which are usable in queries/processes, plus multiple example applications illustrating how to put together building blocks. Nothing, just Hadoop and Hadoop APIs. Good luck!
Domain specific assistance Plugins and codecs for well-known open source technologies like Logstash and OpenDaylight Nothing, just Hadoop and Hadoop APIs
Openness/cost/legal Open source Open source or License encumbered depending on distribution

Rather than suffer our pain, come and join us.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s