OpenTelemetry is great but why is it so bloody complicated?

Observability, OpenTracing, and OpenTelemetry Kubernetes . These are just a fraction of the technology buzzwords you’ll find as you Google your way around the internet. In fact, most (if not all) of those terms are present on any technology website you visit. It’s gotten to the point where these terms are so prevalent that anybody who may not know the full scope of the topic is afraid to ask. If you happen to fall into this category, fear not, this post has got you covered! More specifically, we’re going to take a look at OpenTelemetry and clear up any confusion about what exactly it is and the benefits it can provide.

What is OpenTelemetry?

To fully understand the purpose of OpenTelemetry, we must first take a brief look at “Observability”. Loosely defined, Observability boils down to inferring a system’s internal health and state by looking at the external data it produces, which most commonly are logs, metrics, and traces. Here at Dynatrace, we take Observability to the next level, by displaying all information in context and adding in code-level details, end-user experience, and entity relationships all while feeding this data into our AI engine, Davis®, to produce actionable insights. If you’re interested in a deeper dive on Advanced Observability, check out our Observability page for more information.

OpenTelemetry is great but why is it so bloody complicated?

How does OpenTelemetry benefit me?

Now that we have a basic understanding of what OpenTelemetry is, let’s dive into the benefits it provides. As mentioned in the section above, collecting application telemetry is nothing new. However, the collection mechanism and format are almost never consistent from one application to another. And as you can imagine, this inconsistency is a nightmare for developers and operations personnel who are just trying to understand the health of an application.

OpenTelemetry provides a de-facto standard for adding observability to cloud-native applications. This means that companies don’t need to spend valuable time developing a mechanism for collecting critical application telemetry, and they can spend more time delivering new features. It is akin to how Kubernetes became the de-facto standard for container orchestration. This broad adoption has made it easier for organizations to adopt container deployments since they didn’t need to first build an enterprise-grade orchestration platform.

What happened to OpenTracing and OpenCensus?

OpenTracing became a CNCF project back in 2016, with a goal of providing a vendor-agnostic specification for distributed tracing, offering developers the ability to trace a request from start to finish by instrumenting their code. The OpenCensus project was made open source by Google back in 2018, based on Google’s Census library that was used internally for gathering traces and metrics from their distributed systems. Like the OpenTracing project, the goal of OpenCensus was to give developers a vendor-agnostic library for collecting traces and metrics.

As you can see, we now had two competing tracing frameworks, which were informally called “The Tracing Wars”. Usually, competition is a good thing for end-users since it breeds innovation. However, in the open-source specification world, competition can lead to poor adoption, contribution, and support. Going back to my earlier mentioned Kubernetes example, imagine how much more disjointed and slow-moving container adoption would be if everybody was using a different orchestration solution. In order to avoid this, it was announced at KubeCon 2019 in Barcelona that the OpenTracing and OpenCensus projects would converge into one project called OpenTelemetry and join the CNCF. This brings us to today where OpenTelemetry released its first beta version in March 2020.

OpenTelemetry Review

I’ve been reading the documentation over the last couple of days and using the libraries to instrument a node.js GraphQL API.

I love the idea of open standards and interoperability. For example, I’d even go to argue that in large Kubernetes achieved those goals to a significant degree in the sense that you can pretty easily move workload/ from one cloud to another. And I can see a similar thing happening with OpenTelemetry.

But OpenTelemetry feels like it’s a level upon getting to a productive threshold. In other words, the opposing of the pit of success for such an important and useful technology — especially developers building data intensive applications.

Here are my current gripes:

Difficult naming. Propagator, exporter, manager, provider OTel, OTLP. It seems that interchangeable terms are used for similar things.
The number of dependencies needed to instrument an app is insane. Let alone figuring out the correct versions
Documentation is playing catch up with the implementation
Resources are either very specific or very general. I haven’t been able to find any good resources that pick the right balance of breadth and depth.

Here are my current commendation?

It’s amazing to be able to view traces in a local env with Jaeger or Zipkin in addition to sending it to a platform like Lightstep or NewRelic,
Auto instrumentation is great in theory, still not experienced enough to say how well it works.
It feels like the right moment with the Tracing API having recently stabilized.