Why Observability loves the Elastic Common Schema

sorangutan

For the Elastic Stack 7.0 release, ingestion (namely Beats) went through a big change by adopting the Elastic Common Schema (ECS). One of the reasons for this is to make the correlation of events from logs, metrics and traces easier. This helps to make systems more “observable”. If you want to learn all about observability, check out the blog post “Observability with the Elastic Stack”.

In this blog post I want to cover how the Elastic Common Schema changed the data we ingest, how it makes it possible to better correlate data, how it affects our Infrastructure, Logs, APM and Uptime UIs, and how it will help you to make your system more observable.

Have a problem? Let’s start with the host...

When investigating an issue, one of the first questions often asked is: Which host / machine is affected? As soon as the host on which an issue happens is known, it should be simple to get all the data about this host. This data could be:

Logs from the system, like the auth and system logs
Metrics like CPU, memory, and disk
Logs from the services running on the host
Metrics from the services running on the host
APM data from the services running on the host
Heartbeat data pinging the host
...

Before ECS it was not as simple as querying all the available data on a single field to get all events related to the host. For the hostname, there was beat.hostname, host.name and hostname, depending on which shipper was sending the data. The meaning of these could also have been different, sometimes containing what the hostname command returns or the fully qualified hostname depending on the use case. In addition to that there are also the fields defined by our users which might contain the host data based on their definition. There was no common schema that we could share with our users and recommend them to use.

Hostname wasn’t the only field that had this kind of problem, as many other fields like IP addresses or information about services did too. To solve this problem we decided to introduce ECS to not only have a standard that is used internally, but can also be adopted by our users.

As an example, with ECS we now have host.name and source.ip. Every data point ingested into these fields now has the same meaning. With the release of Elastic Stack 7.0, all shippers — (all Beats and all APM agents — have moved to ECS. This means that all events, whether they be logs, metrics, or APM traces shipped from a host, or Heartbeat data about the uptime of host or service, they will contain the field and have it populated with the same meaning. This allows you to easily jump from logs to metrics to APM traces and compare the data.

ECS also allows to answer other questions. For example, all information about MySQL services can be retrieved by running a query service.type: mysql. Combine this with a query on the host fields, and all information about a MySQL instance on a single host is available.

Field changes on the ingest side

ECS not only defines the host.* fields to store information about a host, it also contains many other fields which are important to filter and group on. An example is the service fields like service.type which stores which service the event data was pulled. Or in the case of web server logs like Apache or Nginx, source.* fields like source.ip can be used to identify the source of an event. This had implications on a large portion of fields shipped by Beats.

Not only were the generic meta information that each Beat ships adapted to ECS, but also all the Filebeat, Metricbeat and Auditbeat modules. This has the effect, especially on the logging side, that the number of different fields was heavily reduced. Access logs shipped by an Apache or Nginx server are now ingested in almost the same format. If you use both services in your infrastructure, you can now correlated the data and build a single machine learning job on top of it.

As many fields changed, if you built your own dashboards on top of the old field structure in 6.x, check out this breaking change list to see all the fields that were changed. Also check the Upgrading Beats from 6.x to 7.x guide. If you are only using our Infra or Logs UI, make sure that all your data you want to see in the 7.0 UI was created with versions 6.6 or 6.7 of Filebeat and Metricbeat. In 6.6, some ECS alias fields were introduced for compatibility reasons.

Many more changes happened on the Filebeat module side than on the Metricbeat modules. The reason for this is that metrics are specific and unique. A metric about the size of a table in MySQL cannot be correlated with the index size of an Elasticsearch index. Most adjustments that happened to metrics fields were around host.name and IP fields which are meta information about the metrics itself.

Elastic Common Schema in the UI

The Logging and Infrastructure UI in 7.0 heavily depend on ECS fields. This means if shippers other than Beats follow ECS, the basic functionality will keep working. An example here are the container or the cloud fields. If an event contains the metadata around which cloud instance an event is coming from, the Infra UI can automatically group by it:

The Logs UI depends on ECS on the message and the log fields. If the message field exists in an event, the Logs UI can automatically visualise the data correctly. Our UIs will rely more and more on ECS, this means if you ship data based on ECS, our UI’s will be able to make sense of it too.

How to use the Elastic Common Schema

Now that most of the data in the Elastic Stack is following ECS, there are a variety of ways to use it. If you have your logs, metrics and traces all stored in the same Elasticsearch cluster, you can use a search request across multiple indices. Most APIs in Elasticsearch by default allow specify multiple indices or a pattern that matches multiple indices.

In case you have a cluster for logs, one for metrics and one for APM, Elasticsearch supports this use case with cross-cluster search (CCS). CCS allows you to connect multiple clusters and then specify similar to the above multi index search, to search across multiple clusters.

Cross-cluster search also works for use cases where you might have local logging and metrics clusters in different parts of the world. To get all the data for a specific host.ip across all your cluster, it’s just one search and you will have all the information together in one place.

What means ECS for the future of Observability?

ECS just got started and will further evolve. The more standardised fields there are, the better the UI, and the easier it is for users can make sense of their data. It will make it easier for the user to better understand what the meaning of each value is. ECS can be used to correlate different data types based on the ECS fields. The capability of Elasticsearch having all the data in one or multiple clusters to make it available combined with ECS to correlate all the data makes this combination a great tool for your Observability needs. The more Elastic Stack data that is mapped to ECS, the more powerful the combination becomes.

Not all data is shipped by Beats, so you often have also your own data. We encourage you to map your own data to ECS. This will not only allow you to correlate your data with all the other data but directly gives some context to each of your fields through the definition of ECS. Others looking at your data will understand what each field means. It also guarantees that as long as the same major version of ECS is used, no breaking changes happen. If your fields are mapped to ECS, they will not conflict with other fields shipped by Beats.

To get started with mapping your events to ECS, best go to the ECS docs pages. And if you want to see your ECS-normalized data within the curated APM, Infrastructure, Logs, and Uptime UIs, spin up a 14-day free trial of the Elasticsearch Service. As always, if you have any questions, reach out on Discuss.

https://www.elastic.co/blog/easier-observability-with-the-elastic-common-schema