Introducing the Elastic Common Schema

sorangutan

Introducing the Elastic Common Schema (ECS), a new specification that provides a consistent and customizable way to structure your data in Elasticsearch, facilitating the analysis of data from diverse sources. With ECS, analytics content such as dashboards and machine learning jobs can be applied more broadly, searches can be crafted more narrowly, and field names are easier to remember.

Why a Common Schema?

Whether you’re performing interactive analysis (e.g., search, drill-down and pivoting, visualization) or automated analysis (e.g., alerting, machine learning-driven anomaly detection), you need to be able to uniformly examine your data. But unless your data originates from only one source, you face formatting inconsistencies resulting from:

Disparate data types (e.g., logs, metrics, APM, flows, contextual data)
Heterogeneous environments with diverse vendor standards
Similar-but-different data sources (e.g., multiple sources of endpoint data, like Auditbeat, Cylance, and Tanium)

Imagine searching for a specific user within data originating from multiple sources. Just to search for this one field, you would likely need to account for multiple field names, such as user, username, nginx.access.user_name, and login. Drilling into and pivoting around that data would present an even greater challenge. Now imagine developing analytics content, such as a visualization, alert, or machine learning job — each new data source would add either complexity or duplication.

What is the Elastic Common Schema?

ECS is an open source specification that defines a common set of document fields for data ingested into Elasticsearch. ECS is designed to support uniform data modeling, enabling you to centrally analyze data from diverse sources with both interactive and automated techniques.

ECS offers both the predictability of a purpose-built taxonomy and the versatility of an inclusive spec that adapts for custom use cases. ECS’s taxonomy distributes data elements across fields that are organized into the following three levels:

Level	Description	Recommendation
ECS Core Fields	Fully defined set of field names that exists under a defined set of ECS top-level objects	These fields are common across most use cases, so work should begin here
ECS Extended Fields	Partially defined set of field names that exists under the same set of ECS top-level objects	Extended fields may apply to narrower use cases or be more open to interpretation depending on the use case
Custom Fields	Undefined and unnamed set of fields that exists under a user-supplied set of non-ECS top-level objects that must not conflict with ECS fields or objects	This is where you can add fields for which ECS does not have a corresponding field; you can also keep a copy of original event fields here, such as when transitioning your data to ECS

Elastic Common Schema in Action

Example 1: Parsing

Let’s put ECS to work on the following Apache log:

10.42.42.42 - - [07/Dec/2018:11:05:07 +0100] "GET /blog HTTP/1.1" 200 2571 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36"

Mapping this message to ECS organizes the log’s fields in the following manner:

Field Name	Value	Notes
@timestamp	`2018-12-07T11:05:07.000Z`
ecs.version	`1.0.0`
event.dataset	`apache.access`
event.original	`10.42.42.42 - - [07/Dec ...`	Full, unmodified log for auditing
http.request.method	`get`
http.response.body.bytes	`2571`
http.response.status_code	`200`
http.version	`1.1`
host.hostname	`webserver-blog-prod`
message	`"GET /blog HTTP/1.1" 200 2571`	Text representation of the significant information from the event for succinct display in a log viewer
service.name	`Company blog`	Your custom name for this service
service.type	`apache`
source.geo.*		Fields for geolocation
source.ip	`10.42.42.42`
url.original	`/blog`
user.name	`-`
user_agent.*		Fields describing the user agent

As shown above, the raw log is preserved in ECS’s event.original field to support auditing use cases. Please also note that for simplicity’s sake, this example omits details about the monitoring agent (under agent.*), some details about the host (under host.*), and a few more fields. For a more complete representation, review this example event in JSON.

Example 2: Search

Consider an investigation into the activity of a specific IP across a complete web stack: Palo Alto Networks Firewall, HAProxy (as processed by Logstash), Apache (using the Beats module), Elastic APM, and for good measure, the Suricata IDS (custom, using their EVE JSON format).

Prior to ECS, your search for this IP may have looked something like this:

src:10.42.42.42 OR client_ip:10.42.42.42 OR apache2.access.remote_ip:10.42.42.42 OR context.user.ip:10.42.42.42 OR src_ip:10.42.42.42

But if you’ve mapped all your sources to ECS, your query is much simpler:

source.ip:10.42.42.42

Example 3: Visualization

The power of ECS is more readily revealed by seeing how it can be applied to uniformly normalized data from several different data sources. Perhaps you’re monitoring your web stack for threats with several sources of network data: a Palo Alto Next-Gen Firewall on the perimeter, and the Suricata IDS generating events and alerts. How do you extract the source.ip and network.direction fields from each message in a way that enables centralized visualization in Kibana and vendor-agnostic drill-down and pivoting? With ECS, of course, allowing you to perform centralized monitoring more easily than was previously possible.

https://www.elastic.co/blog/introducing-the-elastic-common-schema