This Week in Elasticsearch and Apache Lucene - 2019-05-05

sorangutan

### Elasticsearch #### Nightly Benchmarks We've created a new benchmarking environment serving default distribution benchmarks and migrated [https://elasticsearch-benchmarks.elastic.co](https://elasticsearch-benchmarks.elastic.co/). You'll notice that for each of the datasets, you can select either `OSS` or `Default` to follow benchmarking results for each distribution of Elasticsearch. #### Documentation Improvements After helping with a couple support tickets related to configuring synonyms, we've [added documentation](https://github.com/elastic/elasticsearch/pull/41645) around how to best use synonyms with token filters such as `word_delimiter` that produce stacked tokens. We are also [reworking the docs for the discovery-ec2 plugin](https://github.com/elastic/elasticsearch/issues/41630). #### Geo_line Aggregation We recently opened the [geo_line PR](https://github.com/elastic/elasticsearch/pull/41612). This is an aggregation which consumes a series of points and a sort value (e.g. time) and sorts those points into a linestring. An example use-case is GPS coordinates logged periodically by taxis, container ships, etc. These individual points are much more useful when arranged chronologically in a line. Here's what that looks like: ![Geo Lines][1] #### Community PR's We love it when our community submits pull requests to Elasticsearch! Thank you to all of our contributors, past and present. Here are a few recent community pull requests: We [reviewed a community PR](https://github.com/elastic/elasticsearch/pull/41404) to reject port ranges in `discovery.seed_hosts`, which were previously accepted but silently ignored by Elasticsearch. We're [reviewing a community PR](https://github.com/elastic/elasticsearch/pull/41489) that is adding the index name to cluster block exceptions. These exceptions are triggered for example when trying to write to an index with a read-only cluster block, which is automatically added by the system when nodes holding this index are running low on disk space. We're [reviewing a community PR](https://github.com/elastic/elasticsearch/pull/41050) that allows running `_cluster/reroute` commands even if the maximum number of retries limit for failed shard allocations has been reached. #### Data Replication Resiliency We've [updated](https://github.com/elastic/elasticsearch/pull/41522) the [resiliency status page](https://www.elastic.co/guide/en/elasticsearch/resiliency/current/index.html), closing off two important issues that were addressed by a multi-year effort, starting with the sequence numbers project in the 5.x and 6.x series, and culminating in the release of 7.0 with the new cluster coordination subsystem: - *Documents indexed during a network partition cannot be uniquely identified*: We have switched [optimistic concurrency control](https://www.elastic.co/guide/en/elasticsearch/reference/7.0/optimistic-concurrency-control.html) (OCC) from the `_version` field to the new `_seq_no` (sequence number) and `_primary_term` (primary term) fields, which do uniquely identify each operation. To be clear, the `_version` field continues to not uniquely identify a particular version of a document; if you need to do this then you should move to using the `_seq_no` and `_primary_term` fields. All internal consumers that are making use of OCC (e.g. reindex, update-by-query, ...) have been switched to these new fields as well. - *Replicas can fall out of sync when a primary shard fails*: After a primary failover, the new primary now realigns the replicas with itself by rolling back the replicas to a safe point in the history and sending it the missing operations. We've recently also worked on more extensive testing in these areas. We added tests for our new optimistic concurrency control structures (`if_seq_no` and `if_primary_term`), checking [linearizability of compare-and-set operations](https://github.com/elastic/elasticsearch/pull/38561). Based on this work, we learned quite a bit about the guarantees that the system provides under certain failure conditions, and will look at strengthening some of those guarantees. This work also helped in uncovering bugs. We also fixed a problem where a shard that was being [closed during a replica rollback](https://github.com/elastic/elasticsearch/pull/41584) was keeping an active index writer around, causing various follow-up checks to fail. We've added [stronger consistency checks](https://github.com/elastic/elasticsearch/pull/41614) to our disruption test suite to verify that, beside the existing checks on `_seq_no` and `_primary_term` fields, also `_source` and `_version` fields were fully aligned across all shard copies to make sure that at the end of each disruption test all shard copies would contain exactly the same set of documents. #### Token Service Our long running [change to move security tokens into their own index](https://github.com/elastic/elasticsearch/pull/40742) (and out of the main security index) has been merged. This has been a pretty big effort as we want the change to happen automatically when you upgrade the cluster to a new (7.2+) version, but not happen during a mixed cluster (when some nodes are on <7.2), and for it to seamlessly accept any active tokens that were created on the old cluster version and are stored in the main security index. This is part of our long running effort to make it easier and safer to backup & restore the security index. We've made a range of changes to the token service for 7.2, some of which forced us to change the format of the tokens that we provide to client and how we stored them in the index. We are tackling the last part of those changes which is to [change the way we store token strings in the index](https://github.com/elastic/elasticsearch/issues/40765) so that getting read access to the tokens index does not allow you to authenticate with someone else's token. (All previously released versions of ES had this protection, but the way that was implemented didn't fit with the changes we were making in 7.2, so we removed it during this development cycle and are implementing a different solution before feature freeze for 7.2). #### Enrich Processor The Enrich Processor (formerly referred to as the "Lookup Processor") will allow users to define ingest pipelines that _enrich_ the ingested document with data from another index on the cluster (subject to some limitations). There is still a lot of work left to do, but the core pieces are are coming together, which include data-local Lucene queries, a policy runner, and the REST APIs. If you'd like to follow our progress or provide feedback, feel free to check out the [meta issue](https://github.com/elastic/elasticsearch/issues/32789). We've [merged](https://github.com/elastic/elasticsearch/pull/41532) the first iteration of the enrich processor to the feature branch. We've also [merged](https://github.com/elastic/elasticsearch/pull/41707) the first iteration of the enrich policy runner. The enrich policy runner is the background task that reads the enrich policy and synchronizes the source index to the specialized .enrich index. Under the covers this is implemented with the re-index API and will eventually get support for a cron scheduler. Finally, we [added an API](https://github.com/elastic/elasticsearch/pull/41553) to list enrich policies and have started the work on the `_execute` API to allow a user to manually run a policy. #### Snapshot Lifecycle Management The first cut of the [SLM documentation](https://github.com/elastic/elasticsearch/pull/41510) and an API path change from `/ilm/` to `/slm/` have now been merged to the [feature branch](https://github.com/elastic/elasticsearch/tree/snapshot-lifecycle-management). We've also [introduced](https://github.com/elastic/elasticsearch/pull/41607) two new roles: `manage_slm` and `read_slm` to allow configuration for more fine-grained permissions. Finally, we've [started](https://github.com/elastic/elasticsearch/pull/41707) the work to store the results from SLM's snapshot creation to a dedicated history index. This will allow us to set up alerts and have a history of failed/successful snapshots. ### Lucene #### Apache Lucene / Solr 8.1 The release branch for Apache Lucene / Solr 8.1 has been cut and the release process has started. We await the first RC later this or early next week. For Lucene in particular this will bring: * a new BKD tree strategy for segment merging providing significant performance boost for high dimensions * the new [Luke](https://issues.apache.org/jira/browse/LUCENE-2562) module * new query visitor [API](https://github.com/apache/lucene-solr/pull/581) allowing to traverse a query tree efficiently * [read time attributes](https://github.com/apache/lucene-solr/pull/640) that allow to control codec level functionality on a per reader basis for instance to load FSTs per field off-heap. #### Other * We're [working on the Luwak codebase with Lucene](https://issues.apache.org/jira/browse/LUCENE-8766) to prepare the donation of Luwak to Lucene. * Can we [improve](https://issues.apache.org/jira/browse/LUCENE-8788) search performance by sorting the segments by an estimated number of hits? * Some of our JDK 11 builds are hitting a [JVM Bug](https://bugs.openjdk.java.net/browse/JDK-8205399) that's fixed in 12 but not in 11. * JDK 12 doesn't seem to be bug free either - lucene is [hitting](https://issues.apache.org/jira/browse/LUCENE-8668) [this](https://bugs.openjdk.java.net/browse/JDK-8219448) bug frequently. * We are still [discussing](https://issues.apache.org/jira/browse/LUCENE-8757) how we can slice up segments better for parallel search * One persons [bug](https://issues.apache.org/jira/browse/LUCENE-8776) is another persons feature... * Spooky failures are actually [bugs](https://issues.apache.org/jira/browse/LUCENE-8785) sometimes. ### Changes #### Changes in Elasticsearch Changes in 8.0: * Update TLS ciphers and protocols for JDK 11 [#41385](https://github.com/elastic/elasticsearch/pull/41385) * BREAKING: Parse empty first line in msearch request body as action metadata [#41011](https://github.com/elastic/elasticsearch/pull/41011) * Suppress illegal access in plugin install [#41620](https://github.com/elastic/elasticsearch/pull/41620) * Fix: added missing skip [#41492](https://github.com/elastic/elasticsearch/pull/41492) Changes in 7.2: * Improve error message for ln/log with negative results in function score [#41609](https://github.com/elastic/elasticsearch/pull/41609) * Add details to BulkShardRequest#getDescription() [#41711](https://github.com/elastic/elasticsearch/pull/41711) * Amend `prepareIndexIfNeededThenExecute` for security token refresh [#41697](https://github.com/elastic/elasticsearch/pull/41697) * Implement Bulk Deletes for GCS Repository [#41368](https://github.com/elastic/elasticsearch/pull/41368) * Security Tokens moved to a new separate index [#40742](https://github.com/elastic/elasticsearch/pull/40742) * Simplify initialization of max_seq_no of updates [#41161](https://github.com/elastic/elasticsearch/pull/41161) * Handle WRAP ops during SSL read [#41611](https://github.com/elastic/elasticsearch/pull/41611) * Upgrade to Netty 4.1.35 [#41499](https://github.com/elastic/elasticsearch/pull/41499) * Close and acquire commit during reset engine fix [#41584](https://github.com/elastic/elasticsearch/pull/41584) Changes in 7.0: * Run packaging tests on RHEL 8 [#41662](https://github.com/elastic/elasticsearch/pull/41662) * Fix multi-node parsing in voting config exclusions REST API [#41588](https://github.com/elastic/elasticsearch/pull/41588) Changes in 6.8: * Fix for full cluster restart tests [#41723](https://github.com/elastic/elasticsearch/pull/41723) * Drop distinction in entries for keystore [#41701](https://github.com/elastic/elasticsearch/pull/41701) * Fix Watcher deadlock that can cause in-abilty to index documents. [#41418](https://github.com/elastic/elasticsearch/pull/41418) Changes in 6.7: * Bump the bundled JDK to 12.0.1 [#41627](https://github.com/elastic/elasticsearch/pull/41627) * Change JDK distribution source [#41626](https://github.com/elastic/elasticsearch/pull/41626) #### Changes in Elasticsearch Management UI Changes in 7.0: * [ILM] Surface shrink action in edit form if it's already been set on the policy [#35987](https://github.com/elastic/kibana/pull/35987) #### Changes in Elasticsearch SQL ODBC Driver Changes in 7.2: * Add JDBC's protocol tests as integration tests [#149](https://github.com/elastic/elasticsearch-sql-odbc/pull/149) Changes in 6.7: * Consider interval's precision. Allow non-aligned period values as interval encoding [#148](https://github.com/elastic/elasticsearch-sql-odbc/pull/148) [1]: https://images.contentstack.io/v3https://www.elastic.co/assets/bltefdd0b53724fa2ce/blt5d470f15432be392/5cd04b9ae8ec6ef265db8946/geo_line.png

https://www.elastic.co/blog/this-week-in-elasticsearch-and-apache-lucene-2019-05-05