This Week in Elasticsearch and Apache Lucene - 2019-04-26



  • ### Elasticsearch #### Watcher UI There is a PR in code review which will [add support for threshold alert actions](https://github.com/elastic/kibana/pull/35175). Another has been merged a PR that [updates the threshold alert visualization with a chart from the elastic-charts library](https://github.com/elastic/kibana/pull/35404) and one more to add [support for three new comparison types when build a threshold alert](https://github.com/elastic/kibana/pull/35614). We have also [refined the UI of the watch history detail panel](https://github.com/elastic/kibana/pull/35462) and [added 403 and 404 error feedback](https://github.com/elastic/kibana/pull/35444). ![Create Threshold Alert UI][1] #### Enrich Processor We have continued to work on the [enrich processor](https://github.com/elastic/elasticsearch/issues/32789). The [groundwork](https://github.com/elastic/elasticsearch/pull/41010) that allow enrich processor to access a locally allocated shard has been merged in the feature branch. A [new PR](https://github.com/elastic/elasticsearch/pull/41521) has been opened that adds an enrich processor to decorate documents based on exact lookups. This PR also adds a special field mapper that allows the values being decorated to be retrieved in a fast manner. We have some changes in the works to add [get](https://github.com/elastic/elasticsearch/pull/41384) and [delete](https://github.com/elastic/elasticsearch/pull/41495) enrich policy APIs and we have completed the put policy API. We also [simplified the EnrichStore](https://github.com/elastic/elasticsearch/pull/41421) (internal helper used to store and access enrich policies), so that it can be used more easily in the various places where it is needed. #### Consistent Settings We are working on [a feature](https://github.com/elastic/elasticsearch/pull/40416) that will allow us to ensure certain secure (keystore) settings are consistent across the whole cluster. #### API Key privileges there are now [some additional cluster privileges](https://github.com/elastic/elasticsearch/pull/41488) that will allow API keys to be useful for users that do not have superuser/manage_security privileges in the cluster. #### Block-Max Score We have [disabled max-score optimization](https://github.com/elastic/elasticsearch/pull/41361) on queries that contain a mandatory scoring clause with unbounded max score. Lucene 8 has the ability to skip blocks of non-competitive documents. However some queries don't track their maximum score (script_score, span, ...) so they always return Float.POSITIVE_INFINITY as maximum score. This can slow down some boolean queries if other clauses have bounded max scores. #### Field Capabilities The Field Capabilities API will now report fields that are [missing in some indices](https://github.com/elastic/elasticsearch/pull/34071) in an "unmapped" section. Following this change we have [enhanced the index resolution in SQL](https://github.com/elastic/elasticsearch/pull/41377) to use field_caps not just for merging but also for individual table discovery (used inside metadata). #### Querying frozen indices in SQL [SQL now supports frozen indices](https://github.com/elastic/elasticsearch/pull/41558). In SQL you can now indicate that you want index resolution to include frozen indices in two ways: 1. Through a dedicated `FROZEN` grammer extension (e.g. `SELECT field FROM FROZEN index`) 2. Through a configuration parameter on the drivers `index.include.frozen: true` #### Conditional logic in SQL SELECTs We finished up with implementation of [CASE](https://github.com/elastic/elasticsearch/pull/41349). CASE is a powerful ANSI SQL expression which implements the IF.. THEN.. ELSE.. logic of programming languages. It can be used in the SELECT, WHERE, GROUP BY, ORDER BY & HAVING clauses. Here is an example: ``` SELECT count(*) AS count, CASE WHEN NVL(languages, 0) = 0 THEN 'zero' WHEN languages = 1 THEN 'one' WHEN languages = 2 THEN 'bilingual' WHEN languages = 3 THEN 'trilingual' ELSE 'multilingual' END as lang_skills FROM test_emp GROUP BY lang_skills ORDER BY 2; ``` #### Snapshot resiliency The snapshot repository plugin testing for Azure is now on par with the testing we have for S3 and GCS. We have worked on adding [third-party tests for Azure](https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+third-party-tests-azure/), i.e., CI jobs that run some of our snapshot/restore related tests against the actual Azure service instead of just mocks as we do in local test runs. This also completes the work to add CI capabilities that run basic snapshot/restore tests against the real infrastructure of the three major Cloud providers. Based on the [recent work](https://github.com/elastic/elasticsearch/pull/40322) that allows snapshot repositories to implement deletes more efficiently using bulk operations, we are [switching](https://github.com/elastic/elasticsearch/pull/41368) the GCS repository to this new feature by making use of GCS's capability to batch deletes, which will significantly speed up snapshot deletions on a GCS repository. The same functionality was [recently](https://github.com/elastic/elasticsearch/pull/40322) added for S3 as well. #### Cluster coordination We worked with an external contributor to [fix the sample configuration files in our docker-compose docs](https://github.com/elastic/elasticsearch/issues/41394), which were missing a vital discovery.seed_nodes setting for all nodes, resulting in a cluster to form fine only the first time but then fail to properly restart half of the time. We have also adapted the logging output of the functionality where we periodically log a warning message with detailed info when a node cannot discover a master or elect a master, making sure to [filter out master-ineligible nodes](https://github.com/elastic/elasticsearch/issues/41344) from the cluster state, as these nodes are just adding noise and confusion to the log message. ### Lucene #### Backward offsets Users are [complaining](https://issues.apache.org/jira/browse/LUCENE-8776) that the fact that Lucene started [rejecting backward offsets](https://issues.apache.org/jira/browse/LUCENE-7626) in 7.0 breaks their use-case since they no longer can index compound terms as a synonym of their sub terms while preserving offsets. This is unfortunate but the ongoing discussion suggests that Lucene will keep rejecting backward offsets since it allows to implement some algorithms without backtracking and allows to encode offsets more efficiently. #### Performance regressions with Java 11 Since Lucene master now requires Java 11, the benchmarks were upgraded to Java 11 as well, but [it was found that it triggered a slight slowdown](https://markmail.org/message/d2pklgnmkpr4fk2i). We suspect it is due to the change of the default garbage collector from ParallelGC to G1. The benchmarks will temporarily force the ParallelGC collector to isolate whether this is actually due to the garbage collector. #### Other * Work is ongoing to [align the Luwak codebase with Lucene](https://issues.apache.org/jira/browse/LUCENE-8766) to prepare the donation of Luwak to Lucene. * We have javadocs for analysis components, but we generally don't [document the name of these components](https://issues.apache.org/jira/browse/LUCENE-8778), making them hard to used. * An Elasticsearch user reported that [MinHashFilter generates illegal unicode](https://issues.apache.org/jira/browse/LUCENE-8779). * [Corner cases in the Tessellator were fixed](https://issues.apache.org/jira/browse/LUCENE-8775). *How should Lucene [map segments to threads](https://issues.apache.org/jira/browse/LUCENE-8757) when configured to parallelize query execution? ### Changes #### Changes in Elasticsearch Changes in 8.0: * Make 0 as invalid value for `min_children` in `has_child` query [#41347](https://github.com/elastic/elasticsearch/pull/41347) Changes in 7.1: * Disable max score optimization for queries with unbounded max scores [#41361](https://github.com/elastic/elasticsearch/pull/41361) * Deprecate support for first line empty in msearch API [#41442](https://github.com/elastic/elasticsearch/pull/41442) * Improve accuracy for Geo Centroid Aggregation [#41033](https://github.com/elastic/elasticsearch/pull/41033) * Disallow null/empty or duplicate composite sources [#41359](https://github.com/elastic/elasticsearch/pull/41359) * Peer recovery should not indefinitely retry on mapping error [#41099](https://github.com/elastic/elasticsearch/pull/41099) * SSLDriver can transition to CLOSED in handshake [#41458](https://github.com/elastic/elasticsearch/pull/41458) * Introduce aliases version [#41397](https://github.com/elastic/elasticsearch/pull/41397) * fix #35262 define deprecations of API's as a whole and urls [#39063](https://github.com/elastic/elasticsearch/pull/39063) * SQL: Implement IIF(, , ) [#41420](https://github.com/elastic/elasticsearch/pull/41420) * SQL: Use field caps inside DESCRIBE TABLE as well [#41377](https://github.com/elastic/elasticsearch/pull/41377) * SQL: Implement CASE... WHEN... THEN... ELSE... END [#41349](https://github.com/elastic/elasticsearch/pull/41349) * Add ignore_above in ICUCollationKeywordFieldMapper [#40414](https://github.com/elastic/elasticsearch/pull/40414) * Move keystore-cli to its own tools project [#40787](https://github.com/elastic/elasticsearch/pull/40787) * Omit non-masters in ClusterFormationFailureHelper [#41344](https://github.com/elastic/elasticsearch/pull/41344) * Handle unmapped fields in _field_caps API [#34071](https://github.com/elastic/elasticsearch/pull/34071) Changes in 6.7: * Fix Has Privilege API check on restricted indices [#41226](https://github.com/elastic/elasticsearch/pull/41226) * Fix role mapping DN field wildcards for users with NULL DNs [#41343](https://github.com/elastic/elasticsearch/pull/41343) * Reduce security permissions in CCR plugin [#41391](https://github.com/elastic/elasticsearch/pull/41391) * SQL: Fix bug with optimization of null related conditionals [#41355](https://github.com/elastic/elasticsearch/pull/41355) #### Changes in Elasticsearch Management UI Changes in 6.7: * [CCR] Retrieve paused state of follower index from ES instead of depending upon the client to provide it [#35342](https://github.com/elastic/kibana/pull/35342) * [CCR] Allow user to use CCR when security is not enabled. [#35333](https://github.com/elastic/kibana/pull/35333) #### Changes in Rally Tracks * Update target throughput [#74](https://github.com/elastic/rally-tracks/pull/74) [1]: https://images.contentstack.io/v3https://www.elastic.co/assets/bltefdd0b53724fa2ce/blta7f81974851746a4/5cc31327ee4bf29d69c58d00/threshold-alert-ui.png

    https://www.elastic.co/blog/this-week-in-elasticsearch-and-apache-lucene-2019-04-26

Log in to reply
 

© Lightnetics 2024