For almost a decade, the core Endeca MDEX engine that underpins Oracle Endeca Information Discovery (OEID) has supported one-time indexing (often referred to as a Baseline Update) as well as incremental updates (often referred to as partials). Through all of the incarnations of this functionality, from “partial update pipelines” to “continuous query”, there was one common limitation. Your update operations were always limited to act on “per-record” operations.
If you’re a person coming from a SQL/RDBMS background, this was a huge limitation and forced a conceptual change in the way that you think about data. Obviously, Endeca is not (and never was) a relational system but the freedom to update data whenever and where ever you please, that SQL provided, was often a pretty big limitation, especially at scale. Building an index nightly for 100,000 E-Commerce products is no big deal. Running a daily process to feed 1 million updated records into a 30 million record Endeca Server instance just so that a set of warranty claims could be “aged” from current month to prior month is something completely different.
Thankfully, with the release of the latest set of components for the ETL layer of OEID (called OEID Integrator), huge changes have been made to the interactions available for modifying an Endeca Server instance (now called a “Data Domain”). If you’ve longed for a “SQL-style experience” where records can be updated or deleted from a data store by almost any criteria imaginable, OEID Integrator v3.0 delivers.
I’ve been on a number of projects over the past 5 years where the ability to define a set of records to be deleted from Endeca Server based on an attribute, or set of attributes would have been invaluable. There’s been invalid financial data in a Spend Analytics application skewing metrics, classified data that users had made visible in an Enterprise Search causing a security risk and lots more. With the new Delete component, eliminating this data from the index is a snap.
Let’s say I have an application to allow users to do Data Discovery on financial data from the last year. Every morning, I run an index update to add in the latest data from the previous day. However, I also want to “sunset” older data, say, anything older than 365 days. In previous versions of OEID, I would need to go to my index (or more likely, the system of record), pull the unique identifiers for the data, massage it so that I can generate the Endeca Record Specifier (the equivalent of a “primary key”) and load it into Endeca as a delete operation.
In version 3.0, I can just crack open the Delete component and specify an EQL Expression to identify those older records
And we’re done. I can see this being a huge help to customers that tend to “evolve” their index, appending and modifying data on the fly over time. It also opens up the possibility for applications that follow a more traditional “kill and fill” model to shift their architecture in this direction.
By the same token, the new update component opens up a whole new set of options when designing your data architecture. For applications that require “flagging” or aging style operations (navigate by current month, for example), the new Update functionality is a great fit. It supports Adding, Replacing and Deleting Assignments based on an incoming data feed and uses the same EQL mechanism for record identification.
Let’s say it’s April 1st (it’s really only 3 days away) and I want to take all of my March 2013 records and flip their “Current Month” attribute set to ‘No’. I create a one-line data file with the data I want to send into the index (this is a VERY simple example, you could have a number of attributes getting updated here):
I feed that into a Modify Records component and specify my EQL Record Set Specifier expression.
And, again, that’s it! I’ve now successfully flipped my March records to No. One quick note here is that the Modify component will also add attributes to your index if they don’t exist. In the above example, I actually ran my update against an index that didn’t include a CurrentMonth attribute for testing purposes and it worked great.