Quick Hits: Temporal Analysis in Endeca

I try to keep the pulse of the OTN Forum for OEID, or Oracle Endeca Information Discovery.  Of late, a lot of questions have come up around how to handle temporal analysis with Endeca.  Specifically, when producing visualizations by time (e.g. month), how do I ensure that I have a “bucket” for all months, even if my underlying data does not tie back to every month?  A common pain point in the product, to be sure.

To illustrate, say I have “sales” records, like so:

When I load these into my Endeca server and attempt to produce a visualization that totals my sales by month, I wind up with:

 

RETURN foo AS SELECT SUM(SalesAmt) As "TotSales" GROUP BY Month

 

Almost immediately, it jumps out at me that there is no bucket for “5-May”.  Upon investigation, this is *accurate* as I had no sales in May, but far from the visualization I require to properly convey that fact.

The best practice here is to introduce a secondary “record type” that I usually call “Calendar”.  Each record in this record type is a different day, and I include all of the derived attribution I may want for the varying temporal analysis I’d like to perform in my application.  Thus, my new “Calendar” record type might look like:

and so on…

Now when I issue the same EQL statement that powers my chart, all temporal “buckets” are covered by my calendar records.  The calendar records ensure that my GROUP BY is offered any and all buckets, even if there are no sales to total in a particular bucket.  After loading this second record type, I refresh my chart and voila:

chartsalesMay

 

I now have a bucket for “May” and my visualization properly conveys that sales tanked in May and someone needs to lose their job.

Hope this helps.

3 thoughts on “Quick Hits: Temporal Analysis in Endeca

  1. Have you studied performance impact of this approach? As you know, most of the endeca’s customers are annoyed with loading the charts, when the size of the data is large.

    • Hi Srini,

      The performance impact of this approach should be extremely minimal. Even if you’re building your calendar record type to cover the last 10 years of data, this is introducing only another 3,650 records to the Endeca server. Asking the Endeca Server to consider another 3,650 in a query is peanuts.

      In our implementations, even with 10s of millions of records in play, we always shoot for “interactive speeds” with our component loading, including charts. If performance is an issue for your customers, there is plenty that can be done to potentially improve performance — including, but not limited to: MDEX threads, MDEX cache, software versioning, limiting the records before issuing EQL, avoiding EQL pitfalls like “AllBaseRecords” and unnecessary joins, data modeling changes, ensuring the Endeca Server is running on linux bare metal, using sticky sessions with load balancing, and so on.

      Branchbird is well versed in Endeca performance audits so feel free to reach out with any performance concerns.

  2. Pingback: Deploying Oracle Endeca Portlets in WebLogic | Bird's Eye View

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s