The Feature List : Oracle Endeca Information Discovery 3.1

As promised last week, we’ve been compiling a list of all the new features that were added as part of the Oracle Endeca Information Discovery (OEID) 3.1 release earlier this month.

If we’ve missed anything, please shoot us an email and we’ll update the post.

OEID Integrator v3.1

hadoop-cloveretl

The gang at Javlin has implemented some major updates in the past 6 months, especially around big data.  The OEID Integrator releases, obviously, lag a bit behind their corresponding CloverETL release but there’s still a lot to get excited about from both a CloverETL and “pure OEID” standpoint:

  • Base CloverETL version upgraded from 3.3 to 3.4 – full details here
  • HadoopReader / HadoopWriter components
  • Job Control component for executing MapReduce jobs
  • Integrated Hive JDBC connection
  • Language Detection component!

The big takeaway here is the work that the Javlin team has done in terms of integrating their product more closely with the “Big Data” ecosystem.  Endeca has always been a great complementary fit with sophisticated data architectures based on Hadoop and these enhancements will only make it easier.

Keeping with our obsession of giving some time to the small wins that add big gains, I really like the quick win with the Language Detection component.  This is something that had been around “forever” in the old Endeca world of Forge and Dgidx but was rarely used or understood.  It is nice to see the return of this functionality as it will play a huge role in multi-lingual/multi-national organizations, especially those with a lot of unstructured data.  Think about a European bank with a large presence in multiple countries trying to hear the “Voice of the Customer”.  Having the ability to navigate, filter and summarize based on a customer’s native language gets so much easier.

OEID Web Acquisition Toolkit (aka KAPOW!) Continue reading

Adventures in Installing – Oracle Endeca 3.1 Integrator

The newest version of Oracle’s Endeca Information Discovery (OEID v3.1), Oracle’s data discovery platform, was released yesterday morning.  We’ll have a lot to say about the release, the features, and what an upgrade looks like in the coming weeks (for the curious, Oracle’s official press release is here) but top of our minds right now is: “How do I get this installed and up and running?”

After spending a few hours last night with the product, we wanted to share some thoughts on the install and ramp-up process and hopefully save some time for others who are looking to give the product a spin.  The first post concerns the installation of the ETL tool that comes with Oracle Endeca, OEID Integrator.

Installing OEID Integrator

When attempting to install Integrator, I hit a couple snags.  The first was related to the different form factor that the install has taken vs. previous releases.  In version 3.1, the install has moved away from an “Installshield-style” experience to more of a download, unzip and script approach.  After downloading the zip file and unpacking it, I ended up with a structure that looks like this:

installing-integrator-3.1

Seeing an install.bat, I decided to click it and dive right in.  After the first couple of prompts, one large change becomes clear.  The Eclipse container that hosts Integrator needs to be downloaded separately prior to installation (RTFM, I guess).

Not a huge deal but what I found was that is incredibly important that you download a very specific version of Eclipse (Indigo, according to the documentation) in order for the installation to complete successfully.  For example:

I tried to use the latest version of Eclipse, Kepler.  This did not work.
I tried to use Eclipse IDE for J2EE Developers (Indigo).  This did not work.
I used Eclipse IDE for Java Developers (Indigo) and it worked like a charm:

eclipse-indigo

In addition, I would highly recommend running the install script (install.bat) from the command line, rather than through a double-click in Windows Explorer.  Running it via a double-click can make it difficult to diagnose any issues you may encounter since the window closes itself upon completion.  If the product is installed from the command line, a successful installation on Windows should look like this:

integrator-success

Hopefully this saves some time for others looking to ramp up on the latest version of OEID.  We’ll be continuing to push out information as we roll the software out in our organization and upgrade our assets so watch this space.

 

Default and User Friendly Prompting With BI Publisher

As mentioned in the previous post, Dynamic Report Grouping with Oracle BI Publisher, Edgewater Ranzal is working with a client to convert XML Publisher reports to BI Publisher reports. As part of Ranzal’s initiative, we began looking for opportunities to improve the user interface as well as create a standard methodology that report developers could utilize in the future. One of the initial areas we focused on was to improve the prompting feature. To this effort, we concentrated on:

  • Presenting prompts to the user within the BI Publisher tool
  • The displaying of user-entered prompt values within the report
  • Creating a methodology of implementation for report developers.

As expected, many of the reports had time prompts (date, period, or year), but the existing reports did not have default prompt values.  Although it is not published in any Oracle documentation we have seen, Oracle offers five functions that can be inserted into the Default Value option of the parameter:

{$SYSDATE()$}
{$FIRST_DAY_OF_MONTH()$}
{$LAST_DAY_OF_MONTH()$}
{$FIRST_DAY_OF_YEAR()$}
{$LAST_DAY_OF_YEAR()$}

*Note that you also have to set the Data Type to Date for these parameters. 

Simple numeric mathematical calculations can be performed with these functions to add some flexibility.  For instance, the previous day’s date would be displayed as

{$SYSDATE() – 1$}

By using these functions in conjunction with the Date String Format in the parameter options section, a variety of date value defaults can be displayed in the prompting section of the report. The following table is a sample of the prompts, Default Value, and Date Format Strings that were deployed at the client:

BI Publisher post 2 1

It is very important to understand that, regardless of the Date Format String settings, the actual value used in the date functions is the full date string and an optional numeric number added or subtracted that represents days. For instance, if the Default Value is set as {$FIRST_DAY_OF_YEAR() + 1$} (first day of year plus one) and the Date Format String is set to MM, the user would still see 01 as the default value because the actual value generated (and then converted to the month number) is 20XX-01-02T00:00:00.000+HH:00 (Jan 2, 20XX).

Because the optional numeric value used in the function refers only to days, and no logic can be written into the Default Value function, there is a natural limitation that prohibits generating anything beyond a period and/or year plus or minus one. For instance, if a client wants a prompt default value for two years ago, logic cannot be written to determine if the current year or previous year was a leap year and conclude whether to subtract 365 x 2 = 730 or 366 x 2 = 732 from the first day of year function (or system date function, depending on your preference).

Understandably, this problem would only occur two days every four years (December 31st of both a leap year and the year following a leap year); however, extrapolating from this logic is evidence of the difficultly in going back two or more months from any date function because of the variable numbers in a month. We observed an even more complicated version of this issue when the client wanted to have the default values for a period range equal to the previous period (i.e. during Q3, From Period defaults to 04 and To Period defaults to 06). Depending on the current period, the From Period needs to default from three to five periods ago and the To Period needs to default from one to three periods ago. Further exacerbating this problem was the year prompt that, during Q1, needs a default value of the previous year.

The final piece of the puzzle when using any parameters with the date data type is realizing that the bind value passed to your data model is the full date/time string. Our client exclusively used SQL in their data models; therefore, it was only a matter of using Oracle SQL’s native TO_CHAR function to convert the date/time string to a relationally comparable value as such:

BI Publisher post 2 2

The Ranzal team then looked to streamline and simplify interaction between parameters, parameter input requirement evaluation, and the RTF templates. The client’s reports had up to twelve parameters that required user input, and they used XLST logic to evaluate whether or not users had supplied values. As mentioned in previous posts, XLST is not a robust language as it relates to logical evaluations; after all, XLST was designed to consume XML documents and output new documents (in this case, RTF based reports). Because of these limitations, the initial RTF templates used the following logic (white space added for clarity):

BI Publisher post 2 3

Using this method, each parameter is evaluated until a null value is found, and then the remaining parameters are evaluated for a null value. When the XLST consumes the XML, each required parameter that the user has not entered a value for results in an additional warning line message. From a developer point of view, each additional required parameter requires the creation of additional lines of code. While the example above only has four required parameters, reports with many required parameters become quite convoluted and difficult to maintain.

Ranzal again turned to the logic processing capabilities of Oracle SQL. Within the data model, we created a new data set to create a parameter status (named PARAM_STAT) to look at the bind values passed by the BI Publisher parameters. We came up with the following SQL template to generate a more succinct warning message within the column value PARAM_STAT (note that n denotes the number of required report parameters):

BI Publisher post 2 4

There is an argument for creating a SQL statement that concatenates all missing parameter names with a comma and then uses logic to correct the punctuation; however, we felt that from a reusability standpoint, it would be best to compartmentalize the statement using the WITH TABLE1 statement. Using the above SQL template, report developers merely have to update the following lines:

  • 4 – 7:  Data model parameter names (i.e. :PRMBU) and report parameter names (i.e. Business Unit)
  • 10:  Data model parameter names (i.e. :PRMBU)
  • 15 – 20:  Replace the PARAM_COUNT comparison values (n, n – 1, and n – 2)

Using the example above with the required parameters for year, period, business unit, and ledger, the following SQL statement was generated:

BI Publisher post 2 5

Using this parameter status value results in a much more succinct XLST template that needs only to evaluate whether PARAM_STAT has a value (white space added for clarity):

BI Publisher post 2 6

The client has hundreds of BI Publisher reports and plans to continue to develop additional reports as their Oracle Business Intelligence platform becomes the standard reporting tool. By using the SQL template along with the simplified RTF template, the real work becomes creating the table, pivot table, or chart within the RTF template.  Fortunately, the Ranzal team was able to create an Excel-based VBA macro that automates the generation of the majority of the client’s templates. We will discuss this tool in a later post.

These two examples demonstrate the Ranzal team’s commitment to taking a proactive stance to examining current processes and looking for opportunities for improvement.  As we worked through the technical details of this implementation, we carefully balanced the idea of a user-centered experience against the often competing need for a simplified methodology and process for report developers. To accomplish the latter, we went through several phases of technical refinement, demonstrated the process to developers, and provided thorough documentation. This ensures that when the time comes to turn the maintenance of these reports over to the client, there is a complete knowledge transfer as well.

Dynamic Report Grouping With Oracle BI Publisher

Edgewater Ranzal is working with a client to convert XML Publisher and nVision reports to BI Publisher as part of a larger initiative to consolidate reporting under the Oracle Business Intelligence solution.  The client currently does not use the BI Publisher Layout Editor, but rather relies on RTF templates to display results to the user. One nVision report in particular presented a challenge because of the grouping requirements.

The report requires users to enter an Account range, and the resulting tables need to be grouped by individual accounts. There are also seven optional wildcard prompts, including Product and Department.  If the optional prompts are left blank, the results need only to be grouped by the individual accounts; however, if values are entered for the optional prompts, the subsequent results need to be grouped by all resulting unique combinations for the prompts entered. For instance, if the user enters 10000 to 20000 for the account range and all departments beginning with 10, an example report may be:

BI publisher 1

Sample Data Only

If the user adds the wildcard criteria (%) for the optional prompt Product, the results are further grouped by Product as such:BI publisher 4

Sample Data Only

One approach to meet the client’s needs for this particular nVision report is to use the grouping feature in the Microsoft Word BI Publisher Table Wizard. To fulfill the requirement of grouping only when users enter values in the optional prompts, the XLST statements that create groupings can be wrapped in IF statements that evaluate the prompts for null entries. For example, Department and Product can be evaluated as such (note that all prompts begin with PRM):

BI publisher 2

This approach has two drawbacks: First, after adding the additional evaluations for the other five optional prompts, the XLST within the RTF template becomes quite convoluted. The second, and more limiting factor, is that the IF statements to end the “group by” statements (below the table template) result in either a non-functioning report or incorrect groupings.

 A second and more viable option is to use the Data Model to leverage Oracle SQL which has more robust logic evaluation capabilities than XLST. To meet the client’s needs, an additional column was added to evaluate user prompt values (note that all prompt variables begin with PRM, and CHR(13) returns a carriage return):

BI-publisher-5

This column evaluates each optional prompt for a value and, if the user has made an entry, concatenates the common name (i.e. Department and Product) with the value of the respective row and a carriage return; otherwise, a null value is returned. The Microsoft Word BI Publisher Table Wizard feature is then used to generate the XLST needed to group the table by the new column GRPOPT in a more succinct fashion:

BI publisher 3By leveraging the Data Model SQL rather than XLST within the RTF template, Ranzal recreated this nVision report while maintaining the sort of dynamic capabilities normally seen in Oracle Business Intelligence Answers. The additional SQL statements add no overhead to the BI Publisher report, and the report runs as fast as or faster than, the corresponding nVision report. The delivery of this report opens up new reporting possibilities for the client, and it reinforces Ranzal’s expertise in the Oracle Business Intelligence tool set.

Data Discovery In Healthcare

A few days ago, QlikTech and Epic announced a technology partnership that will strengthen the integration between their software products as well as provide a forum for their joint customers to share best practices and innovative ways to use both technologies.

For a firm like Ranzal who is currently implementing several population health discovery applications, my first reaction was simply that this partnership made sense.  Both companies are leaders in their respective domains and are very well-regarded.  Beyond that, discovery technologies like Qlik, Tableau and Endeca are quickly establishing a foothold in the blossoming domain of healthcare analytics.  Unlike traditional BI technologies, data discovery tools are meant to quickly mashup disparate datasources and allow users to ask in-the-moment, unanticipated questions.  This alternative approach to analytics is allowing healthcare providers to build self-service discovery applications for broad audiences at speeds unimaginable in the world of the clinical data warehouse.  Since almost all healthcare analytics applications rely on data from the EMR, this partnership seemed natural, if not overdue.

My second reaction was that there was something missing.  In my experience, to get a holistic view of the health system, all of the relevant data must be tapped.  Data discovery on structured data, while powerful, can only tell party of the story.  With 60% of a health system’s data is tied up in unstructured medical notes, reports and journals, Qlik is not fully equipped to allow healthcare practitioners to gain a 360 degree view of their health system.

Endeca shines when structured and unstructured data are both required to paint a complete picture.  In healthcare, properly analyzing clinical data can mean drastically better outcomes at lower costs.  Understanding the “why” behind the “what” means properly tapping the narratives in the medical notes and tools like Endeca are best suited to unlock value when unstructured is prominent.

QlikView is a powerful tool and one cannot question its ease of use and numerous discovery features.  However, in industries rife with unstructured, products like Endeca that treat unstructured as a first class citizen (in the way it acquires, enriches, models, searches, and visualizes unstructured) are better suited to unlock the whole story.

So, I couldn’t help but think that a strong partnership could also be made between other EMR vendors with Oracle Endeca.  We spend a lot of time sizing up the relevant technologies in the data discovery space trying to understand differentiators.  For the types of discovery we’re seeing healthcare when unstructured is necessary to tell the whole story, our money remains on Endeca.

The Only Oracle Endeca Information Discovery Specialized Partner

One of the “big things” that we alluded to in our previous post has finally come through and we could not be more excited.

Ranzal is the first, and the only, company to achieve Specialized Status for the Oracle’s Endeca Information Discovery (OEID) platform.

Given our background with the product, we knew about a year ago that this is something we wanted to aggressively pursue.  And now, with a big assist from our customer references and partners, we are proud to say we’ve been approved, just in time for Oracle Open World next week.

For those who may not know much about the Specialization program, the program is designed to spotlight Oracle partners who are the recommended go-to partners for a given product line.  In order to achieve this level, companies need to demonstrate that they have:

  • The ability to deliver successful projects and solutions on the OEID platform
  • The ability to sell and evangelize the platform’s capabilities to the market
  • An established customer base that is willing to provide references

With the rigorous success criteria tied to the above objectives, (ex: selling multiple OEID licenses, multiple customer references), it took us a little longer than we would have wanted but we couldn’t be happier to have this under our belt.

Again, a big thanks to our customers and our Oracle partner representatives.  Drinks on us next week in San Francisco!

Killing the Tag Cloud – Introducing the Concept Cloud

For the last decade, since its first appearance on Flickr, the tag cloud has been ubiquitous as the default construct for visualizing concepts and topics found within text. There’s a beauty in its simplicity, the larger the word on your screen, the more frequently it appears in the data you are analyzing.

Now, nothing against the tag cloud but we’ve been thinking it’s gone a little stale lately. The visuals can be a bit muddled and, while it’s great for identifying single concepts, it does nothing to inform the user of relationships between the identified terms or any sentiment that may be attached to a given concept. The sum total of the visual is different sized fonts and good luck to you if you’re trying to draw any conclusions from your data.

Enter the Concept Cloud

Straight out of the lab at Ranzal’s HQ, we feel like this is a huge stride forward in terms of visualizing unstructured concepts and relationships. You’ll notice in the above screen capture, we’ve highlighted 6 terms from a series of physician’s notes related to a patient’s heart issue.

The first thing you’ll notice above is that you still have the visual cue of “size equals frequency”. The most frequently referenced anatomy in the data are represented by the largest circles. We then take this concept and go deeper. Wave the mouse over one of the circles and you’ll see that we surface the exact number of references for the associated term.

node-wave-over

In addition, when you plug this visualization into a Data Discovery environment like Oracle Endeca Information Discovery, you get the ability to drill down and further investigate your data and have the cloud react accordingly.  Below, we have another data set featuring key persons and concepts from everyone’s favorite topic, American Politics.  You have the ability to click a circle and narrow your data to records that contain the term you’ve selected.

sentiment

The Concept Cloud is the Tag Cloud “all grown up”. The traditional visual cues of size and frequency remain and, through shaping and shading, they are enhanced with sentiment analysis and a nicer visual experience. The final advance that the Concept Cloud provides, and the problem that drove us to create this solution, is in the connections between the circles. It’s great that key concepts present in the data are identified. However, what about the relationships?

edge-wave-over

This is where we think this visualization separates itself from the pack. Using some “relational/set magic”, the number of times these terms are found in proximity to each other is calculated and used to inform the user visually. When you wave over a line, linked terms are brought to the fore and unrelated terms are faded.  And, just as larger circles can be highlighted to show the exact frequency of terms, the edges or connections provide the same level of precision and can show the exact number of links in common.

If you look back at the original graphic, it should be apparent that the terms are actually laid out according to how closely they are linked.  Terms that are found in common with one another are arranged more closely than those with a more loose association. Terms that are totally unrelated will shown on the page, but totally disconnected from their cohort.

Also, because this visualization is based on platform-neutral technology (including D3 and SVG) and not Flash, it looks great on mobile, supports zoom in, zoom out and scales beautifully.

One other thing to note on the last diagram is the color coding of different concepts. When pairing this technology with sentiment analysis engines, such as Lexalytics, we can appropriately shade the representative circles for terms and concepts to indicate whether they are being referred to in a positive or negative fashion.

We’re starting to integrate this capability on our current engagements so please contact us if you’d like to learn more or even if you’d just like to see what this would look like on top your own data.  We always welcome any feedback or suggestions, either comment below, tweet at us or send us a message at info [at] ranzal.com.