Data Governance in the Cloud: An Integrated Strategy; A Unified Solution

Are you tasked with making organizational decisions that have placed you in a major dilemma? As a decision-maker in today’s fast-paced economy, you must wonder how you can cut costs, improve the bottom line, and still maintain the data quality necessary to make strategic decisions.

Take heart because it IS possible to achieve a balance of on-premise and off-premise Enterprise Performance Management (EPM) software while maintaining integrity and control of your data to provide the quality and data assurance needed for success – AND benefit financially from new Cloud technologies.

Success is a combination of understanding what each data tract requires and creating an integration strategy consisting of the necessary business processes and software tools that deliver consistency and integrity of your EPM strategic data.

Past trends called for a tight on-premise coupling of all EPM software to achieve the best results. This strategy required maintenance of a large hardware and software infrastructure and related personnel to keep everything running smoothly.  The new Cloud “POD” subscriptions are geared toward reducing the high costs of infrastructure which is a financial benefit. As in all things in life, there is a consequence of moving to Cloud technology.   An unexpected consequence of Pod technology is the creation of isolated silos of information, but there is an easy resolution!  The key to overcoming this limitation is to gain an understanding of what each component offers and demands, and creating an integration strategy to bridge that gap.

If you are interested in learning how to create this strategy to bring the various pieces together as a unified solution or if your organization plans to migrate to the EPM Cloud platform in the future, this whitepaper helps to define a process to pre-build the integration strategy and make moving to the Cloud easier with reduced time to migrate.

Download our whitepaper: Data Relationship Management (DRM) for Cloud-Based Technologies:  Using DRM for Data Governance in the Cloud

Advertisements

Announcing PowerDrill for Oracle EID 3.1

If you had distill what we at Ranzal’s Big Data Practice do down to its essence, it’s to use technology to make accessing and managing your data more intuitive, more useful.  Often this takes the form of data modeling and integration, data visualization or advice in picking the right technology for the problem at hand.

Sometimes, it’s a lot simpler than that.  Sometimes, it’s just giving users a shortcut or an easy way to do more with the tools they have.  Our latest offering, the PowerDrill for Oracle Endeca Information Discovery 3.1, is the quintessential example of this.

When dealing with large and diverse quantities of data, Oracle Endeca Studio is great for a lot of operations.  It enables open text search, it has data visualization, it enriches data, it surfaces all in-context attributes for slicing and dicing and it helps you find answers both high-level, say “Sales by Region”, and low, like “My best/worst performing product”.  But what about the middle ground?

For example, on our demo site, we have an application that allows users to explore publicly available data related to Parks and Recreation facilities in Chicago.  I’m able to navigate through the data, filter by the types of facilities available (Pools, Basketball Courts, Mini Golf, etc.), see locations on a map, pretty basic exploration.

The Parks of Chicago

The Parks of Chicago

Now, let’s say I’m looking for parks that fit a certain set of criteria.  For example, let’s say I’m looking to organize a 3-on-3 basketball tournament somewhere in the city.  I can use my discovery application to very easily find parks that have at least 2 basketball courts.

Navigate By Courts

Navigate By Courts


This leaves me with 80 potential parks that might be a candidate for my tournament.  But let’s say I live in the suburbs and I’m not all that familiar with the different neighborhoods of Chicago.  Wouldn’t it be great to use other data sets to quickly explore the areas surrounding these parks quickly and easily?  Enter the Power Drill. Continue reading

What You Can Do…

Last week, we announced general availability of our Advanced Visualization Framework (AVF) for Oracle Endeca Information Discovery.  We’ve received a lot of great feedback and we’re excited to see what our customers and partners can create and discover in a matter of days. Because the AVF is a framework, we’ve already gotten some questions and wanted to address some uncertainty around “what’s in the box”.  For example: Is it really that easy? What capabilities does it have? What are the out of the box visualizations I get with the framework?

Ease of Use

If you haven’t already registered and downloaded some of the documentation and cookbook, I’d encourage you to do so.  When we demoed the first version of the AVF at the Rittman Mead BI Forum in Atlanta this spring, we wrapped up the presentation with a simple “file diff” of a Ranzal AVF visualization.  It compared our AVF JavaScript and the corresponding “gallery entry” from the D3 site that we based it on.  In addition to allowing us to plug one of our favorite utilities (Beyond Compare 3), it illustrated just how little code you need to change to inject powerful JavaScript into the AVF and into OEID.

Capabilities

Talking about the framework is great, but the clearest way to show the capabilities of the AVG is by example.  So, let’s take a deep dive into two of the visualizations we’ve been working on this week.  First up, and it’s a mouthful, is our “micro-choropleth”. We started with a location-specific Choropleth (follow the link for a textbook definition) centered around the City of Chicago.  Using the multitude of publicly available shape files for Chicago, the gist of this visualization is to display some publicly available data at a micro-level, in this case crime statistics at a “Neighborhood” level: It’s completely interactive, reacts to guided navigation, gives contextual information when you mouse over and even gives you the details about individual events (i.e. crimes) when you click in. Great stuff but what if I don’t want to know about crime in Chicago?  What if I want to track average length of stay in my hospital by where my patients reside?   Similar data, same concept, how can I transition this concept easily?  Well, our micro-choropleth has two key capabilities, both enabled by the framework, to account for this.  Not only does it allow my visualization to contain a number of different shape layers by default (JavaScript objects for USA state-by-state, USA states and counties, etc.), it also gives you the ability to add additional ones via Studio (no XML, no code). Once I’ve added the new JavaScript file containing the data shape, I can simply set some configuration to load this totally different geographic data frame rather than Chicago.  I can then switch my geographic configuration (all enabled in my visualization’s definition) to indicate that I’ll be using zip codes rather than Chicago neighborhoods for my shapes. Note that our health care data and medical notes are real but we de-identify the data, leaving our “public data” at the zip code level of granularity.  From there, I simply change my query to hit population health data and calculate a different metric (length of stay in Days) and I’m done! That’s a pretty “wholesale” change that just got knocked out in a matter of minutes.  It’s even easier to make small tweaks.  For example, notice there are areas of “white” in my map that can look a little washed out.  These are areas (such as the U.S. Naval Observatory) that have zip codes but lack any permanent residents.  To increase the sharpness of my map, maybe I want to flip the line colors to black.  I can go into the Preferences area and edit CSS to my heart’s content.  In this case, I’ll flip the border class to “black” right through Studio (again, no cracking open the code)… …and see the changes occur right away. The same form factor is valid for other visualizations that we’ve been working on.  The following visualization leverages a D3 force layout to show a Node-Link analysis between NFL skill position players (it’s Fantasy Football season!) and the things they share in common (College attended, Draft Year, Draft Round, etc.).  Below, I’ve narrowed down my data (approximately 10 years worth) by selecting some of the traditional powers in the SEC East and limiting to active players. This is an example of one of our “template visualizations”.  It shows you relationships, interesting information but really is intended to show what you can do with your data.  I don’t think the visualization below will help you win your fantasy league though it may help you answer a trivia question or two.

However, the true value is in realizing how this can be used in real data scenarios.  For example, picture a network of data related to intelligence gathering.  I can visualize people, say known terrorists, and organizations they are affiliated with.  From there, I can see others who may be affiliated with those organizations in a variety of ways (family relations, telephone calls, emails).  The visualization is interactive, it lends itself to exploration through panning, scanning and re-centering.  It can show all available detail about a given entity or relationship and provide focused detail when things get to be a bit of a jumble: And again, the key is configuration and flexibility over coding.  The icons for each college are present on my web server but are driven entirely by the data, and retrieved and rendered using the framework.  The color and behavior of my circles is configurable via CSS.

What’s In The Box?

So, you’re seeing some of the great stuff we’ve been building inside our AVF.  Some of the visualizations are still in progress, some of them are “proof of concept” but a lot of it is already packaged up and included. We ship with visualizations for Box Plots, Donut Charts, Animated Timeline (aka Health and Wealth of Nations), and our Tree Map.  In addition, we ship with almost a dozen code samples for other use cases that can give you a jump start on what you’re trying to create. This includes a US Choropleth (States and Counties), a number of hierarchical and parent-child discovery visualizations as well as a Sunburst chart. In addition, we’ll be “refreshing the library” on a monthly basis with new visualizations and updates to existing ones.  These updates might be as simple as demonstrations of best practices and design patterns to fully fledged supported visualizations built by the Engineering team here in Chicago.  Our customers and partners who are using the framework can expect an update on that front around the first of the month.

As always, feedback and questions welcome at product [at] ranzal.com.

Leveraging Your Organization’s OBI Investment for Data Discovery

Coupling disparate data sets into meaningful “mashups” is a powerful way to test new hypotheses and ask new questions of your organization’s data.  However, more often than not, the most valuable data in your organization has already been transformed and warehoused by IT in order to support the analytics needed to run the business.  Tools that neglect these IT-managed silos don’t allow your organization to tell the most accurate story possible when pursuing their discovery initiatives.  Data discovery should not focus only on the new varieties of data that exist outside your data warehouse.  The value from social media data and machine generated data cannot be fully realized until it can be paired with the transactional data your organization already stockpiles.

Judging by the heavy investment in a new “self-service” theme in the recently released version 3.1 of Endeca Information Discovery, this truth has not been lost on Oracle.

Companies that are eager to get into the data discovery game, yet are afraid to walk away from the time and effort they’ve poured into their OBI solution, can breathe a little easier.  Oracle has made the proper strides in the Endeca product to incorporate OBI into the discovery experience.

And unlike other discovery products on the market today, the access to these IT-managed repositories (like OBI) is centrally managed.  By controlling access to the data and keeping all data “on the platform”, this centralized management allows IT to avoid the common “spreadmart” problem that plagues other discovery products.

Rather than explain how OBI has been introduced into the discovery experience, I figured I would show you.  Check out this short 4 minute demonstration which illustrates how your organization can build their own data “mashups” leveraging the valuable data tied up in OBI.

 

 

Chances are that a handful of these tested hypotheses will unlock new ways to measure your business.  These new data mashups will warrant permanent applications that are made available to larger audiences within your organization.  The need for more permanent applications will require IT to “operationalize” your discovery application — introducing data updates, security, and properly sized hardware to support the application.

For these IT-provisioned applications, Oracle has also provided some tooling in Endeca to make the job more straightforward.  Specifically, when it comes to OBI, the product now boasts a wizard that will produce a Integrator project with all of the plumbing necessary to pull data tied up in OBI into a discovery application in minutes.  Check out this video to see how:

 

 

It is product investments like these that will allow organizations to realize the transformative effects data discovery can have on their business without having to ignore the substantial BI investments already in place.

As always, please direct any questions or comments to [at] ranzal.com.

Installment 2: Under the Hood of the EBS Endeca Integration

In my first post in the series, I promised to return with more in-depth goodness about the Endeca extensions for EBS.  The more I thought to write about new topics, the more I was inclined to show the offering first hand.

Thus, here we are.  I have opted to show you how the integration works “under the hood”, instead of boring you with my words.  This first screencast is still somewhat high-level, but aims to help illustrate how the two applications, Endeca and EBS, work together.

In future screencasts, I plan to dive deeper into the integration specifics on around data, UI and configuration.  I also plan to show how the out-of-the-box configuration can be tweaked to help maximize the value of the offering.

 

 

OEID 3.0 First Look — Text Enrichment & Whitespace

I recently spent some cycles building my first POC for a potential customer with OEID v3.0.  After running some of the unstructured data through the text enrichment component, I noticed something odd:

whitespace_prob

The charts I configured to group by those salient terms were displaying a “null” bucket.  This bucket was essentially collecting all records that were not tagged with a term.  After a bit of investigation, it seems this is expected behavior in v3.0 — the Endeca Server now treats empty, yet non-null attributes, as valid and houses them on the Endeca record.  Empty, yet non-null, attributes are common after employing some of the OOTB text enrichment capabilities in 3.0 (tagging, extraction, regex).  Thus, a best practice treatment for this side-effect is warranted.

The good news is that the workaround was very straightforward.

1) Add a “Reformatter” component to the .grf before the bulk loader with the same input and output metadata edge definition.  From the reformatter “Source” tab, select “Java Transform Wizard” and give your new transformation class a name like “removeWhitespaces”.  This will create a .java source file and a compiled .class file in your Integrator project’s ./trans directory (where Integrator expects your java source code to reside).

removeWhitespace

2) Provide the following java logic in your new “removeWhitespaces” transformation class:
import org.jetel.component.DataRecordTransform;
import org.jetel.data.DataRecord;
import org.jetel.exception.TransformException;
import org.jetel.metadata.DataFieldType;

public class removeWhitespaces extends DataRecordTransform {

@Override
public int transform(DataRecord[] arg0, DataRecord[] arg1) throws TransformException {
for(int i = 0; i < arg0.length; i++) {
DataRecord rec = arg0[i];
for(int j = 0; j < rec.getNumFields(); j++) {
if(rec.getField(j).getMetadata().getDataType().equals(DataFieldType.STRING)) {
if(rec.getField(j).getValue() == null || rec.getField(j).getValue().equals(“”) || rec.getField(j).getValue().toString().length() == 0) {
rec.getField(j).setValue(null);
}
}
arg1[i].getField(j).setValue(rec.getField(j).getValue());
}
}
return 0;
}
}

3) Make sure the name of this new class is specified in the “Transform class” input.  Rerun the .grf that loads your data and….profit!

whitespace_fix

We look forward to sharing more emerging OEID v3.0 best practices here….and hearing about your approaches as well.