Don’t Fear the Statistics – Using OBI for Statistical Analysis Part 2

Nearly every client Edgewater Ranzal partners with uses statistical averages in their analytic and reporting solutions. As far as statistical functions go, it is probably the easiest to understand, however; the limitation of using the average is that it can be difficult to determine how to rate the individual performance of contributors to that average.  Consider the following examples:

  • The average cost of a gallon of milk is $3.20 and the corner convenience store is selling it for $3.45, is that a significant deviation from the average?
  • If the average NFL player’s base salary is $1.86 million and Tennessee Titan’s Marcus Mariota made $5.5 million, is this an exceptional payout? Is the salary significant when his role as the team’s starting Quarterback is considered?
  • Suppose the average gross margin percent for a company’s business units is 58% and one particular business unit’s actual gross margin is 46%. Is that business unit truly underperforming?

It turns out that the average of a particular measurement is very subjective. In this post, we explore how the standard deviation of the average can be used to mitigate subjectivity and how it can be incorporated into data visualizations to identify true outliers.

The NASDAQ-100 is comprised of the largest domestic and international non-financial companies (based on market capitalization) listed on the Nasdaq Stock Exchange. It includes technology giants such as Apple and Alphabet (parent company of Google) along with consumer services such as Bed, Bath, & Beyond.  The quarterly gross margin percent from 2007 to Q3 2016 was downloaded and loaded into a data mart leveraged by Oracle Business Intelligence Enterprise Edition (OBIEE) 12c.  (Q4 2016 data was not available for all companies).  With the exception of Figure 1, the following visualizations were created in OBIEE 12c.

The standard deviation can be thought of as ranges that can be used to classify individual contributors to the average. For instance, the average gross margin percent for the NASDAQ-100 in Q4 2014 was calculated to be 59.9% with a standard deviation of 22.7%.  This can be visualized on a number line as such:

Figure 1 NASDAQ-100 Q4 2014 Gross Margin % Performance Ranges

dont-fear-statistics-part-2-figure-1

Many real world events that have variability follow a predictable distribution pattern. For instance, it is expected that approximately 34.1% of the contributors will fall between the average and one standard deviation up.  From the figure above, it is estimated that approximately 34 of the NASDAQ-100 will have a gross margin percent between 37.2% and 59.9%.  The actual distribution can be visualized as such:

Figure 2 Distribution of NASDAQ-100 Gross Margin %

dont-fear-statistics-part-2-figure-2

The NASDAQ-100 companies do not perfectly follow the distribution; there is a fatter spread into the Negative and Positive buckets (Two Standard Deviations down and up). Other, more advanced statistical methods can be used to redefine ranges, but are beyond the scope of this post.

Of course, this visualization simply confirms statistical theories that were proven over a hundred years ago. The true value of analytics is to take statistical theories and turn them into informative visuals.  One method of visualizing the ranking of companies using the standard distribution in OBIEE 12c is through a Treemap:

Figure 3 NASDAQ-100 Distribution Treemap Visualization

dont-fear-statistics-part-2-figure-3

The size of the box represents the Gross Margin % while the color aligns with the distribution ranking from Figures 1 and 2. This visualization allows the viewer to understand both the rankings and relative performance at a glance.  It is easy to discern the delineation between above and below average (border between yellow and light green) as well as which companies are herding together.

One of the most powerful and essential aspects of business analytics is the ability to dimensionalize data so it can be sliced and diced. One (of many) reasons this is done is to be sure that there is an “apples to apples” comparison.  For instance, comparing the gross margin percent comparison between Qualcomm (QCOM), a semiconductor and telecommunications company, and Ross Stores (ROST), a discount department store, can create misconstrued distributions.  Filtering the visualization in Figure 3 by the NASDAQ industry classifications for Technology companies results in the following Treemap:

Figure 4 NASDAQ-100 Technologies Companies Treemap

dont-fear-statistics-part-2-figure-4

Notice that Qualcomm has slipped from “Moderately Positive” to “Moderately Negative.” Averages and standard deviations can change dramatically when looking at the components of the whole.  To demonstrate this, consider the following visualization comparing the average and deviation spread of the three largest categories (by number of companies) of the NASDAQ-100:

Figure 5 Average and Standard Deviation by Categories

dont-fear-statistics-part-2-figure-5

The border between yellow and light green represents the average while each band represents one standard deviation. Notice that the average gross margin % as well as the standard deviation is higher for Healthcare than for Technology.  Healthcare companies are going to skew the performance perspective of Technology companies.  This skew worsens when comparing against companies classified as Consumer Service.

As a general rule, a single point is not the best indicator of long term performance. Although the average and standard deviation for a single quarter was calculated through the agglomeration of one hundred companies, it should be considered a single data point.  Consider the following visualizations that show a comparative trend for four different companies for the entire date range downloaded:

Figure 6 Gross Margin % Trend for Adobe, Amazon, Electronic Arts, and Priceline

dont-fear-statistics-part-2-figure-6

At a glance, viewers can see that Adobe (upper left) consistently beats the average performance while consumer goods and technology giant Amazon (upper right) has been performing below average until recently. Electronic Arts (lower left), a video game developer, seems to have erratic gross margin % returns; however, looking past the noise, the company is nearly always between moderately positive and moderately negative when compared against other NASDAQ-100 companies.  Finally, Priceline (lower right) has been increasing gross margin % consistently and steadily pulling ahead of other NASDAQ-100 companies.  If Priceline’s gross margin % trend continues and the performance of the other companies remains constant, Priceline will move into the “Extremely Positive” gross margin % ranking in Q4 2016 or Q1 2017.

Returning to the questions posed at the beginning of this post:

  • The average cost of a gallon of milk is $3.20 with a standard deviation of $0.08. The corner grocery store selling milk for $3.44 is three standard deviations above the average!
  • The average NFL base salary is $1.86 million with a standard deviation of $2.80 million. Comparatively, Marcus Mariota’s $5.50 million salary is one standard deviation above average. However, with the average quarterback base salary being $5.69 million with a standard deviation of $7.17 million, he is actually minimally undercompensated.

For the final question, we ask the reader to evaluate his enterprise:

  • Calculate the average gross margin percent for your company’s business units for the quarter and find the business unit that is approximately 10% less than that average. Are they truly underperforming? Are you able to properly classify these business units to gain the greatest insight into relative performance?

Average and standard deviation can be applied to any metric by which a company wishes to evaluate itself. It can be used in combination with external data to create industry benchmarks.  For instance, if you were to plot your company’s gross margin % performance against the trends above, how would it look?

We want to close this post with the same idea that we closed Part 1 of the “Don’t Fear the Statistics” post: statistical analytics is part science/technology and part art.  Reducing statistical calculations to consumable visualizations is the key.  In the visualizations above, references to “standard deviation” were diligently omitted in favor of familiar terms such as “Moderately Negative.”  Approaches such as this help with change management, adoption, and the acceleration from simple reporting to true analytical insight into business process improvement based on data.

Don’t Fear the Statistics – Using OBI for Statistical Analysis Part 1

Recently, Ranzal has been working with a client in the healthcare space implementing Oracle Business Intelligence (OBI), and a requirement surfaced to translate a scorecard report into an OBI dashboard. One of the data elements was simply captioned “Trend” and colored red, yellow, and green.  It was discovered that this Trend was the slope of a linear regression plot (more on what that means in a moment) and the color was based on an arbitrarily chosen number.  This immediately raised some concerns from the Ranzal team who then made some suggestions for more pertinent statistical analysis.

To set the stage, this healthcare client’s summarized (and greatly simplified) income statement divides Revenue into Inpatient and Outpatient and Expenses into Total Labor and Non Labor. Revenue and expenses are the primary focus of much of the analytics at an aggregate level.  A single (seemingly arbitrarily chosen) number was used to determine the colored flags for each of these measures.  This was despite Inpatient Revenue and Non Labor Expenses comprising the majority of the revenue and expense amounts (respectively).  If we were to plot out these categories for the first five months of a fiscal year, we see the following (all data have been altered to preserve client confidentiality without overly affecting the overall analytic output):

figure-1

Figure 1 Revenue and Expense Trend Plot

The trouble with plotting a trend of numbers is that it is sometimes difficult to understand, at a glance, how the organization is performing. In the plots above, clear downward and upward trends can be seen for Inpatient Revenue and Total Labor Expense (respectively).  However, upon closer examination of Outpatient Revenue and Non Labor Expense, there are two upward trending months and two downward trending months.  The overall trend is difficult to discern.

With the introduction of Oracle Business Intelligence Enterprise Edition (OBIEE)12c, a Trendline function was introduced that allows the creation of a linear regression trendline. Once this is applied, the above trend plots can be augmented to get a clearer picture of performance:

figure-2

Figure 2 Revenue and Expense Linear Regression

This trendline uses a simple linear regression formula that is comprised as the slope (commonly represented by the letter m) and Intercept (commonly represented by the letter b) in the following formula:

y = mx + b

In our trend plots, the letter y represents the revenue and expense categories and x represents the fiscal periods.

The intercept is where the trendline crosses the y-axis when x is equal to zero. For most statistical analyses, the intercept is unimportant.  The slope can be thought of the average change over the two parameters.  Using OBI, the slope of each revenue and expense category can be calculated and the dashboard updated:

figure-3

Figure 3 Linear Regression Slope

In the example above, the slope of the Inpatient Revenue can be thought as decreasing an average of $291,000 a month.

One issue with using the slope is that it is subjective. As was mentioned, our healthcare client had chosen a single arbitrary slope for each of the revenue and expense categories.  The slopes in the example above range from 29 thousand to -291 thousand.  Complicating matters, the client wanted the ability to run these Analysis for individual hospitals which can dramatically affect the slope.  For instance, a hospital operating in Kansas City will probably not have the same revenue growth (or shrinkage) as a hospital operating in New York City.  To use the slope as a quantifiable objective properly, a target slope would have to be determined for the enterprise and at each granular level expected to be benchmarked (hospital, department, etc.).  This creates some obvious maintenance issues.

A more objective approach is to use the correlation coefficient, a number on a range from negative one to positive one. A correlation ranking of one indicates a positive correlation while a ranking of negative one indicates a negative correlation.  For instance, for most companies, the number of units sold is often has a high degree of positive correlation to revenue.  This would correspond to a correlation coefficient of close to one.  For many companies working in the commodities market, the more competitor’s revenue increases, the lower the possible market share.  This would be a negative correlation and result in a correlation coefficient calculation of negative one.  A correlation coefficient of zero indicates a lack of any correlation.  For instance, the number of broken arms set in a New York hospital is probably uncorrelated to the number of bowls of soup served by Panera Bread in Kansas City.

It is worth noting that correlation does not mean causation. For example, consider the number of pirate attacks and users of Microsoft Internet Explorer (IE) users:

figure-4

Figure 4 IE Usage and Pirate Attacks

The number of pirate attacks and IE users have both been in decline since 2009. As can be seen by the scatter graph on the right, the more pirate attacks, the greater the use of IE.  Regardless, naval security experts are probably not asking for adoption rate reports from Microsoft.

Returning to the client’s use case, adding the correlation coefficient to the dashboard provides a greater understanding of how the company is objectively performing:

figure-5

Figure 5 Month and Revenue / Expense Category Figure Correlation

Inpatient Revenue has a correlation of -0.69, which is moderately significant for a metric most businesses want to increase. Conversely, the Outpatient Revenue has a slightly negative correlation of -0.36.  While this should be a cause for concern, a “wait and see” approach (or deeper dive into Outpatient Revenue Categories) might be more prudent.  Because the range of the correlation coefficient is negative one to one, filtering this analysis down to a more granular level, such as a hospital or department, will return an objective number that can be subjected to independent interpretation.

There are cases in which the subjectivity of the slope is particularly useful. In the case of our client, a full year budget was prepared at the beginning of the fiscal year and periodically updated as the year progressed. The slope of this budget could be used to generate the average dollar change desired per month.  The advantage of this is that it reduces the possible volatility of a particular month into a single number that can be compared to the benchmark.  As a final addition to the dashboard, a full year budget slope was added:

figure-6

Figure 6 Full Year Budget Slope

With the exception of Non Labor Expenses, this organization is missing the mark on all of their budgetary goals, and the trend indicated by the actual slope and correlation coefficient means this situation is likely to get worse.

A word of warning about statistics in general and the use of slope and correlation coefficient in particular: micro and macro trends can should be considered and extreme outliers can mask actual trends.

For an example of micro and macro trends, consider JCPenney, a retailor that has been struggling since 2010. The following visualization (created using Oracle Data Visualization Desktop) charts the quarterly revenue from 2004 Q3 to 2016 Q4 along with the trendline for the entire period.  The bars represent the correlation coefficient to that particular quarter (i.e. the first bar is the correlation between 2004 Q3 and 2004 Q4 while the second bar is the correlation between 2004 Q3, 2004 Q4, and 2005 Q1, etc.):

figure-7

Figure 7 JCPenney Revenue Trend and Correlation

Notice that the first correlation bar is equal to one. When there are only two data points, the correlation coefficient will be equal to one, negative one, or zero.  The next data point and correlation for 2005 Q1 (JCPenney recognizes holiday revenue in Q1 of each year) continues the high correlation streak, however, the following quarter drops the correlation down to 0.35.  The correlation fluctuates quarterly until about 2012 Q2 when the definite downward trend is established.

A savvy analyst will break JCPenney’s performance during this time range into three distinct trends. Upward trending from 2004 to 2008 Q1, diminished upward trend from 2008 Q2 to 2012 Q1, and then a flat, but greatly reduced revenue from there:

figure-8

Figure 8 JCPenney Distinct Trends

As an example of how an extreme outlier can affect statistical analysis, consider GTx Incorporated, a pharmaceutical drug developer. In December 2010, GTx recognized $49.9 million dollars in revenue from a partnership with Merck& Co., Inc., which spiked GTx’s revenue (previously averaging $2 million a quarter) to $56.7 million dollars:

figure-93

Figure 9 GTx Incorporated Revenue Trend

In the visualization above, the orange projected trendline was calculated using revenue from 2004 Q1 through 2009 Q4. The purple trendline is the projected calculated using 2010 Q1, which includes the huge revenue spike.  Obviously, the orange trendline is the more accurate due exclusion of the extreme data point.

Statistical analytics is part science/technology and part art. As with any data and visualizations, a certain degree of intelligent interpretation is needed to determine what it all really means.  Functional users should be focused on what the various statistical interpretations mean and not be distracted on the complexity of the underlying mathematical functions.  Trend visualizations can aid users in understanding how to interpret these statistical calculations.  Many organizations miss opportunities because of individuals unwilling to embrace statistical methods due to the lack of solid education and guidance about what these numbers really mean.  Training, change management, and the creation of rich visualizations can help enterprises harness the capabilities of statistical analysis and extend the role of their business intelligence systems.

Oracle Business Intelligence EPM and Relational Federation – A Strategic Approach

The federation of EPM and relational data sources through Oracle Business Intelligence (OBI) seems straightforward: import the cube, federate and rename, expose it all, and create dashboards and analysis. Due to the technical simplicity of EPM and relational federation, many organizations underestimate the amount of effort needed to implement an OBI solution that properly leverages and extends the capabilities of the EPM and relational data sources.  The OBI implementation process should not be an afterthought, especially if OBI is to be the primary method by which users consume organizational data.  We have assembled ten “Dos and Don’ts” that cover the full lifecycle implementation to help organizations get the most out of their OBI solution.

Do – Design and develop the data sources with input from the OBI implementation team

Especially in implementations where OBI is to be the primary method of consuming data, the OBI implementation team should have been heavily involved in Dashboard and Analysis requirements and design. As such, this team will have the knowledge of what data structure is needed to support an efficient and easy-to-use analytic solution.  Asking the OBI implementation team to come in after the data model has been set and create Dashboards and Analysis will often result in workarounds that are error prone, difficult to maintain, and challenging or impossible to scale.

Don’t – View OBI as a one-size-fits-all analytics and reporting tool for the organization

OBI is a powerful and versatile tool capable of addressing a slew of needs; however, it is not a magic bullet. Depending on the application and needs of the organization, Smart View, Financial Reporting, and even BI Publisher have their places in the organization.  Attempting to replicate the capabilities of other analytic and reporting tools through OBI may provide the illusion of capability, but will fall short of user expectations and possibly harm adoption by the rest of the organization.

Do – Have a metadata management process in place before federating data sources

We discussed the rationale of this best practice thoroughly in the post Oracle Business Intelligence – Synchronizing Hierarchical Structures to Enable Federation. To summarize, unsynchronized hierarchical structures between data sources can result in analysis with outcomes that are irreconcilable, seemingly reorganize while drill down or up, display erroneously shared members, or simply result in errors in OBI.  A centralized process for managing this metadata as well as ensuring that all relevant data sources are updated simultaneously is imperative when federating data sources.

Don’t – Treat OBI as a metadata or master data management tool

This is typically a symptom of not having the OBI implementation team involved during the design of the data models. As a result of this misalignment, clients attempt to shoehorn analysis into the data model by using the BI Administration tool (RPD) to excessively manipulate the data model.  Properly leveraged, the BI Administration tool can create an agile analytics solution; however, relying on this tool to fill large gaps between the data model and analytics will result in performance and maintenance issues.

Do – Define a use case, user community, and requirements for all implementations

From proof of concept to full implementations, having the right people involved is imperative. Within your organization:  Who understands the reporting and analytic needs and gaps?  Who understands where the data is coming from?  Who understands what capabilities are needed?  Who is positioned to help user adoption?  Who is asking questions that the organization is struggling to answer?  Any technology implementation that is done with the intent to “throw it against a wall and hope something sticks” is destined to fail; OBI is no different.

Don’t – Expect that users will flock to OBI if EPM is the only data available

We find that when there are both EPM and relational data sources, EPM is often the first to be implemented and exposed through OBI. During these implementations, users are extensively exposed to Smart View and finance users become especially enamored with the tool and struggle to immediately see value in OBI.  A Pavlovian response is to simply federate the EPM cube’s relational data source which typically provides a lower level of detail (or granularity).  While this is sometimes useful to users, it is still not providing the additional insight users cannot readily get elsewhere.  Federating additional data sources with EPM cubes should provide additional attributes or measures or provide a simple path to jump from one organizational view of the data to another.  For instance, a financial consolidation EPM cube federated with an operational relational data source provides an easy-to-use analytical solution for managers with responsibilities that straddle both worlds.  These users will quickly adopt OBI and help with future user adoption.

Do – Empower the users

Guided analysis through Dashboards, Analysis, Alerts, and Scorecards is a powerful tool; however, an organization will never address every scenario through this method. Guided analysis should be an introduction to OBI for users which should quickly be developed into self-service.  Within a few months of rolling out the OBI solution, power users should be assembling ad hoc analysis and putting together their own dashboards.  Within a year, most users should be answering basic questions on their own.  Organizations that empower users are not only improving the ROI on OBI, but they are also more agile in addressing changing business landscapes, accelerating user adoption, and reducing the load on (often) overburdened IT organizations.

Don’t – Neglect the performance of any data sources

The demand for data is the epitome of just-in-time logistics. Especially when users are empowered, many organizations find that their data sources and caching strategies are not sufficient for how users are actually leveraging the data.  EPM and relational data sources both have performance monitoring capabilities that should be frequently evaluated during the months after initial rollout and periodically evaluated thereafter and any deficiencies addressed.  Failing to address performance issues will result in users abandoning and circumventing the analytic tool, resulting in loss of productivity and data quality issues.

Do – Pivot to using OBI as an analytics tool instead of simply another reporting tool

Tabular reporting is typically (and should be) the first use for OBI that clients turn to, but this should be viewed as an insertion point and not the final rally point. With capabilities such as graphs, heat matrices, treemaps, gauges, alerts, and trellis, pivoting from reporting to analytics should be the goal.  Answering business-critical questions, quickly understanding the business landscape, and gaining insight is where the true value of OBI lies.  Simply leveraging OBI as another reporting solution is severely handicapping the tool’s return on investment.

Don’t – Let OBI data sources become static

Analytics is one of the few tools that simultaneously changes a business in a deliberate and serendipitous manner. A well-led and strategically executed analytics program can have a lasting contribution to an organization’s goals.  At the same time, users will develop new skills and capabilities as they become familiar with both the tool and the data and begin to ask new questions.  As both the competitive landscape changes and organizational capabilities expand, data models should be evolving to address these new needs.  OBI has the ability to easily expose, slice and dice, and visualize data to answer these questions; the challenge is to not become complacent in providing new data resources to users.

If OBI is to play a role in your organization’s analytic strategy, it should not be an afterthought. Involving implementation team members with the knowledge of OBI’s capabilities from the start can help ease implementation during the later phases, accelerate user adoption, and increase the long term ROI.  Edgewater Ranzal has both the technical and functional implementation experience with OBI to help you evaluate, adjust, and execute your analytic strategy according to these ten “Dos and Don’ts.”

Leveraging Your Organization’s OBI Investment for Data Discovery

Coupling disparate data sets into meaningful “mashups” is a powerful way to test new hypotheses and ask new questions of your organization’s data.  However, more often than not, the most valuable data in your organization has already been transformed and warehoused by IT in order to support the analytics needed to run the business.  Tools that neglect these IT-managed silos don’t allow your organization to tell the most accurate story possible when pursuing their discovery initiatives.  Data discovery should not focus only on the new varieties of data that exist outside your data warehouse.  The value from social media data and machine generated data cannot be fully realized until it can be paired with the transactional data your organization already stockpiles.

Judging by the heavy investment in a new “self-service” theme in the recently released version 3.1 of Endeca Information Discovery, this truth has not been lost on Oracle.

Companies that are eager to get into the data discovery game, yet are afraid to walk away from the time and effort they’ve poured into their OBI solution, can breathe a little easier.  Oracle has made the proper strides in the Endeca product to incorporate OBI into the discovery experience.

And unlike other discovery products on the market today, the access to these IT-managed repositories (like OBI) is centrally managed.  By controlling access to the data and keeping all data “on the platform”, this centralized management allows IT to avoid the common “spreadmart” problem that plagues other discovery products.

Rather than explain how OBI has been introduced into the discovery experience, I figured I would show you.  Check out this short 4 minute demonstration which illustrates how your organization can build their own data “mashups” leveraging the valuable data tied up in OBI.

 

 

Chances are that a handful of these tested hypotheses will unlock new ways to measure your business.  These new data mashups will warrant permanent applications that are made available to larger audiences within your organization.  The need for more permanent applications will require IT to “operationalize” your discovery application — introducing data updates, security, and properly sized hardware to support the application.

For these IT-provisioned applications, Oracle has also provided some tooling in Endeca to make the job more straightforward.  Specifically, when it comes to OBI, the product now boasts a wizard that will produce a Integrator project with all of the plumbing necessary to pull data tied up in OBI into a discovery application in minutes.  Check out this video to see how:

 

 

It is product investments like these that will allow organizations to realize the transformative effects data discovery can have on their business without having to ignore the substantial BI investments already in place.

As always, please direct any questions or comments to [at] ranzal.com.