Big Data Discovery – Custom Java Transformations Part 2

In a previous post, we walked through how to implement a custom Java transformation in Oracle Big Data Discovery.  While that post was more technical in nature, this follow up post will highlight a detailed use case for the transformations and illustrate how they can be used to augment an existing dataset.

Example: 2015 Chicago Mayoral Election Runoff

For this example we will be using data from the 2015 Mayoral Election Runoff in Chicago.  Incumbent Rahm Emanuel defeated challenger “Chuy” Garcia with 55.7% of the popular vote.  Results data from the election were compiled and matched up with Chicago communities, which were then subdivided by zip code.  A small sample of the data can be seen below:

Sample election data

Sample election data

In its original state, the data already offers some insight into the results of the election, but only at a high level.  By utilizing the custom transformations, it is possible to bring in additional data and find answers to more detailed questions.  For example, what impact did the wealth of Chicago’s communities have on their selection of a candidate?

One indicator of wealth within a community is the median sale price of homes in that area.  Naturally, the higher the price of homes, the wealthier the community tends to be.  Zillow provides an API that allows users to query for a variety of real estate and demographic information, including median sale price of homes.  Through the custom transformations, we can augment the existing election results with the data available through the API.

The structure of the custom transformation is exactly the same as the ‘Hello World’ example from our previous post.  The transformation is initiated in the BDD Custom Transform Editor with the command runExternalPlugin('ZillowAPI.groovy',zip). In this case, the custom groovy script is called ZillowAPI.groovy and the field being passed to the script is the zip code, zip.

The script then uses the zip to construct a string and convert it to the URL required to make the API call:

def zip = args[0]
String url = "http://www.zillow.com/webservice/GetDemographics.htm?zws-id=<ZILLOW_API_KEY>&zip=" + zip;
URL api_url = new URL(url);

Once the transform script completes, the median_sale_price field is now accessible in BDD:

Updated data in BDD

Updated data in BDD

Now that the additional data is available, we can use it to build some visualizations to help answer the question posed earlier.

Median Sale Price by Chicago Community - Created using the Ranzal Data Visualization Portlet*

Median Sale Price by Chicago Community – Created using the Ranzal Data Visualization Portlet*

Percentage for Chuy by Community - Created using the Ranzal Data Visualization Portlet*

Percentage for Chuy by Community – Created using the Ranzal Data Visualization Portlet*

The two choropleths above show the median sale price by community and the percentage of votes for “Chuy” by community.  Communities in the northeastern sections of the city seem to have the highest concentration of median sale price, while communities in the western and southern sections tend to have lower prices.  For median sale price to be a strong indicator of how the communities voted, the map displaying votes for “Chuy” should show a similar pattern, with the communities grouped by northeast and southwest.  However, the pattern is noticeably different, with votes for “Chuy” distributed across all sections of the map.

Bar-Line chart of Median Sale Price and Percent for Chuy

Bar-Line chart of Median Sale Price and Percent for Chuy

Looking at the median sale price in conjunction with the percentage of votes for “Chuy” provides an even clearer picture.  The bars in the chart above represent the median sale price of homes, and are sorted in descending order from left to right.  The line graph represents the percentage of votes for “Chuy” in each community.  If there was a connection between median sale price and the percentage of votes for “Chuy”, we’d expect to see the line graph either increase or decrease as sale price decreases.  However, the percentage of votes varies widely from community to community, and doesn’t seem to follow an obvious pattern in relation to median sale price.  This corresponds with the observations from the two choropleths above.

While these findings don’t provide a definitive answer to the initial question as to whether community wealth was a factor in the election results, they do suggest that median sale price is not a good indicator of how Chicago communities voted in the election.  More importantly, this example illustrates how easy it is to utilize custom Java transformations in BDD to answer detailed questions and get more out of your original dataset.

If you would like to learn more about Oracle Big Data Discovery and how it can help your organization, please contact us at info [at] ranzal.com or share your questions and comments with us below.


* – The Ranzal Data Visualization Portlet is a custom portlet developed by Ranzal and is not available out of the box in BDD.  If you would like more information on the portlet and it’s capabilities, please contact us and stay tuned for a future blog post that will cover the portlet in more detail.

One thought on “Big Data Discovery – Custom Java Transformations Part 2

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s