Case Studies - Big Data Analytics
A large media company required the capability to provide for online video stream tracking, web page view and user click tracking in order to measure content usage, advertisement plays and user engagement. This Company's data management hardware and software infrastructure was ineffective and not easily scalable to meet this need. In addition, the Company had recently merged with another media company causing a need to consolidate and re-architect its data management platform. This new platform required support of Unicode for international language support. This new architecture had to be scalable, reliable, and economical.
Several proprietary and non-proprietary frameworks were considered for a new "big data" platform. In order to handle the expected volumes and processing time constraints, the Knowledge Solutions (KS) Architect recommended a parallel processing, clustered framework approach utilizing Hadoop Technology. KS delivered a design that integrated over 100 daily data sources. Inexpensive, commercial off-the-shelf servers were installed and configured to support new Hadoop ETL processes. A successful proof-of-concept was initially deployed that tracked the Company's web activity for mainland China. Subsequently, releases were delivered that support the Company's USA and Global Properties. Initially, Hadoop was required due to aging infrastructure, but due to the business changes during re-architecture, it actually became imperative to successfully run the data for the changed business.
The "Big Data" Platform has allowed the following to be realized by the Company:
- CAPEX mitigation (up to 10x reduction compared to proprietary data platform solutions)
- Provided a vendor agnostic data platform, reduced the effort by months to switch proprietary DW database
- Reduced nightly batch windows by 8+ hours while experiencing 3x increase in data volumes
- Improved HA/Fault-tolerance capability of the system (recorded significant increase in fault-tolerance of hardware/network failures)
- Increased historical data tail (substantial online data retention increase) & depth (full fact level for years) for analytics
- Opened path for new analytical methods/capabilities
- Concomitantly increased data quality by performing code re-factoring (proper data cleansing now reasonable)
- Facilitated methods for creating/maintaining ultra-large dimensions (one is 2+ billion rows)