The customer is Asia’s largest global sports media company with a broadcast spectrum of over 2.6 billion potential viewers across 140+ countries globally. The customer, one of the leading sports media companies, is looking for an end-to-end data lake solution as part of their digital transformation program. Blazeclan proposed the customer with an Azure-based, cost-effective, scalable, and reliable data lake solution.

The Need for a Data Lake Solution

The customer is looking for an end-to-end data lake solution that includes data visualization on cloud platform. The customer is facing challenges in decision making, owing to the unavailability of a central data store. Also, there have been challenges in controlling the business data arising from various social media platforms, based on the paid and organic metrics from the different sources, such as Facebook, Instagram, Twitter and YouTube.

Key Objectives of the customer are

  • Data extraction from the 4 social media platforms with respective metrics available currently:
    • Facebook – Facebook Insights and Facebook Ads
    • Instagram
    • Twitter – Twitter Ads and Twitter Analytics
    • YouTube – Google Ads and YouTube Analytics
  • Data ingestion from the sources to the data lake storage
  • Data profiling and standardization, given that necessary rules are passed on to the technical teams
  • [OLAP] Data modeling for analytical services i.e. reporting and dashboarding
  • Data reporting and dashboarding for given sources (social media platforms) against the agreed metrics

Blazeclan’s Azure-based Data Lake Solution

In a bid to deliver the customer with a cost effective, reliable, and scalable Data Lake solution, Blazeclan proposed an Azure-based solution, which includes data ingestion and load, data warehousing, and reporting using various Azure services such as Azure Data Factory (or Azure Functions), Azure Data Lake Services, SQL Data Warehouse and Power BI.

  • Data Extraction: The data arising from the social media platforms will be extracted on the basis of key metrics available, i.e. organic and paid. Azure Data Factory service is the perpetrator of pulling the data from each source to move ahead and store it in the Azure Data Lake Storage.
  • Data Ingestion: The data extracted from the sources will be stored in the structure of file and folder, with the help of the Azure Data Lake Storage. This structuring is based on the extracts from each source as well as the metrics available on the basis of the respective sources.
  • Data Warehousing: A data model will be built for online analytics processing or reporting. This data model prepared will then be implemented on the Azure SQL Data Warehouse. The data transition from the Azure Data Lake Storage to the SQL Data Warehouse will be carried out with the help of Azure Data Factory service.
  • Data Visualization: On the basis of the requirements put forth by the customer, with regard to the reporting and data available on the data warehouse, reports will be developed with respect to every requirement with the help of Power BI service.

The solution will enable the customer achieve a well documented data extraction, ingestion and data warehousing. Also, this will render the customer’s data architecture to be future-proof to address any upcoming requirements with ease and convenience. This robust yet configurable data analytics model will cater to any change in the customer’s data structure. In addition, the integrated data warehouse and reporting will help the customer in ad-hoc reporting as well as on-the-fly reporting requirements.

Key Benefits:

  • High Scalability: The data lake solution provided the customer with the ability to scale their infrastructure according to the changing business demands. The solution further enabled a new auto-scaling feature with minimum or zero downtime.
  • Automation: Automated data collection, storage and management, and integration into the deployment pipeline enabled the customer’s application teams scale their deployment speed. This further helped the customer in reducing the time & resources required in their analytics processes, compared to manual efforts. Automation and data ingestion enables the customer to have a consistently updated view of business performance and customer behaviour.
  • Cost Optimization: The daily incremental data processing takes approximately 15 minutes in the data lake implemented by Blazeclan team. Hence, the compute resources are not in the running mode for 24-hours a day which reduces the overall cost for the company. Moreover, irrespective of the increasing amount of data ingested and processed in the data lake, with the optimized architectural design, savings were considerable.
  • Analytics Driven Business decisions: Consolidation of all data sources in a single platform supported Data Scientists/Analysts/Business Users in making efficient decisions. Furthermore, it helped market leaders to develop effective marketing strategies and make wise business decisions.

Tech Stack

Azure Data Factory Azure Data Lake Storage Gen 2 Azure SQL Data Warehouse
Azure Functions Power BI Azure Key Vault
Azure Log Analytics Azure Active Directory Application Insights

Service Tags: , , , , ,