Featured Post

Reference Books and material for Analytics

Website for practising R on Statistical conceptual Learning: https://statlearning.com  Reference Books & Materials: 1) Statis...

Thursday, May 13, 2021

Machine Learning System Monitoring

 Monitoring Machine learning system is a cumbersome process that involves quite a lot of skills other than constant business feedback.



Broadly there are 3 kinds of monitoring that one need to focus on:

1) Data : Data Monitoring includes drift monitoring, monitoring data variance and feature monitoring 
2) Model : Monitoring of model accuracy over a period of time to make decisions for retraining and A/B testing, It  also include model health check
3) System : Monitoring system health parameters such as average response time, Capacity utilization, Number of service requests, down time etc.







Friday, May 7, 2021

Machine Learning Model Governance Process

Like Data Governance, predictive model also has its own governance process. There are multiple teams like but not limited to Core team, extended team, decision making/Steering committee & implementation team. This Governance process typically requires following steps.

1) Inputs : The generation of a request for a new or updated version of model

2) Model Need, Design and Direction:  Technical process to validate the requirement, scope and high level implementation

3) Model Build: Creates the model and develops implementation requirements (along with legal and regulatory considerations)

4) Model Approval: Multistep approval process (technical, business, risk, legal) to affirm and ascertain the model

5) Model Implementation: Data integrity, end to end testing and detailed implementation

6) Monitoring: This process is done for post implementation monitoring and understanding the data drift.

In addition to this, there is a  model review process at a regular frequency to decision on refreshing the model.




Saturday, January 2, 2021

Unify Data Warehousing and Advanced Analytics

 Most of the Data warehouses in today's world still deals with only structured data. Portion of it alos utilizes unstructured data from Data Lake or some landing layer before the warehouse. Data warehouse architecture as we know it today will wither in the coming years and be replaced by a new architectural pattern, the Lakehouse, which will (i) be based on open direct-access data formats, such as Apache Parquet, (ii) have firstclass support for machine learning and data science, and (iii) offer state-of-the-art performance. Lakehouses can help address several major challenges with data warehouses, including data staleness, reliability, total cost of ownership, data lock-in, and limited use-case support. The industry is already moving toward Lakehouses and how this shift may affect work in data management. We also report results from a Lakehouse system using Parquet that is competitive with popular cloud data warehouses on TPC-DS.


Please refer below architecture for the evolution of Data Warehouses. With the increased focus now on 

Data Science & Machine learning Lakehouse platform is the future.


Reference: This article has referred the CIDR paper. For more details please refer following link.

http://cidrdb.org/cidr2021/papers/cidr2021_paper17.pdf?utm_source=bambu&utm_medium=social&utm_campaign=advocacy&blaid=1066676