Monitoring Machine learning system is a cumbersome process that involves quite a lot of skills other than constant business feedback.
Broadly there are 3 kinds of monitoring that one need to focus on:
This blog is helpful to those who see their career & passion in data Analysis & data scientist work. Focus would be on concepts and eventually discuss examples in Excel, SAS , R and Python. Happy Learning DataOps :)
Website for practising R on Statistical conceptual Learning: https://statlearning.com Reference Books & Materials: 1) Statis...
Monitoring Machine learning system is a cumbersome process that involves quite a lot of skills other than constant business feedback.
Like Data Governance, predictive model also has its own governance process. There are multiple teams like but not limited to Core team, extended team, decision making/Steering committee & implementation team. This Governance process typically requires following steps.
1) Inputs : The generation of a request for a new or updated version of model
2) Model Need, Design and Direction: Technical process to validate the requirement, scope and high level implementation
3) Model Build: Creates the model and develops implementation requirements (along with legal and regulatory considerations)
4) Model Approval: Multistep approval process (technical, business, risk, legal) to affirm and ascertain the model
5) Model Implementation: Data integrity, end to end testing and detailed implementation
6) Monitoring: This process is done for post implementation monitoring and understanding the data drift.
In addition to this, there is a model review process at a regular frequency to decision on refreshing the model.
Most of the Data warehouses in today's world still deals with only structured data. Portion of it alos utilizes unstructured data from Data Lake or some landing layer before the warehouse. Data warehouse architecture as we know it today will wither in the coming years and be replaced by a new architectural pattern, the Lakehouse, which will (i) be based on open direct-access data formats, such as Apache Parquet, (ii) have firstclass support for machine learning and data science, and (iii) offer state-of-the-art performance. Lakehouses can help address several major challenges with data warehouses, including data staleness, reliability, total cost of ownership, data lock-in, and limited use-case support. The industry is already moving toward Lakehouses and how this shift may affect work in data management. We also report results from a Lakehouse system using Parquet that is competitive with popular cloud data warehouses on TPC-DS.
Please refer below architecture for the evolution of Data Warehouses. With the increased focus now on
Data Science & Machine learning Lakehouse platform is the future.
http://cidrdb.org/cidr2021/papers/cidr2021_paper17.pdf?utm_source=bambu&utm_medium=social&utm_campaign=advocacy&blaid=1066676