Featured Post

Reference Books and material for Analytics

Website for practising R on Statistical conceptual Learning: https://statlearning.com  Reference Books & Materials: 1) Statis...

Friday, February 16, 2018

Baseline result for a Model(Category:Concept,Level:Intermediate)



There are common ways that you can use to calculate a baseline result.
A baseline result is the simplest possible prediction. For some problems, this may be a random result, and in others in may be the most common prediction.
  • Classification: If you have a classification problem, you can select the class that has the most observations and use that class as the result for all predictions. If the number of observations are equal for all classes in your training dataset, you can select a specific class or enumerate each class and see which gives the better result in your test harness.
  • Regression: If you are working on a regression problem, you can use a central tendency measure as the result for all predictions, such as the mean or the median.
  • Optimization: If you are working on an optimization problem, you can use a fixed number of random samples in the domain.
A baseline is a method that uses heuristics, simple summary statistics, randomness, or machine learning to create predictions for a dataset. You can use these predictions to measure the baseline's performance (e.g., accuracy)-- this metric will then become what you compare any other machine learning algorithm against.
In more detail:
A machine learning algorithm tries to learn a function that models the relationship between the input (feature) data and the target variable (or label). When you test it, you will typically measure performance in one way or another. For example, your algorithm may be 75% accurate. But what does this mean? You can infer this meaning by comparing with a baseline's performance.
Typical baselines include those supported by scikit-learn's "dummy" estimators:
Classification baselines:
·         “stratified”: generates predictions by respecting the training set’s class distribution.
·         “most_frequent”: always predicts the most frequent label in the training set.
·         “prior”: always predicts the class that maximizes the class prior.
·         “uniform”: generates predictions uniformly at random.
·         “constant”: always predicts a constant label that is provided by the user. This is useful for metrics that evaluate a non-majority class.


Regression baselines:
“median”: always predicts the median of the training set
“quantile”: always predicts a specified quantile of the training set,provided with the quantile parameter.

“constant”: always predicts a constant value that is provided by the user.

No comments:

Post a Comment