reading-notes

This Repo required for Asac labs class 2


Project maintained by ManarAbdelkarim Hosted on GitHub Pages — Theme by mattgraham

# Data Science Primer

There are 5 core steps:

1- Exploratory Analysis

First, “get to know” the data. This step should be quick, efficient, and decisive.

2- Data Cleaning

Then, clean your data to avoid many common pitfalls. Better data beats fancier algorithms.

a- Remove Unwanted observations
b- Fix Structural Errors
c- Filter Unwanted Outliers
d- Handle Missing Data
    1- Dropping observations that have missing values
    2- Imputing the missing values based on other observations
  how to deal with missing data?
        a- Missing categorical data: 
                -  Missing categorical data
                  The best way to handle missing data for 
                  this categorical features is to simply label them as ’Missing’!
        b- Missing numeric data
                - For missing numeric data, you should flag and fill the values.

3- Feature Engineering

Next, help your algorithms “focus” on what’s important by creating new features.

a- Infuse Domain Knowledge
b- Create Interaction Features
c- Combine Sparse Classes
d- Add Dummy Variables
e- Remove Unused Features

4- Algorithm Selection

Choose the best, most appropriate algorithms without wasting your time.

    - There are 3 common types of regularized linear regression algorithms: 
            a- Lasso Regression
            Lasso, or LASSO, stands for Least Absolute Shrinkage and Selection Operator.
                * Lasso regression penalizes the absolute size of coefficients
            b- Ridge Regression
                * Ridge regression penalizes the squared size of coefficients.
            c- Elastic-Net
            Elastic-Net is a compromise between Lasso and Ridge.
     - base models of decision trees: 
            a- Random forests
            b- Boosted trees

5- Model Training

    a- Split Dataset
    b- Fit and Tune Models
    c- Select Winning Model

Finally, train your models. This step is pretty formulaic once you’ve done the first

## Key Terminology :

Machine Learning Tasks:

A task is a specific objective for your algorithms.