Feature Selection - Yousef's Notes
Feature Selection

Feature Selection

#Why you can’t use all features

Not all features will be equally relevant for a given problem. Features present in few examples:

  • Increase dimensionality: number of columns (features)/number of rows (examples)
  • Increase sparsity: a lot of zeros in each example
  • Increase computational cost: storage, computing time, dedicated technologies.
  • Overfitting risk: fitting noise instead of true patterns
  • Reduce model interpretability: harder to analyze and understand.
  • Distance metric degradation: in high dimensions, data points tend to be equidistant, reducing the effectiveness of distance-based algorithms (e.g. kNN, clustering)

#Main Techniques

  • Cutting the Long Tail
    • Cut the features in the tail of the distribution of examples over features.
  • Boruta
  • L1-Regularization (Lasso Regression)
  • Task-specific feature selection
    • Remove stop words
    • Replace uncommon words with a single label, such as RARE_WORDS