Naive Bayes
- A simple, probabilistic classifier that combines feature-specific probabilities to predict class labels.
- It assumes all features are independent (the “naive” assumption), which greatly simplifies computation.
- Works quickly on large datasets and can perform well when there are many features or when classes are highly imbalanced.
Definition
Section titled “Definition”Naive Bayes is a type of machine learning algorithm that is based on the principle of Bayes’ theorem. It is called “naive” because it makes the assumption that all features (variables) in a dataset are independent of each other. Despite this assumption, naive Bayes classifiers have shown to be effective in many applications.
Explanation
Section titled “Explanation”- Build a naive Bayes classifier by gathering a labeled dataset and extracting features from each instance.
- For each feature, calculate the probability of each class given that feature (for example, the probability that an email is spam given the presence of a particular word).
- Use Bayes’ theorem to combine these feature-level probabilities into an overall probability for each class given all observed features.
- For classification, either select the class with the highest combined probability or apply a threshold (e.g., 50%) to decide class membership.
- The independence assumption simplifies probability estimation and enables fast training and testing, even on large datasets. However, it can sometimes reduce accuracy compared with algorithms that model feature dependencies.
Examples
Section titled “Examples”Example 1: Spam Filtering
Section titled “Example 1: Spam Filtering”- Task: classify incoming emails as spam or not spam.
- Data: a dataset of emails labeled spam or not spam.
- Features (examples): the sender’s email address; the subject line of the email; the presence or absence of certain words in the body of the email (e.g. “free,” “discount,” “Viagra”).
- Method:
- Calculate probabilities such as the probability that an email is spam given the presence of the word “free” by counting spam emails that contain that word and dividing by the total number of spam emails.
- Use Bayes’ theorem to combine probabilities from the subject line, sender, and word-presence features to obtain the probability that an email is spam.
- Apply a threshold probability (e.g. 50%) to decide whether to classify the email as spam.
Example 2: Predicting the Weather
Section titled “Example 2: Predicting the Weather”- Task: predict the type of weather given temperature, humidity, and wind speed.
- Data: a dataset of weather observations with features:
- The temperature (in degrees Fahrenheit)
- The humidity (as a percentage)
- The wind speed (in miles per hour)
- The type of weather (e.g. sunny, cloudy, rainy)
- Method:
- Calculate the probability of each weather type given the observed features. For example, estimate the probability of it being rainy given that the temperature is 60 degrees Fahrenheit, the humidity is 80%, and the wind speed is 10 mph by counting matching instances and dividing appropriately.
- Use Bayes’ theorem to combine the probabilities from the three features and choose the weather type with the highest resulting probability.
Use cases
Section titled “Use cases”- Situations with a very large number of features.
- Datasets that are extremely imbalanced (e.g. one class is much more common than the other).
Notes or pitfalls
Section titled “Notes or pitfalls”- The independence assumption between features can sometimes lead to less accurate predictions than algorithms that model feature dependencies.
- Naive Bayes classifiers remain popular because of their simplicity, speed, and ability to be trained and tested quickly, even on large datasets.
Related terms
Section titled “Related terms”- Bayes’ theorem
- Classifier
- Features (feature independence)