Features and Labels
In simple words, a feature is an input, while a label is an output. For instance, you want to conduct research on a dataset containing population statistics. Likely, every person in that dataset is an observation, and the characteristics of those observations are known as features. In this case, features would be gender, age, location, etc. If you want to extract information about a certain group of people in this dataset, you would input the desired features into the system to get back an output.
TipA feature is a variable that can be changed independently.
Words within the email’s text, the sender’s address, the time the email was sent, and a select few words within the email itself are some of the features that could be included in an email spam detector application.
A label is an element we’re anticipating – the y variable in a graph. The label can be the future price of oats, the kind of plant shown in a photo, the summary of an audio clip, or whatever can symbolize or define a piece of content. In other words, it can also be called “tagging.” And these tags are made and imposed upon the element in question based on the prior knowledge of that element prestored in the system.
Simple examples of labeled data are
- A picture of a fruit, with an associated label “apple” or “orange”
- A text description for a product review and the score assigned to that product by a user
NoteThe label is the thing we are predicting – the final output of the prediction. In machine learning applications, we can use both labeled and unlabeled data.
Machine Learning Algorithm
A methodology or a framework in accordance with which an AI system conducts itself and its tasks or simply a set of instructions that give directions to the entire system on how to learn from data and improve over time without depending upon any human support. These algorithms are agile, precise, probabilistic, and technically adequate enough to guide a whole system on how to scan out hidden patterns in entangled, noisy, and complex datasets.
The essentials of machine learning can be found in all the following algorithms:
- Linear regression
- Logistic regression
- CART (Classification and Regression Trees)
- Naïve Bayes
- KNN (K-Nearest Neighbors)
- Apriori
- K-means
- PCA (Principal Component Analysis)
- Bagging with random forests
- Boosting with AdaBoost
NoteAll of the preceding algorithms can be broken down into smaller groups, such as supervised learning algorithms, unsupervised learning algorithms, and ensembling methods. However, the book doesn’t go into the specifics of how these algorithms work. To pass the AI-900 exam, you don’t need to know a lot about these algorithms.