Perceptron and its implementation in Python
This blog covers the following topics:
1. What is perceptron?
2. The Algorithm.
3. Implementation.
4. Limitations.
What is perceptron?
The idea of a Perceptron is analogous to the operating principle of the basic processing unit of the brain — Neuron. A Neuron is comprised of many input signals carried by Dendrites, the cell body and one output signal carried along Axon. The Neuron fires an action signal when the cell meets a particular threshold. This action either happen or they don’t; there is no such thing as a “partial” firing of a neuron.
Similarly, the perceptron has many inputs(often called features) that are fed into a Linear unit that produces one binary output. Therefore, perceptrons can be applied in solving Binary Classification problems where the sample is to be identified as belonging to one of the predefined two classes.
The Algorithm
Since Perceptrons are Binary Classifiers (0/1), we can define their computation as follows:
Let’s recall that the dot product of two vectors of length n is given by:
w . x = ∑ᵢ wᵢ . xᵢ
The function f(x)=b+w.x is a linear combination of weight and feature vectors. Perceptron is, therefore, a linear classifier — an algorithm that predicts using a linear predictor function.
The weights signify the effectiveness of each feature xᵢ in x on the model’s behavior. Higher the weight wᵢ of a feature xᵢ, higher is it’s influence on the output. On the other hand, the bias ‘b’ is like the intercept in the linear equation. It’s a constant that helps the model adjust in a way that best fits the data. The bias term assumes an imaginary input feature coefficient x₀=1.
The model can be trained using the following algorithm:
1. set b = w = 02. for N iterations, or until weights do not change(a) for each training example xᵏ with label yᵏi. if yᵏ — f(xᵏ) = 0, continueii. else, update wᵢ, △wᵢ = (yᵏ — f(xᵏ)) xᵢ
Implementation
The dataset that we consider for implementing Perceptron is the Iris flower dataset. This dataset contains 4 features that describe the flower and classify them as belonging to one of the 3 classes. We strip the last 50 rows of the dataset that belongs to the class ‘Iris-virginica’ and use only 2 classes ‘Iris-setosa’ and ‘Iris-versicolor’ because these classes are linearly separable and the algorithm converges to a local minimum by eventually finding the optimal weights.
import numpy as npimport pandas as pdimport matplotlib.pyplot as pltdef load_data():URL_='https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'data = pd.read_csv(URL_, header = None)print(data)# make the dataset linearly separabledata = data[:100]data[4] = np.where(data.iloc[:, -1]=='Iris-setosa', 0, 1)data = np.asmatrix(data, dtype = 'float64')return datadata = load_data()
Visualizing the dataset with 2 of the features, we can see that that dataset can be clearly separated by drawing a straight line between them. Our goal is to write an algorithm that finds that line and classifies all of these data points correctly.
plt.scatter(np.array(data[:50,0]), np.array(data[:50,2]), marker='o', label='setosa')plt.scatter(np.array(data[50:,0]), np.array(data[50:,2]), marker='x', label='versicolor')plt.xlabel('petal length')plt.ylabel('sepal length')plt.legend()plt.show()
Now we implement the algorithm mentioned above as it is and see how it works. We have 4 features and hence 4 weights associated with each feature. Remember that we defined a bias term w₀ that assumes x₀=1 making it a total of 5 weights.
We have defined the number of iterations to be 10. This is one of the hyperparameters, as opposed to system parameters like w that are learned by the algorithm. At each iteration, the algorithm computes the class (0 or 1) for all the data points and updates the weights with each misclassification.
If the sample is misclassified, then the weights are updated by delta that shifts in the opposite direction. So if the sample is to be classified again, the result is “less wrong”. We classify any label≤0 as ‘0’ (Iris-setosa) anything else to be a ‘1’ (Iris-versicolor).
def perceptron(data, num_iter):features = data[:, :-1]labels = data[:, -1]# set weights to zerow = np.zeros(shape=(1, features.shape[1]+1))misclassified_ = []for epoch in range(num_iter):misclassified = 0for x, label in zip(features, labels):x = np.insert(x,0,1)y = np.dot(w, x.transpose())target = 1.0 if (y > 0) else 0.0delta = (label.item(0,0) - target)if(delta): # misclassifiedmisclassified += 1w += (delta * x)misclassified_.append(misclassified)return (w, misclassified_)num_iter = 10w, misclassified_ = perceptron(data, num_iter)
Now, let’s plot the number of misclassified samples in each iteration. We can see that the algorithm converges in the 4th iteration. i.e., all the samples are classified correctly at the 4th pass through the data.
A property of the Perceptron is that if the dataset is linearly separable, then the algorithm is guaranteed to converge at some point!
epochs = np.arange(1, num_iter+1)plt.plot(epochs, misclassified_)plt.xlabel('iterations')plt.ylabel('misclassified')plt.show()
Limitations
1. A single-layer perceptron works only if the dataset is linearly separable.
2. The algorithm is used only for Binary Classification problems. However, we can extend the algorithm to solve a multiclass classification problem by introducing one perceptron per class. i.e., each perceptron results in a 0 or 1 signifying whether or not the sample belongs to that class.