Introduction to Machine Learning

CONTENTS

List of topics that we covered in this article:

  • What is machine learning?
  • Why machine learning?
  • Who use machine learning?
  • Designing a learning system.
  • Supervised learning.
  • Unsupervised learning.
  • Top machine learning algorithms
  • Linear regression.
  • Naïve Bayes classifier.
  • K-means clustering.
  • Super vector machine algorithms.
  • Artificial neural networks.
  • Decision trees.

 

In this article, we learn about only Linear regression algorithm because to predict the future it is the best algorithm we learn how the algorithm works and how it predicts. 

 

WHY MACHINE LEARNING?

Now a days Machine learning is a new era in technology and every change that we are seeing in the technology because of machine learning using machine learning many wonders are happening in the world. But most of us don’t know what is machine learning? how machine learning? why machine learning? how we are using machine learning in real time?

What is machine learning?

A machine must have ability to learn without being explicit programming this can be done only when we give more training to the machine and it is going to train based on the past data.

Why machine learning?

Basically,machine learning is a sub branch of artificial intelligence, once if we think deep why machine learning is and where we are using now a day consider self -driving cars, google assistants and the Netflix is using machine learning surprisingly, we came to know that 40% of income is increased, and we are seeing how this machine learning is implemented.

Who Use machine learning?

Data scientists use to analyse data and main goal is to achieve hidden pattern in the data using machine learning algorithms.

 

Now, we have to know how learning system will be designed.

We will see step by step how learning system is designed

Designing a Learning System

  • Choosing the training Experience
  • Choosing target function.
  • Choosing a representation for the target function.
  • Choosing a function approximation algorithm.
  • The final design.

We can design the learning system using the above steps

 

The two main machine learning methods are supervised and unsupervised learning, mostly 90% we use supervised learning

and some cases we use unsupervised learning.

 

Supervised learning:

Supervised learning means both input and output is clearly given and they are going to trained using labelled examples.

  • Hand written digits and character recognition.

 

Unsupervised learning:

In Real word, unlabelled data is taken.

So unsupervised data plays a vital role in real world.

Examples: google news, social network analysis, astronomical data analysis etc.

Top machine learning algorithms:

  • Linear Regression
  • Naïve Bayes classifier
  • K-means clustering
  • Super vector Machine algorithm
  • Artificial neural networks
  • Decision trees

we use these different machine learning algorithms to solve different problems.

Now a day’s machine learning is becoming a trend in real world and another thing is most of the jobs are going to be replaced by machine learning jobs in the future they use tools like ML studio, python, MATLAB and R, these algorithms are automated which reduces the human work and they always depend and learn from the more data, without data we can’t perform any type of algorithm. The main goal of machine learning algorithm is to predict the future.

But learning all these algorithms is important and another thing is knowing which algorithm suits for problem when we don’t know how to implement particular algorithm we will not get the correct outcome. So choosing an algorithm is also an important factor when solve the problem. Let lookinto an example how algorithms are used for particular problem.

Suppose we want to predict the future G.D.P growth based on the past data then choosing a algorithm is an important factor to implement then we have to observe deeply which algorithm suits for the above problem.

After observing all algorithms Linear Regression algorithm is the perfect algorithm to solve the problem. Now we will see what linear regression is and how this algorithm works.

 

Linear regression is very simple algorithm and used in many cases.

Consider now we are having a past data set:

 

S.NO

YEAR

POPULATION

G.D.P

1.

    2013

            4

    4.5

2.

    2014

            5

    4.8

3.

    2015

            8

    6.8

4

    2016

            7

    7.0

5.

    2017

            6

    6.5

6.

    2018

            9

     ?

 

 

=30

=29.6

 

 

 

 

 

 

 

 

 

By using above data set, we can predict what will be the G.D.P in the year 2018?

Linear regression is the best example that to predict the above problem firstly we haveto know what is “Regression” it means we need to find the relationship between dependent variable and independent variable.

Before going to start the problem, we need to know the basic concepts:

What is co-relation?

  • It tells about the relationship between the two variables values are -1 to +1

What is variance?

  • It measures the data.

Linear Regression:

  • y =a+bx

where:

‘y’ is a dependent variable & ‘x’ is independent variable.

‘a’ and ‘b’ are constants.

To find ‘a’ and ‘b’ there are two formulas which we need to know well before going to start the problem.

By substituting a and b values in linear regression equation we get ‘y’

                              y = a+bx

                              y = 5.47+0.07x

Finally, we got an equation to get an ‘y’ value we must substitute ‘x’ value as population value in 2018.

                              Here x=9(population in 2018)

                              y = 5.47 + 0.07(9)

                              y = 5.47 + 0.63

                              y = 6.1

After calculating and substituting the values we get finally ‘y’ value

 

Finally, we got a G.D.P in 2018 is 6.1

By using this machine learning algorithm, we can easily predict the future, this is basically done by hands, but we want to do programming that machine should understand automatically.

Except one point all points lie under the regression line, So we have to find the standard error of the estimate by using simple formulae.

 

                                     

Standard error of the estimate is calculated using formulae:

Here, we have calculated the error rate is 1.067

Finally, we have predicted the G.D.P in the year 2018 using linear regression algorithm.

 

 

Contributor's Info

Created:
0Comment
Classification in Machine Learning | Supervised learning algorithm | Unsupervised learning algorithm | Semi-supervised learning algorithm | Reinforced learning algorithm

What is Machine learning?

It is the thing which makes the machine to learn the things on its own by the experiences. Machine learning uses the training examples to learn things. It is the derived branch of Artificial intelligence. The machine is done by using some of the learning algorithms. These analytical algorithms are suited for a large number of datasets. So we can say as machine learning algorithms are not scalable to big data analytics. Some of the prominent algorithms are.

  • Supervised learning algorithm
  • Unsupervised learning algorithm
  • Semi-supervised learning algorithm
  • Reinforced learning algorithm

Supervised learning algorithm:

In this type of learning algorithm, the machine is trained with the labeled data. It can accurately do predict the output for the given test data. Supervised learning consists of two parts.

  • Classification
  • Regression

Unsupervised learning algorithm:

In this type, the machine is trained with the unlabeled data. This cannot predict the output precisely but It will generate a lot of inferences from the given training data set.

Semi-supervised learning algorithm:

This is typically the combination of both supervised and unsupervised learning algorithms. The accuracy will be more when compared with the supervised and unsupervised learning. Most probably it contains the little amount of labeled data and the huge amount of unlabeled data.

Reinforced learning algorithm:

Trial and error is the most Important method in the reinforcement learning algorithm. The two main components present in this type of algorithm is the agent and the environment. Always there will be communication between these two. Every time the environment sends the reward feedback to the agent so the agent will decide the best move or the action. That feedback is also termed as a reinforcement signal.

 

What is classification?

It is one of the core parts of the machine learning. It will categorize the upcoming item or new value to which sub-populations it belongs to among the different categories or sub-population. It is done based upon the training data which consists of records whose membership values are previously known. The value which we going to predict is the discrete value either 0 or 1.

For example, if we want to classify whether the person will be given credit card or not, we have to train the machine with the data set contains the average balance in the account, Number of transaction doing per month, profession, CIBIL  score(based on the previous things this will be calculated),  etc.,.  if the CIBIL score is more than 750 then the person can be approved with the credit card. Else we cannot.

If we want to check whether a person is having the eligibility to have a credit card or not we have to map his/her average balance with the number of transactions and do calculate the CIBIL score. It lies() in this region he is unable to get the credit card. If it lies() in this region he can get the credit card.

Applications of classification:

  • Finding the spam emails.
  • In cancer diagnosis.
  • Self-driving cars.
  • Identifying blood groups.

Types of classification techniques:

  • Rule-based classifier
  • Decision trees
  • Naïve Bayes classifier
  • Support vector machines
  • Artificial neural networks
  • Nearest neighbor(KNN)

Decision tree classification:

It builds a decision tree which consists of at least two nodes. It can handle both numerical and categorical data. The topmost decision node is called as the root node. It will make decisions from the class labeled dataset. The decision tree can be built using many algorithms. Some of them are,

  • ID3.
  • CART.
  • C4.5.
  • CHAID.
  • HUNT’S.

ID3 is the base algorithm for everyone. CART (classification and regression tree) is the derived algorithm from the ID3.

CHAID (chi-square automatic interaction detector) algorithm is used for the classification of categorical data. It is used for the searching of patterns from a large amount of categorical data. By using this relationship among the data can be easily visualized.

C4.5 is the extension of ID3. It is also called the statistical classifier.

 

Example for the decision tree using hunt’s algorithm,

 

What is Hunt’s algorithm?

Hunt’s algorithm is one among the decision tree building algorithms. The features of the Hunt’s algorithm are as follows:

  1. If the decision tree belongs to the same class label, then the leaf node is labeled with the class name.
  2. If the decision tree belongs to more than one class label, then use an attribute and split the data into smaller subsets.

Example:

PROFESSION

AVERAGE BALANCE

CIBIL SCORE

APPROVED

Doctor

Low

Sufficient

No

Doctor

High

Sufficient

No

Doctor

High

Insufficient

No

Software

Low

Sufficient

No

Software

Low

Insufficient

No

Software

High

Insufficient

Yes

Software

High

Sufficient

Yes

Business

Low

Sufficient

Yes

Business

High

Insufficient

Yes

Business

High

Sufficient

Yes

 

Given a dataset like that we will perform the decision tree classification using any of the algorithms. Here I'm using HUNT'S algorithm.

For finding the decision tree we have to calculate entropy and information gain of each and every attribute.

Entropy: entropy is the amount of the impurities in the given data set, like redundancy, replication etc.

A picture containing text, whiteboardDescription generated with very high confidence

Information gain:  it is the result of the difference between the entropy of the parent and the entropy the present working attribute.

A picture containing objectDescription generated with high confidence

A close up of a logoDescription generated with high confidence

 

A close up of text on a white backgroundDescription generated with very high confidence

A close up of text on a white backgroundDescription generated with very high confidence

A close up of text on a white backgroundDescription generated with very high confidence

 

A close up of text on a white backgroundDescription generated with very high confidence

A screenshot of a cell phoneDescription generated with very high confidence

PROFESSION

AVERAGE BALANCE

CIBIL SCORE

APPROVED

Software

Low

Sufficient

No

Software

Low

Insufficient

No

Software

High

Insufficient

Yes

Software

High

Sufficient

Yes

 

Sub-table:

A close up of text on a white backgroundDescription generated with very high confidence

A close up of text on a white backgroundDescription generated with very high confidence

A close up of a mapDescription generated with very high confidence

 

Rule-based classifier:

in this classification is done using the IF-THEN rules, it is the simplest form of classification as we can directly retrieve the results from the dataset or decision trees or neural network etc.

the main things in this classification are,

  • Antecedent: IF part is called the antecedent.
  • Consequent: THEN part is called the consequent.

Example:

PROFESSION

AVERAGE BALANCE

CIBIL SCORE

APPROVED

Doctor

Low

Sufficient

No

Doctor

High

Sufficient

No

Doctor

High

Insufficient

No

Software

Low

Sufficient

No

Software

Low

Insufficient

No

Software

High

Insufficient

Yes

Software

High

Sufficient

Yes

Business

Low

Sufficient

Yes

Business

High

Insufficient

Yes

Business

High

Sufficient

Yes

 

R1) IF profession= SOFTWARE, average balance= HIGH, THEN credit card can be approved.

R2) IF profession=DOCTOR, CIBIL score= SUFFICIENT, THEN credit card cannot be approved.

There we are using two attributes to find the conclusion. We may use any number of attributes to make a rule.

Characteristics of rule-based classifiers:

  1. Mutually exclusive.
  2. Mutually exhaustive.

Mutually exclusive:

This will say that the rules which are derived from the dataset will differ from one another but not the same. There should be the common attributes as a pair in any of the two rules.

                                                     R1 ∩ R2=∅

 

Mutually exhaustive:

it is quite opposite to the previous one. In this, we have to derive the combination of attributes from the given data set. This will accept the redundancy.

Example: we can make the rules as follows,

  1. Profession & average balance
  2. Cibil score & average balance
  3. Profession & CIBIL score
  4. Profession & average balance & CIBILscore

Now we have the number of rules at hand. If a statement triggers more than one rule at a time. Then conflicts will arise. So, we need conflict resolution strategies for rule-based classifiers.

Conflict resolution strategies:

  • Size-ordering scheme.
  • Rule-ordering scheme.

Size-ordering scheme:

If the statement triggers both the rules at once, this size-ordering scheme will decide which rule should be activated based upon the number of attributes used for making the rule.

The more matching attributes we use the rule will be triggered first and vice versa.

Example:

R1 consists of 2 attributes.

R2 consists of 4 attributes.

R3 consists of 7 attributes.

The coming statement will trigger R3 first and then R2 if necessary then R3.

Rule-ordering scheme:

It again consists of two sub-categories based on them it will decide which rule should be triggered.

  • Class-based
  • Rule-based

 

Class-based:

In this type, the coming statement is passed into the different number of classes. And the matching class rule will be triggered first.

Example:

If class1 is from 20 to 30(c1>=20 & <30)

If class2 is from 30 to 50(c1>=30 & <50)

If class3 is from 50 (c1>=50)

If the statement consists of value 33, class2 will be triggered.

Rule-based:

In this type, the coming statement is matched with the rules based on the [priority of the rule given.

Example:

Rule1 → 2

Rule2 → 3

Rule3 → 1

Rule4 → 4

 If the coming statements contain the attributes which are present in both rule3 and rule1 then based upon the priority rule3 will be triggered because of high priority.

Approaches for rule-based classifier:

  • Direct method
  • Indirect method

Direct method:

In this method, we use directly the data sets given to infer the rules. Based upon the attributes in the data set we will make the rules with the combinations among them.

 

Sequential covering algorithm:

This is the algorithm used in the direct method to infer the rules.

 

//Initially make the rule set empty

Rule-set= {  }

//check for every class C pass it through the LEARN-ONE-RULE algorithm.

For each class C do

                        Rule= LEARN-ONE-RULE (dataset, attributes, class)

                        Remove attributes covered by previous rules from dataset

                        // if a1, a2 are used in the above rule, in next iteration avoid using a1 and a2

                        Rule-set= Rule-set+ Rule ();

end for ()

return Rule-set

 

what inside LEARN-ONE-RULE?

  1. Consider one class.
  2. Pass it through the training data and check with every rule.
  3. Find the attribute that increases the accuracy of the current rule-set.
  4. Append the attribute to the current rule-set.

Indirect method:

In this method, we will make use of different things to infer the rules. As like,

  • Neural networks.
  • Decision trees
  • Perceptron models.

Consider decision trees,

 

A close up of a logoDescription generated with very high confidence

 

If the statement passes through rule1 and rule4 from the above-mentioned diagram,

            IF (R1 & R4)

                        Then true.

If the statement passes through rule1 and rule3 and rule7 from the above-mentioned diagram,

            IF (R1 & R3 & R7)

                        Then true.

Like this indirect method proceeds.

Rule pruning:

The central idea of this pruning method is to cut off the unnecessary rules from the necessary rule set which are contributing for the classifier to classify the things.

Example:

If the classifier is made using

            R1, R2, R4, R6, R9 only.

But we have R1, R2, R3, R4, R5, R6, R7, R8, R9. we have to remove the remaining rules and keep the contributing rules only

So, final rule-set will be R1, R2, R4, R6, R9.

It is mainly used by the c4.5 algorithm as it is a class-based thing based on which it will generate the decision tree.

Quality measures of rule-based classifiers:

  1. Coverage:

it includes the ratio of the number of tuples used for the classification to the total number of tuples.

 

                                  

Example:

If we use 2 attributes out of 10 attributes, then

 

2/10 = 20% will become the coverage ratio.

 

  1. Accuracy

it will determine the accuracy of classification.

 

                       

Example:

If we classified 4 correct out of 4 covers, then

4/4 = 100% accuracy.

 

Naïve-Bayes classifier:

It is the classification technique used when we are given with the data set and are asked for an unfamiliar condition which is not present in the given data set. We will find the class label based on the Bayes theorem.

                                        A close up of text on a white backgroundDescription generated with very high confidence

P(A/B) = probability of A after happening B, B must be true.

P(B/A) = likelihood term.

P(A) = probability of happening A.

P(B) = probability of happening B.

Example:

PROFESSION

AVERAGE BALANCE

CIBIL SCORE

APPROVED

Doctor

Low

Sufficient

No

Doctor

High

Sufficient

No

Doctor

High

Insufficient

No

Software

Low

Sufficient

No

Software

Low

Insufficient

No

Software

High

Insufficient

Yes

Software

High

Sufficient

Yes

Business

Low

Sufficient

Yes

Business

High

Insufficient

Yes

Business

High

Sufficient

Yes

 

If we are asked with a new condition to find the class label,

Software

High

Sufficient

?????

 

A close up of a mapDescription generated with high confidence

A close up of text on a white backgroundDescription generated with high confidence

 

So, class label is

Software

High

Sufficient

yes

 

Support vector machines:

It will classify the new or upcoming item into their respective classes. It is generally a discriminative classifier. It returns the hyperplane as the output based on that it will categorize the items. The hyperplane is surrounded by another two support vectors.

The distance of support vectors must be maximized from the hyperplane.

A close up of a mapDescription generated with high confidence

SVM classifies the things into respective classes.

The distance between the hyperplane and the support vectors is called a margin.

 

The marking of the hyperplane is done by using several functions, inbuilt libraries, quadratic equations etc.

Above diagram shows the points in a 2D manner. If the points are like below, then we need to convert into a 3D plot. And we should map with the 2D plot.

We have to convert into y z plane so that the point will be divided into different classes. This will be done by the SKLEARN library.

After converting

A close up of a logoDescription generated with very high confidence

How to know where the hyperplane can be located?

Example: 

C:\Users\KRISHN~1\AppData\Local\Temp\Rar$DRa7760.3725\Scanned_1531329403927.jpg

 

C:\Users\KRISHN~1\AppData\Local\Temp\Rar$DRa7760.3725\Scanned_1531329547243.jpg

C:\Users\KRISHN~1\AppData\Local\Temp\Rar$DRa7760.3725\Scanned_1531329583726.jpg

 

Artificial neural network:

It is the simulation of the biological neurons. As human brains have the capabilities of processing information, making instantaneous decisions under some critical situations. Artificial neural networks simulate the human brains whereas natural neurons are replaced with the artificial neurons.

The artificial neural network will only take numerical data but not categorical data. But in the case of the decision tree, it will accept both numerical and categorical data.

The basic structure of an artificial neural network:

 

 

Input layer:

In this layer, it will accept the given input values from the user and passes to the hidden layer.

Hidden layer:

It will compute the real-valued integer output. It computes based on the weighted inputs.

Output layer:

The output from the hidden layer is the input to the output layer. This will compute the output of the neural network.

The weighted edges between the nodes are also called as synapsis (→).

The synapsis represents the knowledge gained by the neuron.

 

There are two types of networks in the neural networks,

  • Feedforward network.
  • Feed backward network.

Feedforward neural network:

This type of network proceeds from left to right and no feedback is given to the input layer. It is again divided into two types.

  • Fully connected neural networks.In this, every neuron is connected to every other neuron in the network.
  • Partially connected neural networks.

In this, some neurons are connected to the neurons in the other layers.

Feed backward neural network:

This type of network provides the feedback to the input layer so that the weights can be adjusted according to it and the error will be rectified.

Backpropagation neural network:

In this neural network, it assumes that every neuron is divided into two parts.

  • ∑ → It represents the summation if the weights of the neural network.

∑= x1 w1 + x2w2+ x3w3…..

Whereas,

X= input given to the input layer.

W= net weight (edge weight) from one neuron to another neuron in the different layer.

  •        →  It represents the activation symbol of the network. Based on that it classifies the behavior of the neural network.

There as several activation networks,

What is the activation function?

Based on that it defines the output of the nodes in the output layer.

  1. Sigmoidal function
  2. Step function
  3. Signum function.
  4. Linear function.

Sigmoidal function:

The graph of the sigmoidal function is

 

https://lh4.googleusercontent.com/1JYyYSn-pk8ekF5FcRHAq9cCWtq7s6ybBKZQFaAfcVQAFSSTSBlzdSzpOfXTBZIo8tYfkGF0awJfQxLIlEk7wJCagrZ4iiST24dPhvQIoh10bdoTour5j2MHtLMww-zh6FUjtzZglYxdpx3uSg

Step function:

 

https://lh4.googleusercontent.com/1JYyYSn-pk8ekF5FcRHAq9cCWtq7s6ybBKZQFaAfcVQAFSSTSBlzdSzpOfXTBZIo8tYfkGF0awJfQxLIlEk7wJCagrZ4iiST24dPhvQIoh10bdoTour5j2MHtLMww-zh6FUjtzZglYxdpx3uSg

The graph of the step function is

 

https://lh4.googleusercontent.com/1JYyYSn-pk8ekF5FcRHAq9cCWtq7s6ybBKZQFaAfcVQAFSSTSBlzdSzpOfXTBZIo8tYfkGF0awJfQxLIlEk7wJCagrZ4iiST24dPhvQIoh10bdoTour5j2MHtLMww-zh6FUjtzZglYxdpx3uSg

 

Signum function:

 

https://lh4.googleusercontent.com/1JYyYSn-pk8ekF5FcRHAq9cCWtq7s6ybBKZQFaAfcVQAFSSTSBlzdSzpOfXTBZIo8tYfkGF0awJfQxLIlEk7wJCagrZ4iiST24dPhvQIoh10bdoTour5j2MHtLMww-zh6FUjtzZglYxdpx3uSg

The graph of the linear function is

 

https://lh4.googleusercontent.com/1JYyYSn-pk8ekF5FcRHAq9cCWtq7s6ybBKZQFaAfcVQAFSSTSBlzdSzpOfXTBZIo8tYfkGF0awJfQxLIlEk7wJCagrZ4iiST24dPhvQIoh10bdoTour5j2MHtLMww-zh6FUjtzZglYxdpx3uSg

 

Linear function:

https://lh4.googleusercontent.com/1JYyYSn-pk8ekF5FcRHAq9cCWtq7s6ybBKZQFaAfcVQAFSSTSBlzdSzpOfXTBZIo8tYfkGF0awJfQxLIlEk7wJCagrZ4iiST24dPhvQIoh10bdoTour5j2MHtLMww-zh6FUjtzZglYxdpx3uSg

The graph of the linear function is

https://lh4.googleusercontent.com/1JYyYSn-pk8ekF5FcRHAq9cCWtq7s6ybBKZQFaAfcVQAFSSTSBlzdSzpOfXTBZIo8tYfkGF0awJfQxLIlEk7wJCagrZ4iiST24dPhvQIoh10bdoTour5j2MHtLMww-zh6FUjtzZglYxdpx3uSg

 

 Backpropagation neural network example:C:\Users\KRISHN~1\AppData\Local\Temp\Rar$DRa14000.25962\Scanned_1531331603293.jpg

 

C:\Users\KRISHN~1\AppData\Local\Temp\Rar$DRa14000.25962\Scanned_1531331655296.jpg

 

C:\Users\KRISHN~1\AppData\Local\Temp\Rar$DRa14000.25962\Scanned_1531331699420.jpg

C:\Users\KRISHN~1\AppData\Local\Temp\Rar$DRa14000.25962\Scanned_1531331743836.jpg

 

When to consider the neural networks?

  • When the input is raw data (directly from the sensors).
  • Noisy data.
  • The output is random or discrete.
  • When the target function is unknown.

KNN classifier:

It is the classification algorithm used to classify the new item to which class it belongs to among K different classes. The new item is generally defined with letter C. we have to categorize C among K.

Example:

  

The above diagram consists of two classes on is represented by the red circle () and another one with blue triangle (). The upcoming class C star () is to go to either circle or triangle.

For that, we have to draw a circle which consists of  K nearest neighbors.

Consider the K value as 5.

So, draw a circle which is inscribed with 5 neighbors.

Among them calculate voting for each class.

       → 3 votes

       → 2 votes

The circle has more votes when compared with the triangle, so the star will belong to the circle class only.

Limitation:

  • K must not be the multiple of the C( number of classes).
  • K must be an odd number, if even there may be a chance of having the equal number of different classes.
  • The time complexity of this algorithm is much higher than the remaining.

Contributor's Info

Created: Edited:
0Comment