AI and ML Terms

Figure showing the relationship between AI, ML, Deep Learning, and some related concepts. Taken from Goodfellow/Bengio book on Deep Learning


Artificial General Intelligence (AGI):

AGI is the stuff of imagined future AIs. They include the Skynet/HAL9000 style machines and agents that have skills and capabilities that either equal or surpass human capabilities, and are capable of such behavior not just in particular or singular domains, but rather across a wide range of domains and modalities. What was historically referred to as AI is now more commonly referred to as AGI, largely as a result of the broad application of the term AI to systems that operate in particular domains.


Artificial Intelligence (AI):

The premise behind AI was that any aspect of (human) learning or intelligence could be “so precisely described that a machine can be made to simulate it” [1]. While the term “artificial intelligence” might conjure images of science fiction, it can accurately describe many systems that learn, adapt and self-improve in response to inputs.

Data Lake:

            A Data Lake is a means of storing raw data, regardless of its structure or lack thereof, from one or more sources. The data is kept is kept in its originating format to be queried later.


Deep Learning:

                  Deep learning refers to neural networks that can learn increasingly abstract representations of inputs, by making use of successive layers of neurons. Although the word “deep” suggests many layers, there is little consensus on what that actually means. Some state of the art networks are simply a few layers deep, while others have hundreds of layers. Additionally, networks can be thought of as deep in space (i.e., many layers stacked one after the other), or deep in time (i.e., a layer that has recurrent connections so that it can process its historical input).



A heuristic generally refers to a non-optimal, yet relatively easy and practical solution a problem that can be improved upon by more sophisticated methods. For example, classifying a binary based upon the series of the system calls it makes when executed is a type of heuristic detection.


Machine Learning:

Machine learning refers to a program’s ability to recognize patterns and abstract specific events to generalized knowledge. While humans are good at pattern recognition, it is difficult to define rules for a set of pixels, speech waveforms, or traffic patterns, that distinguish one object from another.

As shown in the figure below (from Oliver Selfridge’s 1955 article, Pattern Recognition and Modern Computers), the same inputs can lead to different outputs depending on the context. In the figure shown, the H in THE and the A in CAT are identical in terms of pixels, but their interpretation as an H or an A instead relies on the surrounding letters.


Mechanical Turk:

A Mechanical Turk draws its name from an 18th century automaton designed to play chess at human levels of game-play. Despite appearing to be a mechanical system, it was later revealed to be a hoax, wherein a human could hide inside the machine, and make it appear as though the machine was playing on autonomously.

Today, Mechanical Turks refer to the use of humans to respond to tasks, both to generate labeled data about how best to respond, as well as giving the appearance of autonomous AI. Companies have often used Mechanical Turks in order to generate sufficiently large datasets of labeled data to train their AI systems.


Neural Networks:

Artificial neural network models are inspired by the biological circuitry that makes up the human brain. At a high level, they make use of “neurons” which are connected to each other by a series of synaptic “weights”. Initially, the weights are selected randomly --- which means that the neural network will respond to inputs with random outputs. However, as new inputs are presented, the weights of the model change in order to reduce the errors made by the network.

Though various learning rules exist to train a neural network, at its most basic the learning can be thought of as follows:

  • A neural network is presented with some input, and activity propagates throughout its series of interconnected neurons, until reaching a set of output neurons.
  • These output neurons determine the kind of prediction that the network makes. For example, if we want the network to recognize hand-written digits, we could have 10 output neurons in the network, one for each of the digits between 0-9, where the neuron with the highest activity in response to an image of a digit denotes the prediction of which digit was seen.
  • At first, the weights between the neurons will all be set to random values, so the first predictions about which digit is in an image will be random.
  • As each image is presented however, the weights can be adjusted so that the next time it sees a similar image, it will be more likely to output the correct answer.
  • By adjusting the weights in this manner, a neural network can learn which features and representations are relevant for correctly predicting the class of the image, rather than requiring this knowledge to be predetermined by hand.

A great example is Google’s Quick Draw:


No Free Lunch (theorem):

 “There is no single algorithm that works best on all problems”. [2] Although the technical form of this theorem is far more nuanced, this is how it is most widely (and colloquially) used.


Reinforcement Learning:

Reinforcement learning refers to learning paradigms where a sequence of actions is taken, each of which may change the state of the environment, and rather than receiving explicit feedback about the correct action to take at each step, it only receives general reward signals about how well it is performing.

Video games can offer great examples of reinforcement learning: the program observes the state of the environment and then decides to take an action. In video chess the program observes the state of the board and then chooses which piece to move. In a 2D action-adventure game the program tracks a set of pixels on screen and then attacks the player.


Semi-supervised Learning -

Semi-supervised learning refers to situations in which some of the data has labels, and some do not. There are a number of techniques that are specific to semi-supervised learning that make it more than a simple combination of unsupervised and supervised learning.



A specific pattern in data. For network security, this may be specific byte code in a packet or a specific domain. Signatures may be explicit, such as baddomain[.]com or abstracted, such as bad{1,2}omain[.]com

Supervised Learning:

Supervised learning refers to situations in which each instance of the input data is accompanied by pre-determined labels. When the labels are a set of (finite) discrete categories, the learning task will often be referred to as a classification problem, and when the targets are one (or more) continuous variables the task is called regression.

Classification tasks would include predicting what objects are in an image, whether a given email is spam or legitmate, or which category of malware describes a binary file.

A regression task on the other hand, might include trying to predict the amount of power consumption by a server on a given day, or how many security incidents a given host on a network will give rise to in a certain period of time.



“Training” refers to the use of a dataset in order to adapt a model, so that it can perform better according to some objective function.


Unsupervised Learning:

In contrast to supervised methods, unsupervised learning refers to scenarios in which an algorithm or agent must learn from raw data alone, without any pre-determined label. Often, this means learning to group together similar examples in the data (a task known as clustering).

Clustering can be used to determine groups of machines in a network that are similar to one another based on features like the number of internal vs external hosts that they initiate connections to and the numbers of hosts that initiate connections with them.

Alternatively, unsupervised methods can be used for anomaly detection by learning about the properties and statistics of “normal” traffic on a network, so that network connections that deviate too far from the norm can be labeled as anomalous.



 “Validation” refers to the use of a set of data distinct from the training/testing data, that can determine the best ‘hyperparameters’ for a model. These hyperparameters include things like learning rate, model size, etc.


[1] A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence; J. McCarthy, M. L. Minsky, N. Rochester, and C. E. Shannon

[2] No Free Lunch Theorems for Optimization; D. H. Wolpert and W. G. Macready

Was this article helpful?
3 out of 3 found this helpful

Download PDF

Have more questions? Submit a request


Article is closed for comments.