Artificial intelligence: a concise conceptual introduction

evergreen · May 25, 2020

From intelligence to machine learning

To talk about artificial intelligence, we have to start by having a reasonable definition of intelligence. In his paper On the Measure of Intelligence,¹ Chollet describes two different definitions of intelligence dominating the literature:

Intelligence as a collection of task-specific skills: emphasizing the capacity to achieve goals.
Intelligence as a general learning ability: emphasizing generality and adaptation.

Assuming either definition, intelligence appears as a property in many natural systems. Life forms present intelligence in various ways, and the challenging task of describing and explaining this phenomenon is at the heart of the so-called Cognitive Sciences. Other than that, since its founding conference in 1956, the longstanding human effort to build intelligent machines has been labeled by the term Artificial Intelligence (AI).² Right from the start, the main ideas on how to make intelligent artificial systems gravitated around two approaches:

Learning from experts: focusing on methods to distill knowledge into sets of rules or other explicit representations and loading those descriptions into machines. It is common to refer to methods in this tradition as symbolic AI.
Learning from data: focusing on teaching machines by example. This core idea of allowing computers to learn directly from data is expressed by the umbrella term naming this paradigm: machine learning.

Due to technical challenges and limited success on the symbolic front, many such methods fell into disuse, leading this paradigm to be commonly referred to as Good Old-Fashioned AI (GOFAI). Besides, the growing availability of computing power and data over the last decades drove machine learning to the forefront of AI research. If someone is talking about AI today, it is probably talking about machine learning.

It is crucial to notice that when machine learning appeared as a research field, the “learning from data” approach was already the subject of another discipline: statistics. This overlap between statistics and machine learning has generated some debate over the years, and a good reference to start a discussion on the issue is Breiman’s paper Statistical Modeling: The Two Cultures.³ In a nutshell, the arguments go as follows:

Statistics: models data by assuming a stochastic model and estimating its parameters. Goodness-of-fit tests and residual examination assess the model quality.
Machine learning: models data by assuming the model is complex and unknown. Its predictive power on a hold-out dataset assesses the quality of a model, e.g., cross-validation.

Due to the technical nuances of the various solutions proposed by both traditions, it is hard to classify every technique as pertaining to one or another approach. Perhaps it is more useful to acknowledge these “two cultures” as two opposing extremes of a spectrum, where the tools proposed by each tradition are laid down in the continuum between inference-centric and prediction-centric methods.

Now, having this background in mind, let’s move on to the complexities of machine learning solutions. Even if we concentrate the discussion about AI on machine learning, navigating the rich terminology surrounding the field is still no easy task. When discussing such a system, I find it useful to think of every machine-learning solution as consisting of five somewhat independent abstraction layers.

Five layers of a machine learning solution

The five layers stack from the concrete to the abstract: technology provides the computational foundation, data feeds the system, the model defines the space of candidate solutions, the method searches that space for a good fit, and the application turns the result into something useful. They are abstraction layers, not sequential stages — a choice at one layer constrains and informs the others, which is why the diagram draws them as overlapping concerns over a single problem rather than a clean pipeline. The sections that follow walk through each layer from the foundation up, sketching the questions a system designer weighs at each one.

Technology

Three line icons: a database cylinder (storage), a processor chip (computing), and code brackets (development). — The technology layer rests on storage, computing, and development.

Every solution depends on computational infrastructure.

Storage: depends on the dataset size, data type, required latency, etc.
Computing: depends on required processing units, e.g., CPUs and GPUs, and if the processing is centralized or distributed.
Development: depends on team expertise, toolset maturity, etc.

Besides, like any other computational system, the decisions on each topic will have to consider other issues, such as budget limits, scalability, and maintainability. This layer has many options, ranging from a small system where data sits in your hard drive and is treated on a local Jupyter Notebook to massive pipelines on multi-cloud environments.

Data

Three line icons: a table grid (tables), a stacked document (texts), and a framed picture (images). — Tables, texts, and images are the most common kinds of data.

It may seem obvious, but it’s worth stating: all machine learning solutions depend on data. Knowing which kind of data will be treated by the system will allow the system designer to make informed decisions about adequate modeling techniques to achieve the desired results. When talking about data, the concerns are usually about:

Data type: the way information is presented to the system. It may be on tables, texts, images, sounds, etc. Each data type presents unique challenges for which specific tools and methods are available.
Dataset size: the amount of available data will affect decisions both on the technology and the model layer. Some models thrive on lots of data, while others are adequate for small datasets.
Dependence: knowing if each data point in the dataset is independent or if some sort of dependence structure exists, e.g., time series and graphs, makes a big difference when choosing an adequate modeling technique.

Model

Choosing a model is selecting the space of functions, or the hypothesis set, in which we will search for the best fitting model. Characterizing the previous layers will already inform this decision in some ways: deep learning models, for example, may require bigger datasets and specialized hardware to train. Additionally, there are particular characteristics that are central to this layer:

Capability: if the model can generate new instances of its learning data elements, it is called a generative model; if it creates estimates based on its inputs, it is called a discriminative model.
Interpretability: the need to explain its prediction or the relations of the variables within the model is a critical aspect of the model choice.

The direct assessment of the coefficients on a linear model or the thresholds on a decision tree can give us information about the relationship between the variables on the dataset. Other models, such as a Random Forest or deep neural networks, are hard to assess directly and may need additional tools to get some insight into their inner workings.

Method

The three major learning approaches.

The method, or learning approach, defines how data is used to search for a good model. It will heavily depend on the type of problems we are solving: clustering, classification, regression, control, etc. Most expositions about this aspect of machine learning tend to highlight three major approaches:

Unsupervised learning: where we do not have a particular target variable. The usual method to tackle segmentation and association tasks.
Supervised learning: where we have a particular target variable. The usual method to tackle classification and regression problems.
Reinforcement learning: where an agent learns how to achieve a goal by directly interacting with an environment.

Although the learning approach characterization may seem a mere formality, it is a crucial definition of a machine learning system design. Specifying an adequate learning approach for a given problem will guide all the model training setup, including its assessment methods, learning metrics, and expected results.

It is worth keeping in mind that those three are not the only approaches available. Still, once you clearly understand their characteristics, it will be easier to understand other learning variations, such as semi-supervised, online, adversarial, etc.

Application

Five line icons: a starred card (recommender systems), a screen with sliders (simulators), a forecast chart (predictors), two overlapping sets (classifiers), and an outlier point (anomaly detectors). — Some familiar applications built on machine learning.

Applications are designed to solve problems, regardless of the underlying technique. The application is the final product of the machine learning system design. Some well-known applications relying on machine learning nowadays are recommender systems, loan classifiers, anomaly detectors, and autonomous vehicles.

It is important to remember that virtually any of those applications could be built using hard-coded rules. Keeping this in mind is a reality check on the necessity of using machine learning techniques on a given solution. Software engineering has a lot of complexities by itself; relying on machine learning to build an application adds yet another layer of complexity that can lead to a significant increase in the technical debt backlog.⁴

Increasingly, applications add intelligence not by training a model but by calling one that another organization hosts behind an API. In that arrangement most of the technology, model, and method layers are abstracted away to the provider; you work mainly at the application layer, with only a thin slice of the data layer — the inputs you send and the outputs you act on. The decomposition is therefore most directly useful when you build the models yourself. When you consume one as a service it still earns its keep: it tells you precisely which concerns you have delegated, and which — data quality, evaluation, and the cost of the new dependency — remain yours.

Conclusion

Artificial intelligence is a complex research field, and having a clear overall picture is very useful, if not needed, for people dealing with this technology. The next time you face an application built using machine learning, try disentangling each abstraction layer to understand the designer’s choices in each one; it may improve your understanding of it. Finally, I hope this conceptual introduction may work as a simple map helping you navigate this vast field. For those wondering how to dig deeper into the subject, when in doubt, I always go back to Russell and Norvig’s excellent Artificial Intelligence: A Modern Approach.⁵

Chollet, F. On the Measure of Intelligence. arXiv:1911.01547 [cs] (2019). https://arxiv.org/abs/1911.01547 ↩
McCarthy, J., Minsky, M., Rochester, N. & Shannon, C. A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence (1955). ↩
Breiman, L. Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author). Statist. Sci. 16, 199–231 (2001). ↩
Sculley, D. et al. Hidden Technical Debt in Machine Learning Systems. In Proceedings of the 28th International Conference on Neural Information Processing Systems — Volume 2, 2503–2511 (MIT Press, Cambridge, MA, USA, 2015). ↩
Russell, S. J. & Norvig, P. Artificial Intelligence: A Modern Approach. (Pearson, Hoboken, 2020). ↩