The Zen of TDA


This is an extremely high level perspective on something that can be described very concretely. The author has found this layer abstraction very useful in remembering and relating the zoo of variations present in the literature.

In general, data science wants a strategy of the form:

\[\mathrm{Data} \overset{\mathrm{ML}} \longrightarrow \mathrm{Insight}\]

Traditional methods use probabilities:

\[\begin{split}\begin{align*} \mathrm{Data} &\overset{\mathrm{Stats}}\longrightarrow \mathrm{Probabilities} \overset{\mathrm{Prob}} \longrightarrow \mathrm{Insight} \\ \mathrm{sample} &\longmapsto \mathrm{(reg.)\ MLE} \longmapsto \mathrm{descriptive\ statistic} \end{align*}\end{split}\]

TDA uses spaces:

\[\mathrm{Data} \longrightarrow \mathrm{Spaces} \longrightarrow \mathrm{Insight}\]

It’s based on the simple


Data has shape:

\[\begin{align*} &\mathrm{Data} \longrightarrow \mathrm{Spaces} \end{align*}\]

and shape has meaning:

\[\begin{align*} \mathrm{Spaces} \longrightarrow \mathrm{Insight} \end{align*}\]

In other words, TDA uses spaces to understand data.


We can use sample data to build a graph:

\[\mathrm{sample} \longmapsto \mathrm{graph}\]

and draw that graph:

\[\mathrm{graph} \longmapsto \mathrm{pretty\ picture}\]

or say something about the graph:

\[\mathrm{graph} \longmapsto \mathrm{number\ of\ connected\ components}\]

Here is a loose


TDA is the application of constructions, deconstructions, and analyses of spaces to data science.

The theory behind spaces has a name.


Homotopy theory is the (de)construction and analysis of spaces

This gives a more concise definition of TDA:


TDA uses homotopy theory to understand data.


Unless you have a very strong (e.g. graduate level) background in mathematics, don’t try to learn this stuff by (only) reading a textbook.

  1. Find someone to talk to about it who already knows about it. For example, go to a conference.

  2. Find a project to use this stuff in.

  3. Read a book.

  4. Combine them.


Add useful references