Constituent Ingredients of Data Science

The following is my eccentric opinion on what makes up the cross disciplinary subject of data science, along with a non-exhaustive list of subconstituents.

  1. Math
    • Probability theory

    • Statistical Learning/Inference
      • Maximum Likelihood

      • Probably Approximately Correct

      • Hypothesis Testing

    • Optimization
      • Calculus

    • Linear Algebra/Sheaf theory
      • Arrays

      • Signal Processing

  2. Software Development
    • Documentation

    • Text Editing
      • IDE, Vim, etc.

    • Fluently reading/writing in high Level programming language(s)

    • Using/creating libraries, APIs, open source software, etc.

    • Development practices
      • Test Driven Development

      • AGILE

      • Continuous integration/delivery

    • Version control

    • Dependency management

    • Effective communication

    • Identification/construction of key performance indicators

    • Product sense

  3. Subject Matter Expertise
    • “Know the data”


I used to have decision making as a section. This has been subsumed under the software development.


I used to have a section on the experimental method. I’ve subsumed this into the category of statistics, which fits into math. This is questionable. What I mean is that the aspects of the experimental method which are relevant to data science fit into statistics, and therefore math.