Why OOP sucks as an MLE

Why OOP sucks as an MLE

"Object oriented programs are offered as alternatives to correct ones"

The quote in the subtitle is by the legendary Edsger Dijkstra. As a Machine Learning Engineer with some years of experience across a variety of domains and tech stacks, I've gradually come to the conviction that Object Oriented Programming (OOP) has, unfortunately, become one of the key factors slowing the pace of progress and innovation in the field of Machine Learning.

On the surface, classic OOP concepts like design patterns, abstraction, encapsulation, and polymorphism have an appealing sophistication to them. Managers and developers alike are drawn to the apparent structure and organization they bring to large codebases. All too often, however, teams get so caught up in building elaborate object hierarchies, abiding by principles like the Single Responsibility Principle and Interface Segregation, and throwing around pattern names like Facade, Decorator, and Abstract Factory, that they lose sight of what really matters: shipping a working product in a timely fashion. The siren song of OOP can lure you into feeling like you're making huge productivity gains, when in fact there's frequently a much simpler and more direct path to achieving your goal.

One of the most common complaints from developers about heavily OOP languages and codebases is the sheer amount of boilerplate and ceremonial code required to express even relatively simple program logic, especially in comparison to more functionally-oriented alternatives. In a traditional OOP system, the actual business logic is often scattered across a sprawling landscape of classes, each encapsulating its own mutable state which may interact with the mutable state of other classes in subtle and unpredictable ways. As the application grows in size and complexity, the intricate web of object interactions and dependencies can become a nightmare to test, refactor, debug, and simply reason about. I've lost count of the number of hours I've wasted fruitlessly tracing through labyrinthine call hierarchies, trying to unravel the tangled skein of control flow and mutable state mutations, all to find a single critical line of code. In the Machine Learning world in particular, where so much of the work revolves around mathematical transformations of immutable tensors and dataframes, the OOP paradigm often feels like an awkward and unnatural fit.

So why, then, does OOP still exert such a strong influence over the programming culture and mindset of so many organizations? I believe a large part of it comes down to a generational divide. Many of the grizzled veterans and old hands who currently occupy leadership and decision-making roles came of age back in the heyday of Java's dominance, where OOP reigned supreme and the Gang of Four's Design Patterns was treated with an almost scriptural reverence. When newer languages like Python came on the scene and ushered in a renaissance of functional programming techniques, many of the old guard found it easier to stick with what they knew than invest the considerable time and effort needed to rewire their brains around a functional mindset. As a result, they fell back on a cargo-cult approach of blindly porting over OOP design principles and patterns to Python, shoehorning this elegant multiparadigm language into an ill-fitting Java-esque mold. I've seen this play out at multiple companies, with mandates handed down from on high that all code must be shoved into classes and liberally festooned with all the standard "pillars of OOP."

Despite the immense cost in time and cognitive overhead imposed by these byzantine OOP architectures, many developers, especially those newer to the field, still instinctively reach for them as a way to signal their sophistication and savvy to their peers. If you can casually pepper your code reviews and design docs with a sprinkling of impressive-sounding OOP jargon, you must really know your stuff, right? It takes confidence and a certain hard-won wisdom to resist this temptation, to recognize that true elegance and maintainability arise from simplicity rather than complexity. I have learned this lesson the hard way over the course of my career.

This is not to say that OOP has no place at all in a modern development shop. In particular, for building large, interactive graphical user interfaces, the OOP model of statefulness and message-passing is often a natural fit. When it comes to the core computational kernels and data pipelines of Machine Learning applications, however, I've come to believe that a functional approach offers compelling advantages that deserve serious consideration.

First and foremost, functional programming languages treat functions themselves as first-class citizens – that is, functions can be passed around as values, returned from other functions, and manipulated and transformed just like any other data type. This enables powerful forms of abstraction and code reuse through higher-order functions – functions which accept other functions as arguments or return functions as results. A vivid example of this is the map function found in many functional languages, which takes a function f and a list [x1, x2, ..., xn] and returns the list [f(x1), f(x2), ..., f(xn)] – that is, the result of applying f to each element of the input list. With map, you can write highly general, reusable code that performs arbitrary element-wise transformations on collections, without muddying your logic with extraneous looping constructs.

Another key tenet of functional programming is immutability – once a piece of data is created, it is never altered, only transformed into new data. If you need to update a complex value like a dictionary, rather than mutating it in place, you create an entirely new dictionary containing the updated key-value pairs. Immutable data structures eliminate whole classes of subtle, insidious bugs arising from unintended mutations to shared state – if a function can't alter its inputs, and always produces the same output for the same input, reasoning about its behavior becomes drastically easier. Pure functions like this are inherently simpler to test, since you can write isolated unit tests without worrying about your test cases stepping on each other's toes by mutating global state.

Immutability and purity dovetail very naturally with the dataflow-oriented nature of many Machine Learning workloads. Training a model typically involves passing immutable batches of data through a pipeline of transformations – normalization, tokenization, feature extraction, etc. The outputs of each stage flow into the inputs of the next, without incurring side effects or mutations along the way. The suitability of the functional paradigm for machine learning is evident in the design of libraries like PyTorch and TensorFlow, which are built around composable primitives for mathematical operations on immutable tensor values. The authors of these libraries clearly recognized that the functional style was a natural fit for the job at hand.

To make the advantages of functional programming for machine learning more concrete, let's compare OOP-style and FP-style approaches to a simple task – normalizing an array of data points by subtracting the mean and dividing by the standard deviation. First, an OOP implementation in Python:

class Normalizer:
    def __init__(self, data):
        self.data = data

    def normalize(self):
        datatype = self.data.dtype
        mean = self.data.mean(axis=0)
        std = self.data.std(axis=0)
        self.data = (self.data - mean) / std
        self.data = self.data.astype(datatype)

    def get_data(self):
        return self.data


data = np.array([[1.,-1,2.], [2.,0.,0.], [0.,1.,-1.]])
normalizer = Normalizer(data)
normalizer.normalize()
result = normalizer.get_data()

And the equivalent functional version:

def normalize(data):
    datatype = data.dtype
    mean = data.mean(axis=0)
    std = data.std(axis=0)
    return ((data - mean) / std).astype(datatype)


data = np.array([[1.,-1,2.], [2.,0.,0.], [0.,1.,-1.]])
result = normalize(data)

The functional version is remarkably terse, requiring only half the lines of code of its OOP counterpart. There are no temporary variables, no mutable state, and no methods – just a single pure function that takes an input, performs a series of arithmetic operations, and returns an output. The entire logic of the normalization operation is crisp, clear, and visible at a single glance, unencumbered by boilerplate code or indirection. Extending this example to a real-world scenario, factoring out reusable functional building blocks like this normalization function can lead to concise, elegantly modular codebases. Individual functions for common operations like tokenization, vectorization, feature scaling, etc. can be composed together in various ways to assemble pipelines for training and inference. Adding new features or swapping out one version of a component for another becomes a breeze.

You may argue that the above example is too simple to be representative, that real Machine Learning systems require more complex architectures that OOP is better equipped to handle. Yet in my experience, that's often not the case. So much of applied Machine Learning consists of shuffling data through a series of transformations, and the functional paradigm is exceptionally well-suited for this mode of computation. Even when building more elaborate systems, I've found that a more functional, de-coupled architecture lends itself to a "separation of concerns" that makes the codebase easier to understand and maintain. Pure, deterministic library functions for data loading, feature extraction and model training can be isolated from the impure, effectful code that handles user interaction, logging, and other "side effects." Compared to the OOP approach of packaging everything into complex, ever-growing "God objects," this tends to result in cleaner, more modular and testable designs.

None of this is to say that functional programming is always the right paradigm for every problem in Machine Learning. In particular, some neural network architectures, like LSTMs and transformers, have an intrinsically stateful character that can be awkward to express in a purely functional manner. Even in these cases, however, I would argue that a hybrid approach, in which stateful components are isolated and treated as opaque "black boxes" within an otherwise functional pipeline, is preferable to contorting the entire system into an OOP mold.

In conclusion, while Object-Oriented Programming certainly has its place in the developer's toolkit, I have become convinced that for the domain of Machine Learning in particular, the advantages of functional programming are too compelling to ignore. By emphasizing immutability, statelessness, purity, and higher-order abstraction, functional programming offers a more natural and productive paradigm for the highly mathematical, dataflow-oriented workloads involved in Machine Learning. The next time you find yourself tempted to reach for a smorgasbord of manager-pleasing OOP design patterns, take a step back and consider: is there a simpler, more functional approach that could better serve your needs and those of your fellow developers? Resisting the siren call of needless complexity is not always easy, but in my hard-won experience, an ounce of simplicity and clarity is worth a pound of "best-practice" OOP indirection.