Two things I’ve learned from 12 years of programming

I’ve been writing code for almost as long as I can remember. I’ve written in BASIC, Pascal, C, C++, C#, JavaScript, Java, PHP. To name a few. I’ve written small applications; Games; Utility programs; Websites; Classroom projects. Applications with more than 750k lines of code. I’ve been paid to do this for over five years. I’ve done it on my own for fun for an additional 7.

Through all those projects, and all the experiences I’ve had during that time, there have been two things that have massively impacted how I read, write, and understand code on a day-to-day basis. These may be completely obvious to you! If so, I applaud your mental faculties. It’s taken me many years to finally boil down some of the finer points of these ideas and see how these ideas have played out in my own code over time.

Objects as Functions and Data

The first construct that has changed me has two parts to it. In the discipline of functional progrogramming there is an idea that states that the smallest reusable unit is not the object, it’s the function. Furthermore, objects are rather terrible, untestable, and unstable messes of state and data. Yes, I said it; and just incurred the wrath of almost every modern CS professor everywhere. Objects do have their place, they can be useful, and yes there are instances when an object can, and should represent the state of a real physical object and is a completely valid way to model small parts of your code. However, with that in mind, you also must realize that that writing an object such that it is nearly “unbreakable and uncrashable” is extremely difficult to do well. The tenants of functional programming haven’t caused me to dismiss object oriented programming entirely, but the way I think about objects has changed drastically.

In the world of functional programming everything is a function or data. Nothing else. Functions take inputs, and functions have outputs. Data is immutable. There is no state because functions are atomic. Something goes in, something comes out. If something is used by the function, it’s passed in. I’ve heard somewhere (Probably from Rich Hickey) that you could reduce 90% of the bugs of a junior programmer by making them declare all their methods as static. I’m not saying to go and do this, but imagine for a moment how a requirement like that would change your code. All of a sudden, you don’t have reference to “this” anymore. You have to pass your object in. You can’t just access a variable. You have to pass it in. With that one change, an instance object doesn’t do anything, because it can’t change itself. Your object is just a collection of data: simple values, pointers, and methods that act on a constrained set of data.

In reality this is how your code actually works under the covers in most languages. The runtime will never physically create multiple instances of you function no matter how many instances of your object you declare. Seriously. When you call “functionThatChangesX()” that changes variable x in your class, you don’t get multiple instances of “functionThatChangesX()”, it takes your nice little class function and turns it into “static void functionThatChangesX(classX this)” and changes “X” on “this” where “classX” is the name of your class.

See what just happened? Realize it or not, you already live in a world where your code is static and your data is not. There’s are rules in place that make the reality slightly more complicated (usually enforced by the compiler or runtime), but when it comes down to it, your instance methods are only a short hand parlor trick to make it look like you can “just access” things.

To take this even a bit further, you understand what a Map is right? A set of key value pairs, with unique keys. You can literally take an object’s fields, and translate it into a dictionary, use the property names as keys, and turn the methods into static functions that act on that dictionary. Have a list of other objects? They become a list of dictionaries. That list becomes a property of your first object. Your functions take a dictionary, or a list, or multiple dictionaries and spits out new ones. When you start thinking about objects as just lists and dictionaries, there are a lot of really cool things you could do with dictionaries that are just plain hard with objects. What if you want to select a subset of based on the value of a property? What if you wanted to select a subset of them based on the value of the sum of a list of one of the properties? Easy, if you could write a sum function that worked on arbitrary dictionaries given a property name, and a select function, and then just compose them together… I literally would not have to write loops anymore for 90% of the code that I write.

Now, there is a point to be made that having objects allow you to enforce compile time checking of properties. This is a completely valid concern. At the end of the day I’m not trying to tout functional programming over object oriented programming, as with all things, different problems will ultimately require different approaches. There will always be trade offs. I want to point out that functional programming has made me look at objects differently and use them as tools instead of a golden orb of truth. However; Functional programming has forced me to look at the inherent instability of objects themselves. Objects have three distinct things that cause complexity because they are not separate: state, identity and value. These three things can always changed, cannot be relied on, and force everything that touches it to deal with it. I cannot count the number of times I’ve tested properties on an object I’ve been passed to see if they are null, or if the object is in a correct state. I cannot count the number of times I’ve spent huge amounts of time trying to decide if instance A and instance B are the “the same”, and if they should be considered “the same”. Objects, and more generally, any kind of mutable (directly modifiable values), is prone to changing under your feet. Properties can be changed by other code, methods or events on the object can can change the state of the object without you knowing. Or even in the middle of a method if your application is threaded.

The point of this whole section is that functional programming has forced me to think through the instability of my code. In an object oriented world, it has opened my eyes to potential problems and allowed me to control them before I’ve actually run into those particular problems. It has allowed me to better understand and break apart chunks of code such that they can be reused.

Ultimately, all I can say is that the more I’ve applied the ideas of functional programming to my own code, the better my code has become, and the easier it’s been to create systems that work simply and effectively with minimal amounts of excess code. The problems of selection, joining, filtering, mutation, reducing, ziping and so on have been so well defined for so long. Once you can look at the problem and realize that in an objects the data and methods that operate on it are separate, even if they are written together, it becomes far easier to apply solutions to problems even if you have to write a custom join / select / filter / zip functions by hand.

Applications as Graphs

The second construct that I’ve just recently crystallized into a firm concept that I can apply to my code may be overly simple: It’s the idea that applications are graphs of objects (if we’re going with a functional mindset, then applications are litterally data graphs with static methods that modify and work with subsections of the graph). The running theory in my head is that for any reasonably complex application the code will always be a graph, not a tree. This may seem painfully obvious. After all, we talk about the object graphs all the time, there are graphing problems, and whole fields devoted to analyzing how graphs can be classified, sorted, traversed and so on. But until recently I personally have never stopped to realize what that means and how that impacts how my code is written.

For one, it means you will never be able to get away with passing every object you need in the constructor. Seriously, think about that. I will bet that most of you have heard that the right thing to do ™ when writing object oriented code in the “classic OOP manner” is to pass all your dependencies in through the constructor. They’re wrong. When you construct a data graph, it is always a two step process. You create all your objects and you link all the objects together in a separate pass. This means that in your application for at least some of your objects you must have some kind of initialize or set up method(s) or event handler subscriptions for at least some of your objects. You may try to hide it, make it simpler with dependency injection or to limit interaction to to just events, but the core idea will apply.

As an overly simple abstract example consider a circular dependency such that A depends on B depends on C that depends on A. There is no way to construct this graph if A takes a B in the constructor, B takes a C in the constructor and C takes an A in the constructor. It’s not possible to have 3 instances that all talk to one another unless you allow for the objects to be initialized or linked to in some way outside of the graph.

The more I’ve considered this, the more I’ve realized how often I try to cram and contort my design into an object tree. It’s absurd. Psychologically, I believe it’s inevitable that we do this to our designs, simply because we’re surrounded by hierarchy. Classes have one parent. Code is organized into directories and folders. File structures have a “Parent”. Organizations have a “Manager” a “Director” a “VP” and so on. You have a “Top” and a “Bottom” of your application. Your application is designed in “Layers”. We miss the power thinking about our application structure in terms of a graph because it’s not as simple as a top down structure of a tree. Trees are simpler to talk about and reason through because graphs can take so many forms. So, because thinking about graphs is hard, we try to structure our code into a tree, and we write constructs to talk to other parts of the tree “secretly” so that we don’t expose the dirty little fact that we actually do know about and depend on this other part of the tree. We usually call these events. Or message frameworks. Or whatever. The point is that your code will eventually reference things besides it’s “parent” or it’s “children”. Whatever those are.

To further explore this, lets take an almost real example: lets say you have a menu bar that needs to know the state of a dial inside a widget inside a group of widgets inside a larger component. In the “tree mindset” you would have to pass the value when it changed from the widget to it’s controller to its parent controller to the application controller to one of it’s child controller for the menu to the child controller for the menu item. That is a huge number of dependencies for your code to be a tree. It means that everybody above you and below you in the tree along the path from your dial widget to your menu item needs to know about this value and what to do with it! If you had a way to construct your app such that the menu had a direct knowledge of the widget controller (turning your tree into a graph) construction becomes harder (the graph problem that causes us to try to make a tree in the first place) but the overall complexity of your code goes way down

Thinking about how a menu will talk with a dial with a tree:

If you think about how it can talk as a graph:

It should be fairly obvious in these examples the amount of coupling that will be required in the first example. Every parent has to know about the combined behavior of every sub element and be able to handle changes in those sub components. Furthermore, it somehow has to route data and information up the tree such that it goes down the other branch of the tree and reaches the correct component. There is massive coupling. You may argue that there are interfaces in-between, or generic events, or you that use such-and-such messaging framework or data architecture with controller layers. The coupling and dependencies exist, like it or not, but the more you force communication of dependent components up and down the tree or through other messaging frameworks, the harder it will be to maintain and build on as the application gets larger. I’m talking about long term maintainability and growth, not small 1-3 month mini-projects.

In the second example, the object graph is going to be much harder to construct. You have the exact same problems as constructing an object graph of data, but the code required for communication between different components is much smaller. As in constructing normal graphs of data, there are two stages: Constructing the objects, and wiring the objects together. Is it easy? No. But it drastically reduces the amount of moving pieces needed. The simpler code there is, the less there is to change and maintain. The simpler code there is, the less there is that can go wrong.

Final Thoughts

I’ve covered two things: Objects as Functions and Data, and Applications as Graphs. I don’t want you to come away from this with a sudden desire to increase the amount of coupling in your code. That is the last thing I want. I want to get you to look beyond what’s easy to do, beyond what you are comfortable with and to write code that is simple. Simple is not easy. Simple is not “smaller”. Simple is not “more succinct.” Simple is directly correlated to the long term progress and maintainability of the codebase of a project.

When I say easy, or simple, I’m talking about something very specific, and I recommend watching Rich Hickey’s talk on the subject Simple Made Easy, but here is the quick defintion:

Simple: (vs Complex)

  • One role, one braid, one fold, one strand
  • Does NOT mean one of something, one operation
  • Simple can involve many things, but it is about not interleaving things
  • Simple is an objective notation

Easy: (vs Hard)

  • Near at hand (on our hard drive, tool set, IDE, gem install… etc…)
  • Near to our understanding (It’s familiar)
  • Near our capabilities
  • Easy is relative. For instance mountain climbing and german are easy for some, and hard for others.

As programmers, we have an infatuation with easy. We like things that can be “up and running in 2 minutes”. We argue about tools. About IDE’s. About programming languages. But easy is like a runner on a sprint. You can dash 100 meters really quick. It may start off really well, but it has no distance capability. As a programmer, at the end of the day, our job is to write simple, reliable, correct code. Code that can be reused, built on, and can be reasoned about in the long term. Functional programming has made me completely rethink how I write code to be simple. Looking at applications as graphs instead of trees and given me the mental mindset to recognize dependencies and simplify how the projects I’ve worked on have been built.

2 thoughts on “Two things I’ve learned from 12 years of programming

  1. Hey just wanted to give you a quick heads up. The text in
    your content seem to be running off the screen in Safari. I’m not sure if this is a formatting issue or something to do with browser compatibility but I figured I’d post to let you
    know. The design look great though! Hope you get the problem resolved soon.
    Kudos

  2. Excellent beat ! I would like to apprentice
    whilst you amend your site, how could i subscribe for a weblog website?
    The account helped me a acceptable deal.
    I were a little bit acquainted of this your broadcast provided bright transparent idea

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>