Objects from the magic box

2001MonolithObjects are data, functions, behaviors, contracts, everything. If you came from the plain-old-C age, you would be familiar with a much simpler way of structuring your code: structures as records of data fields, and functions as collections of transformation steps that affect these data structures.
The procedural approach to programming is more strictly structured than OOP.

OOP was born out of procedural programming, as an extension. That extension, called classes, did not narrow down the possibilities or put additional constraints. It opened up a rather complex world of possibilites, by allowing a free unrestricted mix of data and functions called an “object”. One common rant agains OOP is that “OOP forces everything to be an object”. Joe Armstrong, designer of the Erlang functional Language, expressed this rant very strongly . I think the truth is quite the opposite. It’s not that OOP forces everything into an object, it’s that an object can be everything in OOP, and as such it’s hard to say what the object construct is meant for. I would rather second objections along the lines of Jeff Atwood’s entry  in that OOP is not a free ride.
A class can be a data structure, and in this case the encapsulation traits are probably not extremely interesting and the class itself could well be a structure. A class can be a collection of methods only, without shared state. In this case it’s like a C module. A class can be a contract, when it contains virtual functions. A class can be the implementation of a contract with hidden and encapsulated state. A class can be many more things.
I think that one of the productivity issues with OOP, at least the C++ way (and all other derivatives) is that all these different use cases are syntactically represented in the same way, as a class. The class construct is totally devoid of any specialization, and as such it’s both extremely powerful and hard to handle. The software architect needs to specialize the class into a meaningful tool for the problem at hand. OOP in this sense is a meta-programming paradigm, which does require some thoughtful selection of language features and how these should be bent to the goals of product creation. This becomes even more evident if you look into all the “companion” language features of C++, like templates, multiple inheritance or friend classes. If you choose OOP, you have to define rules of how to use the language, much more so than in the procedural language case. Java and C# made some moderate attempts at specializing the class construct by adding the interface keyword. It might be interesting to see what an even more constrained OOP language could look like. A language with special syntax for data classes, behavior classes, user interface classes, and so on. A language that naturally leads the developer to choose a nearly optimal tool for the job. Any language designer out there? For the time being, architects are called to do this step in frameworks instead of languages.

So, if OOP requires so much planning and choice of the tools, why has it become so popular? In my mind, it’s because of two reasons. First, because flexible structuring allows software designers to create libraries and frameworks with the reuse patterns they have in mind and they need. As Spiderman said, with great power comes great responsibility, and that’s what OOP gives and demands.

The second, maybe the most important reason, is that the object way of decomposing a problem is one of the most natural ways of handling complexity. When you plan your daily work activities, are you concerned about the innards of the car you are driving to reach the office? Do you need to know how combustion in the engine works? Do you need to check out the little transistors in your CPU to see they are all twitting correctly? Me not. I rely on those things working as expected. I don’t need to know details of their internal state. I appreciate that somebody encapsulated and hid their variables and workings in convenient packages for me to consume. It’s like this with objects, and it’s like this with human organizations. We all regularly delegate important work to others and trust them, maybe after signing some contract, that they will provide us with the results we need. Delegation and work-by-contract is what defines human structures as well as OOP, which is why OOP is popular for large software architectures.

There’s maybe one last perspective. Object orientation might favour static structures over processes made of steps, or state machines where state keeps changing. By hiding the changing state, OOP could give the impression of a perfect world of static relationships. The word “perfect” comes in fact from the latin composition of per-factum, that is complete, finished, done. If it’s done it does not change anymore and it’s thus static. Clearly a static structure is easier to observe than something which keeps changing, so perfection of static structures is more in the eyes of the beholder who can then appreciate all details. Science, for instance, is about capturing what changes in formulas that do not change and thus can be used for predictions. It’s not just an observer perspective, as static and long lasting structures are more worthy of investigation than brief temporary situations.
To sum it up, the bias of OOP towards static structures is natural and useful in describing large architectures.

 

Advertisements

Atomic software patterns

Software engineering, really?

ImageIt’s typical of software engineers to feel a little apart in the company of engineers of other specialties. Why? Because engineering and software don’t really get along that nicely. Engineers who build a house, a bridge or even an electronic circuit have little margin of error. That makes the process a little more constrained than say the typical write-build-run cycle we are used to. Being more constrained requires self discipline and a lot of book reading which usually makes you grow big spectacles and makes you lose hair. Software engineers, on the other hand, have compilers and unit tests. That frees software engineers from discipline and introduces one single constraint, that of a comfortable chair. That does not have a proved influence on your hair, but surely a comfortable massage chair is more forgiving of a few extra calories in your diet. So if the standard picture of an engineer is a square, that of a software engineer will be more like a circle. If there’s a lot of discipline in bridge engineering, there’s a lot of make-break in software engineering. Finally, a bridge engineer could use its books as construction material. Honestly, how many of you developers could name a book you must keep by your side while you develop?

We are going to fix this empty spot by your side, now and forever. All the knowledge you need, the light you’ve been waiting for, is coming in the following lines. To be fair, there have been attempts in the past. One of the most notable is the arch-famous “Design patterns” from the Gang of Four, the book of all software pattern books. I argue that the Gang of Four is not the theory, but rather a well organized set of common practices instead. Does the Gang of Four contain equations? Nope. Full of pictures, so it can’t be an engineering book. Is there one common formula, one law that rules them all? nope, just some common solutions to common problems.
In order to extract the one law (well, one or two) we need to go back to the basic action of software design. In a previous post I stated that software engineer is the art of copying data. I will rectify: software programs are artistic ways of copying data. How do we make this software programs?

The GOF pictures contain a lot of UML boxes with many connections, but the simple beginning of it all is one problem we need to solve. Software engineering is the process of splitting that problem in parts to manage its complexity. The GOF pictures are not the basic rules because they represents systems where the problem has already been split in many, many parts. This splitting has to start from somewhere, and that’s where we will find the grounding rule of software enineering.

Rule number 1: If a problem is too big, split it in two problems

ImageBy splitting you get two little problems instead of one. The immediate advantage is that the two little problems might be more tractable. If each of the two little problems by themselves fit in your brain, you might be able to cook a solution for each and combine.

If not, you could buy coffee for a colleague who knows how to solve part A, and pizza for another colleague who’s an expert in dealing with part B.
The split gives you more tractable problems, and the ability to replace one part of the system (the dependent part) without touching the rest of the components.

Rule number 2: the way of the dependency

ImageIt seems that by applying the rule Nr 1 we could design all possible software architectures. We split A in B and C, then C in D and E. If all dependencies go the same way what we achieved is that the first component depends on the entire chain. Only the last one is a self standing entity. Rule number 1 is thus enough only as a beginning.

In order to be a successful split, the dependency between the two parts of the split needs to be one directional. That is, if you split A in B and C, it should be that B depends on C and not viceversa.

ImageIf you have a two-way dependency, you are specifying a chat between two parties rather than a software design. If B depends on C and viceversa, you cannot use any of the two entities independently, thus looking from far away, those two entities could very well be one. By the same transitive logic, the dependency chain b->c->d from above is no better than two components.

 

ImageThe rule nr 2 generates this super useful corollary: if you split A in B, C and D, make sure that the there is no chain. In practice this means B and D both depend on C, which acts as the contract or interface between the two.

 

And that was it. The book of software engineering is composed of two atomic rules from which all patterns derive.

Side note: I have taken the admittely very biased approach of making software design coincide with object oriented design. I’ll explain why in another post.

Everything is asynchronous

Asynchronous APIs

In one of the previous posts I stated that all processing that occurs as a result of a user interaction should be delegated to background processing so that the user interface is always responsive and smooth. In ordfuturismoer to keep it (even) simpler, one might say that all classes in an application which deal with data or business logic should only expose asynchronous methods. To be more specific, we can start by categorizing classes and components in an application. Some classes are user controls, like buttons, comboboxes or other more complex collections of user-facing components. All the other classes are somehow related with the actual domain of the application, from data model classes to business logic components that deal with business processes. So, if we want to be strict, we can say that all user-interface classes which call into business classes should do so via an asynchronous call that delegates the work to another thread. In the other direction, from background workers to user interface, the UI frameworks typically require all calls to be directed to the thread owning the user interface components (there might be multiple), so our rule is already enforced. One of the issues with this approach is that it leads to too much/unwanted parallelism. When business objects start calling other business objects, every call turns into a new thread. The asynchronous calls should be enforced only when coming from a user interface component.

With thread-based APIs, this is difficult to achieve. Whenever you design a business object with a method A that can potentially take a very long time, you delegate the work to a background thread. This is appropriate if the caller is the UI, but what if the caller is another business oject? It might be a better choice to run the lengthy method in the same thread as the caller. The solution to this problem, as usual in software engineering, comes via a layer of abstraction. The thread is the low level way of doign parallel computation, the task hides the details of the thread. You can start thousands of tasks, but the “runtime” (language library or virtual machine) will only execute a reasonable number of tasks in parallel, where reasonable depends on several factors including the number of real available cpu cores. Many languages provide some task based abstraction: C++ 11, C#, Javascript and Java as well (JDK8).

While tasks were becoming the trend in parallel programming, I was designing an API which would be used in a mostly asynchronous way. So I asked myself whether I should simply shape the API to return Tasks insted of plain simple results. Back then I chose to offer both task-based as well as non-task based (synchronous) APIs. That meant an API like this:

public class A {
int FindSomeValue(int someKey);
Task BeginFindSomeValue(int someKey);
}

Normally you would not clutter an API with utility functions. If the user of the API can easily achieve the desired behavior with the existing API, don’t add anything. The smaller the API, the more understandable, the more usable, the more productive. So why would we want to expose both synchronous and asynchronous APIs? After all, it’s easy to turn a call into an asynchronous call in .net:

int someValue = await Task<int>.Run(()=>FindSomeValue(someKey));
int someOtherValue = someValue+1;

The previous lines do a lot of things: start the FindSomeValue function in another thread (to simplify a little), return control to the caller and set up an event so that when the result of the asynchronous call is available (the someValue result), it can continue the computation and finally perform someValue+1. So, although not entirely trivial, it’s at least possible with little code to turn synchronous into asynchronous. Why did I put two versions in the API then? The reason is that I wanted to handle the scheduling myself. The BeginFindSomeValue would use a combination of resources that performed suboptimally when loaded with too many parallel workloads. .Net would allow to specify a custom scheduler, but asking a user of an API to chew all the custom scheduling way of calling threads would be too much work put on the user, and ultimately would mean exposing implementation details of the API. This is the most practical reason to expose both an asynchronous and a synchronous API: custom scheduling. Doing the scheduling internally allows the API implementor to choose how much parallelism to allow for optimal performance. For example, a database might have different scaling characteristics than a simple file storage on disk. .Net schedulers essentially schedule work for the CPU to perform, but in modern computation architectures there’s much more than CPUs: GPUs, remote computing servers, remote data servers. The logic used to schedule tasks on CPUs does not necessarily work well on network bound operations or GPU bound operations. For example, loading the GPU with many more operations than available cores is rather normal. The ratio tasks/cores is much lower on CPUs due to the different architectures. The ratio on a network is again different. A gigabit link can “perform” many network calls per second, and in most circumstances will be limited by latency more than bandwidth. Combining CPU, GPU and network workloads thus require some custom scheduling to achieve the best peformance. In these scenarios, explicitly async APIs give the implementors the freedom to keep this advanced scheduling internal.

In all other cases, which version should we expose, synchronous or async? Unless you or some team mates find the Task api difficult to understand, the synchronous version should be used, as the other one can easily be realized by using the Task factory methods in combination with synchronous APIs. Synchronous APIs are easier to read, and in case the parallelism is already achieved by other means (e.g. explicit thread creation), the asynchronous versions would be useless.
What about the ideal solution? If we have some knowledge about the types of tasks, maybe with a little help from the developer, such as an attribute, we could do better than simple CPU scheduling:

[HeavyNetworkTraffic]
int FindSomeValue(int a, int b) {...}

[LengthyComputation]
int ComputeSomeValue(int a, int b) {...}

Now, let’s say the typical use case involves calling the FindSomeValue, then call the ComputeSomeValue locally. This is in fact quite a realistic scenario where data fetched remotely is processed locally before display. Let’s say the application submits many such operations of the first kind, FindSomeValue, followed by ComputeSomeValue. If two ComputeSomeValue instances are scheduled simultaneously, the available CPU per instance is halved. If two FindSomeValue instances are scheduled in parallel, it might easily be a fine situation for a gigabit ethernet. So, ideally, a scheduler which knows what types of resources are used by each task would schedule one ComputeSomeValue task in parallel with a number of FindSomeValue tasks. This level of custom scheduling can be achieved via the .Net Task Parallel Library extension points (custom schedulers). Who knows, maybe in the future the compiler will even be able to turn synchronous calls into asynchrnous automatically. This could be possible by analyzing runtime behavior.

Until then, go for synchronous APIs unless you must control the scheduling yourself.

To reiterate: expose synchronous APIs unless you have advanced scheduling scenarios.