[ edited 2017-01-08: added performance numbers for gcc at the bottom of the post ]
[ edited 2017-01-14: Perfect forwarding of functions in higher order function ]
Take this somewhat silly traditional C++ code:
It's not very complex, and most C++ developers understand what it does in a few seconds. If this is the only place in the entire code base that elements are selected from a vector of Employees, then all is well. But what if there are several places? The code becomes cluttered with almost identical copies of the same loop. This screams algorithm. None in the standard library is a perfect fit, though. Let's hand roll a simple one:
It's not ideal, but it works with any container type and returns a vector with pointers to the elements in the container that matches the predicate.
With this, the devs_who_make_at_least() function becomes simple:
This is cool, but what if there's more than one place in the code where you want to select employees based on salary or title?
C++14 introduced the auto return type, which is great for writing functions that generates lambdas.
Take for example:
Each of the above functions returns a lambda that has captured the function parameter. These can be quite expressively used with the elements_matching algorithm introduced earlier:
This is powerful on its own right. The code is easy to read. Most, even very inexperienced C++ developers, grasp the intent directly, even if the mechanism may be unclear.
But what if we want more than one criteria, like selecting developers with a certain salary? Let's introduce an and higher order function, that returns true if two contained predicates both return true.
This function template pred_and(), creates a new predicate using two predicates. It will be true if and only if both t(x) and u(x) evaluates to true. It naturally short circuits the logic, so that if t(x) is false, u(x) is never evaluated.
Now finding the super well paid developers who make more that 1M becomes so easy it doesn't even need a separate function anymore.
So what about performance, then? Surely this magic has a cost in confusing the optimiser?
I created a small test program that populates a vector of 5000 employees. 1/4 of them with title = "Developer", and an unrealistically uniform random distribution salary in the range 50000 - 250000. With those the program did 500000 loops, each filtering out the developers that make 150000 or more. This was build with clang++ 3.9.
The output from the linux tool perf are,
first from the hand made loop at the top of this post:
Then from using the algorithm and higher order function:
I must admit I was puzzled by this huge performance advantage of using higher order functions, especially when there has been no attempt to optimise it at all. It turns out there's a small mistake in the hand written loop. It compares the member .title, which is a std::string with the string literal "Developer". This means it must make character by character comparison. The higher order function has_title() captures a std::string, and equal comparison of std::string begins with checking their lengths. If the lengths are different there's no reason to look at the characters. That is the only reason I've seen for the huge gain.
Changing the hand written loop to compare title with a std::string gives this result:
So, it is better performing. Not much. But it is. However, the hand written loop gets copied all over the code base, the algorithm and higher order functions do not. The performance bug could've been made in the has_title() function as well, but it would be one place to fix, not many. Likewise, the elements_matching() algorithm could be optimised, for example with partial loop unrolling to do several comparisons per revolution. That too would be an effort spent once in one place, and not all over the code.
That was the results with clang++-3.9. Let's see how g++ 6.2 fares in comparison. First the higher order function version:
And then the hand written loop, with the fix to compare with std::string:
Here it's clear that gcc does a slightly worse job than clang with optimising the hand written loop, but does a substantially better job at optimising the higher order function version.
As can be seen, the exact result does vary between compilers, but the chances of losing any performance to speak of when using higher order functions are small, and there is the potential for very noticeable performance gains.
So, use algorithms and use higher order functions, for code clarity and for performance.