When Slow Algorithms Are Faster
Below are timing results for two searches. One is a linear search, and one is a binary search, but I’ve hidden which search is which.
Search | Execution Time |
---|---|
??? | 14 ns |
??? | 21 ns |
So which one is which? If you’ve read the title of this post, you might think this is a trick question. Let’s pretend I’m being straight with you though. In our algorithms class, we learned that linear search is O(n), while binary search is O(log n). Since log n is lower than n, obviously the lower number in the table is from the binary search. Right?
Wrong! In this case, the linear search is actually faster.
Big-O notation is a convenient way to talk about how an algorithm performs. You can generally look at a piece of code, or even just a description of the algorithm, and determine its Big-O running time (also known as asymptotic time complexity). This is useful, but it also abstracts away a lot of details that matter in real systems.
The definition of Big-O is roughly that a function f is O(g) if there exists an n and C, such that for every i > n, f(i) ≤ C * g(i).
Looking at the definition, there’s a hint to two details we’ve abstracted away. The first is Big-O only holds for a sufficiently large problem size. We may get different behavior at smaller scales, but eventually the asymptotic complexity will dominate.
The second is the C, or the constant factor. The constant factor includes things such as how complex each step of the algorithm is, how fast your CPU is, and so on. It’s quite common for an algorithm to have a better asymptotic complexity but a worse constant factor.
We won’t get into it too much further in this post, but sometimes an algorithm’s average case complexity is better than its worse case complexity. For example, many algorithms are exponential or worse in the worst case, but the average case is significantly better. Quicksort is widely used despite its O(n²) running time because it is often much closer to O(n log n).
The Code
Let’s return to the linear versus binary search example. By now, you probably have some idea why the supposedly slower linear search was actually faster. To start out, we’ll take a look at the code for the two algorithms. I’ve been experimenting with the Julia programming language lately, so I decided to implement these algorithms in Julia. We’ll start with the linear search code:
linear_search(needle::T, haystack::Vector{T}) where {T} = begin
for (i, x) in enumerate(haystack)
if needle == x
return i
end
end
return missing
end
This is about the simplest search that could be written. We iterate through each entry in our haystack, checking if it’s equal to the thing we’re searching for (the needle
). If it is, we return the index, otherwise we continue onward. One reason this algorithm can be fast is that it is simple, and CPUs are quite good at running simple code. I ran my benchmarks using integer keys, which are easy for CPUs to work with. When compiled to assembly language, this code has two conditional branches. These branches are extremely easy to predict, because they always go the same direction, with the exception on the last time through the loop. The memory access patterns are also predictable, meaning the CPU may be able to automatically fetch the next piece of data before it’s needed (although I doubt this will give us much here, because the problem should be memory bound).
Now let’s see the binary search.
binary_search(needle::T, haystack::Vector{T}) where {T} = let low = 1, high = size(haystack)[1]
while true
let mid = (low + high) ÷ 2, midv = haystack[mid]
if midv == needle
return mid
elseif low == mid || high == mid
return missing
elseif midv < needle
low = mid
else
high = mid
end
end
end
end
This search depends on having a sorted input array. We keep track of the index of the low and high value that we are considering. Each time through the loop, we check the midpoint between these two values. If that is the one we’re looking for, we return. Otherwise, we adjust either the high or low bound and continue until we’ve either found the needle or the bounds have gotten too small.
It’s apparent from just the number of lines of code that this is a more complex algorithm. Each iteration has more conditional branches, and these are less predictable since sometimes our value will be too high and other times it will be too low. We also have to do more arithmetic in each iteration. In the linear search, we just had to increment the index by one each time. Now we have to do an addition and a division (although the compiler replaces the division with a faster bit shift). Finally, the memory access patterns are not as predictable.
Performance
Okay, so binary search is slower for small arrays because each step of the algorithm is more complex. It does eventually get faster, right? Let’s see. Below is a graph showing the execution time for the two searches on arrays of various sizes.
We can see that binary search does start out slower, but after the array size, n, gets above about 80 the binary search does indeed start to perform faster. If we were to extend the graph even further, we’d see that the effect grows even more pronounced. O(log n) is a very slow growing function!
One way of thinking about why binary search is faster is that it generally does significantly fewer comparisons. Binary search takes advantage of the fact that the array is sorted to discard half of the remaining items to search with each comparison. By contrast, linear search can only discard one item each time. One consequence of this is that binary search should perform even better if the comparisons were more expensive. So far we’ve been looking at integers, which are extremely fast to compare. Let’s try strings instead.
This is genuinely not what I expected. It now takes until around n=100 for binary search to overtake the linear search. Why is this the case? Julia stores strings with their length and their data. This takes a little more space, but it makes the extremely common operation of checking a string length very fast (O(1) whereas C’s strlen
is O(n), where n is the length of the string). The implementation of the comparison operator for strings first checks the lengths, since if two strings have different lengths they cannot be equal. This saves a lot of time in looking at the contents of the string. I generated the strings with random lengths, which means many times comparing them will be quick. Most of the time, we will only have to compare the string lengths, which means the random string case should look exactly the same as the integer case we saw earlier.
So why is the random string case slower? The linear search takes almost exactly twice as long for strings as for pure integers. Why is this the case? Remember when I said Julia’s string representation uses a little more space by pairing the length of the string with a pointer to the string data, instead of just storing a pointer to the string data? This is likely what’s having an effect here. CPUs do not read memory a byte at a time. Instead, they read chunks at a time. The idea is that if you read a piece of memory, you are likely to read other memory nearby in the near future. The details vary per CPU, but my CPU happens to have a 64 byte cache line size, meaning it reads memory 64 bytes at a time. Let’s look at what 64 bytes of an integer array looks like compared to 64 bytes of a string array.
In the pure integer case, a cache line contains eight separate numbers. On the other hand, one cache line in the array of strings case only has four lengths. Even if every string has a distinct length, we are still going to have to read twice as much memory. Since the program is memory bound, reading twice as much memory leads to taking twice as long to run the program.
We seem to have a reasonable explanation for why the linear search takes twice as long, but if you look carefully, you’ll see that binary search is even worse. Binary search seems to take four times as long when searching through strings! We could explain a 2× degradation the same way as with linear search, but where is the extra 2× coming from? The problem stems from the way Julia sorts strings. Consider this example:
julia> sort(["aa", "a", "b", "bb"])
4-element Array{String,1}:
"a"
"aa"
"b"
"bb"
What we see here is that Julia uses lexicographic ordering. This is a fancy way of saying “alphabetical order,” and most of the time that’s what we want when sorting strings. The problem for us is that in order to compare two strings, we must look at the content. This means that not only does each cache line we read from the array contain half as many elements as in the integer case, we now need to go read another cache line in addition to look at the contents of the string. While I’m being somewhat loose with the mathematics here, I suspect that having half as many entries per cache line and needing to read twice as many cache lines per comparison roughly accounts for the 4× reduction in performance for binary search. If we sorted the array according to length first (so we got [
"
a
"
,
"
b
"
,
"
aa
"
,
"
bb
"
]
instead), I suspect binary search would only be 2× slower.
Anyway, now that we have that behind us, let’s see what happens if we fix the length of the strings. Since we have removed the shortcut when the strings are not the same length, we should see different results. Let’s see!
This is actually rather dramatic! Here each string has length 64. The crossover point where binary search becomes faster has moved from around 110 to between 10 and 15. Another thing to notice is that all of the searches are much slower. With the random length strings, linear search took an average of 100ns at about N=100. On the other hand, now that the lengths are fixed, linear search is up to 100ns at around N=25. It seems that this is roughly four times slower. The reason here is likely the same thing that made binary search four times slower before—that comparing two strings now requires reading an additional cache line to look at the contents of the strings.
I’d hesitate to make too general of a conclusion from this, but it does appear to support the idea that binary search pays off quicker when the comparisons are more expensive.
Conclusion
So what can we learn from this? The main takeaway is to check your assumptions with measurements. Asymptotic analysis is valuable for comparing different algorithms, and the conclusions do hold in the large. Quite a bit of code lives in the space where the details still matter. For example, arrays of less than 100 elements are common in the programs I write. In these cases, the linear search has a good chance of being faster, while also being easier to write and maintain.