Indexing degradations

Let's start with a simple example. This Rust code will perform poorly:

    let arr = ['a', 'b', 'c', 'd', 'e', 'f'];

for i in 0..arr.len() {
println!("{}", arr[i]);
}

This will, of course, work, and it's perfectly safe. We create an index that goes from 0 to the length of the array (6 in this case), but exclude the last one, so the i binding will take the values 0, 1, 2, 3, 4, and 5. For each of them, it will get the element at that index in the array and print it in a new line. There is one problem with this approach though. In C/C++, an equivalent code will simply add the size of the element to the pointer in the array and get the next element, but that sometimes causes issues. Look at this code:

    let arr = ['a', 'b', 'c', 'd', 'e', 'f'];

for i in 0..arr.len() + 1 {
println!("{}", arr[i]);
}

In this case, we are iterating until the array length + 1, and since ranges are exclusive, the last index will be 6. This means it will try to get the seventh element in the array, but there is no seventh element. In C/C++, this will create a buffer overflow and will get whatever is next in the memory. In the case that this memory is outside the program, you will get a segmentation fault, but if it's part of the program, it will print whatever was in that position, leading to leaks. Of course, that is not possible in Rust, since Rust is a memory-safe language, so what will happen?

Well, the answer is surprising—it will panic the program, unwind the stack (call all the destructors of all the variables in the stack), and exit the program safely without trying to access invalid memory. Depending on your perspective, you might think great, I will no longer have buffer overflows, or you might think oh my God, the whole server will go down to prevent a buffer overflow. Well, the second can be mitigated by stopping the panic and recovering the proper server state, already implemented in most frameworks, so it's mostly a win-win.

But is it? How does Rust know if the index is out of bounds? In this simple example, the compiler could know that the arr variable has only six elements, so trying to access the seventh would violate memory constraints. But what about this more complex program:

    fn print_request(req: Request) {
for i in 0..req.content_length {
println!("{}", req.data[i]);
}
}

Here I'm receiving an HTTP request (very naively represented) that has at least one content_length attribute and one data attribute. The first should contain the length of the data field, in a number of bytes, while the second will be a vector of bytes. Let's suppose we don't have the len() function in that data field, and that we trust the content_length attribute. What if somebody were to send us an invalid request with a bigger content_length than the actual length of the content? The compiler wouldn't know this in advance because the request originated at runtime from a TCP connection, but again, Rust must always be memory-safe (unless working in an unsafe scope, which is not the case).

Well, what happens is that the index operation has two parts. First, it checks the bounds of the slice, and if the index is fine, it will return the element; if not, it will panic. And yes, it does this for every indexing operation. So in this case, if the request is a valid request with a supposed 1 million bytes (1 MB), it will compare the index to the length of the vector 1 million times. That is at least 2 million extra instructions (the comparison and the branching for each, at least). That becomes much less efficient than the equivalent C/C++ code.