Perfect hash functions

If the map is known at compile time, and it does not change during the runtime, there is a very, very fast system that can improve by orders of magnitude the use of maps. It's called perfect hash functions, and that's the key to them: they perform the minimum required computation for a hash to know whether it's stored in the hash map. This is because it maps one, and only one, integer for each element. And it has no collisions. Of course, this requires a constant, known hash map at compilation time.

To use them, you will need the phf crate. With this crate, you will be able to define a hash map at compile time in the build.rs file at the same level as the Cargo.toml file and use it with no more overhead than a comparison in your code. Let's see how to configure it.

First, you will need to add the phf_codegen crate as a development dependency. For that, you will need to add a build-dependencies section to your Cargo.toml, with the same syntax as the dependencies section. Then, you will need to create a build.rs file and, inside, you will need something like the following:

extern crate phf_codegen;

use std::path::Path;
use std::env;
use std::fs::File;
use std::io::{BufWriter, Write};

fn main() {
let out_dir = env::var("OUT_DIR").unwrap();
let path = Path::new(&out_dir).join("phf.rs");
let mut file = BufWriter::new(File::create(&path).unwrap());

let map = [("key1", "\"value1\""), ("key2", "\"value2\"")];

write!(
&mut file,
"static MAP: phf::Map<&'static str, &'static str> =\n"
).unwrap();

let mut phf_map = phf_codegen::Map::new();
for &(key, value) in &map {
phf_map.entry(key, value);
}

phf_map.build(&mut file).unwrap();
write!(&mut file, ";\n").unwrap();
}

Let's check what is happening here. The build.rs script is run before the compilation starts (if it's present). We have a map that is an array of key/value tuples. It then creates a code generation map and adds entries one by one to the map. This has to be done in a loop, since the compiler stack could overflow due to deep recursion.

It will write into a file, called phf.rs, starting with a line adding a static variable, and then writing the whole map into the file, ending it with a new line. This means that once the compilation starts, a new file named phf.rs will exist that we can use from our code. How? You will need to directly include the file in your code:

extern crate phf;

include!(concat!(env!("OUT_DIR"), "/phf.rs"));

fn main() {
println!("{}", MAP.get("key1").unwrap());
}

This will print the value associated to key1, in this case, value1.

Note that when creating the map in the build.rs file, the values are written directly, so if you want to put a string, you need to add the quotation marks and escape them. This enables you to add enumeration variants, for example, or to write code directly for values.

Once you have learned how to use compile-time hash maps, you should understand the different kinds of collections the standard library allows you to use, since it will be crucial to the speed and memory footprint of your application.