Slicing collections into smaller pieces

So far we've seen how collections can be filtered, creating a new smaller collection. Lo-Dash supplies you with a number of functions that take existing arrays and produce one or more smaller arrays. For example, you might want a portion of the first part of any array—or a portion of the last part. Arrays can be divided into chunks of smaller arrays that are useful for batched processing. You can also use Lo-Dash array tools to remove duplicates, thus ensuring the uniqueness of your array.

First and last collection portions

With native JavaScript arrays, you can slice off the first portion of an array using the slice() array method. Lo-Dash provides abstractions on top of the native array slice() method that make it a little easier for the developer to write intuitive code—this isn't always the case with the native array approach. Further, the Lo-Dash take() function operates on collections, so it'll work with both arrays and strings, as shown in the following code:

var array = [ 
    'Steve',
    'Michelle',
    'Rebecca',
    'Alan'
];

_.take(array, 2);
// → [ "Steve", "Michelle" ]

_.take('lodash', 2).join('');
// → "lo"

There's a difference in the output when using take() on arrays and strings. When applying it to an array, it generates a new array, a subset of the original. However, when applying take() to strings, it returns a new array of individual characters. The preceding code will return [ 'l', 'o' ]. That's probably not what we're after most of the time, so we'll just join these characters back together with an empty string.

We can slice off the last portions of collections and strings using the takeRight() function. Using the same array and string, you can run the following code to get the last portions of the collections:

_.takeRight(array, 2);
_.takeRight(string, 4).join('');

The resulting array looks like [ 'Rebecca', 'Alan']. The resulting string looks like 'dash'.

Applying take() to a collection without any arguments will slice the first item. Likewise, applying takeRight() without any arguments slices off the last item. In both the cases, the returned value is a one item array, not the item itself. If you're just after the first or last collection items, use the first() and last() Lo-Dash functions respectively.

Splitting collections into chunks

Sometimes, we're faced with large collections. Really large collections. Especially when using API data, the frontend doesn't always have control over the size of the dataset that's returned. When the API does return a mountain of data, there's a good chance that our code that processes it will lock the UI. We can't exactly say give me less data to work with so that the UI doesn't freeze. Freezing the UI is also unacceptable.

Lo-Dash iterates through collections very efficiently. It doesn't, however, have any control over the potentially expensive operations carried out by your code. And this is what causes the UI to freeze – not the size of the collection by itself, and not by executing an expensive operation once – it's the two factors combined together that become lethal for UI responsiveness.

The chunk() function is an easy way to split the processing of a really large collection into several smaller tasks. This gives the chance for the UI to update—render the pending DOM updates and processes the pending events. The usage of this function can be seen in the following code:

function process(chunks, index) {
    var chunk = chunks[index];
    if (_.isUndefined(chunk)) {
        return;
    };  
    console.log('doing expensive work ' + _.last(chunk));
    _.defer(_.partial(process, chunks, ++index));
}

var collection = _.range(10000),
    chunks = _.chunk(collection, 50);

process(chunks, 0);
// → 
// doing expensive work 49
// doing expensive work 99
// doing expensive work 149

If the preceding code is a bit of a turn off, don't worry. There're a few new concepts introduced here that you might find confusing. Let's start by explaining at a high level what the code is actually doing. A large collection is created and it's split into chunks of smaller collections. The process() function does some work with each chunk, then calls itself again to process the next chunk, until there are no chunks left.

The collection itself is generated using the range() function, with 10000 integers in it. It's not the content that's important, but rather, the large size. The chunk() function is used to split the large collection into smaller ones. We specify the size we want each chunked collection to be and in this case, we get 20 smaller collections with 50 items each. The processing work is kicked off by the call to process(chunks, 0). The second argument is the first chunk to begin with.

The process() function itself grabs the next chunk to process based on the index argument. If the chunk is undefined, it means that the end has been reached and there are no more chunks to process. Otherwise, we can start doing expensive processing on the chunk, as illustrated in the example with the console.log() call. Finally, the defer() function will start processing the next chunk. The reason we're using defer() is so that the call stack has a chance to clear, and the DOM operations have a chance to run. If we don't do this, there wouldn't be any point in using chunk() to split the processing. The defer() function expects a callback, and we make one using partial(), which creates a new function, with arguments already supplied to it.

Note

The defer() and partial() functions are covered in much more depth in Chapter 3, Working with Functions.

How do we know what size to make our array chunks? In the previous code, we chose 50 as the chunk size. But is that an arbitrary decision, or is it based on the typical datasets used in the application? The short answer is that we have to tinker a little and optimize for the common case. This might mean doing something such as figuring out the chunk size based on a percentage of the overall collection size, as shown in the following code:

var collection = _.range(10),
    size = Math.ceil(0.25 * collection.length);
_.chunk(collection, size);
// → 
// [
//   [ 0, 1, 2 ],
//   [ 3, 4, 5 ],
//   [ 6, 7, 8 ],
//   [ 9 ]
// ]

The chunk size here turns out to be 3. The actual size is 2.5, but you take the ceiling of that since there's no such thing as 2.5 collection elements. Besides, what you're interested in is not the exactness of the chunk size, but rather, the proximity to the 25 percent.

Note

You might have noticed that 3 doesn't divide evenly into 10. The chunk() function is smart enough to not leave out items. Any remaining items that do not fill the chunk size are still included.

Building unique arrays

Collections sometimes have unwanted duplicates in them. This could be a result of the API data itself that contains the duplicates, or as a side effect of other computations you're performing in the frontend. Regardless of the cause, Lo-Dash provides the tools necessary to quickly generate unique collections.

The uniq() function takes a collection as the input and generates a new collection as output, with any duplicates removed:

var collection = [ 
    'Walter',
    'Brenda',
    'Arthur',
    'Walter'
];

_.uniq(collection);
// → [ "Walter", "Brenda", "Arthur" ]

By default, the potential duplicates are compared against one another using the strict equality operator. In the preceding collection, the duplicate is found and removed because of 'Walter' === 'Walter'. You can specify, in more detail, how you want uniq() to compare values. For example, if we had a collection of objects and we only want unique objects based on the name property, we could write _.uniq(collection, 'name'). The function also accepts a callback, which is used to compute the values before they're compared. This is useful in situations where the uniqueness of an object isn't so straightforward, as in the following code:

var collection = [ 
    { first: 'Julie', last: 'Sanders' },
    { first: 'Craig', last: 'Scott' },
    { first: 'Catherine', last: 'Stewart' },
    { first: 'Julie', last: 'Sanders' },
    { first: 'Craig', last: 'Scott' },
    { first: 'Janet', last: 'Jenkins' }
];

_.uniq(collection, function(item) {
    return item.first + item.last;
});
// →
// [
//   { first: "Julie", last: "Sanders" },
//   { first: "Craig", last: "Scott" },
//   { first: "Catherine", last: "Stewart" },
//   { first: "Janet", last: "Jenkins" }
// ]

This code ensures that the uniqueness of each object in the collection is based on the full name. There's no full name property, perhaps it's not needed anywhere else in the application. So, the uniq() function can just construct one on-the-fly, which is used for the sole purpose of validating this constraint.