Rust - Adding intervals correctly

Previously published

This article was previously published on len-learns-rust.com. A full index of these articles can be found here.

Our simple collection of intervals has a major failing, it doesn’t merge intervals and so we end up with as many intervals as we perform insert operations. For example, if we start with an empty collection and add [4] and then [6] we should have [4], [6], which we currently do. However, if we then add [5] we should end up with a single interval, [4,6] rather three separate intervals, [4], [5], [6].

The existing code also allows us to add duplicate values or intervals which we should at least be told about. The obvious next step is to add some failing unit tests that spell out the functionality that we require.

We’ll start with returning false when we try and add a duplicate value:

    #[test]
    fn test_insert_duplicate_value() {
        let mut intervals = Intervals::new();

        assert_eq!(intervals.insert_value(4), true);

        assert_eq!(intervals.dump(), "[4]");

        assert_eq!(intervals.insert_value(4), false);

        assert_eq!(intervals.dump(), "[4]");
    }

Next, some tests that extend an existing interval in the collection, either at the upper end or the lower end:

    #[test]
    fn test_insert_value_extends_lower() {
        let mut intervals = Intervals::new();

        assert_eq!(intervals.insert_value(4), true);

        assert_eq!(intervals.dump(), "[4]");

        assert_eq!(intervals.insert_value(3), true);

        assert_eq!(intervals.dump(), "[3,4]");
    }

    #[test]
    fn test_insert_value_extends_upper() {
        let mut intervals = Intervals::new();

        assert_eq!(intervals.insert_value(4), true);

        assert_eq!(intervals.dump(), "[4]");

        assert_eq!(intervals.insert_value(5), true);

        assert_eq!(intervals.dump(), "[4,5]");
    }

and finally an insert that joins to intervals:

    #[test]
    fn test_insert_value_joins_intervals() {
        let mut intervals = Intervals::new();

        assert_eq!(intervals.insert_value(4), true);

        assert_eq!(intervals.dump(), "[4]");

        assert_eq!(intervals.insert_value(6), true);

        assert_eq!(intervals.dump(), "[4], [6]");

        assert_eq!(intervals.insert_value(5), true);

        assert_eq!(intervals.dump(), "[4,6]");
    }

Given that the interface on the collection allows us to insert either values (intervals with the lower and upper values the same) or true intervals that span multiple values, we should also add tests for the interval versions of insert:

    #[test]
    fn test_insert_overlapping_interval() {
        let mut intervals = Intervals::new();

        assert_eq!(intervals.insert_interval(2, 6), true);

        assert_eq!(intervals.dump(), "[2,6]");

        assert_eq!(intervals.insert_interval(2, 6), false);

        assert_eq!(intervals.dump(), "[2,6]");

        assert_eq!(intervals.insert_interval(3, 4), false);

        assert_eq!(intervals.dump(), "[2,6]");

        assert_eq!(intervals.insert_interval(4, 5), false);

        assert_eq!(intervals.dump(), "[2,6]");

        assert_eq!(intervals.insert_interval(5, 6), false);

        assert_eq!(intervals.dump(), "[2,6]");

        assert_eq!(intervals.insert_interval(5, 7), false);

        assert_eq!(intervals.dump(), "[2,6]");
    }

    #[test]
    fn test_insert_interval_extends_lower() {
        let mut intervals = Intervals::new();

        assert_eq!(intervals.insert_interval(4, 6), true);

        assert_eq!(intervals.dump(), "[4,6]");

        assert_eq!(intervals.insert_interval(1, 3), true);

        assert_eq!(intervals.dump(), "[1,6]");
    }

    #[test]
    fn test_insert_interval_extends_upper() {
        let mut intervals = Intervals::new();

        assert_eq!(intervals.insert_interval(4, 6), true);

        assert_eq!(intervals.dump(), "[4,6]");

        assert_eq!(intervals.insert_interval(7, 9), true);

        assert_eq!(intervals.dump(), "[4,9]");
    }

    #[test]
    fn test_insert_interval_joins_intervals() {
        let mut intervals = Intervals::new();

        assert_eq!(intervals.insert_interval(10, 12), true);

        assert_eq!(intervals.dump(), "[10,12]");

        assert_eq!(intervals.insert_interval(18, 20), true);

        assert_eq!(intervals.dump(), "[10,12], [18,20]");

        assert_eq!(intervals.insert_interval(13, 17), true);

        assert_eq!(intervals.dump(), "[10,20]");
    }

Implementing any of this functionality requires that rather than simply inserting a new interval we first look at the collection to see what we already have around the area where we will be inserting. To achieve this we use std::collections::BTreeSet<>::range() to look up a range of intervals that include the interval we’re about to insert. First we will look for all elements in the set that sort at or after the interval that we’re about to insert. We do this like this:

    fn insert(&mut self, interval: Interval) -> bool {
        let intervals_after = self.intervals.range((Included(&interval), Unbounded));

We then access the first iterator in that range, for our search criteria this could be an interval that matches the interval we’re about to insert exactly, or one that includes it.

    fn insert(&mut self, interval: Interval) -> bool {
        let intervals_after = self.intervals.range((Included(&interval), Unbounded));

        let next_it = intervals_after.min();

        let next_is = next_it.is_some();

        if next_is {
            let next_ref = next_it.unwrap();

            if next_ref.overlaps(&interval)
            {
                return false;
            }
        }

The code above looks at the first element in the returned range and checks that it doesn’t overlap the supplied interval. This allows the test for inserting duplicate values to pass.

Next we do a similar lookup in the collection, but this time for all of the intervals before the one we wish to insert, again including the one we’re inserting:

        let intervals_before = self.intervals.range((Unbounded, Excluded(&interval)));

        let mut prev_it = intervals_before.max();

        let mut prev_is = prev_it.is_some();

        if prev_is {
            let prev_ref = prev_it.unwrap();

            if prev_ref.overlaps(&interval)
            {
                return false;
            }
        }

Note that this time we search from the start of the collection up to the interval that sorts before the one we are about to insert using Excluded(). We then perform the same check to see if this last interval before our insertion place overlaps us. This is due to the fact that [9,20] will sort after [1,10] and but overlaps.

We now potentially have a range of intervals that occur before the interval that we are inserting and a range of intervals that occur after it. These ranges may, or may not be contiguous with the new interval.

Two lookups?

Whilst this approach works for me I can’t help feeling that I’d rather be doing a single lookup into the BTreeSet<> using something like the C++ std::set::lower_bound() and then navigating in either direction to get the before and after intervals.

We’ll start with the case where we have both a ‘before’ and ‘after’ range of intervals and work out if we need to insert or merge into either of these intervals. For now I’ll put this code in a separate function so that it’s easier for me to understand and then call it from insert().

    fn insert_or_join_intervals(& mut self, interval: Interval, next: &Interval, prev: &Interval) {
       if next_is && prev_is {
            self.insert_or_join_intervals(interval, next_it.unwrap(), prev_it.unwrap());
        } else {
            self.intervals.insert(interval);
        }
Compiler says "no!"

Unfortunately, this doesn’t work. I get the following error from the compiler:

error[E0502]: cannot borrow `*self` as mutable because it is also borrowed as immutable
  --> src\intervals.rs:70:13
   |
39 |         let intervals_after = self.intervals.range((Included(&interval), Unbounded));
   |                               ------------------------------------------------------ immutable borrow occurs here
...
70 |             self.insert_or_join_intervals(interval, next_it.unwrap(), prev_it.unwrap());
   |             ^^^^^------------------------^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |             |    |
   |             |    immutable borrow later used by call
   |             mutable borrow occurs here

The iterators into the collection of intervals can’t be used to access the intervals that are in the collection if we’re then going to modify the collection… So, to be able to work with the intervals that we’ve looked we need to be able to clone them, this isn’t likely to be that expensive here but it seems unfortunate, perhaps we can refactor this requirement away later, after all we need immutable access to the intervals but, I guess, passing them into a function that can modify the collection they’re in means we could do something stupid. So, for now this means adding Clone to the list of features that we want implemented for the Interval struct.

#[derive(PartialOrd, Eq, PartialEq, Clone)]
pub struct Interval {
    lower: u8,
    upper: u8,
}

and we need to work with them like this:

    fn insert_or_join_intervals(& mut self, interval: Interval, next: Interval, prev: Interval) {
       if next_is && prev_is {
            self.insert_or_join_intervals(interval, next_it.unwrap().clone(), prev_it.unwrap().clone());
        } else {
            self.intervals.insert(interval);
        }

The implementation of the insert or join function is pretty much as you’d expect:

fn insert_or_join_intervals(& mut self, interval: Interval, next: Interval, prev: Interval) {
        let next_extends = next.extends_lower(&interval);

        let prev_extends = prev.extends_upper(&interval);

        if next_extends && prev_extends
        {
            // merges the prev and next intervals

            let new_interval = Interval::new(prev.lower(), next.upper());

            self.intervals.remove(&prev);
            self.intervals.remove(&next);
            self.intervals.insert(new_interval);
        }
        else if next_extends
        {
            // extends the next interval

            let new_interval = Interval::new(interval.lower(), next.upper());

            self.intervals.remove(&next);
            self.intervals.insert(new_interval);
        }
        else if prev_extends
        {
            // extends the previous interval

            let new_interval = Interval::new(prev.lower(), interval.upper());

            self.intervals.remove(&prev);
            self.intervals.insert(new_interval);
        } else {
            self.intervals.insert(interval);
        }
    }

First we work out if the new interval extends either the previous or next intervals and then we either merge with them or add the new interval separately. With this working and the test to join intervals now passing it’s easy to add the other cases to the insert function.

        if next_is && prev_is {
            self.insert_or_join_intervals(interval, next_it.unwrap().clone(), prev_it.unwrap().clone());
        } else if next_is {
            self.insert_or_merge_with_next(interval, next_it.unwrap().clone());
        } else if prev_is {
            self.insert_or_merge_with_prev(interval, prev_it.unwrap().clone());
        } else {
            self.intervals.insert(interval);
        }

and the implementation of the two new functions is simply a case of taking the obvious parts of the insert_or_join_intervals() function and, for now, copy and pasting them into the new functions.

Code duplication?

The reason for the duplication here is that I am trying to make the code easy to understand and to avoid needing to reason about where and when I unwrap the iterator and clone the intervals. There’s probably a nicer way…

All of the insertion and merge tests now pass but the sorting test fails as the rules for insertion have changed. The test is easy to update so that it tests the new rules.

Join in

The code can be found here on GitHub each step on the journey will have one or more separate directories of code, so this article’s code is here this allows for easy comparison of changes at each stage.

Of course, there may be a better way; leave comments if you’d like to help me learn.