Lists

Overview plan

    - Singly-linked lists
        - some characteristics
        - how they are formed
        - basic implementation (in Perl)
    - Indexing into a list
    - Interlude: Tail recursion, TCE
    - List operations
        - map / fmap
        - filter
        - fold
    - Modifying, inserting, deleting list elements
        - mutable versus immutable
        - compare with a mutable Vector (such as Perl's arrays)
    - Copying lists
        - how to deep copy an immutable data-structure ;-)
        - caching Empty (MooseX::Singleton)
        - caching generally (Memoize?)
    - Infinite lists
        - laziness
        - implementation
    - Doubly-linked lists?
        - mutable... immutable?
        - pointers
        - code
        - rewrite with Data::Zipper

    - Exercise: implement a turing machine with a zipper onto an infinite
        (in both directions) list

    - Show me the Code!: implementations of the above concepts
      (or, if available, usages of equivalent, available library functions)
        - Modern Perl with Moose
        - Javascript
        - Ruby
        - Python

Linked lists

The linked list is one of the oldest data structures in computing, dating back to the mid-1950s. While it's a collection type, similar to an Array, it makes very different tradeoffs.

In a dynamic programming language like Perl, Ruby, Python, or Javascript, we very rarely feel the need to look beyond the builtin array types, as they are usually Good Enough. However, the linked list is a fundamental building block, and we'll see that it is key for understanding concepts such as:

Beyond this, linked lists do have some features that arrays don't

A linked list is a structure that contains an element (often called the head) and a pointer to the rest of the list (the tail).

For example, the word "hello" is made up of the list of characters:

    [h.]->[e.]->[l.]->[l.]->[o ]
     ^    ^
   head   tail

The head is the letter "h", and the tail is the list

    [e.]->[l.]->[l.]->[o ]

So we've seen lists of 4 and 5 elements. You could grow the list arbitrarily large, by continuing to add links to new elements.

How about smaller lists?

    1 element:   [o ]
    0 elements:  ()

The 0-element list doesn't have a head or a tail. It's just an empty list. But how about the 1-element list? It has a head, but does it have a tail?

Well, we saw that the tail of the list "hello" was the list "ello". So the tail of "o" should be the list "", i.e. the empty list.

This means that really our lists look like:

    [o.]->()

    [h.]->[e.]->[l.]->[l.]->[o.]->()

Which means that all lists have a head and a tail. Unless they're the empty list, which has neither. i.e. a list can be one of two things:

    ()      the empty list
    [x.]->  a linking cell the connects a value to another list
            (where the other list is either another linking cell
            or an empty list)

A note on the implementations

(This note will be moved to Chapter 1)

The examples in this book are in Perl, but they should be accessible and relevant to all the dynamic languages, for example: Javascript, Python, and Ruby. To this end, the examples all use the Moose library, and in particular, the MooseX::Declare sugar layer. The former gives us a robust, "postmodern" Object Oriented framework that steals the best ideas from CLOS, Smalltalk, ML, and the other dynamic languages. The latter gives us a syntax that has less of the "line-noise" that some might have expected when hearing the word "Perl".

But, never fear, the code is still Perl: Moose itself, in particular, is used in production and is actively maintained by a large team of volunteers and businesses. The sugar layer is elegant, but if you are uncomfortable running macro transformations on your code, you lose little by rewriting the code without.

Of course, it's always nice to see examples in your own language-of-choice, and I hope that people will contribute to this opensource (Creative Commons licensed) book with code samples in Javascript, Python, Ruby, and indeed in any other language.

Implementation

Let's have a look now at how we could represent this in code. As usual, we'll look at examples in Modern Perl. We'll add code and explanations in Ruby, Python, and JavaScript at the end of the chapter.

We could represent the two cases as

    []              an empty array
    [$head, $tail]  an array representing a tuple of head and tail

However, in this book we'll use objects declared using Moose wherever we can. This will give more straight-forward code, as well as having a lot of power that we'll need when we go on to more complex scenarios.

Note that all of the code examples are implemented and tested at https://github.com/osfameron/pure-fp-book/tree/master/code, so you can easily check how things hang together. If there's no implementation in your language, then please fork the code on github and contribute!

As we saw, there are two types of list. From an Object Oriented perspective, we might think of implementing this with an Abstract Base Class (i.e. one that only acts as a superclass, to tie together related functionality and type information, but which can't be instantiated into a ->new object; and then with concrete subclasses. The Moosey way to do this is to use a Role (also known as a trait, and similar in many ways to Ruby's mixins).

    use MooseX::Declare;
    BEGIN { role_type 'List' }

    role List {
        requires 'head';
        requires 'tail';
    }

    class List::Link with List {
        has head => ( is => 'ro', isa => 'Any' );
        has tail => ( is => 'ro', isa => 'List' ),
    }

    class List::Empty with List {
        method head { die "Can't take head of empty list!" }
        method tail { die "Can't take tail of empty list!" }
    }

With this code, we could already create a list, but it'd be a little verbose:

    my $list = List::Link->new(
        head => 'h',
        tail => List::Link->new(
            head => 'e',
            tail => List::Link->new(...

So let's add a little sugar. We'll create a function fromArray that we'll call like so:

    my $list = List->fromArray( 1..10 );

As lists are a recursive data structure, it makes sense to implement this in a recursive way so:

Typically in Perl we might use a condition like if to distinguish between these cases. But in both Functional and Object Oriented programming, it's often much more elegant to use pattern matching (of which an OO "multiple dispatch" is a variant.) Moose provides a rather nice and flexible way of declaring multiply dispatched methods, and we'll take advantage of it.

    use MooseX::MultiMethods;

    # can be called as a class method, or on an object
    multi method fromArray ($class:) {
        return List::Empty->new;
    }
    multi method fromArray ($class: $head, @tail) {
        return List::Link->new(
            head => $head,
            tail => $class->fromArray(@tail),
        );
    }

This kind of recursive code is very elegant. But there is a teensy little problem: dynamic languages were never designed for this kind of code structure, and that means that, by default, they do have a few little hiccups that we need to be aware of.

Tail calls and recursion limits

In the method fromArray above, we recurse back into the same method until we've converted all the elements of the array into a List. Every time we do this, we are making a new method call, which is added to the call stack. This requires storing information about the caller, and to support these very dynamic languages, may require quite a lot of information... And finally, once we get to the base case, and are ready to return the result, the interpreter will then have to hop back through the stack, one level at a time, till it gets back to the top, passing the result information up all the way to the top.

If we make many of these recursive calls, then the interpreter may start to complain about what we're doing. For example, in Perl:

    Deep recursion on subroutine "List::fromArray" at foo.pl line 20

There are ways to eliminate these warnings

    no warnings 'recursion';
    $DB::deep = 100_000_000; # some big number, greater than maximum 
                             # number of recursions

But doing so would only be papering over the synptoms, the root problem would remain: we may blow all the memory allocated to the call stack, and the interpreter may crash.

In languages like Scheme and Haskell, which are built around recursive data structures, the compilers will naturally do Tail Call Elimination. This means that if the last action in a routine is to call another method (often, but not always, the current method) then you can simply jump to that method, without having to add the overhead of storing the call on the stack.

Though Perl can't (yet?) do this kind of optimization automatically, there are two modules on CPAN by Yuval Koogman that will allow you to handle this elegantly, Sub::Call::Tail and Sub::Call::Recur. The latter is faster, but only handles function calls that recurse into themselves. The former can handle tail calls to any subroutine, and also object methods, so we'll use that.

By default, Sub::Call::Tail exports a function called tail. As our List class already has a method tail (meaning something quite different), this could confuse both the reader and perl itself. So we'll take advantage of the facility to import an aliased function, and call it tail_call instead, which makes it much clearer!

 # the slightly nicer 'use' line is implemented in 
 # https://github.com/osfameron/sub-call-tail release pending ;-)

    # take that, recursion!
    use Sub::Import 'Sub::Call::Tail' tail => { -as => 'tail_call' };
    # use Sub::Call::Tail tail => { -as => 'tail_call'};

    multi method fromArray ($self: $head, @tail) {
        tail_call List::Link->new(
            head => $head,
            tail => $self->fromArray(@tail),
        );
    }

There are similar workarounds available for Python, Ruby, and Javascript. They will usually use a goto statement in some form, perhaps a structured one that either jumps back to the beginning of the block.

(Interestingly, though the Perl hack in Sub::Call::Tail is implemented with a goto &function call, which shares the same name as the reviled control-flow operation, it's substantially cleaner than you might fear. This type of goto simply replaces the currently running subroutine with a new one, a little like Unix's exec switches to a new process.)

Getting data out of a list

Indexing into the list

While arrays and lists both store collections of data, the representations make certain tasks easier or more difficult. Accessing an item positionally, for example, is trivial with an array:

     0  1  2  3  4 
    [h][e][l][l][o]

    To access item number 3, find the beginning of the array, and get the 
    contents of the box 3 positions to the right.

With a list, on the other hand, we don't have an easy way to jump to another element, as each cell is stored in a different location in memory.

    [h.]->[e.]->[l.]->[l.]->[o.]->()

Instead we need to visit each cell in turn until we arrive at the right head element.

Obviously, we can't index into an Empty list:

    multi method nth (List::Empty $self: Int $pos) { 
        die "Can't index into an Empty list"; 
    }

And if we want the zero-th element of a list, then we're just looking for its head.

    multi method nth (Int $pos where { $_ == 0 }) {
        return $self->head;
    }

But if we've passed a positive integer, then we can simply recurse down the tail, looking for the nth of the next lowest integer, until we get to 0.

    multi method nth (Int $pos where { $_ > 0 }) {
        tail_call $self->tail->nth( $pos - 1 );
    }

Notice how we've specified all the valid ways to call this method. If we call $list->nth("foo") or $list->nth(-1) then Moose will throw an error for us automatically!

Converting back to array

It may sometimes be convenient to flatten our List structure back to a native array. We could do this again with recursion.

    multi method toArray_recursive (List::Empty $self:) { 
        return ()
    }
    multi method toArray_recursive (List::Link $self:) {
        return ($self->head, $self->tail->toArray_recursive);
    }

But notice how we don't have a convenient function to make a tail call on. The toArray method is called, but as part of the expression only. In general, there are two ways to handle this:

As far as I know, the best general tutorial to these methods using a dynamic programming language is Mark Jason Dominus's Higher Order Perl. It's freely available online at http://hop.perl.plover.com/ to check it out, and it's well worth buying, whatever your programming language of choice.

We'll just briefly look at an example of each technique.

converting the function into a tail-recursive one

We'll need to extract the returned @array and instead pass it onwards within the tail call.

    multi method toArray_stack (List::Empty $self: @stack) {
        return @stack;
    }
    multi method toArray_stack (List::Link $self: @stack) {
        tail_call $self->tail->toArray_stack(@stack, $self->head);
    }

converting the function into an iterative one

Here we will entirely eschew the idea of making a recursive function call, and instead loop through the list using a classic imperative programming statement, the while loop. We'll also manage our own stack, as a variable @stack.

    method toArray_iterative () {
        my @stack;
        my $list = $self;
        while ($list->isa('List::Link')) {
            push @stack, $list->head;
            $list = $list->tail;
        }
        return @stack;
    }

Transformations on lists

Every standard library for manipulating lists should come with a number of basic transformations, for mapping all the values with a function, filtering them according to some criteria, or reducing the collection to a single value. We'll look at each of these in turn.

Applying a function to a list: map

We often want to apply a function to all elements of a collection. For example, we may have a list of user records, and we want a list of full names.

The dynamic languages, for the most part, have already borrowed this functionality from the world of Functional Programming, and often call it map.

    my @users = (
        {
            first_name => 'Bob',
            last_name  => 'Smith',
        },
        {
            first_name => 'Aisha',
            last_name  => 'Chaudhury',
        }
    );

    use signatures;
    sub full_name ($user) {
        return join ' ' => 
            $user->{first_name},
            $user->{last_name};
    }

    my @full_names = map full_name($_), @users;

To define map for a linked list, we of course define it recursively!

    multi method map (List::Empty $self: CodeRef $f) { 
        return $self;
    }
    multi method map (List::Link $self: CodeRef $f) {
        tail_call $self->new( 
            head => $f->($self->head),
            tail => $self->tail->map( $f )
        );
    }

and later:

    my $users = List->fromArray(@users);

    my $names = $users->map( \&full_name );
Functors

Note how we started with a function that operated on a single value:

    full_name :: { first_name, last_name } -> "Full name"

And we've ended with one that operates on whole collections:

    map(full_name) :: [List of {...}] -> [List of "Full name"]

Often, in the world of FP, we'll call this kind of operation a Functor. You'll often see the operation that we just defined as map called fmap instead (i.e. Functor Map). Converting a function that operates on a single value to one that operates on a collection (or any other type of container) is called lifting the function.

filtering the list

Another key operation on any collection type is to filter them by some criteria. All the dynamic languages offer this facility for their native types, usually calling them grep or filter, or in some cases using a more general facility called List Comprehensions. We'll implement the simple filter case, again recursively.

    multi method filter (List::Empty $self: CodeRef $f) { 
        return $self;
    }
    multi method filter (List::Link $self: CodeRef $f) {
        my $head = $self->head;
        if ($f->($head)) {
            tail_call $self->new( 
                head => $head,
                tail => $self->tail->filter( $f ),
            );
        }
        else {
            tail_call $self->tail->filter( $f );
        }
    }

So, we could now write

    # TODO TEST
    use signatures;
    my $odd_numbers = List->fromArray(1..100)->filter(
        sub ($n) { $n % 2 }
    );

Reducing a list with fold

Folds are the result of repeatedly applying an operation to every pair of values in a List until you get a single value. An example could be finding the sum of a list of numbers.

    sum (1,2,3,4,5) = 1 + 2 + 3 + 4 + 5

We could think of the sum as:

    1 + (2 + (3 + (4 + 5))) <-- a right fold

or as

    (((1 + 2) + 3) + 4) + 5 <-- a left fold

Oddly, these two folds, foldRight and foldLeft have rather different characteristics: sometimes one will be faster, or will never complete.

With all these recursive algorithms, we have to consider the "base case", which for lists means "what happens with an empty list?" The typical way to do this is to also include a parameter which is an identity value. For a sum operation, this would be 0.

    sub add ($x,$y) { $x + $y },
    sub sum ($nums) {
        return $nums->fold( \&add, 0 );
    }

Whereas for a multiplicative product it would be 1

    sub multiply ($x,$y) { $x * $y },
    sub product ($nums) {
        return $nums->fold( \&multiply, 1 );
    }

Though in both these cases, the fold's input and output are the same type of value, folds allow the result to be a different type entirely. In general, we call the result an accumulator, as it accumulates the folder data as we progress through the list.

Let's have a look at the left-fold first.

    multi method foldLeft (List::Empty $self: CodeRef $f, $acc) {
        return $acc;
    }
    multi method foldLeft (List::Link $self: CodeRef $f, $acc) {
        tail_call $self->tail->foldLeft($f, $f->($self->head, $acc));
    }

So the left fold is a typical recursive function. Note how we have to pass the accumulator to the first call - often we'd call this the init argument.

While it's easy to see what the init argument should be, in many cases, you may have noticed that my original example omitted one:

    sum (1,2,3,4,5) = 1 + 2 + 3 + 4 + 5

In this case we can simply use the first element in the list to start things off. Often such a fold is labelled with the number 1.

    # TODO test
    method foldLeft1 (List::Link $self: CodeRef $f, $acc) {
        return $self->tail->foldLeft1($f, $self->head);
    }

As we've specified that $self must be a linking list, the method will fail entirely on an empty list. This is entirely expected behaviour for a foldLeft1 or foldRight1.

# TODO case where we have an accumulator, rather than a typical binop

Now let's have a look at a right fold. The empty case is, of course, the same:

    multi method foldRight (List::Empty $self: CodeRef $f, $acc) {
        return $acc;
    }

But note how, unlike foldLeft, the right fold isn't tail recursive:

    multi method foldRight (List::Link $self: CodeRef $f, $acc) {
        tail_call $f->(
            $self->head,
            $self->tail->foldRight($f, $acc),
        );
    }

Though we have a tail call, it's only on the folding function $f. The actual call back into foldRight isn't in the tail position, where it can be easily optimized, but rather inside the call to this folding function.

Later, when we look at infinite lazy lists, we'll come back to another little tweak that languages like Haskell use to optimize right folds.

Modifying the list

We noted earlier that an array was represented as a contiguous area of memory while a list is a series of nodes that point to each other

    ARRAY:
     0  1  2  3  4 
    [h][e][l][l][o]

    LIST
    0     1     2     3     4     5
    [h.]->[e.]->[l.]->[l.]->[o.]->()

We saw above that the code to access the nth element of a list was rather more complicated than for an array. Similarly, other operations are more (or less) complicated. We'll look at updating a single element, inserting a new one, and deletion in turn, comparing the trade-offs between an array and a list.

Updating an element

If we want to change our list to "hullo" instead, then this is easy with an array.

    $array[1] = "u";

    ARRAY is now:
    [h][u][l][l][o]

Of course we've now "lost" the original representation of "hello", because the contents of that array have been entirely written over. Now, we might want to do the same thing with a list

    $list->mutateAt(1, 'u');

    LIST
    0     1     2     3     4     5
    [h.]->[u.]->[l.]->[l.]->[o.]->()
           ^
           changed in-place

And of course we'd do it recursively:

    # NOT TESTED
    multi method mutateAt (List::Empty $self: Int $pos, $new) { 
        die "Can't mutate an Empty list"; 
    }
    multi method mutateAt (Int $pos where { $_ == 0 }, $new) {
        $self->head($new);
    }
    multi method mutateAt (Int $pos where { $_ > 0 }, $new) {
        tail_call $self->tail->mutateAt($pos - 1, $new);
    }

However, that would be a mutation of a value, and in this book, we're mostly interested in immutable, pure data structures. Note that we even declared our attributes with read-only ('ro') accessors.

    class List::Link with List {
        has head => ( is => 'ro', isa => 'Any' );
        has tail => ( is => 'ro', isa => 'List' ),
    }

If we relax these to 'rw', we could make the mutateAt code work. This is how you might typically work with linked lists in a language like C. But of course such a list shares the disadvantage of the array mutation - that we have now lost the original "hello" sequence. Let's see if we can do better.

So, we want to retain the 'e' node, but avoid it, like so:

    LIST
    0     1     2     3     4     5
    [h.]->[e.]->[l.]->[l.]->[o.]->()
        \     /
          [u.]

But of course the node 'h' can't point to two tails! So we could relink it entirely:

    LIST
    0     1     2     3     4     5
    [h.] x{e.}->[l.]->[l.]->[o.]->()
        \     /
          [u.]

But of course that means mutating the tail of the the list 'h', which we've promised we wouldn't do... immutable data-structures are hard!

The trick turns out to be to copy everything before the element we're changing.

    LIST
    0     1     2     3     4     5
    [h.]->[e.]->[l.]->[l.]->[o.]->()
              /
    [h.]->[u.]

We now have 2 different nodes that both contain the value 'h' in their head, but which are in fact different lists: they have entirely different tails. Note, though, that they converge at 'l', so we only need to copy part of the list.

    # NOT TESTED
    multi method updateAt (List::Empty $self: Int $pos, $new) { 
        die "Can't update an Empty list"; 
    }
    multi method updateAt (Int $pos where { $_ == 0 }, $new) {
        List::Link->new({
            head => $new,
            tail => $self->tail,
        });
    }
    multi method updateAt (Int $pos where { $_ > 0 }, $new) {
        tail_call List::Link->new({
            head => $self->head,
            tail => $self->tail->updateAt($pos - 1, $new);
        });
    }

If you think about it, this method is following the same pattern of nth, except that instead of just descending the list till we get to the element we want, we copy the list as we go. Then, once we get to position 0, we link this copied part of the list with the updated element, and the rest of the list.

All this copying may seem inefficient. If you have to do multiple updates, then you will end up copying at least the beginning of the list each time. Tellingly, Haskell's Prelude (the standard library), with a very rich set of functions to manipulate linked lists. entirely omits any version of updateAt.

In fact, a linked list, by default isn't the optimal structure for random access. However, we have seen that the benefit of using an immutable structure is that we get to keep the old and new versions of a structure at the same time, and with minimal memory usage, because of the sharing of data between the two structures.

This technique of data sharing where possible (and cloning with updated values where not) will be a common tool, so it is worth getting to grips with this simple example first, so we'll look now at inserting and deleting elements from a list.

In both these cases, we'll say that the operation is perhaps less convenient with an array.

Inserting a new element

We already saw that arrays are contiguous. We will have reserved a block of memory for the array, so to insert into it, we first have to copy all of the data to the right, and finally insert the element. So, for example, to insert the missing letter into the alphabet, we'd do:

    ARRAY:
     0  1  2  3 
    [a][b][d][e]

    copy
     0  1  2  3  4
    [a][b][d][d][e]

    update
     0  1  2  3  4 
    [a][b][c][d][e]

That looks nice and straight-forward: but of course the array didn't have this extra element reserved! In the dynamic languages what we call an array is usually in fact a dynamic array or vector, which will resize itself behind the scenes. So in the above example, when we try to add a new element to the vector, our insert algorithm must first realise that we're growing the array, and then allocate a new block of space (we'll double the storage space here, to avoid having to re-allocate too often).

    ARRAY:          NEW
     0  1  2  3     0  1  2  3  4  5  6  7
    [a][b][d][e]   [ ][ ][ ][ ][ ][ ][ ][ ]

Then we'll copy the data to the left of the insertion

    ARRAY:          NEW
     0  1  2  3     0  1  2  3  4  5  6  7
    [a][b][d][e]   [a][b][ ][ ][ ][ ][ ][ ]

and to the right

    ARRAY:          NEW
     0  1  2  3     0  1  2  3  4  5  6  7
    [a][b][d][e]   [a][b][ ][d][e][ ][ ][ ]

and of course, add the inserted element. (Now, finally, we can de-allocate the old array from memory).

                    NEW
                    0  1  2  3  4  5  6  7
                   [a][b][c][d][e][ ][ ][ ]

As you can see, this is quite a lot of work, and that's the reason why we pre-allocated a larger range of memory than needed. The next time we insert a value, we can use our first, simpler version, and only need to resize again once we need more than 8 elements. Of course, this does mean that we also need to keep track of the number of items in the array:

                    NEW
                    0  1  2  3  4  5  6  7    #count
                   [a][b][c][d][e][ ][ ][ ]    5

Let's look at how we'd insert an element into a list:

    LIST
    0     1     2     3     4
    [a.]->[b.]->[d.]->[e.]->()

We need to re-point the link from b to a new list with value c that points to the remainder of the list d.

    LIST
    0     1      3     4     5
    [a.]->[b.]   [d.]->[e.]->()
             \   /
             [c.]
             2

If we were using a mutable data-structure, we'd do exactly this, and just mutate the references to point to the right place. With a pure, immutable list, we'll take advantage of sharing as before (by pointing to the e list, which remains unchanged) but we'll also have to copy the start of the list.

    LIST
    0     1     2     3     4     5
    [a.]->[b.]------->[d.]->[e.]->()
                    /
    [a.]->[b.]->[c.]

The insertion code is quite simple, and follows the same structure as updateAt:

    # NOT TESTED
    multi method insertAt (List::Empty $self: Int $pos, $new) { 
        die "Can't insert into an Empty list"; 
    }
    multi method insertAt (Int $pos where { $_ == 0 }, $new) {
        List::Link->new({
            head => $new,
            tail => $self,
        });
    }
    multi method insertAt (Int $pos where { $_ > 0 }, $new) {
        tail_call List::Link->new({
            head => $self->head,
            tail => $self->tail->insertAt($pos - 1, $new);
        });
    }

Just as before, we copy the head of the list up until the point we want to insert. In fact, there is only difference between the routines! Instead of linking the list to the original position's tail, we link to the whole list.

Deleting an element

Deletion from an array will always involve shifting the elements after the one deleted to the left, and then decreasing the element count.

    array
    0  1  2  3  4  5  6  7
   [h][a][l][l][o][ ][ ][ ]  #5

   [h][a][l][o][ ][ ][ ][ ]  #4

At this point, the algorithm might choose to reallocate the memory, to shrink the size of data stored.

Again, using linked lists, this could be trivial.

    LIST
    0     1     2     3     4     5
    [h.]->[a.]->[l.]->[l.]->[o.]->()

    [h.]->[a.]->[l.]------->[o.]->()

But again, if we want to use immutable lists, we'll have a copy and share, as before.

    # NOT TESTED
    multi method deleteAt (List::Empty $self: Int $pos) { 
        die "Can't delete from an Empty list"; 
    }
    multi method deleteAt (Int $pos where { $_ == 0 }) {
        return $self->tail;
    }
    multi method deleteAt (Int $pos where { $_ > 0 }) {
        tail_call List::Link->new({
            head => $self->head,
            tail => $self->tail->deleteAt($pos - 1, $new);
        });
    }

This follows the same structure again, except that once we get to the element we want to delete, we simply skip over its head and return its tail instead.

Copying

TODO: a) we "copied" tail of list by just pointing at it. "copy" of first part was in fact a change. But a *deep copy* of a Pure FP structure is as simple as referring to it. TODO b) MooseX::Singleton

Comparing the array and list approaches

The pros and cons of each approach might include:

It is important to be aware of these trade-offs when deciding whether to look beyond your language's default dynamic array type, or not.

But of course, so far we've only seen lists that are linked in one direction (also known as "singly linked lists".) To be able to fairly compare lists to arrays, we'll now look at at a variant that can be used to traverse the list in both directions.

Later in the book, we'll also look at Trees, which may further address concerns with efficient random access to given elements.

Doubly linked lists

We've seen that it's easy to traverse our linked lists from the beginning, by following the tail until we get to the empty list at the end. This is a fundamental capability, and is used by many fundamental functions, for example map, filter, and fold, as we saw. Occasionally, though, we will want to be able to traverse a list in both directions.

For example, consider a list of tabs in a web browser:

              google.com
              google.com search results
   focused -> guardian.co.uk             
              hop.perl.plover.com/book
  

We might want to move to the next tab to the right (the Higher Order Perl book) or to the left (Google search results).

A classic structure from computer science is the doubly linked list: as the name implies, rather than having just a head and a tail, it will have instead a value and then two links to prev and next

So, the tabs I showed above might be represented as:

   ()<-[.1.]<->[.2.]<->[.3.]<->[.4.]->()

In a mutable programming language, we might iterate each node, pointing it to the next node, and then pointing the next node's prev back at it.

Of course we can't, strictly speaking, do that in a pure style. But we have at least two options:

You'll have noticed that I've left implementing both the workarounds as exercises. Though they're no doubt interesting, and valuable learning projects, it turns out that doubly linked lists may be suboptimal in a purely functional context, and we'll shortly look at a better alternative.

Modifying a doubly linked list

Let's look at what happens when you modify an element in a doubly linked list:

   ()<-[.1.]<->[.2.]<->[.3.]<->[.4.]->()
                     ?\     /?
                       [.5.]

As we saw before, with singly linked lists, we'd have to copy the left hand side of the list, because the list is flowing in the direction of a changed value, and is therefore itself changed.

   ()<-[.1.]<->[.2.]<->[.3.]<->[.4.]->()
                            /?
   ()<-[.1.]<->[.2.]<->[.5.]

So that means going through the fiddly 2-way linking process on the whole of the left side. But notice that I highlighted the phrase "in the direction of", but we're looking at a list that flows in both directions. This means we also need to copy the right hand side!

   ()<-[.1.]<->[.2.]<->[.3.]<->[.4.]->()
                              
   ()<-[.1.]<->[.2.]<->[.5.]<->[.4.]->()

So any change to a doubly linked list requires modifying the whole list! This makes them a poor choice for purely functional programming, at least in the case where you have to make any changes after construction.

Garbage collection

In Perl, memory is allocated using reference counting. Every time a value is referred to by something (a variable, or an object attribute) this is added to its reference count. When a reference is dropped, this is reduced. When the count reaches 0, the object is garbage collected.

Now, for simple cases, this works perfectly. With a singly linked list, for example:

    my $list = List->fromArray(1,2,3)
    
                    [1.]->[2.]->[3.]->()
    referred to:    $list <1>   <2>  <3>
    from   
    count:           1    1     1     1

The first item 1 has a single reference (from $list) and the subsequent references via tail keep the rest of the list in memory. If we now move down into the list, replacing the contents of $list with its own tail:

    $list = $list->tail;

                    [1.]->[2.]->[3.]->()
    referred to:        $list   <2>  <3>
    from   
    count:           0    1     1     1

we can see that Perl is now ready to garbage collect the node 1. Similarly, if we delete $list entirely, then:

    undef $list;
                    [1.]->[2.]->[3.]->()
    referred to:          X     <2>  <3>
    from   
    count:           0    0     1     1

List no longer has a pointer to 2, so in fact this should be:

                    [1.]->[2.]->[3.]->()
    referred to:                X    <3>
    from   
    count:           0    0     0     1

                    [1.]->[2.]->[3.]->()
    referred to:                      X 
    from   
    count:           0    0     0     0

So, for linear structures, this works really well! But let's look at a circular structure, for example, a doubly linked list of 2 elements.

                    ()  <-    [.1.]   <->   [.2.]   ->   ()
    referred to:              $list         
    from           1.prev     2.prev        1.next     2.next
    count:           1          2             1           1

Obviously, we want the list to remain in memory if we move (after all, there's no value in a list that you can traverse in both directions if the direction you came from disappears when you move!)

    $list = $list->next;

                    ()  <-    [.1.]   <->   [.2.]   ->   ()
    referred to:                             $list
    from           1.prev     2.prev        1.next     2.next
    count:           1          1             2           1

But if we delete $list, just look what happens.

    undef $list;

                    ()  <-    [.1.]   <->   [.2.]   ->   ()
    referred to:                             
    from           1.prev     2.prev        1.next     2.next
    count:           1          1             1           1

Though there are no variables pointing to any of these values, they will never get garbage collected! If your program runs long enough, this kind of space leak may eventually become a problem.

Perl has two workarounds for this:

Effectively, this means that you can't rely on automatic garbage collection for a doubly linked list, and would have to manually call a $list->delete method (which then unpicks the references all the way along the list in both directions).

# TODO handwave, further contributions welcome from language experts! Ruby (at least the MRI implementation) and most modern Javascript implementations use mark-and-sweep GC instead. This doesn't suffer the same problem. Python uses a slightly modified reference counting algorithm with cycle detection, so may also avoid the issue.

So...

OK, so the Garbage Collection issue is avoidable, or not an issue, depending on your language. But even if you don't program in Perl, it's interesting to look at the reference counting problem, because it so closely mirrors the issue with updating a list. In general, cycles will interfere with the sharing aspect of functional data structures and, though they may sometimes be necessary or useful, we'll more often find ways to avoid them.

We'll now look at exactly such a way of traversing a list, while being able to get back to the beginning.

A pointer into a list

Let's imagine that we have a singly linked list, that we want to be able to traverse in both directions. We'll create a new data structure for this traversal that will simply point at the position it's currently at.

    [h.]->[e.]->[l.]->[l.]->[o.]->()
      ^
      #

When we start at the beginning, this pointer will simply point to the list headed by "h". From there, it can get the current value, and also move to the next element (its tail). Let's see what happens when we move to the right:

    [h.]->[e.]->[l.]->[l.]->[o.]->()
           ^
           #

Now the cursor is pointing to the list headed by "e". It can move to the next element, but has no link back. Let's fix that!

    [h.]->[e.]->[l.]->[l.]->[o.]->()
        \  ^
           #

This looks ok, but when we move to the next element again, we can see the problem:

    [h.]->[e.]->[l.]->[l.]->[o.]->()
              \  ^
                 #

Now the cursor can point to the "e", but it has no way of going back to "h". What we really need is another list, going in the opposite direction!

        [h.]->[e.]->[l.]->[l.]->[o.]->()
                     ^
    ()<-[h.]<-[e.]<- #

This is quite different from a doubly linked list! The links forwards and backwards are in fact in entirely different structures. There are no circular references at all. The cursor will start pointing to the head of the list, and an empty list pointing backwards:

        [h.]->[e.]->[l.]->[l.]->[o.]->()
         ^
    ()<- #

and finish off pointing at the empty end of the original list, and to a complete copy (in reverse order) of the list's contents.

        [h.]->[e.]->[l.]->[l.]->[o.]->()
                                 ^
    ()<-[h.]<-[e.]<-[l.]<-[l.]<- #

In fact, because of garbage collection, if there are no other references to the list's contents, the picture will be more like:

                                [o.]->()
                                 ^
    ()<-[h.]<-[e.]<-[l.]<-[l.]<- #

We can travel back to the beginning, but we will end up with an entirely new list (albeit an identical copy) that what we started with.