All entries for Wednesday 15 May 2019

May 15, 2019

Datastructures – Linked lists part 1

Back to data structures this month with the linked list. Linked lists are a way of holding data that allows you to add and remove items quickly and easily.

Why not arrays?

First question: why is adding and removing items from an array not quick and/or easy? The problem with adding items is quite simple - arrays have a fixed size so eventually you will run out of spaces in your array to store items. When this happens you have to do something to allocate additional space. Many languages have a function called "realloc" or similar that tries to extend the length of your array but it can only do that if there is unused memory space "above" the location of your array because the array elements have to be arranged one after the other in memory. The concept of "space above" is a bit complex in general and depends on details of your OS etc. but as a general idea if you allocate two arrays then they are placed one after the other in the computer's underlying memory so if you try to realloc the first array then there won't be any space between it and the second array to grow it in. If you can't grow your array like this then you have to allocate new memory to store the bigger array and copy the existing elements in. If you keep adding items then this continual growing of your array can be quite expensive, although this can be mitigated by always growing your array by more elements than you immediately need.

Removing items has the opposite problem. Since arrays are required to be contiguous (can't have gaps in them) you can't just "remove" an item you have to either flag it as empty and ignore it when going through your array in future or take all of the items above the removed element and move them down to pack everything up. The first approach has three problems

You have to use additional memory to flag items as being empty or not
If you are both adding and removing items from your array then since you don't actually recover memory when you remove an item your total memory requirements will grow without bounds
Depending on your algorithm you might have more difficulty getting optimal performance if you have to do fundamentally different things for empty and non-empty array elements

The second approach avoids those problems but on average involves copying half of the elements in your array every time you remove an item which can also be quite expensive.

It is quite possible to build a container based on arrays that you can add and remove items from that has good general performance (C++ std::vector is a good example of one) but they always have to make tradeoffs and if you are doing a lot of adding and removing of arbitrary elements it might be better to use a data structure other than an array.

Linked lists

The idea of a linked list is quite simple. Each element in a linked list is like a link in a chain - linked to the item after them, so you go through the linked list by taking the first item then going to the next item and the next etc. until you reach the end. This is generally implemented using pointers in what are often called "self referential structures", that is structures that contain pointers to themselves. These are easy enough to implement in either C/C++ or Fortran.

struct llitem{
  struct llitem *prev, *next;
};

TYPE :: llitem
  TYPE(llitem), POINTER :: next, prev
END TYPE llitem

These are more or less normal types but there is one more important rule: self referential structures can contain only pointers to their own type, not actual instances of their own type (try removing the *s in C or the POINTER attribute in Fortran and it will fail to compile). This is because types, much like arrays, are laid out contiguously in memory so they can only contain things that the compiler knows the length of and if you have a type that contains an instance of itself then there would be an infinite regression problem because you don't know how big it is until you have finished creating it and you can't create it until you know how big it is. Pointers are all of a fixed size so they work OK.

The structure as given is for what is technically called a doubly linked list because it contains links both to the next item and the previous item in the list. A singly linked list has each item linked only to the next item in the list. Doubly linked lists have some substantial advantages over singly linked lists, notably that you can go through it from either end, but also you can remove an item from the list needing only the item itself (and the list that it is held in if you have several).

Creating linked lists

Creating a linked list is quite easy. You hold a simple pointer to the first element in the list (generally called the head item) and then you simply create the list going down from that. The key thing is that you have to hook up the prev and next links as you go. This isn't too difficult and looks like

#include 
#include 

struct llitem{
  int value;
  struct llitem *next;
  struct llitem *prev;
};

void init_ll(struct llitem * l)
{
  l-> value = -1;
  l->next = NULL;
  l->prev = NULL;
}

int main(int argc, char** argv)
{
  struct llitem *head, *current;
  int i;

  head = malloc(sizeof(struct llitem));
  init_ll(head);
  head->value = 1;
  current = head;
  for (i=0;i<10;++i){
    current->next = malloc(sizeof(struct llitem)); /*Create the next element*/
    init_ll(current->next); /*Initialise it to nullify pointers*/
    current->next->value = current->value + 1; /*Simple counter*/
    current->next->prev = current; /*It's previous pointer should be the current item*/
    current = current->next; /*Now move onwards so the newly created particle is now current*/
  }

  current = head;
  while(current){
    printf("%i\n", current->value);
    current = current->next;
  }

}

PROGRAM test

  IMPLICIT NONE
  TYPE :: llitem
    INTEGER :: value = -1
    TYPE(llitem), POINTER :: next => NULL()
    TYPE(llitem), POINTER :: prev => NULL()
  END TYPE llitem

  TYPE(llitem), POINTER :: head, current
  INTEGER :: i

  ALLOCATE(head) !Create the head
  head%value = 1
  current => head
  DO i = 1, 10
    ALLOCATE(current%next) !Create the next element
    current%next%value = current%value + 1
    current%next%prev => current !The next element's previous is the current element
    current => current%next !Now move onwards so the newly created particle is now current
  END DO

  current => head
  DO WHILE (ASSOCIATED(current))
    PRINT *,current%value
    current => current%next
  END DO

END PROGRAM test

This example also shows how you how to step through the linked list from the head, simply by having a "current" pointer that starts at head and is then incremented by setting current = current->next (or current => current%next in Fortran). This can look a bit odd but it isn't that hard to understand. I start by manually creating the "head" element, using either ALLOCATE or malloc. Once I have a head element I then loop through, each time using the same ALLOCATE or malloc command on the "current->next" pointer, creating a new item every time. In C I then call the ll_init function to setup the values of the struct (in Fortran this is done for me since I gave the elements of my TYPE default values). After this the prev and next pointers are both NULL. This is correct for the next pointer becuase my new item is the last item in the list (it won't be next iteration but right now it is), but I have to set the prev pointer. If my new item is the next element in the chain from my current element then the previous element in the chain from my new element must be my current element so I set that up. After that I just have to repeat until I have added enough items.

Part 2 of this will be in a couple of weeks and will describe how you remove and item from a linked list and how to add new items to the middle of a linked list.