How to Implement Linked Lists in WimpWorks 2
Introduction
A linked list is a structure similar to an array but with many more desirable properties. An array can be considered as one object with a fixed number of sections - once declared, the only way of changing its size is to create a new array and copy the data in (which is wasteful and not possible in some languages). One elegant solution to this problem is to treat each element of an array as a separate object. By creating each element only as required, and linking elements together using pointers, you can have a structure similar to an array that can grow and shrink dynamically at run-time. Data structures like this are called linked lists.
Fig. 1: The two main types of linked list (pictured here as a list of 'client' objects)
Linked lists come in two main varieties - Single and Double. Both can be be tricky to implement as they involve manipulating pointers. Basically, for each element you allocate enough memory to store the data plus additional memory to store either one or two pointers (depending on whether you are linking singly or doubly). Each item in the list contains a pointer to the next item (and, if linking doubly, a pointer to the previous item). Access to the list is facilitated by keeping a pointer to the first item in the list or 'head item'. Because the items are linked by pointers, they do not have to be adjacent in memory and can be created whenever required.
Because of the properties described above, linked lists lend themselves very well to a number of computing problems. Any application which has to keep details on an unknown number of objects will need to use a linked list. However, even if the number of objects is set at a maximum value, linked lists still offer other good advantages. If objects are created by the user of an application in a random order, but need to be presented in a fixed order (an address book for example), linked lists are ideal, as an object can be inserted in the list between any other two objects. This means that (using the address book example) 'people' objects could be inserted into the address book list in any order, but stored and displayed alphabetically by surname. If you are storing complex data structures (comprising of many variables) in a list, then swapping two items is only a case of changing a few pointers. In an array, you would have to copy whole blocks of memory to achieve this. As most sorting algorithms require swapping of objects this means sorting a linked list can be a lot simpler than sorting an array.
Singly Linked Lists vs Doubly Linked Lists
One disadvantage of a linked list against an array is that it does not allow direct access to the individual elements. If you want to read/write a particular item then you have to start at one end of the list and follow the pointers until you get to that item.
Although singly linked lists have a lower memory requirement, because each item only contains a pointer to the next item you cannot traverse singly linked lists from the tail towards the head. If you will need to move in both directions along the list then double is the only option. However, as can be seen from the examples below, adding an item to a doubly linked list is more complex (as it involves more pointer manipulation) while removing an item from a singly linked list is more complicated (because of the lack of pointers in each item).
Pointers and Data Types in BASIC
As BBC BASIC is by nature basic, very little support for composite data types
is present. In C we have, (the union
and struct
declarations) which make writing linked lists a lot easier. As an example,
the C code to declare a new data type to represent a linked list element
could look like this:
typedef struct element
{
struct element *next; /* pointer to next element */
struct element *prev; /* pointer to previous element */
char name[20]; /* 20 characters for the name */
char telephone[20]; /* 20 chars for a 'phone number */
char address[100]; /* 100 chars for an address */
} Address_Book_Entry;
Having defined this, we can create new objects of this type, just as we could create a new integer or string, and access the individual members of the structure.
But in BBC BASIC, as no support for data structures is provided, we have to
use pointers to a block of memory to access the different members in our
structure. In the above example we need to calculate the exact size of an
Address_Book_Entry
, so that we can use CLAIM
to
get the correct amount of memory. In BASIC all pointers (next
and prev
in the example above) take 4 bytes - so our address
book element will take 4+4+20+20+100 bytes in total. In WimpWorks the code
to achieve this would be:
new_element% = CLAIM(148)
Now that we have enough memory, we can start using new_element%
to access it. CLAIM
returns a pointer to the memory it has
allocated, so new_element%
is actually a pointer. For the
sake of good programming practice I would define variables to hold the
values of the offsets (number of bytes from the beginning of the block of
CLAIM
ed memory) of each variable in the structure:
next% = 0
prev% = 4
name% = 8
telephone% = 28
address% = 48
These would probably exists in a procedure that responded to 'Starting Up'
so that they would be defined from the start of the program. In order to
access the data using a pointer (new_element%
in our example)
you use the
'$
',
'?
',
'|
' and
'!
' pointer indirection operators in BASIC:
new_element%?address% = ASC("J") : REM char (1 byte)
new_element%!next% = head_element% : REM integer (4 bytes)
$(new_element%+name) = "Joe Bloggs" : REM string (n bytes)
By defining variables to hold all the offsets you instantly make your code
more robust and more readable. If you have to change the format of your
structure then all you have to do is alter the value of the variables and all
the code that accesses that part of the structure will use the new offset.
Imagine if you decided that the pointer to the previous item should come
first. You would have to change every instance of pointer%!0
to pointer%!4
and vice versa.
Linked List Operations
Throughout all the examples given in the following sections
head_item%
will be a pointer to the head of a linked list of
strings, tail_item%
will be a pointer to the tail of the list.
Each element in the list uses the first word (4 bytes) of its memory to
store a pointer to the next object. In doubly linked list examples the
second word is a pointer to the previous item. The string is stored at a
position in memory directly after the pointers
As recommended above we will define the following offsets for singly linked lists and the constant NULL:
next% = 0
text% = 4
NULL = -1
While for examples that use doubly linked lists we will define :
next% = 0
prev% = 4
text% = 8
NULL = -1
In this article I will discuss, in detail, the operations:
- search
- add
- remove
Though many more are possible, due to space constraints we cannot include them here.
Searching a Linked List for the desired Item
As mentioned earlier, because a linked list doesn't reside in one contiguous block of memory, we have to locate the desired element each time we want to read or alter its contents. We could just store pointers to all the objects we create but we don't know how many objects will be created. To store all the pointers would require a further linked list! So we can't get away from the fact that linked lists will never permit direct access to the objects they contain.
In order to identify which node or element of the list we want, we require some way of uniquely identifying each node. Two methods for this are outlined below :
- Indexing
- The programmer supplies the position of the node in the list. This is similar to the index of an array.
DEF FNlocate_string_indexed(index%)
LOCAL node%
node% = head_item% : REM set the current node to point to
REM the head of the list
WHILE index% > 1 AND node% <> NULL
node% = node%!next% : REM move the node to the next element
index% -= 1 : REM reduce the index.
ENDWHILE : REM stop when we have gone the correct
REM number of steps or hit the end
REM of the list
=node%
Example 1 : Locating an element based on its position
- Keying
- In order to identify each item in the list we need to provide values for some subset of the elements or attributes of the node. These attributes must be like the primary key in a database.
DEF FNlocate_string_keyed(current$)
LOCAL node%
node% = head_item% : REM set the current node to point to
REM the head of the list
IF node% = NULL THEN =NULL : REM if list is empty then exit
WHILE $(node%+text%) <> current$
node% = node%!next% : REM move the node to the next element
IF node% = NULL THEN =NULL : REM if reached end of list then exit
ENDWHILE : REM stop when we have reached the
REM correct string
=node%
Example 2 : Locating an element based on its contents
The two examples above traverse the linked list to locate a given element.
The first will locate the nth element in a list. The second will return the pointer to the element containing a given string.
Adding Items to a List
Because of the nature of a linked list items can be added at any point in the chain without overwriting any existing items. The easiest place to add a new item is at the head or tail of a list. This is because we can keep a pointer to the head/tail element.
To add an item to a linked list you must first claim a block of memory to store the item in. After the memory is claimed you can treat it like a data structure and start filling in the data that the structure stores. This will include any pointers to other objects in the list. These pointers, along with the equivalent pointers for other objects in the list, should be used to link the object in to the list. Once this is done the actual data to be stored at that position can be inserted.
Allocation of memory should be done using the WimpWorks 2 command
CLAIM
.
Adding an item to the head of a list
Since this item is going to be the head of the list, its 'next item'
pointer should point to the current head of the list, and only when that is
the case can the head_item%
variable be changed to point to
the new item.
Fig. 2 : Stages involved in adding an element to the head of a singly linked list
The following examples show how to add a node to the head of a list.
DEF PROCadd_as_head_sing(RETURN head_item%,s$)
LOCAL newItem%
newItem% = CLAIM(LEN(s$) + 4) : REM claim enough memory
newItem%!next% = head_item% : REM make the first word of the new item
REM point to the first item in the list
head_item% = newItem% : REM now that the new item points to the
REM head of the list it is effectively
REM before it in the list and so becomes
REM the new head item.
$(newItem%+text%) = s$ : REM insert the string into the newly
REM allocated memory block after the
REM pointer to the next item
ENDPROC
Example 3 : Adding a node to the head of a singly linked list
DEF PROCadd_as_head_doub(RETURN head_item%,s$)
LOCAL newItem%
newItem% = CLAIM(LEN(s$) + 8) : REM claim enough memory
newItem%!next% = head_item% : REM make the first word of the new item
REM point to the first item in the list
head_item%!prev% = newItem% : REM make the 'previous item' pointer of
REM the current head of the list point
REM back to the new item.
newItem%!prev = NULL : REM as this new item will become the head
REM of the list there will be no items
REM before it so the pointer is NULL (-1)
head_item% = newItem% : REM now that the new item points to the
REM head of the list it is effectively
REM before it in the list and so becomes
REM the new head item.
$(newItem%+text%) = s$ : REM insert the string into the newly
REM allocated memory block at the correct
REM offset.
ENDPROC
Example 4 : Adding a node to the head of a doubly linked list
Adding an item to the tail of a list
As this item is going to be the tail of the list, its 'next item' pointer should be NULL as there is no next item. However, the current tail will no longer be tail once we have finished, so its 'next item' pointer should point to the new item we have created.
DEF PROCadd_as_tail_sing(RETURN tail_item%,s$)
LOCAL newItem%
newItem% = CLAIM(LEN(s$) + 4) : REM claim the memory
tail_item%!next% = newItem% : REM make the first word of the tail item
REM point to the new item
tail_item% = newItem% : REM now that the tail points to the
REM the new item it is effectively
REM before it in the list and so the new
REM item is effectively the new
REM tail item.
$(newItem%+text%) = s$ : REM insert the string into the newly
REM allocated memory block after the
REM pointer to the next item
ENDPROC
Example 5 : Adding a node to the tail of a singly linked list
DEF PROCadd_as_tail_doub(RETURN tail_item%,s$)
LOCAL newItem%
newItem% = CLAIM(LEN(s$) + 8) : REM claim some memory for the node
newItem%!next% = NULL : REM make the first word of the new item
REM point to nothing as it is the end
tail_item%!next% = newItem% : REM make the 'next item' pointer of
REM the current tail of the list point
REM to the new item.
newItem%!prev% = tail_item% : REM as this new item will become the tail
REM of the list it needs to point back
REM to the current tail
tail_item% = newItem% : REM now that the tail points to the
REM the new item it is effectively
REM before it in the list and so the new
REM item is effectively the new
REM tail item.
$(newItem%+text%) = s$ : REM insert the string into the newly
REM allocated memory block after the
REM pointer to the next item
ENDPROC
Example 6 : Adding a node to the tail of a doubly linked list
Adding an item in the middle of the list
To add an item in the middle of the list we need a pointer to the item after which we want to insert the new item.
If the item after which we wish to insert is not the tail, or NULL, then we
must devise a method to insert in the middle of the list. For the purposes
of the steps outlined below let us assume 'before
' is a
pointer to the element that will be before the new one in the list and
that 'new
' will be a pointer to the new item (after we
allocate the memory). One possible method, for a singly linked list, is
detailed below (it might be useful to sketch the pointer alterations on a
bit of scrap paper using a style similar to figures 2 and 3) :
- Get a pointer, '
after
', to the item which will be after 'new
' in the list. This is equal to the 'next element' of 'before
'. - Make the 'next element' pointer for '
new
' equal to the 'next element' pointer for 'before
'. - Make the 'next element' pointer for '
before
' point to 'new
'.
DEF PROCadd_mid_sing(head_list$, before%, s$)
LOCAL new%, after%
new% = CLAIM (LEN(s$) + 4) : REM claim memory
after% = before%!next% : REM step 1
REM as above
new%!next% = after% : REM step 2
before%!next% = new% : REM step 3
$(new%+text%) = s$ : REM fill in the string
ENDPROC
Example 7 : Adding a node in the middle of a singly linked list.
For a doubly linked list this method will have the same effect:
- Get a pointer, '
after
', to the item which will be after 'new
' in the list. This is equal to the 'next element' of 'before
'. - Make the new item's 'next element' pointer point to '
after
'. - Make the new item's 'previous element' pointer point to
'
before
'. - Make the 'previous element' pointer of '
after
' point to 'new
'. - Make the 'next element' pointer of
'before
' point to 'new
'.
Fig. 3 : Stages involved in adding an element to the middle of a doubly linked list.
DEF PROCadd_mid_doub(before%, s$)
LOCAL new%, after%
new% = CLAIM(LEN(s$) + 8) : REM claim memory
after% = before%!next% : REM step 1 as described
REM above.
new%!next% = after% : REM step 2
new%!prev% = before% : REM step 3
after%!prev% = new% : REM step 4
before%!next% = new% : REM step 5
$(new%+text%) = s$ : REM fill in the string.
ENDPROC
Example 8 : Adding a node in the middle of a doubly linked list.
Removing an item from a linked list
When removing a node from a linked list you need first to un-link it from
the list so that all the pointers that keep in place it in the list skip
over it. Secondly you must free up all the memory for that node using
RELEASE()
.
Removing the head node of a linked list
Singly linked lists
In a singly linked list removing the head item is extremely simple. You
must ensure that the pointer you have to the head of the list is altered to
point at the item after the head. But if you move the pointer straight away
you won't be able to call RELEASE
to free the memory because
the only pointer you had to the node you wanted to delete has been changed.
So you must copy the pointer :
DEF PROCdelete_head_sing(RETURN head_item%)
LOCAL delete%
delete% = head_item% : REM copy the pointer to the
REM item being deleted
head_item% = head_item%!next% : REM move the head_item pointer
RELEASE(delete%) : REM release the memory using the
pointer to the old head
ENDPROC
Example 9 : Removing a node from the head of a singly linked list
Doubly Linked Lists
To remove the head element from a doubly linked list is only slightly more complex than for singly linked lists. The method is exactly the same, but with the added step of ensuring the the new head of the list doesn't point back to the now non-existent previous head :
DEF PROCdelete_head_doub(RETURN head_item%)
LOCAL delete%
delete% = head_item% : REM copy the pointer to the
REM item being deleted
head_item% = head_item%!next% : REM move the head_item pointer
head_item%!prev% = NULL : REM ensure that the new head
doesn't point back to the
old one.
RELEASE(delete%) : REM release the memory using the
pointer to the old head
ENDPROC
Example 10 : Removing a node from the head of a doubly linked list
Removing the tail element from a linked list
Doubly linked lists
This is the simplest style of list to remove the tail element from. The 'previous element' pointer of the tail element gives us the new tail element. All we have to do is set its 'next element' pointer to NULL (reflecting the fact that nothing is after it).
DEF PROCdelete_tail_doub(RETURN tail_item%)
LOCAL delete%
delete% = tail_item% : REM copy the pointer to the
REM item being deleted.
tail_item% = tail_item%!prev% : REM move the tail_item pointer.
tail_item%!next% = NULL : REM ensure that the new tail
REM doesn't point forwards to
REM the old one.
RELEASE(delete%) : REM release the memory using the
REM pointer to the old tail.
ENDPROC
Example 11 : Removing a node from the tail of a doubly linked list
Singly linked lists
Because it is not possible to traverse a singly linked list backwards, we must locate the new tail element by traversing it from the front until we find a node which points to the current tail element. After this is done we proceed in a similar manner as for a doubly linked list.
DEF PROCdelete_tail_sing(head_item%, RETURN tail_item%)
LOCAL new_tail%
new_tail% = head_item% : REM set the new_tail to be the
REM current head, this routine
REM will fail if it = NULL
WHILE new_tail%!next% <> tail_item%
new_tail% = new_tail%!next% : REM move the new_tail pointer
REM through the list until its
REM next item pointer is equal
REM to the current tail item
REM don't fall off end of list
IF new_tail% = NULL THEN ENDPROC
ENDWHILE
new_tail%!next% = NULL : REM set the new_tail's next item
REM pointer to NULL (it is now
REM effectively the end of the list)
RELEASE(tail_item%) : REM we still have the original
REM pointer to the tail item so
REM we can free the memory
tail_item% = new_tail% : REM now the new_tail can become the
REM actual tail.
ENDPROC
Example 12 : Removing a node from the tail of a singly linked list
Removing an element from the middle of a list
Doubly linked lists
When there are elements on either side of the one you want to delete you need a pointer to the element before the one being deleted and for a doubly linked list you also need a pointer to the element after. With a doubly linked list this is simple as the node you want to delete will have both these pointers stored in it. All we have to do is ensure that the items on either side have their pointers altered so that the item to be deleted is excluded from the list:
Fig. 4 : Stages involved in removing an element from the middle of a doubly linked list.
DEF PROCdelete_mid_doub(delete%)
LOCAL before%, after%
before% = delete%!prev% : REM get the pointer to the element
REM before the one being deleted
after% = delete%!next% : REM and the one after
before%!next% = after% : REM then we have to link 'before'
REM and 'after' to each other and
after%!prev% = before% : REM bypass 'delete'.
RELEASE(delete%) : REM now it is safe to delete the
REM element in question.
ENDPROC
Example 13 : Removing an element from the middle of a doubly linked list
Singly linked lists
With the singly linked list we have the same complications we did previously for deleting the tail. Without searching through the list, we have no way of getting the item before the one we are deleting. We will have to search for it. Once it is found we can perform similar operations to those required for a doubly linked list.
DEF PROCdelete_mid_sing(head_item%, delete%)
LOCAL before%, after%
before% = head_item% : REM start the search for 'before'
REM at the start of the list
WHILE before%!next% <> delete%
before% = before%!next% : REM stop searching when the next
REM element of before points to
REM the element we want to delete
REM don't fall off end of list,
REM this isn't /really/ necessary
IF before% = NULL THEN ENDPROC
ENDWHILE
after% = delete%!next% : REM get 'after' from the next item
REM pointer of 'delete'
before%!next% = after% : REM then we have to link 'before'
REM to 'after' thus bypassing
: REM 'delete'.
RELEASE(delete%) : REM now it is safe to delete the
REM element in question.
ENDPROC
Example 14 : Removing an element from the middle of a singly linked list
Beware of NULL pointers
If a pointer is passed to a program in a language, it must assume that the pointer is valid as there is no real method of checking. Even advanced, modern languages like Java will throw exceptions when a bad pointer (Java calls them references) is used. A classic example of an invalid pointer is NULL. NULL evaluates to either -1 or 0 in most languages. The memory location 0 or -1 is either at the very start of the address space (usually populated by the program itself) or at the very end which is usually never used as it corresponds to +16Mb from the start of the program. If you try to access these points in memory you will cause your program to crash.
When writing software that uses lots of pointers, it is essential to
consider all the possible states that the system will be in when your
operations are called. If you are changing the value of a pointer you must
make sure that you have another copy of that pointer or that the memory it
points to has been released. Once the last pointer to a block of memory is lost the memory itself is void as we now have no way of knowing where it
is. In the examples above it is assumed that a list already exists (this
is to simplify matters). What would happen if the user tried to add a node
to a non-existent list? One can assume that the values of
head_item%
and tail_item%
would be NULL (and
would certainly be invalid). This would cause havoc.
Another point worth mentioning is that if we try to delete the last node in the list we will also encounter problems. At this point it will be both the head and the tail of list. Which method do we use? Well, we cannot use either. This is because we won't have another element in the list to make the new tail or head item.
Generic add and delete operations
Because of the problems outlined above I have defined generic operations
that catch the problem cases and deal with them. I have assumed that, as
well as defining variables for the offsets of the members in the structures,
the pointers head_item%
and tail_item%
have been
initialised to NULL at start-up.
Add
We need to catch the case when there is no list yet and initialise it. This involves creating the first node and making the head_item% and tail_item% pointer point to it. As this node doesn't point to anything else yet the 'nest item' and 'previous item' pointer are NULL.
DEF PROCAddStringSingle(RETURN head_item%, RETURN tail_item%, link_after%, s$)
LOCAL new_item%
REM catch the non-existent list case
IF head_item% = NULL OR tail_item% = NULL THEN
new_item% = CLAIM(LEN(s$) + 4)
new_item%!next% = NULL
$(new_item%+text%) = s$
head_item% = new_item%
tail_item% = new_item%
ENDPROC
ENDIF
REM if we are here then there must already be a list in place
CASE after% OF
WHEN NULL : REM we don't want the new item to be after anything
PROCadd_head_sing(head_item%, s$)
WHEN tail_item% : REM we want it at the end - NOT in the middle!
PROCadd_tail_sing(tail_item%, s$)
OTHERWISE : REM we want the new item in a place where other items
REM are on either side
PROCadd_mid_sing(head_item%, link_after%, s$)
ENDCASE
ENDPROC
DEF PROCAddStringDouble(RETURN head_item%, RETURN tail_item%, link_after%, s$)
LOCAL new_item%
REM catch the non-existent list case
IF head_item% = NULL OR tail_item% = NULL THEN
new_item% = CLAIM(LEN(s$) + 8)
new_item%!next% = NULL
new_item%!prev% = NULL
$(new_item%+text%) = s$
head_item% = new_item%
tail_item% = new_item%
ENDPROC
ENDIF
REM if we are here then there must already be a list in place
CASE after% OF
WHEN NULL : REM we don't want the new item to be after anything
PROCadd_head_doub(head_item%, s$)
WHEN tail_item% : REM we want it at the end - NOT in the middle!
PROCadd_tail_doub(tail_item%, s$)
OTHERWISE : REM we want the new item in a place where other items
REM are on either side
PROCadd_mid_doub(head_item%, link_after%, s$)
ENDCASE
ENDPROC
Delete
DEF PROCDeleteStringSingle(RETURN head_item%, RETURN tail_item%, delete%)
REM catch the case when we are passed the NULL list
IF head_item% = NULL OR tail_item% = NULL THEN
ENDPROC
ENDIF
REM catch the case when there is only one item left
IF head_item% = tail_item%
head_item% = NULL
tail_item% = NULL
RELEASE (delete%)
ENDPROC
ENDIF
REM if we are here then there must already be a list in place
CASE delete% OF
WHEN NULL : REM can't delete nothing
ENDPROC
WHEN head_item% : REM delete the head item
PROCdel_head_sing(head_item%)
WHEN tail_item% : REM delete the tail item
PROCdel_tail_sing(tail_item%)
OTHERWISE : REM delete an item with an item on either side of it
PROCdel_mid_sing(head_item%, delete%)
ENDCASE
ENDPROC
DEF PROCDeleteStringDouble(RETURN head_item%, RETURN tail_item%, delete%)
REM catch the case when we are passed the NULL list
IF head_item% = NULL OR tail_item% = NULL THEN
ENDPROC
ENDIF
REM catch the case when there is only one item left
IF head_item% = tail_item%
head_item% = NULL
tail_item% = NULL
RELEASE (delete%)
ENDPROC
ENDIF
REM if we are here then there must already be a list in place
CASE delete% OF
WHEN NULL : REM can't delete nothing
ENDPROC
WHEN head_item% : REM delete the head item
PROCdel_head_doub(head_item%)
WHEN tail_item% : REM delete the tail item
PROCdel_tail_doub(tail_item%)
OTHERWISE : REM delete an item with an item on either side of it
PROCdel_mid_doub(delete%)
ENDCASE
ENDPROC
Conclusion
I hope that I have given an insight into how useful the linked list can be. With a pen and paper and a little thought, linked lists are not at all difficult to program. Many algorithms require the swapping of items and the re-ordering of data. With a linked list this can be done simply by changing a few pointers. One of the very interesting points only briefly discussed here is concept of the compound (or composite) data type.