All entries for January 2019
January 30, 2019
Data structures – Queues
This week we're back on data structures with another fundamental one: the queue. Simple data queues are pretty much the same as queues in real life, new items arrive at the back of the queue and data is removed from the front of the queue. They have obvious applications in any kind of program that gets data in from an external source and then has to process it in the order in which it is received. Queues are what are called FIFO data structures, first-in-first-out, because the first data to arrive is also the first data to leave. The alternative LIFO (last-in-first-out) data structure is generally called a "stack" because it is much more like stacking up paper in that the last item that you put down is the first that you pick up again.
The easiest way to implement a queue is using an array combined with a front and a back marker (I'm going to stick with the neutral "marker" description because the general idea doesn't care if you use array indices or pointers or any other mechanism that you want to). The idea is quite simple. You start with an array and when a data item turns up you insert it at the back marker and then move the back marker on one element. Adding an element to a queue is often called "pushing" or "enqueueing" so you might encounter those terms. Schematically, this looks like
When a piece of data is requested from the queue you return the item under the front marker and then move the front marker on by one. This operation is also known by the terms "pop" and "dequeue" so those are also worth remembering.
And you continue doing this, moving the front and back markers as data arrives and leaves.
So far, so good but there are two obvious problems. The simpler one is what happens if data is requested when no data is in the queue. You have to have some mechanism for dealing with this case by indicating that there is no data available but that isn't too hard. The more troublesome one is what happens when the back marker goes off the end of your array? In the case as show, you can't really do anything at all and one way or another the queue will fail. This type of queue implementation is only useful if you have a fixed known number of items that you need to store as they come in and then deal with. It's not much use for getting data until your program stops and dealing with it.
If you want to do that, there are a few general solutions
1) You can get rid of the front marker entirely. Every time you dequeue an item you take the first item in the array and then simply shuffle the other items down one so that the first item is always populated so long as there are any items in the queue. This works well but it does potentially involve a lot of copying or moving of data if you have a large array that is mostly filled since every element of the array has to be shuffled up by one. If you are dealing with objects where a move/copy operation is expensive or are dealing with threaded code where some threads have to wait while this happens then it isn't the best solution
2) You can move to a circular queue. Circular queues are a bit strange and break the analogy with real world queues since queueing in a circle doesn't really work in reality. The idea is pretty simple though. Since you knowthat everything to the left of the front marker is unused, why not put new elements into there as they arrive and move the back marker around with them? Then when the front marker reaches the end of the array it just goes back to the beginning too. The effect is to mean that the front marker chases the back marker round in a circle. Shown schematically, this looks like
This works so long as back never catches up with front. If it does then either it has to stop adding data or it will overwrite data that is already there. It is possible to generalise these two examples to an arbitrarily long array - simply create a new array that is longer and copy the extant data into it and start again - but in most of the situations where you want a queue it should be possible to avoid this since that is another expensive operation (that will also render any pointers that you have to your data invalid so might involve more work to sort out pointers).
Generally queues are used in producer/consumer systems. There is something that is producing data and something else that is operating on it in some way. On average you have to be operating on the data at the same rate as you are producing it or the data will build up without limit, so the queue is only there to buffer data for brief periods while either your producer produces data faster than normal or your consumer consumes it slower than normal. In this case, all that you need is a large enough queue to deal with the largest expected difference in production and consumption rates. Of course, in a lot of systems you can only predict that "this queue is large enough for 99% of the expected variation" so for the other 1% of the time where you run out of space in your queue you have to decide whether it's worse to stall your entire system while you make your queue bigger (this might be OK for a web server for example where a page might take slightly longer to load but otherwise would work as expected) or simply throw away data (for example in a data logging system where any halt in the process would cause data loss but it might be lower if you just throw it away until there's space to store it than waiting while memory is allocated to hold it). Unfortunately, which is the right answer depends very much on the problem that you are working with.
As a final note, there is also a very similar data structure called the double ended queue or deque (pronounced "deck") that is similar but allows you to add and remove data from either end of a queue. They behave very similarly in general but the implementation gets a bit fiddlier because your back and front markers have to do double duty as each other.
January 16, 2019
A super quick post, mainly to point at somebody else's post from a while ago, namely http://kera.name/articles/2013/01/the-dynamic-allocation-of-sandcastles/
In case the link disappears, which it looks like the original post already has, the idea is as follows. This is something between a paraphrase and a direct quote, and all credit goes to the Original Poster quoted in the link above, but not named.
The memory space in the computer is like an area of beach, marked off for the building of sandcastles. To strain the analogy to breaking point, other parts of said beach might be marked off for other purposes (e.g. sunbathing, which has no useful relation to program space, and there is far less import to leaving your towel around).
- When you want to build a sandcastle (create any object or variable) you ask for an area of beach, and one is selected and given to you.
- What this area contains is a load of lumpy sand, usually containing the dregs of other peoples castles, possibly cutting across multiple. It's a mess, basically.
- You, usually, either flatten the sand out, or immediately build a castle there, in both cases ignoring what might be there already.
- When you're done, you surrender the area, but usually don't remove your castle.
- The organisers are then free to give that area to somebody else.
So, what does this mean? Well:
- If you go back to your area (keep around a pointer, reference etc to it), your castle may well still be there, in all its glory. But somebody else can come along at any time and kick it down, and beach security are going to support their right to do so. So, you can't trust the shape of any of it.
- You also can't TELL whether anything has happened to your castle by examining it.
- If you accidentally released the area (delete, dealloc etc) and you've forgotten that, you will, sooner or later, find a bit that's not how you left it, or get your castle kicked down at a crucial moment. Do NOT DO THIS.
- You can't even trust what seems to be a patch of flat sand, unless you flattened it, or requested it pre-flattened (e.g. calloc in c).
- If you inherit somebody else's area, there might be a castle there! Since you can never trust the castle, you can't use it. Somebody might have already been along and undermined all the turrets. If you try and use it as a castle, all kinds of strangeness can ensue.
- BUT suppose somebody has a secret technique for building castles. If they leave one un-destroyed, you might be able to find out all kinds of "secret" things. So, any secret data has to be carefully destroyed. It's not enough just to surrender control of the area. This is just like the need to carefully destroy data on an old hard drive - by actually over-writing it, not just deleting it.
- If you, or somebody else, spills out of their designated area, you generally end up mucking up bits of each-other's castles, usually in such a way as to ruin them. DO NOT DO THIS either.
It's a simple analogy, but it is perfect to explain why it's so easy to miss uninitialised variables, and premature free/delete/dealloc operations. Often things still look perfectly fine! Often all the areas close to yours are unallocated, so you can apparently spill out of your area without consequence, until management changes policy on picking areas and suddenly things go wrong.