Numbering logarithmically

We generally number/index over a list of items sequentially. That is we start from the first item at position 1, then move to the second item at position 2, and so on until we reach the last item. This may be so to perform some processing on every item sequentially.

Let n denote the total number of items. Let p denote a position on the list of items. For a given n, 1<=p<=n. If for example there are n=7 items, we process them sequentially from the first item at position p=1 all the way to that at p=7. We can describe this sequential behaviour with the sequence of positions {1, 2, 3, 4, 5, 6, 7}. So the list is sequentially numbered.

This article presents an algorithm to number or index over a list of items logarithmically (though this may not be a good description!) For example, instead of the behaviour described by the sequence {1, 2, 3, 4, 5, 6, 7}, we may want to go through the items with a behaviour described by the sequence {4, 2, 6, 1, 3, 5, 7}, where we first process the item at position p=4, then that at p=2, then that at p=6, and so on up to that at p=7. So the list is logarithmically numbered.

The reader is assumed to have good knowledge of the binary search tree and the binary search algorithm.

Contents:

  1. Tree level-order traversal
  2. Obtaining p given I in O(log I)
  3. Links

Tree level-order traversal

Let I denote a position on the 'sequence of p values'. Then for n=7 we have:

 position I  p sequentially  p logarithmically 
114
222
336
441
553
665
777

Now, consider the balanced BST (Binary Search Tree) below that describes this 'sequence of p values' for all n=7 positions:
Numbering logarithmically a sequence of size 7.
An in-order traversal of the tree generates all the p values in sequential order: the sequence {1, 2, 3, 4, 5, 6, 7}.
A level-order traversal of the tree generates all the p values in logarithmical order: the sequence {4, 2, 6, 1, 3, 5, 7}.

The C code below uses a queue data structure to implement an algorithm to perform a level-order traversal on a balanced BST. However there is no tree data structure implemented. This is because the nodes are not stored. They are directly computed using the idea of the binary search algorithm. That is the value of a node is taken as the middle item of a sub-sequence of the original sequence. For example for the root node where I=1, the corresponding sub-sequence is the original sequence with start = 1 and stop = 7. So its node value = middle = (1+7)/2 = 4.

The basic idea is that a current node generates its left and right child nodes if they exist. A child node does not exist if its corresponding sub-sequence cannot be created, because the current node has only one element - this is the base case of the binary search algorithm. Each time a node is generated it is pushed to the queue. So 4 generates 2 with sub-sequence {1,2,3}, and 6 with sub-sequence {5,6,7}. 2 generates 1 and 3, 6 generates 5 and 7. The queue content becomes {4, 2, 6, 1, 3, 5, 7}.

#include <stdio.h>

struct node
{
    int start;  // first value of corresponding sub-sequence
    int stop;   // last value of corresponding sub-sequence
    int middle; // node value = middle of corresponding sub-sequence
};
struct node queue[10000];

int main()
{
    int i, n, mid;
    int head, tail;             // head of queue and tail of queue

    printf("Enter n : ");       // get n
    scanf("%d", &n);

    queue[0].start = 1;         // set the initial situation, that is,
    queue[0].stop = n;          // set the root/original sequence.

    tail=1;                     // 'tail' always points to an empty slot

    for(head=0; head!=tail; head++)         // go through the queue content
    {
        mid = (queue[head].start + queue[head].stop)/2;
        queue[head].middle = mid;               // get node value = middle

        if(mid != queue[head].start)            // if a left child exists
        {
            queue[tail].start = queue[head].start;
            queue[tail].stop = mid-1;           // add the left sub-sequence
            tail++;
        }
        if(mid != queue[head].stop)             // if a right child exists
        {
            queue[tail].start = mid+1;
            queue[tail].stop = queue[head].stop; // add the right sub-sequence
            tail++;
        }
    }

    printf(" I    p \r\n");                     // print result
    for(i=1; i<=n; i++)
        printf(" %d    %d \r\n", i, queue[i-1].middle);

    return 0;
}


Obtaining p given I in O(log I)

Instead of progressively generating all the values of p like in the case above, one may want to directly obtain a specific value of p for an arbitrary given value of I. The following algorithm does this in O(log I) running time (ignoring O(log n) storage size).

First, it should be noted that the output here is different from that of the one mentioned above, except for when the total number of items is n=2k-1. This difference is due to difference in how the balanced BST is. While the previous method uses the binary search algorithm to simulate a BST, this method uses a complete binary tree. This is a binary tree for which all levels are full except possibly at the leaves which however are gathered on the left. Because of this property it is possible to, at the root node of the tree, know the number of leaves found in both the left and right sub-trees. This forms the principle behind the algorithm. The following questions are asked:
  1. What is the number of leaves of the tree held by the root node?
  2. What are the number of leaves in the left and right sub-trees of the root node?
  3. What is the corresponding p value of the root node? Of course its I value is 1.
  4. Is the targeted I value found in the left or right sub-tree of the root node?
By answering these questions we move to either the left or right child node. This takes constant time. The process is then repeated recursively. After moving log(I) times down the tree the targeted I value is reached. The C code below uses a stack data structure to implement the algorithm.

#include <stdio.h>

int get_p_from_I (int I, int n)
{
    int stack[100][5];      // stack[node][ I, height, leaves, lower_limit, upper_limit ]

    int p;                  // = p value of current node
    int h;                  // = height of current node
    int c;                  // = location in stack of current node

    int left;               // = number of leaves on left subtree
    int right;              // = number of leaves on right subtree
    int max;                // = maximum leaves on left subtree


    if(n<1 || I<1 || I>n) return -1;

    for(h=0, c=n; c!=1; c/=2, h++); // get h = floor(log2(n))

    left = n+1 - (1<<h);            // get number of leaves = n+1 - 2^h
    if(left == (1<<h)) left=0;      // if full leaves, then same as no leaves

    for(c=0; I!=1; I/=2, c++)       // I/2 is the parent node to I.
        stack[c][0] = I;            // store 'I' while ascending the tree

    // Now 'c' is at the end of stack = floor(log2(I)).
    // Set the initial situation at that location:

    stack[c][0] = 1;                // I=1 for root node
    stack[c][1] = h;                // height of tree of root node
    stack[c][2] = left;             // leaves of tree of root node
    stack[c][3] = 1;                // lower limit to root node
    stack[c][4] = n;                // upper limit to root node


    while(1)    // while descending into left or right child node of current node
    {
        h = stack[c][1]-1;
        left = stack[c][2];         // get leaves of current tree

        if(left==0)                 // if it has no leaves (same as has full leaves),
        {
            right = 0;              // then right subtree also has no leaves

            // p = (lower_limit + upper_limit) / 2
            p = (stack[c][3] + stack[c][4]) / 2;
        }
        else                        // else current tree has leaves
        {
            max = (1<<h);           // = maximum leaves on left subtree = 2^h

            if(left>=max)           // if leaves on left subtree are full
            {
                right = left-max;   // then right subtree has the extra leaves
                left = 0;
                p = stack[c][3] + (max*2)-1; // p = lower_limit + 2^(h+1)-1
            }
            else                    // else right subtree has no leaves
            {
                right = 0;
                p = stack[c][3] + max-1+left; // p = lower_limit + 2^h-1+left
            }
        }
        if(c==0) break;             // c==0 => start of stack
        c--;                        // now move to child node

        if(stack[c][0]%2)       // if 'I' of child is odd, then child is on the right
        {
            stack[c][1] = h;
            stack[c][2] = right;
            stack[c][3] = p+1;              // new lower limit
            stack[c][4] = stack[c+1][4];    // same upper limit as parent node
        }
        else                    // else 'I' of child is even, so child is on the left
        {
            stack[c][1] = h;
            stack[c][2] = left;
            stack[c][3] = stack[c+1][3];    // same lower limit as parent node
            stack[c][4] = p-1;              // new upper limit
        }
    }
    return p;
}

int main()
{
    int i, n;

    printf("Enter n : ");                   // get n
    scanf("%d", &n);

    printf(" I    p \r\n");                 // print result
    for(i=1; i<=n; i++)
        printf(" %d    %d\n", i, get_p_from_I(i, n));

    return 0;
}


Links

  1. http://courses.cs.vt.edu/~cs3114/Fall09/wmcquain/Notes/T03a.BinaryTreeTheorems.pdf