5.2 Sorting Algorithms

Suppose that we defined a predicate Order which introduces a relation of (complete) order on some set of Refal terms; thus <Order t.1 t.2> is T , if the pair t.1 t.2 is in good order, and F otherwise. A relation of order is reflexive, antisymmetric and transitive.

Given such a relation, we may wish to order a given list of terms so that whenever t.1 precedes t.2 in the list, <Order t.1 t.2> is true. In programming, this ordering procedure is usually referred to as sorting. The terms to be ordered may be, e.g., parenthesized strings with the lexicographic relation Order , or just numbers with Order being the relation less or equal.

The most straightforward way to order a sequence of terms is to transpose neighboring terms as long as they are not in order:

  * Definition of sorting
  Sort { 
     e.1 t.X t.Y e.2, <Order t.X t.Y>: F = 
                 <Sort e.1 t.Y t.X e.2>;
     e.1 = e.1; }

This function is good as a definition of the problem, but taken as its solution it is grossly inefficient. Indeed, suppose we have made a step of the Refal machine that transposed t.X and t.Y . Now all pairs of neighbors in e.1 t.Y , with the possible exception of the last one (made by the last term in e.1 and t.Y ), must be in good order. Nevertheless, at the next step the Refal machine will check all these pairs before it comes to the next inverted pair.

An improvement suggests itself: to enclose the ready part in parentheses and compare the last term of the ready (ordered) part with the first term from the still unordered part:

  Sort {
      =  ;
     s.X e.2 = <Sort1 (s.X) e.2>; }
   
  Sort1 {
     (e.1 t.X) t.Y e.2, <Order t.X t.Y>:
                 { T = <Sort1 (e.1 t.X t.Y) e.2>;
                   F = <Sort1 (e.1) t.Y t.X e.2>;
                 };
     (e.1) = e.1; }

This algorithm is much better but it can be further improved. After we transpose s.X and s.Y , the ready part e.1 shrinks by one term and will continue to do so while s.Y makes disorder with the last term of e.1 . When s.Y finds its place, however, the whole original e.1 will be ordered; therefore, we can improve the algorithm by restoring the original position of parentheses. Of course, this cannot be done if we do not somehow fix their position in the beginning. Thus we come to the algorithm known as insertion sorting: maintaining the initial part of the list in the right order, and inserting each next term in its place by moving it from right to left:

  * Insertion sort
  Sort {e.1 = <Sort1 ()e.1>; };
   
  Sort1 {
     (e.1) t.Y e.2 =  <Sort1 (<Insert e.1 t.Y>) e.2>;
     (e.1) = e.1;  }
   
  * <Insert e.1 t.Y> assumes that e.1 is ordered
  * and inserts t.Y into it
  Insert {
     e.1 t.X t.Y, <Order t.X t.Y>:
             {T = e.1 t.X t.Y;
              F = <Insert e.1 t.Y> t.X; 
             };
     e.1 = e.1;       
         };

< follow this link if you want to work with example >

With this algorithm the average number of steps required to sort n terms is roughly n² / 2. There are algorithms, such as the merge-sort, which do it in a number of steps proportional to n log n . The idea of the merge-sort is to divide and conquer. In order to sort a list, divide it into two approximately equal parts. Then sort the parts separately and merge them together into one sorted list. When we merge two ordered lists we need not go repeatedly through any of them. We simply compare the first elements in both lists and pick up the least one (more precisely, the one which is not greater than the other). Thus the total number of operations in a merge will be proportional to n. To sort the halves of the original list we divide them into quarters, etc. Since the total number of terms on all levels of divisions is always n, the time required for all mergings at each level is proportional to n. The number of levels of division necessary to come to single-element lists is log n. Therefore, the total time is n log n.

In an implementation of this idea it is better to go in the opposite direction: we first form two-term ordered lists, then merge them into four-term lists, etc.:

  * Merge-sort
  Sort { e.1 = <Check <Merge <Pairs e.1>>>; };
   
  * Form ordered pairs from a list of terms
  Pairs {
     t.1 t.2 e.3, <Order t.1 t.2>:
              {T = (t.1 t.2) <Pairs e.3>;
               F = (t.2 t.1) <Pairs e.3>; };
     t.1 = (t.1); 
         /* the odd term makes a separate list */
        = ;   }
   
  Merge {
     (e.1)(e.2)e.Rest = 
           (<Merge2 (e.1)e.2>) <Merge e.Rest>;
     (e.1) = (e.1);
       =  ;  }
   
  * merge two lists
  Merge2 {
     (t.1 e.X) t.2 e.Y, <Order t.1 t.2>:
                 {T = t.1 <Merge2 (e.X)t.2 e.Y>;
                  F = t.2 <Merge2 (t.1 e.X)e.Y>; };
     (e.1)e.2 = e.1 e.2;  /* One of e.1,e.2 is empty */
         }
   
  * Check whether there is one list or more
  Check {
     (e.1) = e.1;
     e.1 = <Check <Merge e.1>>; }

< follow this link if you want to work with example >

There are more algorithms for sorting. Let us consider the quick-sort. We take the first term in the list t.1 and try to put it in its final place. For this purpose we partition the remaining list into those terms which must precede t.1 -- call this part e.Left -- and those which must follow it -- e.Right . Then the place of t.1 is in between these two lists. Now apply this procedure recursively to the two sublists and concatenate:

  <Sort e.Left> t.1 <Sort e.Right>

Here is the program:

  * Quick-sort
  Sort {
      = ;
     t.1 e.2, <Partit ()t.1()e.2>: (e.L)t.1(e.R) 
           = <Sort e.L> t.1 <Sort e.R>;  
   
  * Partition list e.List by element s.M.
  * <Partit (e.Left)s.M(e.Right)e.Remaining-list>
  *       == (e.Left1)s.M(e.Right1)
  Partit {
     (e.L)s.M(e.R) = (e.L)s.M(e.R)
     (e.L)s.M(e.R) s.X e.2, <Order s.X s.M>:
               {T = <Partit (e.L s.X)s.M(e.R) e.2>;
                F = <Partit (e.L)s.M(e.R s.X) e.2>;
               };
         }

< follow this link if you want to work with example >

Partitioning takes an amount of time that is proportional to the length of the list. The sum of times for partitioning at one recursive level is roughly n. Since e.Left and e.Right are, on the average, equal in length it will take roughly log n levels of recursion. Therefore, the full time is proportional to n log n.