Sorting on two dimensional processor arrays

Algorithm LS3 sort

A very simple algorithm for sorting on a two-dimensional mesh is LS3 sort by Lang, Schimmler, Schmeck and Schröder [LSSS 85].

Algorithm

Algorithm LS3 sort is based on a merge algorithm that merges four sorted k/2×k/2-arrays to one sorted k×k-array. Sorting direction is the snake. The merge algorithm makes use of the basic operations shuffle and oets.

shuffle

A deck of cards can be mixed by exactly interleaving its two halves. The permutation obtained by this mix is called perfect shuffle or shuffle for short. It is shown here for n = 8 elements:

0 1 2 3 4 5 6 7
0 4 1 5 2 6 3 7

On a processor array the shuffle permutation can be realized by a "triangle" of exchanges between neighbouring elements:

0 1 2 3 4 5 6 7
 ⟷ 
 ⟷  ⟷ 
 ⟷  ⟷  ⟷ 
0 4 1 5 2 6 3 7
oets

The operation oets stands for one step of odd-even transposition sort. In an even step all elements at even positions of the sorting order are compared with their right neighbour elements and exchanged if necessary. In an odd step the same is done with all elements at odd positions. Odd and even steps are applied alternately.

 

 

Procedure ls3-merge(k)

Input:

k×k-array whose k/2×k/2-subarrays are sorted in snake-like direction

Output:

the k×k-array sorted in snake-like direction

Method:

  1. shuffle along the rows;
  2. sort each double column in snake-like direction;
  3. apply 2k oets-steps to the snake;

 

The correctness of procedure ls3-merge is shown using the 0-1-principle.

The initial situation is an array filled with zeroes and ones whose four subarrays are sorted in snake-like direction. This situation is shown in Figure 1a (zeroes are drawn white, ones gray). Each subarray contains a certain number of full rows (a, b, c and d, respectively) and possibly one incomplete row.

After the shuffle operation in Step 1 the situation of Figure 1b is obtained. In every double column there are a + b + c + d ones that stem from the full rows plus at most four more ones from the incomplete rows.

Case 1:  a + b + c + d is even

Sorting the double columns in Step 2 yields a situation as shown in Figure 1c. The ones from the full rows form (a+b+c+d)/2 full rows. Additionally, in each double column there are at most 4 ones from the incomplete rows. These ones form an unsorted zone consisting of at most two rows.

 

Initial situation Situation after the shuffle operation Situation after sorting the double columns 

Figure 1: Proof of correctness of procedure ls3-merge

 

Case 2:  a + b + c + d is odd

If a + b + c + d is odd then the ones from the full rows form a step in each double column (Figure 2). Now if in one double column there were 0 and in another double column there were 4 additional ones from the incomplete rows, then an unsorted zone of three rows would arise.

 

Figure 2: Situation if a+b+c+d is odd 

Figure 2: Situation if a+b+c+d is odd

 

However, this is only possible if all incomplete rows start from the same side (e.g. all from left or all from right). Since the sorting direction is the snake this implies that the numbers a, b, c, d are all even or all odd. But then their sum cannot be odd.

If a + b + c + d is odd then in every double column there are at least one or at most three additional ones. Therefore, also in this case the unsorted zone consists of at most two rows.

Eventually, in Step 3 of the procedure the unsorted zone is sorted. Since it consists of at most 2k elements, 2k oets-steps are sufficient.

By recursive application of procedure ls3-merge an unsorted array is sorted:

 

Algorithm ls3-sort(n)

Input:

unsorted n×n-array

Output:

the sorted n×n-array

Method:

  1. if n > 1 then
    1. apply ls3-sort(n/2) recursively to the four n/2×n/2-subarrays;
    2. ls3-merge(n);

 

Analysis

An analysis of procedure ls3-merge(k) yields the following number of operations:

Step 1: k/2 
Step 2: 2k 
Step 3: 2k 
altogether: 4.5k 

Then algorithm LS3 sort requires T(n)  =  4.5(n + n/2 + n/4 + ... + 2) operations; therefore it holds that

T(n)  ≤ 9n

In [LSSS 85] it is shown that the time complexity of LS3 sort can be reduced to 7n by sorting the double columns in k rather than in 2k steps.

Simulation

The following applet shows execution of LS3 sort with a 32×32-array of zeroes and ones. For better illustration some operations are shown sequentially that would be performed in parallel on a two-dimensional processor array.

 

(Java applet for simulation of LS3 sort)

Conclusion

Algorithm LS3 sort for sorting an n×n-array has the same asymptotic time complexity as the algorithm of Thompson/Kung [TK 77], namely O(n). Regarding its constant it is slower by a factor of about 2. The quality of the algorithm lies in its simplicity together with asymptotic optimality.

References

[LSSS 85]   H.W. Lang, M. Schimmler, H. Schmeck, H. Schröder: Systolic Sorting on a Mesh-Connected Network. IEEE Transactions on Computers C-34, 7, 652-658 (1985)

[TK 77]   C.D. Thompson, H.T. Kung: Sorting on a Mesh-Connected Parallel Computer. Communications of the ACM, 20, 4, 263-271 (1977)

 

Next:   [4-way mergesort]   or   [up]

 


H.W. Lang   mail@hwlang.de   Impressum   Datenschutz
Created: 20.08.2001   Updated: 07.02.2023