SICP Exercise 2.60 allow duplicates

Exercise 2.60.  We specified that a set would be represented as a list with no duplicates. Now suppose we allow duplicates. For instance, the set {1,2,3} could be represented as the list (2 3 2 1 3 2 2). Design procedures element-of-set?, adjoin-set, union-set, and intersection-set that operate on this representation. How does the efficiency of each compare with the corresponding procedure for the non-duplicate representation? Are there applications for which you would use this representation in preference to the non-duplicate one?

SOLUTION

The code and tests are here.

Since we allow duplicates, the union operation is simple: just append one set to the other. This procedure will have the performance characteristics of the primitive 'append' which is Theta(n) where n is the number of elements in one of the sets (depending upon which set 'append' chooses to walk through to the end.)

No change in the implementation of element-of-set?.
When the first occurrence of x is found, we evaluate to true. Else, continue looking and evaluate to false when we reach the end of the set. I can't think of how to make it more efficient. Performance characteristic is Theta(n) where n is the number of elements in the list. Since duplicates are allowed, the list may be longer for the same set so the performance will be accordingly less than when duplicates are not allowed.

To construct the intersection set, we need to find elements that are present in both the sets. So the checks below are necessary. So we don't change this implementation from the previous exercise. Now since duplicate elements are allowed, the lists are likely to be longer, so this procedure will be less efficient simply because it handles larger lists for the same sets. Since intersection-set grows as Theta(n squared), this can become a problem

Comments:

In this implementation, 'adjoin-set' and 'union-set' are more efficient than their previous implementations. 'element-of-set?' and 'intersection-set' have not changed but are likely to degrade slightly because the input lists may be longer. So I would use this implementation of set operations when I know that the sets are unlikely to become too large.

Comments

Popular posts from this blog

SICP Exercise 2.56 differentiation rule

SICP Exercise 1.28 (Miller-Rabin Test)

SICP Exercise 4.18 a alternative strategy for interpreting internal definitions