A Set Class

dhilipkumar · Mar 27, 2009, 04:16 PM

A Set Class

Being a big fan of challenge sites, it happens quite often to me to have to use math classes. One problem with C# is that it doesn't implement a set class so, most of the times, I had to switch to java to solve a particular challenge. Java, in fact, exposes several useful set classes, e.g. HashSet, TreeSet, natively. The problem with java is that, although it has wonderful classes for some purposes, it lacks many things for some others. In some cases, to extend a class in order to modify the original behaviour can be a time consuming task and being always in a short time condition, my choice goes to C# whenever it is possible.

The reason I wrote a C# set class is simple. In java when you manage elements in your set, you must ensure that the objects you're adding to the set belong to a class that implements the hashCode() method. This happens because when an object is stored (searched, deleted, ...) the set class needs to have a very quick way to get to it. This quick way is provided by the object.hashCode() method, that returns an int, which is used as a key to store every element. And here comes the problem: what if I needed to store an object with a different key type? Maybe I'd just need shorts, or maybe I'd need longs, or maybe something completely different. But hashCode() returns an int. This problem happend to me several times, when dealing with sets used to store long values. Since you have to store longs, you must do something in order for each long value to provide an unique key. My first attempt was to split the long in 2 ints (first 32 bits and last 32 bits), then combine them someway, e.g. using xor. Depending on the long values you add to your set, this approach can lead to key duplication for different values, so I had some hard times and eventually came up with a set class made by me, purposely designed to avoid these troubles.
The aid of C# generics namespace was invaluable here.

So, first of all, let's remind you what a set class is supposed to do.
Generally speaking, a set is an unordered collection of objects of one type. Being unordered, you don't expect to have access to one element using myset[index] syntax for example. Generally you scan the set with iterators. Also, no element duplication is allowed in a set collection.

The most common operations you'd like to be able to do on a set are:
# Addition/Deletion of an element
# Checking if an element is already present in the set collection
# Intersection between two sets
# Union between two sets (also called set addition)
# Subtraction of a set from another

The intersection operation between A and B (A*B) produces a set C which contains only those elements that are both in A and B. If A and B are disjoint set (with no elements in common) then C is the empty set.

The union operation between A and B (A+B) produces a set C which contains all the elements of A and B, with no repetition of course.

The subtraction of set B from set A (A-B) produces a set C defined as A-(A*B), which means that every element in common between A and B will be removed from A. Those elements in B which were not present in A won't be considered.

I wanted to do these operations as fast as possible, so I looked carefully to C# collection classes, and I found one which turned out to be quite good.

This class is the generic KeyedCollection<K, T> class, which provides object scanning in O(1): marvellous.
The <K, T> notation means, we're adding instances of the T (generic) class, which produce instances of the K (generic) class as keys. For example T could be a Point class and K could be long.

Now, I must be sure that when I declare a Set class, the T class that I have chosen for the elements is appropriate for working with my set class. Same goes for the K class.
Also, since I want the key not to be restricted to the int class, I need the T class the expose a hash method, to be used to retrieve the key.

To accomplish these requirements I wrote an interface. The reason to make an interface (rather than a class) is very simple: C# allows multiple interface inheritance, but only single class inheritance, so if my T class is already derived from some other class (and possibly multiple interfaces), I can always add the interface inheritance without worring that this could cause problems.

Let's see the interface declaration first:

/*
* T must implement the IHashCode interface
* in order to provide the HashCode method
*/
public interface IHashCode<K> {
K HashCode();
}

As commented out in the code section, T must implement the IHashCode<K> interface, so I can be sure it implements the HashCode() method I need for T to work properly with my set.
The HashCode() method, which will be used in substitution of the object.GetHashCode() one, will return the key (which is of K type).
What if you already have a class that already implements the GetHashCode() method and that'd be fine for you to use? Well, in that case, declare K as int and within the HashCode() method, just return the GetHashCode() value for that object.

Now, how do we tell the Set class that it must derive from KeyedCollection<K, T> and also that T must implement the IHashCode<K> interface? Using constraints.

The declaration code is this:

public class Set<K, T> : KeyedCollection<K, T> where T : IHashCode<K> {...}

As you can see, the Set class must be declared with the T and K classes specifications. The first colon states that the Set class is derived from KeyedCollection, and the where clause states that T must implement the IHashCode<K> interface. So we are sure that things will work fine.

by sfabriz
OSIX

dhilipkumar · Mar 27, 2009, 04:17 PM

Let's see the complete Set class:

public class Set<K, T> : KeyedCollection<K, T> where T : IHashCode<K> {

#region Constructors
/// <summary>
/// Default constructor
/// </summary>
public Set() : this(null) {

}

/// <summary>
/// Additional constructor with a Set for initialize the set
/// </summary>
/// <param name="set">The initialization set</param>
public Set(Set<K, T> set) :base() {
Type t=typeof(K);
if (t!=typeof(Int64) && t!=typeof(Int32) && t!=typeof(Int16)) {
throw new ApplicationException("Only Int16/32/64 allowed for generic parameter K");
}
if (set!=null) {
foreach (T item in set) {
this.Add(item);
}
}
}
#endregion

#region KeyedCollection implementation method
/// <summary>
/// Method to get the T class objects' key
/// </summary>
/// <param name="item">The T object</param>
/// <returns>Returns tipically int or long, depending on K</returns>
protected override K GetKeyForItem(T item) {
return item.HashCode();
}
#endregion

#region Set Methods
/// <summary>
/// Addition method. This method adds to the local collection the element
/// provided if it is not already contained in the collection
/// </summary>
/// <param name="item">The element to add</param>
/// <returns>True if the addition succeds, false otherwise</returns>
public new bool Add(T item) {
if (item!=null && !base.Contains(item.HashCode())) {
base.Add(item);
return true;
} else {
return false;
}
}

/// <summary>
/// Union operation. A=A+B
/// </summary>
/// <param name="unionSet">The set to unite to the local one</param>
public void Union(Set<K, T> unionSet) {
foreach (T item in unionSet) {
this.Add(item);
}
}

/// <summary>
/// Intersection operation. A=A*B
/// </summary>
/// <param name="interSet">The set to intersect to the local one</param>
public void Intersection(Set<K, T> interSet) {
for (int i=base.Count-1;i>=0;i--) {
if (!interSet.Contains(base.HashCode())) {
base.RemoveAt(i);
}
}
}

/// <summary>
/// Difference operation. A=A-B
/// </summary>
/// <param name="diffSet"></param>
public void Difference(Set<K, T> diffSet) {
foreach (T item in diffSet) {
base.Remove(item.HashCode());
}
}
#endregion

#region Helper methods
/// <summary>
/// Helper method to print the set's elements
/// </summary>
public void PrintElements() {
if (base.Count==0) {
Console.WriteLine("Set is empty");
return;
}
StringBuilder sb=new StringBuilder("[");
foreach (T item in base.Items) {
sb.Append(item.ToString());
sb.Append(",");
}
sb.Remove(sb.Length-1, 1);
sb.Append("]");
Console.WriteLine(sb.ToString());
}

#endregion

}

Ok, something to explain here. First of all there are two constructors. The first one is the default one, that produces a brand new Set object. In the other one we pass a Set instance as a parameter, and the set created will be filled with its items. Should be something similar to a clone operation.
Notice that in the constructor I put a check, since I didn't want the key to be different from short, int, long. You can remove this constraint, even though using other objects as keys could slow down the performances.

dhilipkumar · Mar 27, 2009, 04:19 PM

Now, a couple of methods that deserve an explanation:

protected override K GetKeyForItem(T item) {
return item.HashCode();
}

This method helps the Set class retrieving the HashCode() from every T objects it manages. Notice that, accordingly to the HashCode() signature, it returns a K key. This is the only method you must override when deriving from KeyedCollection.

After that, the Add() method:

public new bool Add(T item) {
if (item!=null && !base.Contains(item.HashCode())) {
base.Add(item);
return true;
} else {
return false;
}
}

Quite simple indeed, but with a supplementary information. If the T item is not null and not contained in our set, it will be added, and the method will return true, otherwise it will return false. This was done in order to be able to retrieve informations about items uniqueness in just one shot. You add an item: if the Add() method returns true, then it wasn't in the set, otherwise either it was already there or it was null. It's very useful when you have to store values on which you're going to do cpu-intensive computations. If an item is already in the set then you won't repeat the calculations, saving precious time. Also notice that new keyword in the method declaration. Its purpose is to tell the compiler we are rewriting the method, since it was already present in the base class.

The other methods are very very easy to understand so I won't add anything more.

A couple of interesting facts: since the key retrieving is done by this class in O(1), when intersecting two set of size N, you achieve intersection in O(N), which is pretty fast. In second place, using this class to make cpu-intensive computations (on challenges of course...) I have a running time that is about half the time that java takes to do the same job, with the same code, but using its native set classes, so I'm pretty confident that this class is quite fast.

I'll add here a possible implementation of a test class, which inheriths from IHashCode(), just to show you an example:

class Test : IHashCode<long> {
public long n;
public Test(long n) {
this.n=n;
}

public override string ToString() {
return n.ToString();
}

#region IInt32HashCode Members

// this will return the key. In this case, the number itself
public long HashCode() {
return n;
}

#endregion
}

Ok, it should be everything. Download the files, modify the namespaces accordingly to the project you're inserting them into, then compile and enjoy.

by sfabriz
OSIX

dhoni · Apr 10, 2015, 08:23 PM

the set class .net program can be used with that
with that we can easy to use in this program

Discussion Community

News:

A Set Class

dhilipkumar

dhilipkumar

dhilipkumar

dhoni