One book that I recommend the reading is Clean Code, by Robert Martin. It is a well written book with wonderful techniques to create better code and improve your current programs, so they become easier to read, maintain and understand.
While going through it again, I found an excellent opportunity to improve my skills trying to do some refactoring: in listing 4.7 there is a prime generator function that he uses to show some refactoring concepts and turn int listing 4.8. I then thought do do the same and show my results here.
We can start with the listing converted to C#. This is a very easy task. The original program is written in Java, but converting it to C# is just a matter of one or two small fixes:
using System;
namespace PrimeNumbers
{
/**
* This class Generates prime numbers up to a user specified
* maximum. The algorithm used is the Sieve of Eratosthenes.
* <p>
* Eratosthenes of Cyrene, b. c. 276 BC, Cyrene, Libya --
* d. c. 194, Alexandria. The first man to calculate the
* circumference of the Earth. Also known for working on
* calendars with leap years and ran the library at Alexandria.
* <p>
* The algorithm is quite simple. Given an array of integers
* starting at 2. Cross out all multiples of 2. Find the next
* uncrossed integer, and cross out all of its multiples.
* Repeat untilyou have passed the square root of the maximum
* value.
*
* @author Alphonse
* @version 13 Feb 2002 atp
*/
public class GeneratePrimes
{
/**
* @param maxValue is the generation limit.
*/
public static int[] generatePrimes(int maxValue)
{
if (maxValue >= 2) // the only valid case
{
// declarations
int s = maxValue + 1; // size of array
bool[] f = new bool[s];
int i;
// initialize array to true.
for (i = 0; i < s; i++)
f[i] = true;
// get rid of known non-primes
f[0] = f[1] = false;
// sieve
int j;
for (i = 2; i < Math.Sqrt(s) + 1; i++)
{
if (f[i]) // if i is uncrossed, cross its multiples.
{
for (j = 2 * i; j < s; j += i)
f[j] = false; // multiple is not prime
}
}
// how many primes are there?
int count = 0;
for (i = 0; i < s; i++)
{
if (f[i])
count++; // bump count.
}
int[] primes = new int[count];
// move the primes into the result
for (i = 0, j = 0; i < s; i++)
{
if (f[i]) // if prime
primes[j++] = i;
}
return primes; // return the primes
}
else // maxValue < 2
return new int[0]; // return null array if bad input.
}
}
}
The first step is to put in place some tests, so we can be sure that we are not breaking anything while refactoring the code. In the solution, I added a new Class Library project, named it GeneratePrimes.Tests and added the packages NUnit, NUnit3TestAdapter and FluentAssertions to get fluent assertions in a NUnit test project. Then I added these tests:
using NUnit.Framework;
using FluentAssertions;
namespace PrimeNumbers.Tests
{
[TestFixture]
public class GeneratePrimesTests
{
[Test]
public void GeneratePrimes0ReturnsEmptyArray()
{
var actual = GeneratePrimes.generatePrimes(0);
actual.Should().BeEmpty();
}
[Test]
public void GeneratePrimes1ReturnsEmptyArray()
{
var actual = GeneratePrimes.generatePrimes(1);
actual.Should().BeEmpty();
}
[Test]
public void GeneratePrimes2ReturnsArrayWith2()
{
var actual = GeneratePrimes.generatePrimes(2);
actual.Should().BeEquivalentTo(new[] { 2 });
}
[Test]
public void GeneratePrimes10ReturnsArray()
{
var actual = GeneratePrimes.generatePrimes(10);
actual.Should().BeEquivalentTo(new[] { 2,3,5,7 });
}
[Test]
public void GeneratePrimes10000ReturnsArray()
{
var actual = GeneratePrimes.generatePrimes(10000);
actual.Should().HaveCount(1229).And.EndWith(9973);
}
}
}
These tests check that there are no primes for 0 and 1, one prime for 2, the primes for 10 are 2, 3, 5, 7 and that there are 1229 primes less than 10,000 and the largest one is 9973. Once we run the tests, we can see that the pass and we can start doing our changes.
The easiest fix we can do is to revise the comments at the beginning. We don't need the history of Erasthotenes (you can go to Wikipedia for that). We don't need the author and version, thanks to source control technology 😃. We don't need either the initial comment:
/**
* This class Generates prime numbers up to a user specified
* maximum. The algorithm used is the Sieve of Eratosthenes.
* https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes
*/
public class GeneratePrimes
{
public static int[] generatePrimes(int maxValue)
Then we can invert the initial test, to reduce nesting. If we hover the mouse in the line of the first if, an arrow appears at the border, indicating a quick fix:
We can do the quick fix, then eliminate the else clause (don't forget to remove the extra comments that are not needed):
public static int[] generatePrimes(int maxValue)
{
if (maxValue < 2)
return new int[0];
// declarations
int s = maxValue + 1; // size of array
bool[] f = new bool[s];
int i;
Save the code and check that all tests pass. The next step is to rename the variables:
- s can be renamed to sizeOfArray
- f can be renamed as isPrimeArray
Go to the declaration of s and press Ctrl-R-R to rename and rename it tosizeOfArray. Do the same with the f variable. Don't forget to remove the comments (and to run the tests):
int sizeOfArray = maxValue + 1;
bool[] isPrimeArray = new bool[sizeOfArray];
int i;
To go to the next refactorings, we can use the comments as indicators for extracting methods. We can extract the InitializeArray method:
The extracted code isn't what I expected, so I change it to:
private static bool[] InitializeArray(int sizeOfArray)
{
bool[] isPrimeArray = new bool[sizeOfArray];
// initialize array to true.
for (var i = 0; i < sizeOfArray; i++)
isPrimeArray[i] = true;
return isPrimeArray;
}
I can use the code like this:
var isPrimeArray = InitializeArray(sizeOfArray);
After passing the tests, I can refactor the code of InitializeArray to:
private static bool[] InitializeArray(int sizeOfArray)
{
return Enumerable
.Range(0, sizeOfArray)
.Select(n => true)
.ToArray();
}
The next step is the sieve:
The code for the sieve is really bad:
private static void Sieve(int sizeOfArray, bool[] isPrimeArray,
out int i, out int j)
{
// get rid of known non-primes
isPrimeArray[0] = isPrimeArray[1] = false;
for (i = 2; i < Math.Sqrt(sizeOfArray) + 1; i++)
{
if (isPrimeArray[i]) // if i is uncrossed, cross its multiples.
{
for (j = 2 * i; j < sizeOfArray; j += i)
isPrimeArray[j] = false; // multiple is not prime
}
}
}
It has two out parameters (which, for me, is a code smell), and has an error (the out parameter j must be assigned) before exiting the method. So we can change it to remove the out parameters and remove the sizeOfArray parameter:
private static void Sieve(bool[] isPrimeArray)
{
var sizeOfArray = isPrimeArray.Length;
isPrimeArray[0] = isPrimeArray[1] = false;
for (int i = 2; i < Math.Sqrt(sizeOfArray) + 1; i++)
{
if (isPrimeArray[i]) // if i is uncrossed, cross its multiples.
{
for (int j = 2 * i; j < sizeOfArray; j += i)
isPrimeArray[j] = false;
}
}
Then, we can extract the method to count primes:
CountPrimes has the same flaws as Sieve, so we change it to:
private static int CountPrimes(bool[] isPrimeArray)
{
var sizeOfArray = isPrimeArray.Length;
var count = 0;
for (var i = 0; i < sizeOfArray; i++)
{
if (isPrimeArray[i])
count++;
}
return count;
}
We can refactor it to:
private static int CountPrimes(bool[] isPrimeArray) =>
isPrimeArray.Count(i => i);
The next step is MovePrimes:
After we tweak the MovePrimes code, we get:
private static int[] MovePrimes(bool[] isPrimeArray, int count)
{
var sizeOfArray = isPrimeArray.Length;
var primes = new int[count];
for (int i = 0, j = 0; i < sizeOfArray; i++)
{
if (isPrimeArray[i]) // if prime
primes[j++] = i;
}
return primes;
}
Then we can refactor MovePrimes:
private static int[] MovePrimes(bool[] isPrimeArray, int count) =>
isPrimeArray
.Select((p, i) => new { Index = i, IsPrime = p })
.Where(v => v.IsPrime)
.Select(v => v.Index)
.ToArray();
Notice that we aren't using the primes count in this case, so we can remove the calculation of the count and the parameter. After some cleaning and name changing, we get:
public static int[] GetPrimes(int maxValue)
{
if (maxValue < 2)
return new int[0];
bool[] isPrimeArray = InitializeArray(maxValue);
Sieve(isPrimeArray);
return MovePrimes(isPrimeArray);
}
Much cleaner, no? Now, it's easier to read the method, the details are hidden, but the code still runs the same way. We have a more maintainable method, and it shows clearly what it does.
But there is a change we can do here: we are using static methods only. We can then use extension methods and add the keyword this to allow the methods to be used as extension methods. For example, if we change MovePrimes and Sieve to:
private static int[] MovePrimes(this bool[] isPrimeArray) =>
isPrimeArray
.Select((p, i) => new { Index = i, IsPrime = p })
.Where(v => v.IsPrime)
.Select(v => v.Index)
.ToArray();
private static bool[] Sieve(this bool[] isPrimeArray)
{
var sizeOfArray = isPrimeArray.Length;
isPrimeArray[0] = isPrimeArray[1] = false;
for (int i = 2; i < Math.Sqrt(sizeOfArray) + 1; i++)
{
if (isPrimeArray[i]) // if i is uncrossed, cross its multiples.
{
for (int j = 2 * i; j < sizeOfArray; j += i)
isPrimeArray[j] = false;
}
}
return isPrimeArray;
We can have the GetPrimes method to be changed to:
public static int[] PrimesSmallerOrEqual(this int maxValue)
{
if (maxValue < 2)
return new int[0];
return maxValue.InitializeArray()
.Sieve()
.MovePrimes();
}
Cool, no? With this change, the tests become:
public class GeneratePrimesTests
{
[Test]
public void GeneratePrimes0ReturnsEmptyArray()
{
0.PrimesSmallerOrEqual().Should().BeEmpty();
}
[Test]
public void GeneratePrimes1ReturnsEmptyArray()
{
1.PrimesSmallerOrEqual().Should().BeEmpty();
}
[Test]
public void GeneratePrimes2ReturnsArrayWith2()
{
2.PrimesSmallerOrEqual()
.Should().BeEquivalentTo(new[] { 2 });
}
[Test]
public void GeneratePrimes10ReturnsArray()
{
10.PrimesSmallerOrEqual()
.Should().BeEquivalentTo(new[] { 2, 3, 5, 7 });
}
[Test]
public void GeneratePrimes10000ReturnsArray()
{
10000.PrimesSmallerOrEqual()
.Should().HaveCount(1229).And.EndWith(9973);
}
}
The full code is at https://github.com/bsonnino/PrimeNumbers. Each commit there is a phase of the refactoring.