Navigate / search

Why I Study Functional Programming

For a while now I’ve been slowly writing a book about learning functional programming. I don’t know if it will ever see the light of day, that isn’t really the point. It’s more of an exercise to help me learn. I thought I might share the introduction here.

The Introduction
I became aware of computers and computer programming around the same time I became aware of the Rubik’s Cube. I was nine years old. The Cube awakened a love of puzzles that has stayed with me to this day. The computer was then and still is the ultimate puzzle.

The Rubik’s cube can be solved, there is an obviously right solution. That right solution is elegant, it looks beautiful. Thirty years on from the cube’s heyday someone solving it still attracts attention and it’s still thrilling to see the colours fall into place with the last few turns.

With a computer there is no right solution. There are endless problems to be solved and many solutions to every problem.

For the first few years of playing with computers the challenge was to get it to do anything at all. I wrote simple childish games. Some were fun to play, all were fun to write.

Then there was a Eureka moment. People do this for a living. I could do this for a living. I could get paid to write computer programs.

Studying programming in college was a joy. It was amazing to discuss programming with people who loved it as much as I did, and were often much better at it than I was.

I loved my first job. I wrote code on my first day, put code into production in my first week. I stayed late almost every night. It wasn’t work. It was as much fun as I had always hoped.

There have been many high points in my career, projects that I’m proud of, they are not surprisingly also the projects I most enjoyed working on.

There have also be been low points. Instead of getting to work at my hobby, I turned my hobby into work. I rarely get the same thrill from coding that I did as a 9 or 10 year old. I rarely get to delight in elegant solutions. I rarely get to build something just for the fun of building it.

I continue to write code in my spare time but the relentless emergence of new technologies, tools, frameworks and platforms has meant that even my play time has become work time, study time. I don’t study for fun, I study to keep up, and I mostly fail at that.

Functional Programming was something I added to the list of things I must learn about. It sat there dormant while other “more important” things bubbled to the top and stole my attention.

In 2012, at the Norwegian Developer Conference Vagif Abilov gave a beautiful presentation on a Functional approach to Conway’s Game of Life using F#.

Seeing his solution fall into place was like seeing the last few turns of a Rubik’s Cube. It was elegant, it was simple, it was right. For someone who was becoming increasingly disillusioned with programming, it was frankly moving.

You didn’t need to be a brilliant programmer with encyclopaedic knowledge of frameworks to understand what he was doing. My 10 year old self could have been dropped into that room and understood it all.

The voting system at NDC involves putting Green, Yellow or Red cards into a box indicating whether you thought the talk was Good, Average or Bad. As I registered my vote I noticed that the box was full of Green. I wasn’t the only one in the room who was impressed by what they saw.

I started learning F# and Functional Programming that evening and I haven’t stopped since.

One of those clever guys I talked programming with back in college recently described learning functional programming as “like learning to program all over again”.

He’s right, it is, but it’s even more fun second time around.

Richard Dalton
Kildare, Ireland
May 2014

Learning To Think Functionally : Readability

I’m going to be a little unethical here and disclose an answer to Project Euler problem 6.

It’s one of the easier ones, but even still, if you haven’t done it, head on over and solve it now before reading any further. It’ll only take a few minutes.

Here’s the problem:

The sum of the squares of the first ten natural numbers is: 385
The square of the sum of the first ten natural numbers is: 3025

Hence the difference between the sum of the squares of the first ten natural numbers and the square of the sum is 3025 − 385 = 2640.

Find the difference between the sum of the squares of the first one hundred natural numbers and the square of the sum.

Simple, right?

Here are a few C# functions that give us the answer we need.

        static int Square(int n)
        {
            return n*n;
        }

        static int SumOfSquares(int first, int count)
        {
            var total = 0;
            for (var i = first; i <= count; i++)
                total += Square(i);
            return total;
        }

        static int SquareOfSum(int first, int count)
        {
            var total = 0;
            for (var i = first; i <= count; i++)
                total += i;
            return Square(total);
        }

We get the solution to the problem by subtracting one from the other:

SquareOfSum(1,100) - SumOfSquares(1,100)

Or, we take a more functional approach to the whole problem and use LINQ:

        static int Square(int n)
        {
            return n*n;
        }

        static int SumOfSquares(int first, int count)
        {
            return Enumerable.Range(first, count).Select(Square).Sum();
        }

        static int SquareOfSum(int first, int count)
        {
            return Square(Enumerable.Range(first, count).Sum());
        }

And we get result just as before:

SquareOfSum(1,100) - SumOfSquares(1,100)

Those Enumerable ranges are cool. I think it’s easier to figure out what’s going on than with the imperative example. That said, while it’s better, I’ve never found the readability of LINQ to be anything special. It’s ok.

Let’s look at the same code in F#

let square x = x * x
let sumOfSquares range = range |> List.sumBy square
let squareOfSum range = range |> List.sum |> square

And we can get our answer as follows:

squareOfSum [1 .. 100] - sumOfSquares [1 .. 100]

I think that reads a little better. You need familiarity with the concept of ‘mapping’ a List. If you’re using the C#/LINQ example above you probably know all you need to know.

Take a look again at the SquareOfSum function in the LINQ example. Reading left to right we encounter the call to Square first even though it happens last. This is quite common in imperative languages when functions calls are nested within function calls.

In F# the Forward-Pipe operator (|>) gets around that problem quite neatly.

We can go further with the F# example.

Since all we’re doing with those ‘range’ values is piping them into a chain of functions, we can get rid of them entirely and simply compose sumOfSquares and squareOfSum directly from List.sum, List.map and square.

let square x = x * x
let sumOfSquares = List.map square >> List.sum
let squareOfSum = List.sum >> square
squareOfSum [1 .. 100] - sumOfSquares [1 .. 100]

Out sumOfSquares and squareOfSum functions are now “Point Free” in Functional Programming parlance.

Data Context Interaction (DCI) in C#

Download Sample Project 10.7 KB

The following is a really quick blast through implementing DCI (Data Context Interaction) in C#. This will be low on explaination and high on code.

The essence of DCI is that our entities should be implemented as simply as possible and should then assume various ‘Roles’ at runtime.

Within the context of a UseCase (e.g. Transfer Funds) the same type of Entity (e.g. Account) can play two different Roles (Source Of Funds and Recipient Of Funds). In theory, organising the code like this makes it easier to comprehend the runtime reality of the behaviour of the system.

Enough theory, let’s start with a test

public void TransferFunds()
{
    var accountA = new Account(100);
    var accountB = new Account(0);

    var fromA = accountA.ActLike<SourceOfFunds>();
    var toB = accountB.ActLike<RecipientOfFunds>();

    new TransferFunds(fromA, toB, 75).Execute();

    Assert.AreEqual(25, accountA.Balance);
    Assert.AreEqual(75, accountB.Balance);
}

The highlighted line executes the TransferFunds Use Case (or Context), but it doesn’t accept accounts, it accepts a SourceOfFunds and a RecipientOfFunds.

Since we don’t want to build logic for every role an Account may play into the Account class, Account doesn’t implement these interfaces directly. Instead we use ImpromptuInterface to have the Accounts ‘Play’ the roles of SourceOfFunds and RecipientOfFunds. See the ‘ActLike’ lines above.

The Code

Let’s take each of the pieces in turn. First let’s look at that TransferFunds ‘Context’.

    public class TransferFunds
    {
        public SourceOfFunds Source { get; private set; }
        public RecipientOfFunds Target { get; private set; }
        public decimal Amount { get; private set; }

        public TransferFunds(SourceOfFunds source, RecipientOfFunds target, decimal amount)
        {
            Source = source;
            Target = target;
            Amount = amount;
        }

        public void Execute()
        {
            if (Source.Balance < Amount)
            {
                throw new ApplicationException("insufficient funds");
            }

            Source.TransferTo(Target, Amount);
        }
    }

The context uses a TransferTo method on the SourceOfFunds. Note the check for insufficient funds. There are a few places we could put this, I’ve duplicated in one other place. We’ll discuss that further in a minute.

Here are the SourceOfFunds and RecipientOfFunds interfaces or ‘Roles’

    public interface SourceOfFunds
    {
        decimal Balance { get; }
        void Withdraw(decimal amount);
    }

	public interface RecipientOfFunds
    {
        void Deposit(decimal amount);
    }

For a source of funds we need to withdraw cash, and we need a way of checking that there’s enough cash to withdraw (or perhaps we don’t, that’s an interesting question, what if we just let the source of funds worry about that internally?).

For a recipient of funds we need to deposit cash. We don’t care about the balance of a recipient (Unless we want to write a test. Go back again and look at the Asserts in our test).

There is no presumption here that this role will be filled by an Account, nor should there by. A wallet can be a source of, or a recipient of funds. So can a credit card (which may or may not be a type of account).

Note that Source of Funds doesn’t have a ‘TransferTo’ method. We’d expect to see one, that’s the method that’s called by the context.

In actual fact, TransferTo is a ‘Trait’ that is attached to the SourceOfFunds ‘Role’ using an extension method. This split between Context and Traits is one part of DCI that I have a bit of a problem with. It adds an extra level of decoupling, and makes the Role concept explicit, but I’m dubious about whether it pays for itself.

Here’s the Trait

    public static class TransferFundsTrait
    {
        public static void TransferTo(this SourceOfFunds self, RecipientOfFunds recipient, decimal amount)
        {
            if (self.Balance < amount)
            {
                throw new ApplicationException("insufficient funds");
            }
            
            self.Withdraw(amount);
            recipient.Deposit(amount);
        }
    }

I’ve duplicated the ‘insufficient funds’ check. Whether it goes here or in the Context doesn’t matter in this case, both will work, but it’s valid to ask which is better. We could actually have put this check in other places, like in the Account itself. Putting logic like this into the account does limit the ability of the account to play different roles, so that doesn’t seem wise, but deciding between Context and Trait is less clear cut.

Let’s wrap things up with a look at the Account class.

    class Account
    {
        public decimal Balance { get; private set; }

        public Account(decimal initialBalance)
        {
            Balance = initialBalance;
        }

        public void Withdraw(decimal amount)
        {
            Balance -= amount;
        }

        public void Deposit(decimal amount)
        {
            Balance += amount;
        }
    }

It’s only concerned with the initial value, depositing and withdrawing funds. The simpler it is, the easier it is to have it assume different roles in different contexts.

That’s DCI in a nutshell. I like the use of ImpromptuInterface. I find the notion behind DCI interesting but I don’t have a use for it at the moment, and if I did I think I’d need to mull it over for a long time before I’d commit.

Let me know what you think.

Resources

Data Context Interaction (Wikipedia)

Lean Architecture

Understanding the Four Rules of Simple Design

I don’t normally review books, mainly because when it comes to technical books I rarely manage to read them in their entirety.

I have however just finished “Understanding the Four Rules of Simple Design” by Corey Haines and since it’s self published, and promoted pretty much exclusively through word of mouth, I thought it might be worth a short review.

The “4 Rules” mentioned in the title are the XP Simplicity Rules:

Simple Design:

  1. Passes all tests
  2. Expresses developer intent
  3. Has no duplication
  4. Minimizes the number of classes and methods

To expand on the rules and put them in context Corey draws on 5 years of running Code Retreats in which participants work on Conway’s Game Of Life repeatedly during the course of a day, throwing away their code and starting again about every hour or so.

He deliberately avoids separating out the rules and discussing them in isolation from each other. He does this to reinforce the fact that the rules all impact on each other. Mixing them together means the book feels a little less structured than you might expect, but that doesn’t make it any less readable.

Bottom line, I bought this book, I’m glad I did. I’ve read it, and I will be reading it again.

At about 90 pages you can easily get through this book in an evening. When forewords, acknowledgements and appendices are excluded the book weighs in quite a bit less than 90 pages. Don’t skim or ignore those sections though. You may be like me and find towards the back a link to the original paper on the Law of Demeter and realise that although you’ve read lots about it you’ve never read the original source. Or you may find the link to a 1971 paper on designing for maintainability.

These links to other sources are one reason why the book is so short. It doesn’t cover the same well trodden ground that has been covered adequately elsewhere. The law of demeter is mentioned, but little more than that. The SOLID principles likewise are mentioned, but only to demonstrate that following the four rules tends to deliver SOLID designs.

As I started reading I got a little concerned, the examples of tests and code seemed to be fleshing out a domain model rather than tackling any particular behaviour. I’m no expert on Simple Design but this isn’t the approach I’d take. A few pages later however Corey addresses that very concern and shows how the alternative approach of tracing tests and code back to required behaviour can give better results than state based test driving of models.

Interestingly the book doesn’t implement the Game of Life in full, and doesn’t show how some of the more naive/complex implementations can be greatly simplified. I think that would have been fascinating, but I understand why it wasn’t done. Early in the book Corey is at pains to point out that he wants to avoid focus on “A Good Design” or “The Best Design”. If he had provided anything like a full implementation, it would have been hard to avoid the impression that after 5 years of Code Retreat here was (in his view) “The best design”.

That said, I think the book might have benefited from some additional examples and more detailed discussion. It seemed to end just when things were getting really interesting. The section on inverted composition as an alternative to inheritance threw up a whole host of questions in my mind. Corey concluded that discussion quite quickly with an “I’m not saying this is or is not better” kind of non-conclusion that felt a little unsatisfactory.

If you are the kind of person who reads about and thinks about software design on a regular basis, you may be tempted to write this book off as covering ground you’re familiar with. I wouldn’t do that. For the small investment of time and money that the book requires it does contain a few Aha! moments.

If you don’t read about software design very much then this book is actually a really good place to start. There is nothing complicated or advanced here, There is nothing here that can’t be put into practice immediately.

If you really haven’t thought about code as design then you WILL be a better developer for having read this book. Maybe not massively better. Nothing that can be read in an evening will make a dramatic difference, there are no short cuts. But, If you are looking at the elephant of mastering software design, and trying to figure out how to eat it, this is a small enough bite to get you started.

I suspect this book might benefit from multiple readings, and I’m sure that to get the most from it you really should follow Corey’s advice and read the additional material that he links to.

One final thought, although the book on it’s face seems to discuss Object Oriented design, it really isn’t written in an OO specific way. At the moment most of my spare energy is going into understanding how a functional approach can simplify design. I didn’t feel like I was parking that to “go back” to OO thinking. Everything in the book can be applied to Functional Programming. In fact, reading this book through a functional lens actually throws up a lot of extra questions.

Bottom line, I bought this book, I’m glad I did. I’ve read it, and I will be reading it again.

The “Happy Path Model” of software development

Every movement needs an enemy, it galvanises followers, gives a community a sense of some shared identity. Even if Group A aspire to nothing more than to not be like Group B that is at least something to rally around.

“The Waterfall” is increasingly becoming an Alamo for those who aren’t or don’t want to be convinced by talk of “Agile”.

For the Agile community the designated enemy seems to be “The Waterfall Model” and the command and control project management techniques that usually go hand in hand with it.

The anti-waterfall rhetoric from the agile community has been so strong and so consistent that it has created a backlash. “The Waterfall” is increasingly becoming an Alamo for those who aren’t or don’t want to be convinced by talk of “Agile”.

While the Agilistas and Waterfallers slug it out, there is a third way quietly going about it’s business. “The Happy Path Model” is the most popular software development model in history and still by some margin the most widely used.

“The Happy Path Model” is the most popular software development model in history and still by some margin the most widely used.

Adherents of the happy path are eternal optimists. This time it will be different. Requirements are solid, or at worst there will be minor issues that need clarification. Input data will always be clean, complete and easy to work with. No other priorities will get in the way, and even if they do, that’s why we build in contingency time.

Happy path thinking infects every aspect of a project. Project managers will often start with an end date in mind, simply because someone asked for delivery by that date, and regardless of the scope of the project, that date typically remains, chiselled into a gantt chart, because … happy path.

Developers will be asked for estimates and they will underestimate, because … happy path.

Managers will interpret estimates as commitments, because … happy path.

Even as developers realise they are unlikely to hit milestones, they will remain quiet, convinced they can close the gap, because … happy path.

Managers and developers will both allow the scope of the project to increase, or may actually drive the increase, because … happy path.

Happy path thinking infects every aspect of a project.

It goes deeper, to the very core of the product. Happy path thinking results in inadequate exception handling, logging, and even testing. It can result in entire usage scenarios being missed. The result is often brittle, code that is unfit for purpose and difficult if not impossible to maintain and enhance.

Deployment may be a manual process, but why bother automating it? On the happy path it will never go wrong.

Testing may be manual, but why bother automating any of it? On the happy path limited regression testing is all you need, nothing major will ever break.

To even consider Continuous Integration, Automated Build and Deployment, Automated testing, Environment Provisioning, etc is to allow ones mind to turn to the dark side, where manual mistakes are made and things go wrong and people have to work late. A place where features that used to work can break because some unrelated piece of code changed.

These people are not to be trusted, they are typically less productive than developers who get their head down and code the happy path. This academic “agile” bullshit is all fine for hippies in California, but here in the real world we have software to deliver, and the happy path is the path we follow.

Agility is simply the ability to stray from the happy path with confidence.

When stakeholders realise (as they inevitably do) that the happy path isn’t delivering what they actually need, they begin asking for changes. Any change, even a minor change strikes at the very heart of the happy path philosophy. Such requests can create enormous friction between the stakeholders and the development team.

I once worked on a project where a user came asking for a tiny utility that would dramatically reduce some manual work. It was a single form lookup utility, it was read only. It took less than an hour to build and validate. We were not allowed to release it because it was not part of the original specification of the system. Significantly more time was spent arguing to keep this tool out of production that would have been needed to build, validate and deploy it.

Agile teams embrace these practices not because they are how you DO agile, but because they provide the time and space to BE agile.

The reality is that “Agile” is not “Anti-Waterfall”, “Agile” is “Anti-Happy Path”. Agile is about recognising that at the start of a project we are at our moment of maximum ignorance. It’s about embracing the fact that the act of developing software shines a light into the domain and uncovers hidden corners and edges. It’s about realising that trips off the happy path aren’t exceptions, the happy path is the exception.

Agile promotes practices like Continuous Integration, Automated Build and Deployment, Automated Testing, the ability to spin up and destroy environments quickly and reliably. Many believe that these practices are the point of agile. “Doing” these things means you are “Doing” agile. This is a completely backward view of agile.

Agile teams embrace these practices not because they are how you DO agile, but because they provide the time and space to BE agile. Agility is simply the ability to stray from the happy path with confidence.

Annotate your code with TestCases

I’ve noticed myself doing something when I code that I had never really thought about, it just kind of happened. When I write a test for a very particular type of code, I find myself coding the function right there in the test. Here’s some code that kind of illustrates what I mean.

[TestCase(new byte[] { 97, 98, 99, 32, 100, 101 }, Result = "616263206465")]
public string AsciiBytesToHexString(byte[] asciiBytes)
{

}

Now, there are all sorts of reasons not to like this as a unit test. Why so many bytes? two would probably suffice. Is the fact that one of the bytes represents a space significant? Could the test name be better?

All fine questions and in the past I might have obsessed about them.

Let me back up for a second and see what I’m actually doing here. I want a function that can take a byte array and output a hex string. That is all I want. This isn’t a particularly difficult task, but there are lots of these little tasks where I might not be 100% sure of the C# syntax, or I just want a quickly runnable sanity check that the code behaves like I want, so, I write a test.

With the TestCase in place, I can code the function. I don’t know or care whether this is the perfect code for the job, whether there is some C# command that does all this for me, I just want to get my test passing. If I can improve it later I will.

[TestCase(new byte[] { 97, 98, 99, 32, 100, 101 }, Result = "616263206465")]
public string AsciiBytesToHexString(byte[] asciiBytes)
{
    var sb = new StringBuilder();

    foreach (var asciiByte in asciiBytes)
        sb.Append(asciiByte.ToString("X"));

    return sb.ToString();
}

Now, strip away the TestCase attribute and you’ve basically got the function you wanted. It takes in a byte array and returns a string. The Test dissolves into the background.

In the past I might have written a test somewhere in a test fixture, then created the function I actually want in a Test-Driven fashion to make the test compile and ultimately pass. That gives me a function in one part of my code base, and the tests somewhere else, perhaps in an assembly that ends with *.Tests.dll.

Having noticed myself coding full implementations inside tests, I started thinking about what I do next. In this case, I would probably cut and paste this method from the Test Fixture into whatever home I think it belongs.

I might or might not leave behind a test calling the function in it’s new home. If I was just sussing out syntax I’m not sure having a test separate from the function really pays for itself. This isn’t code I expect to break or even ever be refactored significantly.

Looking at that function with the TestCase attribute It occurs to me that there’s no reason why I can’t take that attribute with me when I move the function to it’s ultimate home. Doing so means that the tests exist IN the production code, rather than as some separate entity.

I seem to straying in the direction of code contracts here, but it seems such an obvious step I’m astonished It hadn’t occurred to me sooner.

When you are looking through production code, you can try a find usages on a function to see if it’s referenced by any tests, but how much better is it to see sample inputs and outputs right there above a function in it’s natural habitat.

This might not be something you would use for every function, but for those small discrete functions that can be illustrated perfectly with one or two test cases this seems like a really useful approach.

And let’s be honest. If a function can’t be illustrated with a few test cases, isn’t worth asking why not?

Enriching Data with F#

Download Sample Project 4.1 KB

Let’s say I’ve pulled some raw data from a normalised database. It might look something like this:

type RawEmployee =
    {
        EmployeeId: int
        DepartmentId: int
        RegionId: int
        GradeId: int
    }

Lots of foreign keys, not much by way of useful information. Let’s say I’d like to turn it into something like this:

type EnrichedEmployee =
    {
        Name: string option
        Department: string
        Region: string
        Grade: string
    }

Notice Name is a string option which means it’s valid to not have one. Gotta love crazy internet examples. The rest of the fields are strings, they are mandatory.

How do I get from the raw record to the enriched record using something close to clean code?

I’ve tackled variations of this problem and seen it tackled by others, many times in many languages. The results are rarely encouraging. One common pattern is very specific complicated enrichment functions with tons of null checks and if…else littered all over the place. The other extreme (and much worse) is some kind of elaborate generalised enrichment framework with creatures like AbstractEnrichmentFactories and EnrichmentStrategyFactoryFactories lurking around every corner.

My latest effort in this area was for an FSharp project. Here’s how I wanted the finished code to work.

1. I wanted the individual enrichments to be simple functions than I can chain together, perhaps in a list.

let functions = [EnrichName; EnrichDepartment; EnrichRegion; EnrichGrade]

2. I wanted enriching a raw record to be as simple as possible. We’ll figure out how to get the enrichment functions in there later.

let e = EnrichAnEmployee rawEmployee

3. I wanted to be able to enrich a list of raw records as easily as a single record.

let e = EnrichEmployees rawEmployeesList

4. I wanted optional fields so even if their enrichment fails the overall enrichment continues.

5. I wanted mandatory fields, so, if their enrichment fails the overall result is None rather than an EnrichedEmployee.

6. I’d like to not have to completely reinvent this solution for every new type that I want to enrich, but I also don’t want tons of reusable infrastructure code that is difficult to understand.

What follows is a first stab at this, I can already see a few ways it can be cleaned up, but here you go.
Lets start with a couple of simple lookup functions for resolving IDs to values. We’ll hard code everything for the purposes of this post. A couple of Lists of Tuples should do.

let names = [1,"Richard"; 2,"Tony"; 3,"Tom"]
let departments = [1,"IT"; 2,"HR"; 3,"Finance"]
let regions = [1,"North"; 2,"South"; 3,"East"; 4,"West"]
let grades = [1,"Junior"; 2,"Senior"; 3,"Principal"]

Next we need our actual Enrichment functions, these will use the data above to resolve some field on the raw record to update the enriched record.

let EnrichName (r: RawEmployee) (e: EnrichedEmployee option) = 
    match List.tryFind (fun i -> fst i = r.EmployeeId) names with
        | Some s -> Some {e.Value with Name = Some (snd s)}
        | None -> e

let EnrichDepartment (r: RawEmployee) (e: EnrichedEmployee option)  =
    match List.tryFind (fun i -> fst i = r.DepartmentId) departments with
        | Some s -> Some {e.Value with Department = snd s}
        | None -> None 

let EnrichRegion (r: RawEmployee) (e: EnrichedEmployee option) = 
    match List.tryFind (fun i -> fst i = r.RegionId) regions with
        | Some s -> Some {e.Value with Region = snd s}
        | None -> None 

let EnrichGrade (r: RawEmployee) (e: EnrichedEmployee option) = 
    match List.tryFind (fun i -> fst i = r.GradeId) regions with
        | Some s -> Some {e.Value with Grade = snd s}
        | None -> None 

Note that for Department, Region and Grade if we don’t find a match, the result of the function is None, but for Name if we don’t find a match, the result is e which is the Enriched record, passed on, untouched.

Let’s see all this in action. First we need the ingredients. A list of the enrichment functions we want to apply. A blankEmployee which is the starting point for the enrichment process, and some raw data to enrich.

let functions = [EnrichName; EnrichDepartment; EnrichRegion; EnrichGrade]
let blankEmployee = Some {Name = None; Department = ""; Region = ""; Grade = ""}

let rawEmployees = [{ EmployeeId = 1; DepartmentId = 1; RegionId = 1; GradeId = 1 };
                    { EmployeeId = 4; DepartmentId = 2; RegionId = 2; GradeId = 2 };
                    { EmployeeId = 3; DepartmentId = 3; RegionId = 3; GradeId = 3 };
                    { EmployeeId = 1; DepartmentId = 4; RegionId = 2; GradeId = 3 };
                    { EmployeeId = 2; DepartmentId = 3; RegionId = 4; GradeId = 3 }]

Now through the magic of currying we create the actual function that will apply our list of functions to Enrich an Employee.

let EnrichAnEmployee = Enrich functions blankEmployee

Where did that Enrich function come from? What does it do? Well, that’s the function that actually does all the magic, I haven’t shown it to you yet, but for now lets press on with the cart before the horse and see what it does, we’ll see how it does it shortly.

Having used the magical Enrich function to create the EnrichAnEmployee function we can use that function as follows

let e = EnrichAnEmployee { EmployeeId = 1; DepartmentId = 1; RegionId = 1; GradeId = 1 }

We can also use it to create a function that will enrich lists of raw Employees

let EnrichEmployees rawList = List.map (fun r -> EnrichAnEmployee r) rawList
let e = EnrichEmployees rawEmployees

Let’s look at the heart of all of this, the Enrich function. It’s actually two functions

let IfSome (f:'e option->'e option) (x: 'e option) =
    match x with
    | Some n -> f x
    | None -> None

let Enrich (functions: ('r -> 'e option -> 'e option) list) (enriched: 'e option) (raw: 'r) =
    let funcs = List.map (fun f -> IfSome (f raw)) functions
    List.fold (fun acc f -> f acc) enriched funcs

Ok, it’s not the prettiest code in the world but that’s mainly down to being explicit about the types of arguments. As Infrastructure Code goes it’s not too bad. For starters there’s only about 10 lines of code to understand. Let’s break it down.

The Enrich function takes three arguments

  1. The list of enrichment functions
  2. The blank employee which is the starting point for the enrichment.
  3. The raw employee record that we want to enrich

Inside the function we do something that you can really only do when functions are truly first class citizens. We first Map the list of functions to essentially bolt on some extra behaviour and turn them into other slightly modified functions, and then we do a Fold to execute the functions in turn and produce a result.

The extra behaviour we bolted on when we mapped the functions is to handle the fact that the output of these functions is an option and so might be a None. If that is the case we don’t want subsequent functions trying to enrich it further.

We basically stick a proxy IfSome function in front of each of our enrichment functions so that it
can intercept any None’s and not bother calling the function.

If you’re having any trouble untangling how the Enrich or IfSome fuctions do their work then drop me a comment below and I’ll go into it in more detail.

You can also download the code and play with it.

Download Sample Project 4.1 KB

A next step that we could take would be to have the enrichment function return more than just an enriched employee. It could return a tuple containing both the enriched Employee and a list of strings or exceptions to capture problems.

E.g instead of getting None we could get (None, [“Department 4 does not exist”]).

This means we can tag the results of the enrichment without polluting the results themselves with exceptions and warnings.

F# and Databases Part Deux (Parameters)

In the last post I showed a handful of lines of code that allowed us to query a database and map the results to a record type in a really simple way. One limitation was that while it could run any SQL query we wanted, it couldn’t handle parameters, and we don’t want to include parameter values directly in the SQL.

Fortunately we can add handling for parameters very easily and learn a few things about F# along the way. Here’s the code from the last post:

module SQLiteLoader
    open System.Data
    open System.Data.SQLite
    let Query (connectionString: string) toType (sql: string) = 
      seq { use cn = new SQLiteConnection(connectionString)
            let cmd = new SQLiteCommand(sql, cn) 
            cmd.CommandType<-CommandType.Text

            cn.Open()
            use reader = cmd.ExecuteReader()
            while reader.Read() do
                yield reader |> toType
         }

And we can call it as follows:

    let ProductLoader = Query @"Data Source=SkuMaker.db3;Version=3;" ToProduct 
    let products = ProductLoader "SELECT * FROM products"   

How can we fit parameters into this setup? What would the ideal solution look like. Perhaps something like this:

    let queryText = "SELECT id, description FROM products where id >= :minId AND id <= :maxId"            
    let parameters = [("minId" min); ("maxId" max)]

    ProductLoader queryText parameters    

We simply create a List of Tuples. Each Tuple represents a name value pair of parameter and value. It could hardly be simpler. Unfortunately it won’t work. When we have a list of Tuples the type or signature of each Tuple in the list must be the same. In the example above we’re in good shape, it’s a list of string*int tuples. But what about the following

    let queryText = "SELECT id, description FROM products where id = :Id AND description = :Description"            
    let parameters = [("Id" 4); ("Description" "Super Dooper Widget")]
    ProductLoader queryText parameters    

To paraphrase a famous quote, One does not simply put a string*int tuple in the same list as a string*string tuple. So, we’re screwed.

Of course, we could commit the henious crime of mentioning the word Object in the context of a functional language. F# is still a .Net language after all, and it’s ‘obj’ type is equivalent to ‘Object’ in C#. Any value we would want to use as a parameter value can ultimately be boxed into an ‘obj’. So we can use a list of string*obj tuples to get the job done. We can do this:

    let queryText = "SELECT id, description FROM products WHERE id = :id AND description = :Description"            
    let parameters = [("Id", box 4); ("Description", box "Super Widget")]
    ProductLoader queryText parameters

It’s not bad, We could definitely work with this as a solution. In C# I’d be delighted to be able to magic up a list of tuples this easily, but this is F# and we can take things a little further. Let’s see if we can get rid of the parenthesis, and maybe that word box. We could write a function that produces a tuple whie boxing the second element in the tuple:

    let param name value =
        (name, box value)

Now our query can be rewritten as:

    let queryText = "SELECT id, description FROM products WHERE id = :id AND description = :Description"            
    let parameters = [param "Id" 4; param "Description" "Super Widget"]
    ProductLoader queryText parameters

It’s important to note that our parameters list still has exactly the same type, it is a list of string*obj tuples. But instead of explicitely creating the tuples inline, we use a function to create them instead. The ‘param “Id” 4’ syntax almost reads like we’ve created a statement in a mini language. This is immensly powerful stuff. Your brain should be spinning with all the potential for creating more literate code without going through the whole process of writing Domain Specific Languages.

The code for our Loader module, including new code for actually using the parameters, now looks like this:

module SQLiteLoader
    open System.Data
    open System.Data.SQLite

    let param name value =
        (name, box value)
        
    let MapParameter (p:(string*obj)) =
        SQLiteParameter(fst p, snd p)

    let Query (connectionString: string) toType (sql: string) (parameters: (string*obj) list)= 
      seq { use cn = new SQLiteConnection(connectionString)
            let cmd = new SQLiteCommand(sql, cn) 
            cmd.CommandType<-CommandType.Text
        
            let SQLiteParams = List.map MapParameter parameters
            List.iter (fun p->cmd.Parameters.Add(p) |> ignore) SQLiteParams

            cn.Open()
            use reader = cmd.ExecuteReader()
            while  reader.Read() do
                yield reader |> toType
         }

Modifying this for another database like Oracle is a trivial exercise.

F# and Databases

Here’s an F# function that takes a connection string, a SQL query, and wedged between them a function that
can convert the results from a DataReader to a Sequence of suitable typed records. Even if you’re unfamiliar with
F# this code should look familiar, it’s straightforward ADO.Net

module SQLiteLoader
    open System.Data
    open System.Data.SQLite
        
    let Query (connectionString: string) toType (sql: string) = 
      seq { use cn = new SQLiteConnection(connectionString)
            let cmd = new SQLiteCommand(sql, cn) 
            cmd.CommandType<-CommandType.Text
        
            cn.Open()
            use reader = cmd.ExecuteReader()
            while reader.Read() do
                yield reader |> toType
         }

Let’s look at that ‘toType’ function more closely. I have a type called Category with two properties, and I have a function ‘ToCategory’ that takes a SQLiteDataReader and sets the properties of a Category. The ‘ToCategory’ function, or another very like it will be passed to the Query function above when the time comes.

module Categories

    open System.Data.SQLite
    open SQLiteLoader

    type Category =
        { 
            Code: string
            Description: string    
        }

    let ToCategory (reader: SQLiteDataReader ) =
        { 
            Code = unbox(reader.["Code"])
            Description = unbox(reader.["Description"])
        }    

Don’t worry about that unbox function. If you come from a C# or VB.Net background you may know about boxing and unboxing, or, more likely, you know of it, but don’t usually give it much of a thought. C# and VB.Net allow from implicit type conversions so boxing and unboxing usually just works. In F# all type conversions must be explicit. This can catch you by surprise because in F# Type inference is such an impressive feature it’s a bit of a jolt to be caught out by a casting issue. In this case unbox is just casting the value in the reader (an obj) to the desired type.

OK, the stage is set, we have a query that can load data from the data base, we have a type that we’d like that data mapped to, and we have a function to do the mapping. Time to have some fun.

open System
open SQLiteLoader
open Categories

let Display category =
    printfn "%s - %s" category.Code category.Description

[<EntryPoint>]
let main argv = 
    let categories = Query @"Data Source=SkuMaker.db3;Version=3;" ToCategory "Select * from Categories"
    Seq.iter Display categories
    Console.ReadKey() |> ignore;
    0

Getting our categories is just a matter of calling Query and giving it a connection string, our mapping function and a sql query.

Passing the connection string every time we want to query the database is going to get a bit tiresome. Let’s see if we can avoid doing that.

[<EntryPoint>]
let main argv = 
    let Loader = Query @"Data Source=SkuMaker.db3;Version=3;"
    let categories = Loader ToCategory "SELECT * FROM Categories"    
    
    Seq.iter Display categories
    Console.ReadKey() |> ignore;
    0

That’s better, we create a new function Loader which is a partial execution of Query. We’ve used Currying to lock in the connection string. Now we can use Loader in place of Query and it will work exactly the same way, except we’ve eliminated the need to keep passing in a connection string.

This is pretty cool. We have a function that knows where our Database is, and we just need to give it a mapping function and a sql query and it will return a suitably mapped sequence of results.

We can go further.

Taking Loader as our starting point we can use partial execution (currying) again to lock in the ToCategory mapping function and create a new function CategoryLoader that knows about both the connection string and the mapping to categories. This new function just needs to be passed a sql string.

[<EntryPoint>]
let main argv = 
    let Loader = Query @"Data Source=SkuMaker.db3;Version=3;"    
    let CategoryLoader = Loader ToCategory   
    let categories = CategoryLoader "SELECT * FROM Categories"    
    
    Seq.iter Display categories
    Console.ReadKey() |> ignore;
    0

We can use Loader to create loaders for other ‘Types’ and we can use CategoryLoader for different queries that load categories. And we can even create different Mapping functions if we want different representations of a Category.

I’m not really interested in investing any time in creating some all encompassing Data Wrapper for F#, I’ve been down that road in C# and it’s just heartache and struggle and code that ends up more complicated than it needs to be.

In the example above you can see that F# allows us to quickly and easily spin up code that just works, is light weight enough that writing it shouldn’t be a huge burden.

The joy of this is that as we find limitations or better ways of doing things we can simply start doing things differently for situations where it works. We’re not left trying to shoehorn whatever problem we have into the Framework we’ve shackled ourselves to.

Don’t scatter logic all over the call stack

Here’s a problem I come across all the time in legacy code and I have been guilty of it myself in the past. This isn’t going to be rocket science, but, apparently, this stuff doesn’t go without saying. Normal caveats about this being a simplified contrived example apply.

Take a look at this code

static void Main(string[] args)
{
    PrintTradeBalance();
    Console.ReadKey();
}

Oh! to have code this simple right? It’s pretty clear what it does, it prints a trade balance, whatever that is. Let’s not break out the bunting and brass band yet. What does this code ACTUALLY do. Let’s dig into the function.

private static void PrintTradeBalance()
{
    var results = GetTradeBalance();
	
    foreach (var result in results)
        Console.Write("Country {0}, Value {1}\n", result.Country, result.Value);
}

Well it turns out a TradeBalance isn’t a thing it’s a collection, each country has one. Let’s leave the function name aside for a moment. We see the function GETS the trade balances, then prints them. It’s still not really clear what a trade balance is or where it comes from. Lets’ keep digging.

private static IEnumerable<Result> GetTradeBalance()
{
    var cashMovements = GetCashMovements();

    var results = cashMovements
        .GroupBy(i => i.Country)
        .Select(r => new Result
        {
            Country = r.Key,
            Value = r.Sum(c => c.ValueIn - c.ValueOut)
        });

	return results;
}

Now we’re getting somewhere. We get CashMovements and aggregate them by Country, Netting the ValueIn against the ValueOut. But what are CashMovements? Where do they come from?

private static IEnumerable<CashMovement> GetCashMovements()
{
    var transactions = GetTransactions();

    var movements = transactions
        .GroupBy(i=>new { i.Country, i.Direction })
        .Select (m=>new CashMovement
        {
            Country = m.Key.Country,
            ValueIn = m.Where(x => x.Direction == "Export").Sum(c => c.Price * c.Quantity),
            ValueOut = m.Where(x => x.Direction == "Import").Sum(c => c.Price * c.Quantity)
        });
        
    return movements;
}

CashMovements are an aggregation of Transactions, which have a direction (Import/Export) a price and a Quantity.

We could go further and see where the transactions come from, but lets assume they are pulled from a DB.

Problems
Oh code! how do I hate thee? Let me count the ways.

  • We had to burrow down three levels deep just to see where the source data came from.
  • The transformation of the data was spread over those functions we drilled into.
  • We encountered the steps of the transformation in reverse order as we drilled down.

Does any of this matter?

Yes. I think it does.

First let’s address one concern you might have. Yes, the aggregation that’s going on here is relatively simple and could probably be done in one function. That’s not really the point. In practice these kinds of transformations, aggregations and enrichment are far more complicated and this drilling down deeper and deeper into functions calls is not uncommon.

Having to burrow down into a function should mean we are going into more detail, a lower lever of abstraction. In this case that’s not really what’s happening, we are simply drilling down into the NEXT step of the overall algorithm. All of these steps are essentially at the same level of abstraction.

We lose any sense of the overall transformation by stringing together function calls like this. Worse, as mentioned above, the algorithm is actually upside down. We drill down 3 levels to get the source data and then transform it on the way back up the call stack. We could hardly find a less intuitive way of describing the algorithm if we tried.

We’ve also coupled each function to the next. Each of these functions does something useful to data, but we don’t provide the data, The functions pull it out of the next function, which means they need to be aware of the next step in the chain. This is a fundamentally flawed way of thinking about functions, it makes life needlessly complicated. The code is harder to test, harder to understand, harder to maintain.

So, why is this kind of code so common?

I believe it may be a problem in the way we teach programming. We tell students to decompose problems from the top down, pushing complexity down into lower level functions. The result is code like that described above.

There are two kinds of complexity and I don’t think students of programming are adequately taught to understand and handle the different kinds.

The first kind of complexity is domain complexity, the problem at hand. The seemingly arbitrary business rules and edge cases that seem to keep popping up. This kind of complexity is vitally important. This is the stuff your client will ask you to explain, ask you to change, ask you to validate. This sort of complexity shouldn’t be pushed down, it should be pulled out, highlighted.

The transformation of transactions into Trade Balances is a Use Case of our system. The logic needs to be easily accessible and future developers need to be able to grasp what’s happening quickly and change it with confidence.

The other kind of complexity is implementation details. You need to go through a Web Service to get the transactions? That’s an implementation issue. You pull the data from a Database and map it to structs or classes? that’s an implementation issue. These sorts of complexity should be pushed down, hidden, abstracted away so that the business logic stands out.

In the code above we shouldn’t be pushing logic down to lower and lower functions. Spreading the algorithm over the call stack by “pushing down” is nuts. We should be pulling the algorithm up, embracing it, making it jump off the screen when someone opens our code.

This isn’t a rant about functional programming, it doesn’t matter whether you use functional, OO or procedural code the principle here is the same.

The following modified code isn’t astonishing or revolutionary or even beautiful, but it’s better, a little better, and if we could all just get a little better life would be so much easier, legacy code would be a much smaller problem. The changes aren’t even difficult. This is just a question of internalising a very simple idea and you’ll never write code like the code above again.

static void Main(string[] args)
{
    var transactions = GetTransactions();
    var cashMovements = GetCashMovements(transactions);
    var tradeBalances = GetTradeBalances(cashMovements);
    PrintTradeBalances(tradeBalances);
    Console.ReadKey();
}

We get Transactions, turn them into CashMovements, and turn those into TradeBalances and print them.
We can now drill down meaningfully into any of those steps to see how each is done. That is valid use
of drill down, we are going to a lower level of abstraction.

The algorithm also reads the right way around. We start with Transactions and end with TradeBalances.

The only change to the functions is that instead of each calling the next function in the change, each is passed the data that it operates on as a parameter.

private static IEnumerable<CashMovement> GetCashMovements(IEnumerable<Transaction> transactions)
{
    var movements = transactions
        .GroupBy(i => new { i.Country, i.Direction })
        .Select(m => new CashMovement
        {
            Country = m.Key.Country,
            ValueIn = m.Where(x => x.Direction == "Export").Sum(c => c.Price * c.Quantity),
            ValueOut = m.Where(x => x.Direction == "Import").Sum(c => c.Price * c.Quantity)
        });
        
        return movements;
}

private static IEnumerable<TradeBalance> GetTradeBalances(IEnumerable<CashMovement> cashMovements)
{
    var tradeBalances = cashMovements
        .GroupBy(i => i.Country)
        .Select(r => new TradeBalance
        {
            Country = r.Key,
            Value = r.Sum(c => c.ValueIn - c.ValueOut)  
        });

        return tradeBalances;
}

private static void PrintTradeBalances(IEnumerable<TradeBalance> tradeBalances)
{
    foreach (var tradeBalance in tradeBalances)
        Console.Write("Country {0}, Value {1}\n", tradeBalance.Country, tradeBalance.Value);
}