Picks for October, 2016

The title of this post is probably a little misleading, given that a third of November has gone by already, but here goes…

The Ultimate List of Developer Podcasts

The irrepressible John Sonmez has a rather comprehensive list of Developer Podcasts. He first created this list back in 2014, but he has refreshed it recently so it is bang up to date. If you’re a developer looking for something interesting to listen to on your daily commute then you’ll almost certainly find it here.

PythonBytes

Sticking with podcasts, this one was probably too late to make John’s list. Michael Kennedy, the genial host of the popular Talk Python To Me podcast, has teamed up with Brian Okken, host of the Python Testing podcast, to bring a new podcast Python Bytes, billed as “Python headlines delivered directly to your earbuds.” At the time of writing there has only been one episode, but it’s a promising start.

The X Macro

While I was researching techniques for implementing a CHIP-8 emulator, I stumbled across some posts about an idea known as the X Macro, a coding technique that leans heavily on the C pre-processor to simplify the generation of things like tables.

Here is a two part article on the topic by Andrew Lucas, and another article by Walter Bright.

“If you stranded me on a desert island with a computer that only had a Python interpreter on it, then I’d use it to write a C compiler, and if it only had a C compiler then I’d write a Python interpreter.”

C is an old language. And, because I’ve been using it for many years, I thought I’d seen most techniques. But the X Macro was new to me, even though the idea pre-dates C.

How it feels to learn JavaScript in 2016

This piece describes a hypothetical programmer who has just picked up a web project and asks a web developer friend for their advice. Is it humour? Reality? Both?

This is simultaneously fascinating and worrying. How many security flaws are introduced by this apparent rush for the latest silver bullets? How many people choose wrong when betting on front-end technology? Or is it that people are always doomed to make that bet over and over because the guy who introduced the last framework and shiny new toolchain  that almost does everything but not quite, has moved on, leaving everyone else to pick up the pieces?

The CATCH Unit Test Framework for C++

CATCH, or C++ Automated Tested Cases in Headers, is a header-based test automation framework for C++. It’s exceedingly simple to use and doesn’t require much ceremony.

Visual C++ for Linux Development

From the point of view of someone who works on a cross-platform product, this Visual Studio plugin from Microsoft looks interesting, as it adds the ability to write and compile Linux code from Visual Studio. You can find more information about it here.

I spend a lot of time in Visual Studio, but when I’m working on Linux then I usually reach for vim. They both have their uses, but I’m intrigued by the convenience of being able to do more from Visual Studio. Of course, by now every Emacs user (both of them!) in the world is glowering at me because they do all of their work without leaving the editor.

 

From Duskers to CHIP-8

Duskers

Duskers is a sci-fi roguelike game by Misfits Attic. In it, the player commands small squads of robots, sending them into abandoned spacecraft to salvage them and to figure out “how the universe became a giant graveyard”. It’s an excellent game with that low-tech vibe that you get from movies such as Aliens.

In Duskers the robots (or drones as they’re called in the game) are controlled by typing simple commands, such as telling them which room to go to, or to gather scrap. These commands can be chained together, so that a drone can be instructed to perform a sequence of actions, but they don’t quite go as far as to make the drones programmable.

Now, that got me thinking. “What if you could program the drones in Duskers? What would that game be like? What language would it use?” With its low-tech feel, I thought that the drones would probably use simple 8-bit or 16-bit processors, but they might be programmed in a slightly higher level language, such as Forth, so I set about writing a Forth-like.

That was an interesting exercise in its own right, and I found many excellent resources online that showed how to write a Forth interpreter. But soon after starting coding, I realised that the game I had in mind could have large numbers of drones whose code could be written by different people (anyone remember Robocode?) – so I started looking into how that could be implemented.

Implementing a VM

First approach: Indirect calls

In my first attempt at implementing a Forth virtual machine, its main loop looked something like this:

for (;;)
{
    (*ip++)();
}

This is all very well for a single drone, but it doesn’t cater for multiple drones, each with its own VM. In theory, each VM could be given its own thread, but this would not always be fair. Ideally each VM would be allowed to run for a fixed number of instructions before yielding control, ensuring that no drone could gain an unfair advantage. A possible implementation is shown below:

for (;;)
{
    for (auto i = 0; i < n; i++)
    {
        (*ip++)();
    }
    // Yield here.
}

Here, each VM runs a fixed number of instructions before yielding control. But there’s just one problem. What does that implementation of yield look like? On Windows, Fibers are a distinct possibility as they are manually scheduled, but alas, they’re Windows-only.

Fortunately there’s a cross-platform solution that takes its inspiration from video games. Almost all games have an game loop of some variety, and during each iteration of that game loop many game entities will perform actions. These game entities don’t run as separate threads – they’re just told to update themselves by the game loop which is effectively a scheduler.

Taking that approach, the main loop of a simple scheduler becomes:

for (;;)
{
    for (auto vm : vms)
    {
        // TODO: Check to see if there's any data for blocked VMs.
        if (!vm->is_blocked)
        {
            vm->step(n_steps);
            if (vm->is_blocked)
            {
                // TODO: This VM has become blocked. Tell the scheduler to
                // wake us when it has data.
            }
        }
    }
}

Here the scheduler calls each VM in turn and tells it to run a fixed number of instructions. And it fits the bill. It’s definitely fair, as each VM gets to run the same number of instructions, and it’s quick too, because ultimately vm->step() is little more than this:

void VM::step(unsigned n)
{
    for (auto i = 0; i < n; i++)
    {
        (*ip++)();
    }
}

But therein lies another problem. Just precisely what is being called by that invocation of (*ip++)()? Forth is a low level language so it would be possible for it to write to memory, then for that memory to be treated as an address to be called. In other words, it could call anything inside the host program’s address space.

Second approach: Decoding opcodes

Clearly the first approach is a non-starter. If a VM can access memory outside of its own address space then this will cause problems for anyone who made a mistake when programming their drone, because they might inadvertently call the wrong address, most likely crashing the host program. Additionally, there would be nothing to stop a drone programmer from deliberate acts of vandalism, crafting their drone’s program in such a way that it affects other programs. And finally, there is actually nothing to stop the program from having access to the entire address space of the host program.

In short, it might be fast, but it is definitely not secure.

A solution to these security flaws is to replace the address with an opcode, then to decode that opcode to determine which function to call. This immediately has the advantage that it doesn’t allow for calls to arbitrary locations, making it far more secure. There is no mechanism for a given VM to affect anything outside of its own address space as it is now sandboxed.

void VM::step(unsigned n)
{
    for (auto i = 0; i < n; i++)
    {
        switch (decode_opcode(*ip++))
        {
        case LOAD:
            // ...
            break;
        case STORE:
            // ...
            break;
        default:
            // Illegal opcode.
        }
    }
}

However, the downside is that although it is more secure, it is also a lot slower. In the previous implementation, all the CPU had to do was fetch the address of a function from memory then call it – a simple indirect call. Now, it has to fetch the opcode and decode it before calling the function that actually performs the work. That need to decode the opcode adds to the overhead of each VM instruction. In simple tests, the VM implemented with this technique ran at about a third of the speed of the one implemented with the original approach.

Third approach: Opcodes are indexes

There is a better option that can approach the speed of the first approach without its security flaws, and that is to use opcodes as indexes. With this approach, the indirect calls of the first approach are replaced with an index into a table of function addresses. It’s a little slower than the first approach, but it’s definitely workable because the overhead of indexing a table before calling a function is not huge, and there is no way for the VM program to access memory that it shouldn’t.

In its most simple form, minus any handling of illegal instructions, it looks something like this:

void VM::step(unsigned n)
{
    for (auto i = 0; i < n; i++)
    {
        auto instruction = table[*ip++];
        instruction();
    }
}

Final approach: Compile to “shadow” memory

But this can be taken further, particularly now that memory is not the scarce resource that it once was. The idea behind the final approach is to allow the VM full access to its own memory, but when it writes to VM memory, to treat what is written as an opcode and translate it into a pointer to the function that implements the decoded opcode (or an illegal instruction if no such instruction was found) and write that pointer into a separate “shadow” memory that contains nothing but pointers to functions. Running a VM program becomes a case of stepping through shadow memory and invoking the functions via indirect calls.

In effect, this compiles the VM. All of the opcodes are implemented to work against the VM memory only, and can’t go outside of it, but the job of dispatching them goes back to the first approach. It’s secure, because what is written to the “shadow” memory is strictly controlled and is not accessible by the VM program, and it’s fast, because it dispatches by indirect calls. The obvious downside is that it takes more RAM, but there are techniques that could mitigate this, such as by “compiling” only executable pages.

A CHIP-8 Emulator

But what does this have to do with this post’s title, “From Duskers to CHIP-8”?

Well, to test the idea, I thought I’d implement a CHIP-8 emulator. I first heard of CHIP-8 when I read Mario Zechner’s posts on implementing CHIP-8 using Kotlin, and when I saw that the CHIP-8 CPU had a very small instruction set, I thought that it would make a good proof of concept test for this approach. It certainly didn’t hurt that there are lots of programs available for CHIP-8, so it would quickly become clear if it worked or not.

chip-8-cave

If you’re interested in knowing how it turned out, you can get my CHIP-8 emulator from GitHub. It’s written in C++ and uses a minimal bit of SDL to display the screen. It’s currently Windows only, but it should be fairly straightforward to port to Linux.

Interpreting for the Genie

When I was young, my dad bought a TRS-80 compatible computer called a Video Genie. It boasted 16k of RAM, 128×48 monochrome graphics and a 64×16 text display, all powered by a Z80 CPU speeding along at a little under 2MHz. In a way, the name of that computer is what inspired this piece, but ultimately it’s not what this piece is about.

The parable of the genie

Consider this. You’ve just been to the Middle East and bought yourself an old oil lamp that looks strangely valuable. You bring it back home and, after paying import duty, decide to give it a bit of a clean. Then, to your surprise, as you rub the lamp, a genie pops out and offers to grant you a wish – whatever you tell them and they’ll do it for you.

“What, anything?” you ask.

“Yes, anything,” the genie replies. “You’re the boss. Make a wish and I’ll grant it for you. I can do anything you tell me.”

You think for a bit, and after some consideration, you decide that you’d like to be immortal.

“Ok, genie. I’ve made up my mind. I wish to live forever.”

The genie looks at you. “That’s easy enough,” it replies, “but have you thought this through properly? Is that what you really want?”

You think it’s a little odd that the genie, who has previously assured you that they can do anything, seems to be stalling. Perhaps the genie can’t fulfil your wish after all and is trying to get you to change your mind. Perhaps they were boasting about their abilities. So you decide to call their bluff.

“Yes, genie. I want to live forever!”

“But….”

“No buts! That’s what I want. Make it happen!”

After some frantic muttering and waving of hands over what looks suspiciously like a laptop, the genie disappears in a puff of smoke. When the smoke clears, you see a small pill jar on the ground with a label that contains a single word in bold letters. “Immortality.”

With your heart beating strongly, you bend down and pick up the pill jar. Through the slightly tinted glass you see a large pill and a note with a lot of small print that you assume contains the usual warnings about side effects that no one ever bothers to read. You twist off the cap, pour yourself a glass of water and swallow the pill, throwing the jar and the note away.

As you swallow the pill, you feel a tingling inside and you’re certain that it has worked. You’re immortal!

At first, you feel great. You’re immortal. What could possibly go wrong? You live life to the full, enjoying yourself, knowing that it’s going to last forever. Then, about ten years later, you’re standing in front of the mirror and realize that you’re looking decidedly older and that you’re getting old just as quickly as all of your friends.

You conclude that the genie was a fraud, or a practical joke by one of your friends, just as you’d suspected all along.

Many, many years later, you’re celebrating your 100th birthday. You’ve had a good innings and you’re pleased to have reached such a milestone. “Well, I can’t complain,” you think to yourself. “That genie may well have been a fraud, but I’ve lived a long time.”

Then along comes 110. You’re feeling very frail and can’t really walk any more and your eyesight isn’t what it used to be.

Before you know it, you’re 120. Everything hurts. You have arthritis, you need a lot of help to get out of bed and you can barely see.

The years keep piling on, and before you know it, people are hailing the miracle of the oldest person in the world. You’re 150 years old. You feel rather alone because you’ve not only outlived your friends, but you’ve outlived most of their children too.

At some point, your nursing home is due to be demolished and you’re going to be moved to a new one. As the staff are going through your possessions to pack them for the move, one of them finds the old lamp and asks you about it. You ask if you can hold it.

When the lamp is in your hands, you rub it. Sure enough, the genie pops out and offers to grant you a wish. But then the genie blinks and rubs his eyes, seeming to recognise you.

“Oh…” he says. “Immortality, right? I nailed that one, didn’t I?” You’re not sure because of your failing eyesight and poor hearing, but you think that the genie looks and sounds rather pleased with itself.

“What!?!?” you splutter. “Immortality? You didn’t give me what I wanted! I wanted to live forever, to be young and vigorous – not like this!”

“I did try to warn you,” the genie replies. “I can do anything you tell me. You didn’t mention anything about staying young – you just specified immortality. That struck me as odd, and as you didn’t want to discuss it, I left you a note explaining exactly how the pill worked. Didn’t you see it? It was in the jar? You really should have read it.”

Conclusion

And that is programming in a nutshell. You, as a programmer, are in control of this ridiculously powerful genie that can perform millions or even billions of your commands every second. But it will only do exactly what you specify, so you need to be crystal clear about what you want. As a good programmer you know this, so you strive to be precise. But your customers and clients aren’t usually programmers. They’re from the real world, where words are malleable and have multiple meanings. Their “logic” and thought processes are meaningless to the genie.

Sometimes they’re meaningless to you too. After all, what do you know about the retail industry, for example? But if you don’t know about the industry in which you’re a programmer then you need to learn, because it’s your job to interpret for the genie.

Picks for June, 2016

I must apologise for the lack of blog posts this month. I recently switched jobs after 21 years at the same company, and I’ve been busy with my new role. However, here are some things that caught my eye in June.

Elements of Modern C++ Style

I’ve been making an effort to catch up on what I’ve missed in C++ as it has been a while since I’ve used it. On the whole I’ve been pleasantly surprised by the changes in the language. This post by Herb Sutter dates from about five years ago, but it covers many of the features that were introduced into the language in C++11.

Welcome back to C++

Similarly, MSDN has an article Welcome Back to C++ (Modern C++) that goes into more detail, comparing and contrasting code snippets from C++98 with modern C++.

Stop Saying Learning to Code is Easy

Scott Hanselman gives his opinion on the idea that learning to program is somehow easy. He concludes that it isn’t easy, but “It’s rewarding. It’s empowering. It’s worthwhile.” Yes, a thousand times over. It’s all of those things.

As to whether it is easy or not, the answer is probably as much to do with what “it” actually means. For some values of “it”, learning to program is easy, at least if you think that learning to write is easy. It takes time to learn to write, but it is something that most 5 year olds can pick up fairly quickly. However, hardly any 5 year olds become authors when they grow up.

Sometimes “it” is less to do with technical skill and more to do with domain knowledge and the ability to understand what the customer wants, even when they often don’t really know themselves. Having spent more than a few years doing it, I can tell you that a lot of enterprise software is like that. But ultimately it’s about creating a product that solves a problem for the business that can’t be solved off the shelf.

For a large number of applications, programming is the thing that helps you arrive at an outcome rather than being an outcome in itself. There are physicists, biologists and statisticians who write code every day, not to produce code as the end result, but to help them run huge experiments. Again, the parallel is with writing. These aren’t people who program as their profession. They’re people for whom programming is a means to an end. They aren’t programmers, but they can program, much as you’re probably not an author but you almost certainly send emails and write the odd report.

There are finance teams throughout the world who live and breathe Excel, often relying on fragile macros and massively interlinked spreadsheets, because to them that is “the system”. Again, for them, programming is a means to an end so they use Excel to work on problems that their IT department can’t or won’t fix for them.

My own view is that learning to program is easy, but only if you think that learning to drive is easy, that learning to read and write is easy, that learning calculus is easy, or that learning a foreign language is easy. These are all skills that require practice and dedication before you can do them competently, but in each case being able to use the skill has a big pay off. Programming falls into this category too – it does take practice and dedication to reach a certain level of competence, but for those who are willing to put in the work it has a big pay off.

Monster 6502

What can I say about this project, except “wow!” It doesn’t have much practical value, but I applaud the kind of thinking that says, “I know, let’s make a working 6502 processor from discrete components.”

Duskers

I’ve been playing rather a lot of Duskers. It’s a roguelike, set in space, in which you are seemingly the only survivor of some event that seems to have wiped out everything except you and 3 drones. It has a lo-fi feeling to it that is reminiscent of films such as Alien and Silent Running. Let’s just say that I was given a generous Steam voucher as a leaving gift from my last job and I haven’t spent a penny of it yet because I keep coming back to this game.

I’m no reviewer, but if you want to know more then Polygon’s review puts it into words far better than I ever could.

Picks for May, 2016

Here are some things that caught my eye this month.

It’s NAND gates all the way down

Coursera is running Build a Modern Computer from First Principles: From Nand To Tetris starting June 6th. This is a project-centric course in which you build a (simulated) computer from the ground up starting with nothing more than NAND gates. My wife and I took this course last year and enjoyed it very much. When I was young I used to play around with 7400 series TTL chips, building little devices out of logic gates, and dreaming of the day when I’d design and build a CPU. Roll the clock forward a few decades, and now it’s possible to do exactly that for free. So, if you ever wanted to understand the basic principles underlying a computer, and hopefully develop a bit of mechanical sympathy for the hardware, then I can thoroughly recommend this course. If you want to understand those principles but don’t have time to take a course then read Code: The Hidden Language of Computer Hardware and Software by Charles Petzold.

It’s lambdas all the way down

In the late 1980s, I took a course on implementing functional programming languages. It was based around a book, The Implementation of Functional Programming Languages, by Simon Peyton Jones. I didn’t particularly care for the course, mostly because I had a very limited understanding of functional programming, so a large part of the book and the course was lost on me. However, this month I spotted that Microsoft Research had made the book freely available, so I decided to take another look at my old nemesis.

Much to my surprise, it wasn’t all impenetrable. Yes, it talked a lot about lambda calculus, and related concepts, but now that functional programming is far more common than it once was, there is a lot to be found that many people will recognise. For example, the book contains a chapter on list comprehensions, which have found their way into languages such as Python.

However, you won’t find Monads in the book as they didn’t come into being until 1991, which was four years after the book was published.

p5.js

To quote from its website, “p5.js is a JavaScript library that starts with the original goal of Processing, to make coding accessible for artists, designers, educators, and beginners, and reinterprets this for today’s web.” I must admit that I don’t know much about it yet as I only stumbled across it yesterday, but I was particularly taken by the fact that the landing page for one of its libraries, p5.play, has a playable Asteroids clone running in the background.

It looks like it would be a great starting point for a beginner as it has plenty of good examples and guides.I very much admire anything that introduces beginners to programming, because I don’t think it is particularly easy to get started. When I learned to program, getting started was relatively straightforward. You’d switch on your computer, wait a few seconds for it to boot, then you’d see a “Ready>” prompt. Then you’d immediately start typing in a small program, making sure that it fit in a whole 16k of RAM, and, if you were persistent then you’d get the results that you wanted. But the reason that it was so easy to get started was that nearly all computers of the time came with BASIC built in. This wasn’t to everyone’s taste, of course. The great computer science pioneer Edsger Dijkstra considered that “the teaching of BASIC should be rated as a criminal offence: it mutilates the mind beyond recovery.”

Podcasts

Software Engineering Radio: The LMAX architecture

Last month the Software Engineering Radio podcast interviewed Mike Barker about the LMAX architecture. I enjoyed listening to this, as they talked at length about the problems that the architecture was trying to solve and the various approaches that they’d tried before settling on what became the LMAX Disruptor library.

Talk Python to Me: Python performance from the inside-out at Intel

Michael Kennedy’s Talk Python to Me podcast continues to go from strength to strength. I’ve been listening to this podcast on and off for over a year. Earlier this month the topic was Python performance from the inside-out at Intel, in which David Stewart from Intel talked about how they (Intel) get languages – in this case Python – to run more quickly on Intel CPUs.

In Other News…

After more than 20 years working for the same company, I’ve handed in my notice as I was made an offer that I couldn’t refuse. It all happened so quickly in the end – I interviewed on Thursday morning and had an offer by Thursday afternoon. I start work with my new employer at the end of June. I’m delighted to have this new, exciting opportunity and I’m really looking forward to taking the next steps in my career. Naturally my feelings are tinged with sadness at the thought of saying goodbye to my colleagues, some of whom I’ve known for longer than I’ve known my wife.

 

The HTTP Server API

Over the course of the next few posts we’re going to take a look at a specific API in Windows, namely the HTTP Server API and we’ll explore how it can be used with a view to making it available to Python.

In this post we’ll introduce it by looking at the .NET HttpListener class, and we’ll write a simple C# program to  demonstrate it. Then we’ll look at the underlying HTTP Server API, and write a simple C++ program with almost identical functionality, but implemented in terms of the HTTP Server API.

The .NET HttpListener class

First, let’s introduce the API at a point where you may have used it without knowing. If you’re a .NET programmer then you may be aware of System.Net.HttpListener. This class, which is used by OWIN self-host, allows multiple HTTP servers to bind to a single port.

Here is an example C# program that uses HttpListener. It reads a URL from the command line and uses HttpListener to listen for incoming HTTP requests whose prefix is that URL. It loops, reading requests and responding to them with a simple message, until it encounters a DELETE verb, at which point it terminates.

using System;
using System.Net;
using System.Text;

namespace ListenerExample
{
    class Program
    {
        static void Main(string[] args)
        {
            // Create and start a listener.
            var listener = new HttpListener();
            var url = (args.Length > 0) ? args[0] : "http://localhost:9000/api/cs/";
            listener.Prefixes.Add(url);
            listener.Start();

            // Announce that it's running.
            Console.WriteLine("Listening. Please submit requests to: {0}", url);

            while (true)
            {
                // Wait for a request.
                var context = listener.GetContext();
                var request = context.Request;

                // Display some information about the request.
                Console.WriteLine("Full URL: {0}", request.Url.OriginalString);
                Console.WriteLine("    Path: {0}", request.Url.PathAndQuery);

                // Break from the loop if it's the poison pill (a DELETE request).
                if (request.HttpMethod == "DELETE")
                {
                    Console.WriteLine("Asked to stop.");
                    break;
                }

                // Send a response.
                var response = context.Response;
                string responseString = "Hello from C#";
                byte[] buffer = Encoding.UTF8.GetBytes(responseString);
                response.ContentLength64 = buffer.Length;
                response.ContentType = "text/html";
                var output = response.OutputStream;
                output.Write(buffer, 0, buffer.Length);
                output.Close();
                response.Close();
            }

            // Stop listening.
            listener.Stop();
        }
    }
}

Here it is listening for requests on http://localhost:9000/api/endpoint1/.

C:> ListenerExample http://localhost:9000/api/endpoint1/
Listening. Please submit requests to: http://localhost:9000/api/endpoint1/

If we send it a request with cURL then it responds, but only if the request’s URL starts with the URL that was supplied on the command line.

C:> curl http://localhost:9000/api/endpoint1/
Hello from C#

If we use a different prefix then we get a 404 error.

C:> curl http://localhost:9000/api/endpoint2/
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN""http://www.w3.org/TR/html4/strict.dtd">
<HTML><HEAD><TITLE>Not Found</TITLE>
<META HTTP-EQUIV="Content-Type" Content="text/html; charset=us-ascii"></HEAD>
<BODY><h2>Not Found</h2>
<hr><p>HTTP Error 404. The requested resource is not found.</p>
</BODY></HTML>

If we start another instance of the program listening on the same port but use a different URL prefix then it will work. For example, here it is listening to http://localhost:9000/api/endpoint2/.

C:> ListenerExample http://localhost:9000/api/endpoint2/
Listening. Please submit requests to: http://localhost:9000/api/endpoint2/

Now, because one instance of the program is listening on http://localhost:9000/api/endpoint1/ and the other is listening on http://localhost:9000/api/endpoint2/, we get a response from both endpoints.

C:> http://localhost:9000/api/endpoint1/
Hello from C#
C:> curl http://localhost:9000/api/endpoint2/
Hello from C#

The URL prefixes have to be unique, otherwise it will complain. For example, attempting to start the program listening on http://localhost:9000/api/endpoint1/ when another instance is already listening on the same endpoint will cause it to fail.

C:> ListenerExample http://localhost:9000/api/endpoint1/

Unhandled Exception: System.Net.HttpListenerException: Failed to listen on prefix 'http://localhost:9000/api/endpoint1/' because it conflicts with an existing registration on the machine.
 at System.Net.HttpListener.AddAllPrefixes()
 at System.Net.HttpListener.Start()
 at ListenerExample.Program.Main(String[] args) in C:\Users\Rod\OneDrive\Projects\HttpServer\ListenerExample\Program.cs:line 15

Similarly, if something is already listening on the port that doesn’t use the underlying API used by HttpListener, then it will fail. For example, let’s use Python to start an HTTP server listening on localhost:9000.

C:> python -m http.server --bind localhost 9000
Serving HTTP on 127.0.0.1 port 9000

Now that the port is in use, the program fails to start as it can’t bind to the port.

C:> ListenerExample http://localhost:9000/api/endpoint1/

Unhandled Exception: System.Net.HttpListenerException: The process cannot access the file because it is being used by another process
 at System.Net.HttpListener.AddAllPrefixes()
 at System.Net.HttpListener.Start()
 at ListenerExample.Program.Main(String[] args) in C:\Users\Rod\OneDrive\Projects\HttpServer\ListenerExample\Program.cs:line 15

This is a different error. Here, the Python HTTP server has bound to the port, so our program which uses HttpListener can’t bind to it.

If all goes well then the program will run forever, until it receives a DELETE request, at which point it releases its resources and stops. You can send a DELETE request with cURL as follows:

C:> curl -X DELETE http://localhost:9000/api/endpoint1/
curl: (56) Recv failure: Connection was reset

What’s going on?

We’ve established that programs that use HttpListener can co-exist on the same port as long as they all use HttpListener. Normally, this isn’t possible with sockets, so what is going on here?

Let’s add the -D flag to cURL to see the response headers.

C:> curl -D - http://localhost:9000/api/endpoint1/
HTTP/1.1 200 OK
Content-Length: 13
Content-Type: text/html
Server: Microsoft-HTTPAPI/2.0
Date: Sun, 22 May 2016 16:25:40 GMT

Hello from C#

The clue is in the “Server:” header which contains “Microsoft-HTTPAPI/2.0”.

Searching the .NET reference source for HttpListener reveals calls to functions such as HttpCreateRequestQueue() and HttpReceiveHttpRequest(). These functions are declared in UnsafeNativeMethods.cs, so clearly they’re not implemented in C#.

Further investigation reveals that they’re part of the Windows HTTP Server API.

The HTTP Server API

Several years ago, Microsoft added the HTTP Server API into Windows. It exists in XP and Server 2003, so it has been around for a long time. There are two versions of the API, and as HttpListener uses the second version, let’s do the same. The API itself runs in kernel mode, as shown in this diagram.

We already looked at a C# program that uses the HTTP Server API indirectly (via HttpListener), so let’s see what the C++ equivalent would look like.

#ifndef UNICODE
#define UNICODE
#endif

#ifndef _WIN32_WINNT
#define _WIN32_WINNT 0x0600
#endif

#ifndef WIN32_LEAN_AND_MEAN
#define WIN32_LEAN_AND_MEAN
#endif

#include <windows.h>
#include <http.h>
#include <stdio.h>
#include <stdlib.h>

#pragma comment(lib, "httpapi.lib")

int wmain(int argc, wchar_t **argv)
{
	// Initialize the API.
	ULONG result = 0;
	HTTPAPI_VERSION version = HTTPAPI_VERSION_2;
	result = HttpInitialize(version, HTTP_INITIALIZE_SERVER, 0);

	// Create server session.
	HTTP_SERVER_SESSION_ID serverSessionId;
	result = HttpCreateServerSession(version, &serverSessionId, 0);

	// Create URL group.
	HTTP_URL_GROUP_ID groupId;
	result = HttpCreateUrlGroup(serverSessionId, &groupId, 0);

	// Create request queue.
	HANDLE requestQueueHandle;
	result = HttpCreateRequestQueue(version, NULL, NULL, 0, &requestQueueHandle);

	// Attach request queue to URL group.
	HTTP_BINDING_INFO info;
	info.Flags.Present = 1;
	info.RequestQueueHandle = requestQueueHandle;
	result = HttpSetUrlGroupProperty(groupId, HttpServerBindingProperty, &info, sizeof(info));

	// Add URLs to URL group.
	PCWSTR url = (argc == 2) ? argv[1] : L"http://localhost:9000/api/cpp/";
	result = HttpAddUrlToUrlGroup(groupId, url, 0, 0);

	// Announce that it is running.
	wprintf(L"Listening. Please submit requests to: %s\n", url);

	for (;;)
	{
		// Wait for a request.
		HTTP_REQUEST_ID requestId = 0;
		HTTP_SET_NULL_ID(&requestId);
		int bufferSize = 4096;
		int requestSize = sizeof(HTTP_REQUEST) + bufferSize;
		BYTE *buffer = new BYTE[requestSize];
		PHTTP_REQUEST pRequest = (PHTTP_REQUEST)buffer;
		RtlZeroMemory(buffer, requestSize);
		ULONG bytesReturned;
		result = HttpReceiveHttpRequest(
			requestQueueHandle,
			requestId,
			HTTP_RECEIVE_REQUEST_FLAG_COPY_BODY,
			pRequest,
			requestSize,
			&bytesReturned,
			NULL
		);

		// Display some information about the request.
		wprintf(L"Full URL: %ws\n", pRequest->CookedUrl.pFullUrl);
		wprintf(L"    Path: %ws\n", pRequest->CookedUrl.pAbsPath);

		// Break from the loop if it's the poison pill (a DELETE request).
		if (pRequest->Verb == HttpVerbDELETE)
		{
			wprintf(L"Asked to stop.\n");
			break;
		}

		// Respond to the request.
		HTTP_RESPONSE response;
		RtlZeroMemory(&response, sizeof(response));
		response.StatusCode = 200;
		response.pReason = "OK";
		response.ReasonLength = (USHORT)strlen(response.pReason);

		// Add a header to the response.
		response.Headers.KnownHeaders[HttpHeaderContentType].pRawValue = "text/html";
		response.Headers.KnownHeaders[HttpHeaderContentType].RawValueLength = (USHORT)strlen(response.Headers.KnownHeaders[HttpHeaderContentType].pRawValue);

		// Add an entity chunk to the response.
		PSTR pEntityString = "Hello from C++";
		HTTP_DATA_CHUNK dataChunk;
		dataChunk.DataChunkType = HttpDataChunkFromMemory;
		dataChunk.FromMemory.pBuffer = pEntityString;
		dataChunk.FromMemory.BufferLength = (ULONG)strlen(pEntityString);
		response.EntityChunkCount = 1;
		response.pEntityChunks = &dataChunk;

		result = HttpSendHttpResponse(
			requestQueueHandle,
			pRequest->RequestId,
			0,
			&response,
			NULL,
			NULL,	// &bytesSent (optional)
			NULL,
			0,
			NULL,
			NULL
		);

		delete buffer;
	}

	// Remove URLs from URL group.
	result = HttpRemoveUrlFromUrlGroup(groupId, url, 0);

	// Detach the request queue from the URL group.
	info.Flags.Present = 0;
	info.RequestQueueHandle = NULL;
	result = HttpSetUrlGroupProperty(groupId, HttpServerBindingProperty, &info, sizeof(info));

	// Shut down the request queue.
	result = HttpShutdownRequestQueue(requestQueueHandle);

	// Close down the API.
	result = HttpTerminate(HTTP_INITIALIZE_SERVER, NULL);

	return 0;
}

If we run this, it behaves in much the same way as its C# equivalent. And when we invoke it with curl -D, we see that the “Server:” header is set to “Microsoft-HTTPAPI/2.0” as before.

C:> curl -sD - http://localhost:9000/api/endpoint1/
HTTP/1.1 200 OK
Content-Type: text/html
Server: Microsoft-HTTPAPI/2.0
Date: Sun, 22 May 2016 18:10:59 GMT
Content-Length: 14

Hello from C++

Similarly, if we run another instance, or run it alongside its C# equivalent, then they can all listen on the same port as long as their URL prefixes are unique.

Summary

This has been a quick introduction to the .NET HttpListener and the underlying Windows HTTP Server API, presented in the form of two simple, synchronous servers that appear to have identical behaviour. In the next post in this series we’ll look at how we might go about using the HTTP Server API from Python.

API discovery with OWIN self-host

If you write APIs with ASP.NET Web API then you probably keep the API code in a separate project in Visual Studio. If this is the case, then your startup project is likely to be in a different ASP.NET Web API project that, when run, is hosted by IIS or IISExpress. This works well enough, but the startup time is slow which can make iterating on an idea rather frustrating.

Use OWIN self-host for a faster startup

OWIN self-host has a much faster startup, which makes it ideal for this scenario. The startup for an OWIN self-host project looks remarkably similar to its IIS-hosted equivalent, except that you start the server yourself.

Here’s some typical startup code. Nothing unusual here – it configures the routing and makes sure that Web API is in the pipeline.

    class Startup
    {
        public void Configuration(IAppBuilder app)
        {
            HttpConfiguration config = new HttpConfiguration();
            config.Routes.MapHttpRoute(
                name: &quot;DefaultApi&quot;,
                routeTemplate: &quot;api/{controller}/{id}&quot;,
                defaults: new { id = RouteParameter.Optional }
                );
            app.UseWebApi(config);
        }
    }

And here’s the program that starts the server. Again, nothing unusual. It starts a self-hosted server and runs it until you press ENTER.

    class Program
    {
        static void Main(string[] args)
        {
            string baseAddress = &quot;http://localhost:9000/&quot;;

            using (WebApp.Start&lt;Startup&gt;(url: baseAddress))
            {
                Console.WriteLine(&quot;Server listening on {0}&quot;, baseAddress);
                Console.WriteLine(&quot;Press [ENTER] to end&quot;);
                Console.ReadLine();
            }
        }
    }

Problem: The API controllers can’t be found

However, if your API is in a separate project then you might see this message when you invoke it (here I’m invoking it with cURL):

C:\Users\Rod>curl --silent http://localhost:9000/api/fortunes
{"Message":"No HTTP resource was found that matches the request URI 'http://loca
lhost:9000/api/fortunes'.","MessageDetail":"No type was found that matches the c
ontroller named 'fortunes'."}

Your first inclination might be to think that you’ve forgotten to add a reference to your API project, but even with a reference in place you will still see this message.

The reason for the message is that your OWIN startup project doesn’t automatically discover and load the API code’s assembly, whereas the IIS-hosted equivalent does. So, when Web API tries to find your controller, as described in Routing and Action Selection in Web API, it will never find it because the controller’s assembly was never loaded.

Solution: Load the assembly

The solution is simple. Load the assembly. There are at least a couple of ways of doing this. One is to use an IAssembliesResolver, and the other is to ensure the assembly is loaded by having the startup project use it directly.

Using an IAssembliesResolver

You could solve the problem by overriding the default assemblies resolver, as detailed in Customizing controller discovery in ASP.NET Web API. This works very well. Here’s an example:

    class Startup
    {
        public void Configuration(IAppBuilder app)
        {
            HttpConfiguration config = new HttpConfiguration();
            config.Routes.MapHttpRoute(
                name: &quot;DefaultApi&quot;,
                routeTemplate: &quot;api/{controller}/{id}&quot;,
                defaults: new { id = RouteParameter.Optional }
                );
            config.Services.Replace(typeof(IAssembliesResolver), new AssembliesResolver());
            app.UseWebApi(config);
        }
    }

    class AssembliesResolver : DefaultAssembliesResolver
    {
        public override ICollection&lt;Assembly&gt; GetAssemblies()
        {
            ICollection&lt;Assembly&gt; assemblies = base.GetAssemblies();
            var apiAssembly = Assembly.LoadFrom(@&quot;API.dll&quot;);
            assemblies.Add(apiAssembly);
            return assemblies;
        }
    }

Using a dependency

There’s an even simpler option, which is to have the startup project make use of something in the API project. For example, here’s an ApiInfo class:

    public static class ApiInfo
    {
        public static readonly string Help = &quot;Use GET /api/fortunes to access this API&quot;;
    }

Incidentally, if you’re wondering why the string is static readonly rather than const then here’s why.

And here’s a modified version of the program that starts the server. Now it displays some help text from the ApiInfo class:

    class Program
    {
        static void Main(string[] args)
        {
            string baseAddress = &quot;http://localhost:9000/&quot;;

            using (WebApp.Start&lt;Startup&gt;(url: baseAddress))
            {
                Console.WriteLine(&quot;Server listening on {0}&quot;, baseAddress);
                Console.WriteLine(ApiInfo.Help);
                Console.WriteLine(&quot;Press [ENTER] to end&quot;);
                Console.ReadLine();
            }
        }
    }

Now, when the self-hosted project starts, it displays information about the API, using information from the API project. And because it does this, the API assembly is guaranteed to be loaded therefore the controllers will be discovered.

So here’s the output window. The server is running, and it has displayed the simple help text from the ApiInfo class.

Server listening on http://localhost:9000/
Use GET /api/fortunes to access this API
Press [ENTER] to end

So, does it work?

C:\Users\Rod>curl --silent http://localhost:9000/api/fortunes
"A language that doesn't affect the way you think about programming is not worth
 knowing."

True enough!

Code

The code for this post can be found on GitHub.

Python in Visual Studio – Part 2

In this post we’re going to look at some features of Python Tools for Visual Studio (PTVS) that weren’t covered in Python in Visual Studio.

Generating a project’s dependencies

If you are working on a project then it is likely at some point that it will have dependencies on other packages downloaded from the Python Package Index (PyPI). If you want other people to use your project then they will also need to install those dependencies.

Fortunately the pip package installer (and therefore PTVS) can create a list of those dependencies and save it to a file, which by convention is named requirements.txt.

To create a requirements file in PTVS…

  • Right click on the Python environment in Solution Explorer.
  • Select Generate requirements.txt.

A new file requirements.txt will be added to your project. If you open it, you’ll see that it contains a list of package names and versions, as shown in this example.

astroid==1.4.5
colorama==0.3.7
httpie==0.9.3
lazy-object-proxy==1.2.2
Pygments==2.1.3
pylint==1.5.5
requests==2.10.0
six==1.10.0
tornado==4.3
wrapt==1.10.8

Here, the packages are pinned to specific versions with ‘==’, but this doesn’t have to be the case. In this example, tornado is pinned to version 4.3, but what if a new version came out, say, 4.3.1, or if we knew that any version from 4.3 to 4.5 would work just fine.

To cover that scenario we can write this:

tornado>=4.3,<=4.5

For more information on the syntax of requirements files take a look at this documentation in the pip user guide.

Installing a project’s dependencies

Similarly, a project’s dependencies can be installed from requirements.txt. You would typically do this if you’d checked a project out of version control, or downloaded it from somewhere like GitHub, because you almost certainly wouldn’t expect to find all of a project’s dependencies checked into version control.

To install a project’s dependencies…

  • Right click on the Python environment in Solution Explorer.
  • Select Install from requirements.txt.

Python Tools for Visual Studio will download and install all of the dependencies listed in requirements.txt, along with any of their dependencies.

Running and debugging individual programs

To run or debug an individual Python program in your project…

  • Find the file under the project in Solution Explorer.
  • Right click on it and select Start without Debugging or Start with Debugging.

Setting the startup file

When running a Python project in Visual Studio, it needs to know which file in the project should be executed.

To set a project’s startup file…

  • Find the file under the project in Solution Explorer.
  • Right click on it and select Set as startup file.

This is the file that will be run when you press F5 / Ctrl F5 to debug / run your project.

Executing in Python interactive

In Python in Visual Studio, we saw how to run the Python interactive shell. Sometimes it is useful to run a program or module, then, once it has finished, use the interactive shell to explore its state. PTVS allows us to do this by running the project in the interactive shell.

To run the project in the Python interactive shell..

  • Go to Debug > Execute Project in Python Interactive.

execute_interactive

In the example above, I ran a simple project in the Python interactive shell. Once it finished, its functions, classes and modules remained active, so I was able to list them with dir() then run the say_hello() function again.

The Class View

Visual Studio has a Class View that lets us navigate quickly around the classes in the project. The class view displays a tree containing the projects in the solution, the files in each project, and the classes and functions in each file. In the example below, the class NamedTupleReader is selected in the top pane, and all of its methods and their parameters are shown in the bottom pane. Double-clicking on one of these methods will bring up that method in the editor.

class_view

Setting conditions in breakpoints

In Python in Visual Studio, I touched on the fact that it was possible to set breakpoints for debugging Python programs, but I didn’t go into further detail, such as the fact that PTVS supports conditional breakpoints.

To set a conditional breakpoint…

  • Left click in the margin to set a breakpoint
  • Right click on the breakpoint and select Conditions…
  • Use the dropdown to set the condition type, eg, Conditional Expression and enter the condition that will cause the breakpoint to trigger.
  • Continue to add conditions as required, then click Close once done.

In the following example, a conditional breakpoint is set to trigger only when the first letter of name is “D”, ie, when name[0] == ‘D’.

breakpoint_condition

Setting actions in breakpoints

Similarly, Python Tools for Visual Studio supports adding actions to breakpoints that can log a message to the output window. Actions don’t have to stop execution (which makes really makes them tracepoints rather than breakpoints). I have to say that I find them extremely useful when debugging when I want to see the value of a variable without necessarily having to stop the program or insert print() functions.

To add an action to a breakpoint…

  • Left click in the margin to set a breakpoint
  • Right click on the breakpoint and select Actions…
  • Enter a message in the edit box, using braces, ie,  and } to interpolate variables.

One thing I’ve noticed that’s different about breakpoint actions when debugging Python is that they don’t seem to support $FUNCTION to display the function name, unlike C# which does. I don’t know if this is a bug – but it always seems to display <unknown> when I do it.

In the following example, an action breakpoint is set to log the value of name is {name} to the output window whenever execution crosses that line.

breakpoint_action

When you need the command line

We’ve seen that you can do much of your Python programming and debugging from the comfort of Visual Studio. But there are scenarios where you might want to drop down to the command line, for example, to use Python command line tools. For example, if you were testing a web API then you might want to install the httpie package into your virtual environment, as this lets you make http requests from the command line. It’s similar to cURL, but simpler to use.

If you were to just open a command prompt directly and type python then you would get the globally installed version of Python, and httpie would most likely not be on the path. How, then, do you get access to a project’s virtual environment from the command line?

To open a command prompt in a project’s virtual environment…

  • Right click on the Python environment in Solution Explorer.
  • Select Open Command Prompt Here…

This will open a command prompt in the base directory of the environment. The command window’s title will reflect the name of the environment.

If you were to type python or use any command line tools such as pip or httpie in this command prompt then you would be using the versions from the virtual environment, not from the globally installed Python. In the example below, the output from pip freeze shows the packages that are installed in the environment.

command_prompt

Finally

I wrote these posts to give Visual Studio users an overview of how to perform everyday Python tasks from within Visual Studio. I think PTVS turns Visual Studio into an excellent Python IDE, and I hope that its support for Python will encourage you to give Python a try.

 

 

Python in Visual Studio

So, you’re a long time user of Visual Studio who has heard all about Python from that annoyingly productive guy in the office, and you want to give it a try but you don’t know where to start. Well, help is at hand, because you can do all of your Python development, debugging and package management without ever leaving the comfort of Visual Studio, and that’s what I’ll guide you through in this post.

Installing Python

Installing Python is very straightforward as it has a Windows installer.

  • Download the latest version of Python and install it, making sure to check the option to add it to your PATH.

At the time of writing, the latest version is Python 3.5.1. If you know that you need a different version then by all means go ahead, but in general I’d advise you to go for the latest version of Python 3 unless you have a good reason not to.

Once the installer finishes, Python should be on your PATH.

  • Open a command prompt and type python to check that it is installed and on your PATH.

You should see a prompt similar to this:

C:\Users\Rod>python
Python 3.5.1 (v3.5.1:37a07cee5969, Dec 6 2015, 01:38:48) [MSC v.1900 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> ^Z
  • Press Ctrl-Z to exit.

Installing Python Tools for Visual Studio

Python Tools for Visual Studio (PTVS) turns Visual Studio into a highly capable Python IDE, with all of the features that you’d expect to find such as IntelliSense.

Creating a Python project in Visual Studio

To create a Python project in Visual Studio…

  • Open Visual Studio and go to File > New > Project…
  • Look for the Python tag under Templates and pick Python Application for the project type as we’re just creating a simple “hello, world” application.
  • Give the project a name, e.g., “hello_python”.
create_python_project
Creating a new Python project

Creating, running and debugging a Python program

Now you’re in a position to start writing some Python. Visual Studio has already added an empty Python file to the project, so open it and start editing.

If you haven’t used Python before then enter this…

def say_hello(name='world'):
    print('Hello, {}!'.format(name))

if __name__ == '__main__':
    name = input('What is your name? ')
    say_hello(name)

As you type, IntelliSense should kick in and start suggesting completions for you. If not, it is almost certainly because IntelliSense hasn’t finished building its completion db for this Python environment. You can check on its status by opening the Python Environments window (Tools > Python Tools > Python Environments) and selecting IntelliSense from the dropdown.

To run the program, press Ctrl F5 as you would with any other project. It will run and display its output in a console window.

PTVS has excellent debugger integration, so if you want to debug your program then set a breakpoint, just as you would with a C# application, then run the program with F5.

Installing packages – part 1

In much the same way that Visual Studio has nuget, Python has a package manager called pip. This is typically run from the command line, but Visual Studio wouldn’t be Visual Studio without allowing us to do everything from the IDE.

  • Go to Solution Explorer and open up Python Environments, which is immediately under your project.

This will display a list of packages that are installed in your Python environment that are additional to the built in packages that come with every Python distribution. On the PC that I’m using now, it shows the 4 packages that are installed in my global Python 3.5 environment, along with their versions.

global_python_solution_explorer
Packages in the global Python environment

If you wanted to install a new package at this point then you would right click on the environment name Python 3.5 (global default) then select Install Python Package… from the menu.

However, take a close look at the name of the environment. It contains the word global, alerting us to the fact that installing a package into this environment would in fact install the package into the global Python installation. In other words, we wouldn’t be installing a package just for our project or solution – we’d be installing it for all projects that use this version of Python.

Occasionally this might be exactly what you want or need, but it’s more likely that different projects will need different sets of packages. You could install everything into your global Python environment, but at some point you will encounter a dependency conflict.

And that’s where Python has another trick up its sleeve in the form of virtual environments.

Creating a virtual environment

In a sentence, a virtual environment is a copy of the Python executable and its packages in their own self-contained directory tree. This means that if we create a virtual environment then we can install packages into it without affecting the global Python installation.

In Solution Explorer, under your project’s name, you should see a section labelled Python Environments.

  • Right-click on Python Environments and select Add Virtual Environment…
  • Select the global Python interpreter that you want to base your new virtual environment on, select a location for it, then click Create.

By default, the new virtual environment will be installed into env in your project’s directory. This is sensible, as you will probably want to keep your environment with your project. There are also valid arguments for putting it elsewhere, but for our purposes keeping it with the project is fine.

create_venv2
Creating a virtual environment

The new virtual environment now appears under Python Environments in Solution Explorer. If you expand it then, as before, you’ll see a list of installed packages.

new_venv_in_solution_explorer2
Packages in the virtual environment

Now we have a virtual environment just for our project, so we can install packages into it without worrying if it is going to affect other environments.

For further information take a look at what the Python documentation has to say on the topic, or at the documentation on installing packages.

Installing packages – part 2

Now that we have a virtual environment, we can install packages for our project without polluting the global Python installation.

To install a package…

  • Right click on the Python environment in Solution Explorer.
  • Select Install Python Package… from the menu.
  • Type the name of the package, eg, requests, into the text field then click OK.
install_package
Installing the ‘requests’ package from PyPI

Once you click OK then pip will download the package and its dependencies from the Python Package Index (PyPI) and install it into the virtual environment.

Here I’ve chosen to install  requests which bills itself as “Python HTTP for Humans”. Go ahead and install it as we’ll use it in the next section.

Once the package has installed then it will show up in the list of installed packages.

Running Python interactively

Python has an interactive shell, where you can type Python code and execute it immediately. PTVS makes this shell available in Visual Studio, and naturally enough it enhances it with the Visual Studio magic of IntelliSense.

To open an interactive window…

  • Right click on the Python environment in Solution Explorer.
  • Select Open Interactive Window

You can run Python commands in the interactive shell.

  • Type the following into the interactive shell…
import requests
result = requests.get('http://www.bbc.co.uk')
result.status_code
result.headers

As you type, you’ll see IntelliSense kicking in. It also has pretty good help – watch what happens if you hover the mouse cursor above the word get for example.

Footnote

This post is partially in response to my colleague, Nicholas, who on Friday as I was leaving the office said to me, “Rod, I’ve finally decided to download and learn Python.” Like me, he spends a large part of his day job writing C# in Visual Studio. I felt it was important to get him off on the right foot with package management and virtualenvs, but I also knew that he would get on far better with Python if I could show him how to do everything from an environment that he knew well, rather than selling him on the merits of an alternative such as PyCharm.

A NamedTupleReader for CSV files

Python has a csv module that can handle a wide variety of csv formats. Let’s explore this, then see if we can augment it a little.

First, let’s take a simple csv file. This one, eans.csv, relates bar codes (EANs) to their product codes and descriptions.

eans.csv
ean,product_code,description
5123412341234,FOOD123,"Tin o'Beans"
5123412341235,FOOD235,"Basic Soup"
5123412341236,FOOD236,"Basic Soup"
5123412341236,CARS345,"Big Wide Tyres"
5123412341237,CARS346,"Starter Motor"

We can read this file using csv.reader().

import csv

with open('eans.csv') as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        print(row)

This gives us each row including the header row as a list. So, if we want to access an individual field then we need to use its list index, eg, row[0] would give us the EAN field, which isn’t particularly readable.

Using csv.DictReader() we can get each row as a dictionary whose fields are taken from the fieldnames in the first row. This lets us access individual fields with row[fieldname].

import csv

with open('eans.csv') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        print('         EAN {}'.format(row['ean']))
        print('Product Code {}'.format(row['product_code']))
        print(' Description {}'.format(row['description']))

But we can do better. It would be nicer to be able to type row.ean rather than row[‘ean’]. And that’s precisely what we can do with collections.namedtuple.

import csv
from collections import namedtuple

with open('eans.csv') as csvfile:
    reader = csv.reader(csvfile)
    # Read the fieldnames from the first row and use them to construct a namedtuple.
    fields = next(reader)
    EAN = namedtuple('EAN', fields)
    for row in reader:
        record = EAN(*row)
        print('         EAN {}'.format(record.ean))
        print('Product Code {}'.format(record.product_code))
        print(' Description {}'.format(record.description))

Using namedtuple also has some memory advantages as they don’t have per-instance dictionaries.

However, the example above has some flaws. It won’t work if a header field contains spaces, nor if it is a keyword, nor if it is an invalid identifier, as shown in the following csv file, eans2.csv.

ean,product code,class, description
5123412341234,123,FOOD,"Tin o'Beans"
5123412341235,234,FOOD,"Basic Soup"
5123412341236,236,FOOD,"Basic Soup"
5123412341236,345,CARS,"Big Wide Tyres"
5123412341237,346,CARS,"Starter Motor"

We can fix that.

First, let’s get rid of the spaces. Use str.strip() to remove leading and trailing spaces, then use str.replace() to replace spaces between words with underscores.

    # Remove leading and trailing spaces, then replace any remaining spaces with underscores.
    fields = (f.strip() for f in fields)
    fields = (f.replace(' ', '_') for f in fields)

Next, find fieldnames that aren’t valid identifiers using str.isidentifier() and prefix them with a valid identifier.

    # Prefix non-identifiers with 'field_'
    fields = ('field_' + f if not f.isidentifier() else f for f in fields)

Then we look for keywords using keyword.iskeyword() and append an underscore to them.

    # Append '_' to keywords.
    fields = ((f + '_' if keyword.iskeyword(f) else f) for f in fields)

Finally, we have one question remaining, and that’s how do we deal with a fieldname that has fallen through the cracks and is still not a valid identifier. One option, shown below, uses namedtuple‘s rename parameter to replace invalid identifiers with a positional identifier, eg, ‘_0’ for the first field.

    # And the rename=True gives us a fallback.
    self.record_type = namedtuple(self.name, fields, rename=True)

Now we can put it all together in NamedTupleReader. This acts much like csv.DictReader() but each row is a namedtuple rather than a dict.

import csv
import keyword
from collections import namedtuple


class NamedTupleReader:
    def __init__(self, f, name='Record', dialect='excel', *args, **kwargs):
        self.reader = csv.reader(f, dialect, *args, **kwargs)
        self.record_type = None
        self.name = name

    def __iter__(self):
        return self

    def __next__(self):
        # Define the record type based on the first row.
        if self.record_type is None:
            fields = next(self.reader)

            # Remove trailing spaces.
            fields = (f.strip() for f in fields)

            # Remove spaces.
            fields = (f.replace(' ', '_') for f in fields)

            # Prefix non-identifiers with 'field_'.
            fields = ('field_' + f if not f.isidentifier() else f for f in fields)

            # Append '_' to keywords.
            fields = ((f + '_' if keyword.iskeyword(f) else f) for f in fields)

            # And the rename=True gives us a fallback.
            self.record_type = namedtuple(self.name, fields, rename=True)

        # Skip blank rows.
        row = next(self.reader)
        while row == []:
            row = next(self.reader)

        return self.record_type(*row)

We could take this further by parameterising things like fieldname replacement for non-identifiers, or allowing fieldnames to be overridden, but this covers most common uses.

You can find NamedTupleReader in this gist.