Feb 09 2014

Live By The Code

Category: Codeface,UncategorizedMarcus @ 6:48 pm

Rather scarily I worked out last week that I’ve been using “curly brace” languages for 20 years. I started off with Borland Turbo C++ when I was still at school in 1994 and I’ve gone through Perl, PHP and most recently C# which I use for enterprise grade stuff at work. Before that I played with BBC Basic in the 80s on an Acorn Electron and later QBASIC and QuickBASIC for DOS. In 1997 I did a BEng degree in software engineering, got a job as a software developer, and I’m in the process of applying for promotion to senior or lead grade, and hoping to be sponsored to do an Open University MSc in Computer Science with Software Engineering.

With that kind of background, naturally I was interested to see what the Year of Code initiative was about. According to the website, “Code is the language we use
to instruct computers. We use code to build websites and apps, design clothes, publish books, make games and music, and to get the most from technology. Getting to know code is really important. It means you can be creative with computers, start your own business or boost your earning potential. It is really simple to learn and anyone can do it – not just rocket scientists.” Oh dear. Nothing like a collection of vacuous buzzwords to get things started. In theory you can use “code” to publish books (see LaTeX or Postscript for example), but  I daresay most people would use a word processor or DTP package to do it. My final degree project involved writing a MIDI sequencer, but I think a keyboard or a copy of Garage Band might be a bit easier for making music.

Let’s look at the sentence “It is really simple to learn and anyone can do it – not just rocket scientists,” in some more detail. As I say, I have a degree and over 20 years experience so I like to think I know a bit about programming (let’s not call it “coding”). I also know that there’s a lot that I don’t know.  Starting off, what exactly is programming? As far as I’m concerned, it’s the task of designing, implementing and testing a set of instructions that tell a computer what to do. Not a particularly complex definition but it excludes quite a lot of things that some might call “coding”. An HTML web page is just a text file so that doesn’t really count. Using a design package to draw something is just a high tech alternative to using a pencil and paper. Difficult? It can be. Programming? No.

There are lots of different ways to approach programming but most of them agree that simply sitting in front of a computer with the programming environment open is not a good way to start. A common misconception that Lottie Dexter seems to make is that the software development process is like this:

  1. Come up with a vague idea of what you want to do
  2. Magic happens
  3. A complete and finished program appears

However what really happens, especially for big projects, is more like this:

  1. Specification agreed with customer
  2. Detailed design work takes place including deciding what technologies to use
  3. Initial draft of code is written
  4. Review and do further development if needed
  5. Test
  6. Review test results. Recode and carry out further testing if required
  7. Hand over to customer who may want to carry out their own testing
  8. Release

On small projects (like the classic “hello world” program) some of these steps may be missed out but you still want to make sure the thing behaves correctly and is robust enough to handle what gets thrown at it. Why does it matter? Here’s a basic C program that asks the user to enter their name. Unfortunately it has a pretty major security flaw.

#include <stdio.h>

int main (void)
	char Name[30];

	printf ("Please enter your name:");
	gets (Name);
	printf ("Hello %s", Name);

	return 0;

What could go wrong here? There’s space to store a name of up to 30 characters (actually 29 because of the way C stores strings), but what happens if someone enters 31 characters? It overwrites part of the program’s memory. Depending on who does it, they could put something in there that makes the program do something it wasn’t originally designed to do. This is called a buffer overrun vulnerability and is a major source of malware. Making the code just a little more complex will make it a lot more secure:


int main (void)
	char Name[30];

	printf ("Please enter your name:");
	fgets (Name, sizeof(Name), stdin);
	printf ("Hello %s", Name);

	return 0;

This might confuse the Year of Code crowd but it shows how a subtle change can have massive consequences. Computers do exactly what they’re told, even if it can be dangerous. There are safeguards: my compiler refused to compile the first program with the dangerous gets() call. However not all problems are as easy to catch, which is why you need to know what you’re doing. On a personal computer this might be inconvenient if it crashes, but on a big system like a banking database it could be very expensive if someone breaks into customer records and steals lots of money.

A large part of programming is algorithms. Techniques for things like sorting and finding data, reading and writing files, or using memory have been around for a long time. A lot of them come from mathematical concepts, especially areas like formal logic, functions, formulas and matrix arithmetic. It might not be rocket science but it is a complex science of its own. When you record music, it uses a formula to convert into something suitable for storing on disk. When you move a shape around on screen, ultimately this is done through a set of matrix transformations. There are libraries that will do a lot of the work for you, but you do need to understand how they work to get the most out of them.

One thing I’ve seen in some of the code I maintain is stuff that’s badly hacked together. Rather than stop an error from occurring in the first place, let it happen anyway and just ignore it if it’s not important. Forget coming up with useful names for things. Just have things called “x” or “zotz”. If you’ve been brought in as a contractor, don’t bother documenting what you’re doing. Source code might not physically decay in the same way that a steel bridge might, but technologies cease to be supported and other parts of a system might change. I know offshore developers are popular in certain places, but that’s because they’re cheap, and they’re cheap for a reason.

Bearing all this in mind, how would I teach programming? Start off getting the principles right:

  1. Define what you’re going to do
  2. Break it down into logical steps
  3. Decide how you know if it’s working properly
  4. Select appropriate technologies and techniques
  5. Write the code
  6. Test it
  7. Fix any bugs and test it again
  8. If it works correctly, release/deploy/publish it

These principles are actually a major part of engineering so they’d carry over pretty well into other subjects. Putting together a flat pack wardrobe? Following a recipe? Building a suspension bridge? You get the idea. Technologies change, so the turtle graphics in Logo that I did at school wouldn’t really cut it now and it wasn’t exactly riveting back then either. Programming is a creative task so I’d leave some room for originality.

As for what language, I like the idea of something that’s graphically appealing and which is based on something that is used commercially. I spent quite a lot of time playing around with the graphics libraries in Borland Turbo C++ for DOS when I was first getting started in the mid 90s. These days I’d probably suggest one of the .Net Express languages on Windows, or something with a graphical IDE and based on C or Java on other platforms. The important bit is learning generic principles rather than any particular language. If you can understand program flow and some of the ideas behind things like object oriented programming in one language, it’s easy to transfer them to another

Later I might suggest Java for an Android emulator if people wanted to get into mobile phone apps. I wouldn’t expect learners to write the next Angry Birds, but again, the aim is to understand the principles and to have something to show at the end of it. I know there are teaching languages like Scratch and MS Small Basic available, but I’d prefer people to get started with something that they don’t have to unlearn later.

It’s definitely worth at least mentioning some of the laws and politics behind certain technologies. Open vs closed source would obviously be a key point when it comes to choice of technologies. Keeping data secure is another important point: both stopping bad people getting in, and understanding why, just because you have the technical capability to do something, it isn’t necessarily a good idea. DRM and copyright laws are also a topic worth discussing, but I’d go for a more balanced approach than just “copyright theft is a crime” (which of course it isn’t). As a programmer you’re creating intellectual property that you might want to share under something like the Creative Commons or Gnu licences.

All this might be a very different approach to the team of non-technical venture capitalists and “entrepreneurs” in charge of the Year of Code program, but as someone who works with very large systems where secure and reliable programming is required, I like to think I have a few ideas of my own. Farming the nasty techy stuff out to somewhere that can do it cheaply is all well and good, but you still need to be able to understand what they’re doing and provide guidance to make sure they get it right.

Tags: , ,

Mar 17 2013

Starting and Salting

Category: CodefaceMarcus @ 6:35 pm

Hello. I’ve been meaning to start a technical blog for a while covering some interesting aspects of what I do at work. I work for the IT services division of a large telecoms company, mostly developing complex web applications in ASP.Net, C#, Oracle and SQL Server. The specifics are sometimes sensitive so I won’t go into those, but sometimes there are techniques that come out of it that are worth discussing.

To start things off, password salting. This is a technique for preventing passwords being stored in plain text where it’s easy for them to be stolen. A very basic login table in a database may have something like this:

Username Password
Alice 123456
Bob password

If someone manages to get a copy of this table, they can easily log in as any user. Why not generate some summary of the password, called a hash, and store that instead? For a very simple method you could just add up the ASCII values of each character and store that. If you store the hash, you can compare that to a hashed version of what someone tries to use as a password. However this simple algorithm has the disadvantage that different  passwords might have the same hash, known as a collision. Someone using 123456 as a password would have the same hash as someone using 235416, so you could get in with either. There are better algorithms that reduce the problem of collisions. If we decide to use the SHA-1 algorithm as one of them, our table above would therefore be changed to this:

Username Hashed_Password
Alice 7c4a8d09ca3762af61e59520943dc26494f8941b
Bob 5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8

If someone then tries to log in as Alice with a password of 123457, they get a very different hash value of 908f704ccaadfd86a74407d234c7bde30f2744fe.

To get an SHA1 hash in SQL Server, use:

SELECT HASHBYTES ('SHA1', <password>)

Obviously this is an improvement over just storing the password. However, hashed passwords are vulnerable to a Rainbow Table attack, which is basically a lookup table of hashes and their unhashed equivalent. Also, if several people have the same hash, you know they also have the same password. To avoid this, it would be good to make passwords more complex, which is where salting comes in. A salt is a random string of characters that is different for each user. Take your pick of how to generate one. I like the NEWID() function in SQL Server which generates a random GUID each time it’s called:


Once you have the salt, it can be stored alongside the user details. It doesn’t need to be hashed or encrypted. The password hash is now stored as <password><salt>, or

SELECT HASHBYTES ('SHA1', <password> + <salt>)

This would make our table like this:

Username Salt Hashed_Password
Alice D0157CCE-C811-4E4E-8C4F-AD4ED37023AF 5f9f5a293710fdad1abbdb02bc4ac2eccfd81e7e
Bob F45D4893-1A46-4C2F-B783-E0903E2EB5CF 45843ca5737eccc04690dd566ee933a414a74a62

To compare an incoming password attempt, we combine it with the hash to see if it matches the hashed value. In T-SQL:

SET @pass = '123457'
SELECT CASE WHEN HASHBYTES ('SHA1', @pass + salt)  = Hashed_Password THEN 'Match' ELSE 'No match' END
FROM usernames

A bit more complex than just comparing two strings, but a lot more secure. As long as some method of generating hashes is available it isn’t too difficult to implement (the .Net library has some methods and I’d be very surprised if Oracle and Java didn’t have some as well) and it saves a system being compromised. The main drawback of hashing and salting passwords is that it is one way: if you forget your password, there’s no easy way to decrypt it. I’ve used the SHA1 algorithm here because it’s one of the most common but better ones are available. SQL Server 2012 supports SHA2, and SHA3 is in the process of being finalized.

Tags: , , ,