Search:
Local Thoughts logo
CS @ UWO in London, Ontario, Canada

Student and faculty blogs for the Department of Computer Science at The University of Western Ontario.

904181
Full

Monday, July 21, 2008 - 10:15pm by Mike


I’m so pleased with my progress on the Ca compiler, I’ve decided to provide an ad hoc link to it in its current state. I’m not confident enough to give proper instructions for it though, as it is for hardcore haxx0rz only.

I’ve made the switch from make to ant for the build system. There are a number of things which are bizarre about ant. For instance, failonerror should default to true in every case. Perhaps there are some cases where silent failure can be acceptable, which can be made explicit in those cases with a failonerror=”false”, but in a build system, I have a hard time believing that an error should, by default, be considered a success. Also there is no convenient built-in way to copy a file while retaining permissions.

Aside from that, I like the ant model of things. Java is not such a horrible way to describe how to build something.

As for the compiler itself, there are a number of bug fixes still in the queue, such that it still can’t be used for useful work. For useless work, however, it is quite fabulous, if I say so myself.

Even more exciting is I’ve run into a theoretical problem which is proving very difficult to prove. Research is one of the few vocations where your very metric of “success” is the number of problems you create. I’m off for a camping trip over the next few days which will give me some time to mull it over.

Next month I’ll be taking a trip back to my alma mater. I’ve been invited by my undergraduate supervisor to help him and his post-doctoral student on a new project he’s starting up. It’s a programming language based on co-inductive types where each program is guaranteed to halt in polynomial time. My job is to come up with, ideally, some clever ways of implementing it efficiently. I don’t know many more details than that, but it sounds like a dream project for me.

902270
Full

Monday, July 21, 2008 - 12:05am by Geoff Wozniak

Bruce Corcoran doesn’t see much point to film criticism, particularly when it comes to children’s films.

We’ve all seen it. Critics review Disney or DreamWorks animated movies. “Kung Fu Panda” and “WALL*E” are two of the latest victims. Critics had issues with the fact one went for looks over plot depth, and said the other had a weak story line.

Bruce apparently missed the fact that both films are getting good reviews and WALL*E is considered to be a masterpiece of storytelling.  He also doesn’t seem to understand that a critic’s job isn’t necessarily to judge whether a children’s film is good for children or not.  A decent critic will consider the notion when writing for the local paper, but not when considering the merits of the art form.  His point amounts to the fact that people have a difference of opinion.

And really, children are more sophisticated than he may think.  I, for one, saw through the Earnest movies and related ilk when I was a kid.  I only wanted to see them so I wouldn’t have to take part in church-related activities.

894222
Full

Tuesday, July 15, 2008 - 4:24pm by Mike


I apologize for the long delays between posts. Rest assured no news is good news, and it’s just business as usual.

One bit of exciting news, though, is that I’ve received an email from my old supervisor, Robin Cockett, at the University of Calgary. He has a project for a pretty cool sounding programming language based on Martin Hofmann’s work—where programs are restricted to polynomial time—and needs someone to worry about the implementation details. I’m pretty excited about the prospects.

Also I’ve decided to turn my idle attention to my Zipit, borrowed from my friend Albert. It’s kind of a fun little thing, if annoying to type on, but sports a 60MHz ARMv4 chip in it. I’m toying with the idea of having my compiler target not just C, but also ARM. I’ve got an ARM cross-assembler installed, so it’s just a matter of working out the networking details.

ARM is a surprisingly beautiful ISA. Everyone knows it for its pervasive use of conditional instructions, of course, but its addressing modes are quite nice as well. My only other exposure to pre- and post-index addressing was with the PDP-11, but ARM does it in a much cleaner way.

881649
Full

Tuesday, July 8, 2008 - 3:23am by Dan Siemon

This probably isn’t news to many people by now but CBC’s Search Engine will not be returning in the fall. What a loss. To me Search Engine is a great example of what a radio show and Podcast can be. The show had strong audience participation and felt almost more like a blog post than a traditional radio show. More importantly, Search Engine covered digital issues such as Copyright reform in a way that is greatly needed at this time.

I really hope that CBC will reconsider this cancellation. Public broadcasters need to bring in young people and new listeners. A new and experimental format like Search Engine is a great way to accomplish this. The huge amount of interest in this spring’s Copyright reform bill shows that many Canadians are becoming aware of the topics Search Engine covered. Now is not the time to give up on this show.

Fortunately it looks like Search Engine’s sister show, Spark, is still going to continue.

869032
Full

Sunday, June 29, 2008 - 7:14pm by Dan Siemon

About a month ago Karen and I spent the weekend in Chicago. If you haven’t been to Chicago I would recommend it. We had a great time.

One of the highlights and quite possibly the coolest museum exhibit I’ve seen was U-505 at the Museum of Science and Industry. This was worth the trip by itself. I have a bunch more pictures here.

U-505

868966
Full

Sunday, June 29, 2008 - 6:45pm by Dan Siemon

For the first time in almost month I had a bit of free time for experimentation today so I decided it was time I set up my home network to use IPv6. I’ve tried to keep up on the development and deployment of IPv6 but besides setting up a few internal network nodes with IPv6 addresses I haven’t played with it much in the past.

Background

Before I get into configuration here are a few links to some of the better articles and videos that I discovered today. Some of these are pretty technical.

The End of the (IPv4) World

Current projections on the time frame for IPv4 address exhaustion.

What this prediction is saying is that some time between late 2009 and late 2011, and most likely in mid-2010, when you ask your local RIR for another allocation of IPv4 addresses your request is going to be denied.

IPv6 Transition Tools and Tui

This article describes 6to4 and Teredo which are two technologies that aim to ease the transition from IPv4 to IPv6.

IPv6 Deployment: Just where are we?

Current IPv6 usage estimations.

Google IPv6 Conference 2008

Videos from Google’s IPv6 conference in May 2008.

IPv6 content

Other than gaining some experience with IPv6 there really isn’t a lot of benefit to using IPv6 yet. However, if you are in any way involved with IP networking it may be time to start learning about IPv6. Current projections have the IPv4 address space being exhausted in mid-2010 (The End of the (IPv4) World).

At the moment there are very few sites available which are accessible via IPv6 and even fewer are IPv6 only. Google has setup an IPv6 version of their main search site at ipv6.google.com. It is unfortunate that Google does not have an IPv6 (AAAA) record for www.google.com yet but given that providing an AAAA record to some hosts without IPv6 connectivity can cause problems, their choice is not surprising. Hopefully these kinks can be worked out as more people gain experience with IPv6. Rather then trying to remember to type ipv6.google.com all of the time I have locally aliased www.google.com to ipv6.google.com. Everything seems to be working normally so far.

SixXS maintains a list of IPv6 content at Cool IPv6 Stuff. A couple of highlights from this list are the official Beijing 2008 Olympic website and some IPv6 only BitTorrent trackers.

It has been said that the availability of porn is what really drove the adoption of the Internet. In this spirit The Great IPv6 Experiment is collecting copyright licenses to a large amount of commercial pornography and regular television shows for distribution only via IPv6. The project is due to launch sometime “soon”. What a great idea for an experiment.

A short tutorial for IPv6 and 6to4 on Fedora 9

Getting things setup turned out to be pretty easy on Fedora. I expect the same is true of any Linux distribution although the details will differ. Since most ISPs do not have native IPv6 support, special technologies are required to connect to other IPv6 nodes over IPv4. I chose 6to4 to connect to the IPv6 network. 6to4 requires a public IP address so if you are behind NAT look into using Teredo instead. Incidentally, Teredo is supported by Windows Vista.

The first step is to enable IPv6 and 6to4 on the publicly facing network interface. The required configuration file can be found in /etc/sysconfig/network-scripts/ifcfg-XXX where XXX is the name of your publicly facing network interface. Some or all of this may be configurable through system-config-network and other GUI tools but I tend to stick to configuration files. I added the following lines:

IPV6INIT=yes
IPV6TO4INIT=yes
IPV6_CONTROL_RADVD=yes

A few new entries were also required in the global network configuration (/etc/sysconfig/network).

NETWORKING_IPV6=yes
IPV6FORWARDING=yes
IPV6_ROUTER=yes
IPV6_DEFAULTDEV="tun6to4"

Note that if the computer you are configuring is not going to act as a IPv6 gateway for other hosts on your network you probably don’t want to add IPV6FORWARDING and IPV6_ROUTER. After editing these files restart the network service.

/sbin/service network restart

You should now have a new network interface named tun6to4 with an IPv6 address starting with 2002 assigned to it. 2002 (hexadecimal notation) is the first sixteen bits of the IPv6 address space dedicated to 6to4.

/sbin/ifconfig tun6to4

Now try pinging an IPv6 addresses.

ping6 ipv6.google.com

If you can reach ipv6.google.com you have working IPv6 connectivity. If you are not configuring an IPv6 gateway you can ignore everything below this point.

In order for the configured host to act as a gateway for IPv6 traffic it needs to advertise the IPv6 network prefix to the rest of your network. IPv6 doesn’t require DHCP for automatic address configuration but does require prefix announcement so the local node can figure out its IPv6 address. Prefix advertisements are handled by the radvd daemon. Below is the configuration I used (/etc/radvd.conf). Note the leading zeros in the prefix. This indicates that radvd should create the IPv6 prefix using the special 6to4 format.

interface eth0
{
        AdvSendAdvert on;
        MinRtrAdvInterval 30;
        MaxRtrAdvInterval 100;

        prefix 0:0:0:0001::/64
        {
                AdvOnLink on;
                AdvAutonomous on;
                AdvRouterAddr off;
                Base6to4Interface ppp0;
                AdvPreferredLifetime 120;
                AdvValidLifetime 300;
        };
};

After restarting radvd the other IPv6 capable nodes on your local network should also be automatically assigned an IPv6 address starting with 2002.

/sbin/service radvd start

I’m not sure if this is the way it is supposed to work or not, but eth0 on my gateway never obtains a 2002 IPv6 address automatically (this is box radvd is running on). As a result, I assigned the IPv6 address manually. Since my external IPv4 address never changes this isn’t a problem for me but it seems wrong to have to manually change the interface address if the external IPv4 address changes even though radvd will correctly advertise the new IPv6 prefix to the rest of the network automatically.

If you already know how IPv6 addresses are constructed skip this paragraph. In what will likely be the most common deployment model, IPv6 addresses are constructed of two parts: a 64-bit network identifier (prefix) and a 64-bit host identifier. The network identifier is assigned by the ISP. In the case of 6to4 the network prefix is constructed by using your public IPv4 address in combination with the first sixteen bits of the address being set to 2002. The host or node identifier is constructed by extending the 48-bit MAC address to 64-bits.

Determining the IPv6 address to assign to the internal interface (eth0) is a little tricky. First get the network prefix portion of the IPv6 address assigned to the tun6to4 interface. You want everything before the /16. This is the first 64-bits of your IPv6 address. Then look at the link-local IPv6 address which is automatically created on eth0. This address will start with fe80. The last 64-bits of this address is also the last 64-bits of the new address because this is the MAC address of the network interface. Copy everything after “fe80::”. Append this to the previously obtained network prefix separating the values with a colon. You now have the IPv6 address. Append an “IPV6ADDR=” line to /etc/sysconfig/network-scripts/ifcfg-eth0 and restart the network service (or the interface only if you like). You should now be able to ping6 between network nodes using the 2002 prefixed IPv6 addresses.

Once you have established connectivity between the nodes try ping6ing ipv6.google.com from the internal network nodes. If the ping fails you will likely have to investigate the iptables and ip6tables rules on both the gateway and the internal nodes.

862554
Full

Wednesday, June 25, 2008 - 4:03pm by Geoff Wozniak

Out in Bridgewater, Nova Scotia, Penny Boudreau is being charged with the first-degree murder of her daughter, Karissa.  The death of Karissa was distressing enough, but the fact that these incidents are rare in the town gets people riled up.

It seems, however, that a fair number of people out in Bridgewater don’t seem to understand the concept of innocent until proven guilty.  At a brief court appearance recently, people gathered outside decided to taunt her.  Stuff like, “Kill some more kids!” and “Child killer!”.  Even worse was the comment by someone about the trial that, “It helps to us to rest, to give us a little peace.”

I hate to break it to these people, but the police don’t arrest only the guilty, nor are the guilty the only people who get charged with crimes.  Just ask Anthony Hanemaayer, who was recently exonerated of a crime he was implicated in over 20 years ago.  The crime?  Attempted rape of a 15-year old girl.  Who really did it?  Paul Bernardo, who confessed last year — and they just exonerated the guy now.  How about Guy Paul Morin, if you’re interested in high-profile cases.

This kind of attitude shows why there is a system of law set up in the first place.  People judging ahead of time should be ashamed of themselves.

857912
Full

Monday, June 23, 2008 - 5:19am by Geoff Wozniak

For the last month, I’ve basically been locked away writing my dissertation.  Here are some things I have noticed.

  • Your thoughts tend to get very focused.  My thesis is the adaptation of some previous papers I have written, but with a lot more formalism to lay out the idea more clearly.  Papers often have limited space, which means you can rely a bit more on intuition.  I still have to be concise, but more rigourous.  This means careful consideration of the argument.
  • Consequently, figuring out what to write about next can be time-consuming.  I try to avoid jumping between ideas in paragraphs too much to prevent discontinuity.  Further to that, I don’t like excessive sectioning to delineate ideas.  I try to put a narrative together and give it some flow without forcing it.
  • It pays incredible dividends to sit down with a piece of paper and sketch out the main points of your argument before trying to write anything.  Outlines are invaluable.
  • Having a reasonably diverse vocabulary makes your first draft come out a little faster.  I don’t sit for a long time trying to figure out how to say something.  Most of my time is spent figuring out what I should really be saying.
  • I like writing.  It’s difficult, but rewarding. (Most of the above is really a side-effect of sitting down to write.)
  • I really, really appreciate LaTeX.  I prefer TextMate to Emacs for the actual editing.
  • Version control is something that should be mandatory for any writing/text editing software.  Failing that, it should be in the operating system.  Time Machine saved me untold hours of work from an incident that erased a section of a chapter I really didn’t enjoy writing.  I didn’t notice the section was gone until hours later (and I was being careful!  Or so I thought).  With Time Machine, I restored it in under five minutes.

 

845041
Full

Saturday, June 14, 2008 - 4:08pm by Dan Siemon

Where Google App Engine Spanks Amazon’s Web Services: S3, EC2, Simple DB, SQS

A short summary of the differences between Google’s App Engine and Amazon AWS.

Pricing for App Engine has also been announced.

840511
Full

Thursday, June 12, 2008 - 12:52am by Dan Siemon

Hunting memory leaks in Python

Interesting post showing how to use the Python garbage collector’s introspection features and Graphviz to track down object reference problems.

819659
Full

Thursday, May 29, 2008 - 8:55pm by Geoff Wozniak

Earlier this week I accepted a position with TVWorks as a software developer. The team I’ll be joining seems to be doing some interesting work and I’m looking forward to working there. It gives me a chance to get some industry experience and put together some research projects I have in mind.

In conjunction with the start of gainful employment, I’ll be finishing my thesis. (For those interested, it’s about something I’m calling behavioural synthesis, where I propose some programming constructs for obtaining structure given behaviour.) It’s been a long road with the graduate work, and at times I wasn’t sure where I would be ending up with it. I’m in the process of getting everything put together and aim to have it submitted by the end of June.

I also had a paper accepted to the European Lisp Workshop. Assuming everything works out and I get to travel to Cyprus, that ought to be fun.

778912
Full

Monday, May 5, 2008 - 1:27pm by Geoff Wozniak

This glossary is making the rounds right now, but is amusing. It also goes to show that the terminology I’ve seen in some science writing that makes me cringe has been around for while. (This was published in 1957.)

777187
Full

Saturday, May 3, 2008 - 5:53pm by Dan Siemon

Gin, Television, and Social Surplus

People new to open source software, blogging and other participatory Internet activities often wonder where others find the time. In short, it comes from not wasting a lot of time on things such as TV. The two-way nature of the Internet has made it possible for normal people to be part of the creative process in their spare time in a way that one-way media like TV and radio do not.

The article linked to above refers to the time wasted on TV etc as the cognitive surplus. It even goes on to define a ‘cognitive unit’ based on the total amount of work that has gone into creating Wikipedia. Using this unit, the amount of cognitive resources that are wasted on TV every year is estimated at 2,000 Wikipedias or 200 billion hours in the U.S. alone.

The linked article is worth your time. The suggested link between Gin, TV and societal change is fascinating.

774248
Full

Thursday, May 1, 2008 - 8:33pm by Mike


Last week, my co-supervisor gave me the Steam Boiler Problem. It’s a classical problem for specifications languages. You have a computer monitoring a steam boiler: a giant tank of water with a heat source and a steam pipe at the top. There are water pumps to control—to add more water into the tank—and a series of sensors. Essentially, given complex specifications, you have to get the steam boiler into a functioning state and keep it that way. Sensors can start malfunctioning, too, to make it more difficult.

It’s a good problem because it forced me to look at how Ca is to work with in the flesh, in a complex example. Even though I’m not done, the conclusion is that it needs a lot of syntactic sugar built on top of it.

Some of them are rather superficial, if time-consuming. For example, currently patterns cannot overlap at all in Ca, so something like:

l {
  Cons 5 xs -> ...;
  Cons x xs -> ...;
  _ -> ...;
};

Is not allowed, though it really should be. I had no idea before now how irritating it is not having that feature. There are other minor syntax features like that that need to be addressed.

The more interesting case is what to do with the catamorphisms. Theoretically they work fine, almost. In practice, they’re a bit cumbersome. When I said “almost” before, I meant they worked fine up to Loïc Colson’s famous inférieur (”minimum”) problem. It pertains to primitive recursive schemes like the one I’m working with, were writing the “minimum function”—finding the minimum of two values—has greater time complexity than in an unrestricted computational model.

What this boils down to in my mind is the inability to perform a catamorphism over two objects simultaneously. Or stated in a more Haskell-ish way, there’s no efficient way to write the zip function.

One solution floating in my mind is to make the catamorphism construct explicitly allow zipping. It’s not too bad adding it in. The syntax would be something like so:

(l, n) {
  (Cons x xs, p+1) -> ...;
  (Nil, 0) -> ...;
};

Where l is a list and n is a natural number.

Another possible solution, which I quite like, is the idea of making catamorphisms parametric. So you could do something like so:

{ i ->
  Cons x xs -> i * x + @xs(x);
  Nil -> i - 2;
} l 0;

The variable i is a parameter over the catamorphism, initially 0. I like this is as it makes a lot of things more natural and doesn’t add any computational power. The syntax for the catamorphism has to change, but I think this is for the best anyway.

760388
Full

Wednesday, April 23, 2008 - 10:39pm by Geoff Wozniak

Yesterday was the final exam for the course I’ve been lecturing. It’s been a wild ride. For good or bad, I took on some challenges with this course because I really enjoy the course material. That can be dangerous as you can get caught up in your own enthusiasm and maybe miss that the students don’t have a clue what you’re talking about. Also, I’m busy trying to finish with my doctoral work, so I didn’t really need the extra work. But I needed the money. Money takes precedence, since food is something I need to live.

Some of the things I did were risky, but different and potentialy interesting.

  • A heavy focus on assignments. I pushed the students here. The work stressed thought over brute-force attacking the problem, which is something they didn’t seem to get used to. Most assignments had a short, elegant solution if you sat down, did some research, read documentation and tried things out. The students seemed the most comfortable writing brute-force approaches.
  • Which leads to the next point: heavy emphasis on trying things out. The textbook I elected to use was supposed to be something you work your way through by entering the examples. I have doubts that most students put any serious effort into doing this judging from the assignment submissions. The pitfall of this was that, as a student, if you got behind, it was very difficult to catch up. That is generally true of university courses anyway and I went on the assumption that this was a condition they had seen before, since it was a third-year course.
  • Minimal lecture notes to encourage/nudge students to actually work through the textbook. I spent a lot of time putting lectures together that complemented the textbook, but did not regurgitate its content. Some of the lectures went over very well, some didn’t. I got the impression the lectures were appreciated, but I still got requests for more comprehensive lecture notes. I just said, “read the textbook”.
  • A modern looking website in the style of a blog. It was dead simple to do, so I did it. I am of the opinion that our department has a very poor web setup for course instructors seeking to use the web in their courses. I was aiming to demonstrate ways of using the departmental website to do reasonably interesting things that students could probably relate to, given the ubiquity of the web. My criticisms of our department setup, however, is a discussion that should be held in a different context.
  • All my assignments were done electronically. Submissions were done through the departmental electronic submission system. I then wrote some programs to process these and turn them into PDF files, which the teaching assistant and I marked. Feedback was provided by annotating the PDF files using Preview. I returned the assignments using the course web page and password-protecting the files. Student could only access their own files. Almost no paper was used at all for the course.
  • I made a screencast to demonstrate the desired behaviour for some assignment code.
  • For the exam, I created a question for each student based on something they submitted in an assignment. This was more difficult than I thought, but it was immensely satisfying. I strove for fairness above anything else, but I also wanted to connect the material to the work of the student by using the student’s own work to demonstrate some concept. This might encourage students to reflect on what they do in the future, outside of the course.

Overall, I enjoyed it, even if it was incredibly difficult. I certainly hope the students enjoyed it (and the feedback has been positive, so far). I would certainly do it again, but I would only do it were I to be paid more. Now the job search is going to supplant teaching as something that consumes my time.

756079
Full

Monday, April 21, 2008 - 10:56pm by Andrew Delong


I have to do this. I should be able to afford US$4,120 some day not too far into the future. Let me know if you want to go too :)

752807
Full

Saturday, April 19, 2008 - 5:28pm by Dan Siemon

Environment Canada is nice enough to publish radar data for their many weather radars across the country. The Exeter WSO radar covers the area in which I live.

A friend of mine has created a great Google maps mashup which makes use of the data provided by Environment Canada in combination with the KML file format which has recently been standarized. Note that because of the multiple overlay layers you may need to turn off some of the data (checkboxes on the right) to make Google maps faster.

Environment Canada Weather Radar on Google Maps

Environment Canada Weather Radar KMZ (Google Earth)

Kier has also made a couple of blog posts describing the development of the mashup [1, 2].

740880
Full

Saturday, April 12, 2008 - 10:30pm by Geoff Wozniak

Now that classes have ended and I don’t have to prepare lectures (just an exam and a marking scheme), I can get back to writing some more here.

I just have to say that I thoroughly enjoyed the teaching the course, but I am very glad that I am done. I have a dissertation to write. (And I only taught the course for the money. Really.)

725978
Full

Friday, April 4, 2008 - 5:08am by Andrew Delong


I had to dig around for simple but non-obvious steps in order to get a paper to pass IEEE’s PDF eXpress validation tool, so I thought I’d share them. These steps are for latex/dvips/ps2pdf tools on windows.

First, if your paper size or margins appear incorrect in the .ps/pdf, (no matter what size you put on the dvips command line!), try adding the following to your .tex file after the \usepackage statements.

\special{papersize=8.5in,11in}

Next make sure the hyperref package is disabled since IEEE doesn’t like bookmarks, backrefs, or hyperlinks. If you use \url for formatting you can define your own replacement.

%\usepackage[...]{hyperref}
\newcommand{\url}[1]{{\small \tt #1}}

And finally page numbers should be disabled with the following

\pagestyle{empty} % disable page numbers
\begin{document}
...
\maketitle
\thispagestyle{empty} % spurious number on page 1

If your dvi looks fine, then force Type 1 (outline) fonts when converting to postscript via

> dvips -P pdf paper_final.dvi

and subset-embed all fonts (including standard ones) in the final PDF with

> ps2pdf -dEmbedAllFonts#true -dPDFSETTINGS#/printer
         paper_final.ps paper_final.pdf

For some reason ps2pdf won’t embed standard fonts unless you output for `printer’. Since hyperref is disabled, you can embed pdftitle and pdfauthor by adding the following after the papersize special if you really care.

\AtBeginDocument{%
  \begingroup
  \special{! /pdfmark where
             {pop} {userdict /pdfmark /cleartomark load put} ifelse
             [ /Author (Your Name Here)
               /Title (Your Title Here)
               /Subject (Booktitle Here)
               /DOCINFO pdfmark
         userdict /pdfmark /cleartomark load put
  }
  \endgroup
}

719273
Full

Monday, March 31, 2008 - 10:40pm by Mike


I’ve been thinking about the compiler again and how best to implement higher-order functions, once time comes to do that. Currently the compiler uses a lambda lifter. I.e., there are no closures. Every function is lifted out of any scope to the top-level, collecting extra parameters along the way. This is fine for first-order programs, since you can change every call to a nested function into a call with extra arguments. However, in first order programs, this causes problems because the arities have to match. For instance:

f x = let g y = x + y; in
  g (x + 4);

Gets transformed into:

f_g x y = x + y;
f x = g x (x + 4);

Every function is now at the global level and g has all the variables it needs. This causes problems with higher-order functions though:

f x zs = let addX y = x + y; in
  map addX zs;

Naïvely this is:

f_addX x y = x + y;
f x zs = map (f_addX x) zs;

Ahh, but the first argument to map is a closure, not a function! We should properly implement closures as lambda lifted functions, so we can’t very well implement lambda lifted functions are closures! There are myriad other problems with this approach, but suffice it to say it won’t work.

Anyway, I was contemplating abandoning lambda lifting entirely and just leaving closures implemented as closures in C. The GNU C Compiler has a little-known and little-used, sadly, feature usually called nested functions. They’re half-closures in a way. They’re closures up until their lexical scope disappears. Well, such is the C way. But they would do what I needed.

I was reading the original 1988 Usenix paper which describes the technique, usually called “trampolining,” to implement nested functions quite smartly in C. As I was reading, I discovered that the author had originally intended to implement unnamed functions in C!

He gives two suggestions for syntax. I like the first one he suggests, which looks something like (int (*)(int x) { return x + 2; }). For instance, you could do something like (int (*)(int x) { return x + 2; })(3), which would evaluate to 5. Perhaps not the most useful example, but you get the idea.

These unnamed functions are really no more of a bother to implement than named functions, once you allow nested functions, but it appears this feature was never implemented in GCC! I suppose to get good use out of them you’d need a less bulky syntax. Type inference could help there.

Anyway, the point of this post was to register my dismay at GCC never implementing unnamed functions, despite there being no technical reason for not doing so. They can be quite handy, those unnamed functions.

For what it’s worth, I think I’ve decided against relying on GCC’s trampolining to do higher-order functions and lambda abstractions in Ca. It is very clever and very efficient, but the semantics are a bit hairy for the 21st century. For example, I just looked at some SPARC assembler that GCC produced for it, and it involves a system call to make to the stack executable. In today’s security conscious kernels, executing the stack, as trampolining necessarily requires, is less practical than it once was. Maybe one day we can get rid of these blunt MMUs, but that’s a rant for another time. In any case, MMU concerns aside, executing the stack mucks with the instruction cache on architectures with separate caches. As I said, hairy semantics.

In Breuel’s paper, he sets up other simpler solutions for implementing closures as being impractical due to their changing calling conventions. Well, I’m making my own language and I can make my own calling conventions. I expect that higher-order functions will simply be passed as two pointers instead of one. It saves everyone a lot of headaches.

716686
Full

Sunday, March 30, 2008 - 4:00am by Geoff Wozniak

The symbolism of Earth Hour is well-intentioned, but I’m doubtful of any significant changes coming about in the habits of people as a result of it. Despite my skepticism, I take part because I feel it’s important.

Beata and I went for a walk during Earth Hour to see what others in downtown Hamilton were doing for it. Turns out it wasn’t much.

For the most part, all the apartment buildings around our house were well-lit. One was rather dark, but the apartment buildings gave us no indication that it was Earth Hour.

Most business had the lights on as usual, but we did come across three taking part in the event: La Luna, the Sheraton and Honest Lawyer. None of them had all the lights off (there are safety issues, of course), but they certainly had fewer lights on than usual. La Luna had candles for each table, the lobby of the Sheraton had minimal lighting with some live music, and Honest Lawyer had nearly all the lights off with the big screen TVs provided the necessary illumination. The Board of Education office and City Hall looked like they were off as well.

Aside from these places, everything else was lit up like a Christmas tree. The Standard Life Building in Jackson Square was close to fully lit on at least 3 floors. If there was a big cleaning crew there or people working late on a Saturday, we didn’t see any of them. Many businesses had the full array of lights on, even the superfluous neon ones lining store windows that don’t seem to say anything. Every Tim Horton’s outlet we saw (3, in total) was ablaze with light, seemingly oblivious to the whole idea.

Overall, it was pretty disappointing. Based on the propensity of light we saw, apathy seemed to be the primary reaction, although we couldn’t rule out ignorance. Still, we spent the hour walking around, capping it off with a visit to a variety store, just to see two teenage kids get booted out for shoplifting.

Earth Hour didn’t exactly boost my opinion of this town.

711553
Full

Thursday, March 27, 2008 - 2:17am by Dan Siemon

There has been lots of discussion and buzz around the Amazon Web Services (AWS) lately. I posted a few links about this last week. Most of the articles that I have read on AWS speak of it from a high level. General discussions about how the service allows your web application to increase capacity as required are interesting but I was curious about the interface that these services present application developers and to the Internet. More specifically, how do the AWS interfaces compare with normal server colocation services.

Amazon AWS is actually a collection of services. The Elastic Compute Cloud (EC2) is the service most commonly discussed. Other interesting services that are part of AWS include the Simple Storage Service (S3), SimpleDB and the Simple Queue Service (SQS). This article will only discuss EC2 but does not aim to be a EC2 tutorial. Amazon provides a good user guide if you are sufficiently interested.

Everything below comes from an afternoon of experimentation with EC2. Please leave a comment with any corrections or other useful bits of information you might have.

Signing up

EC2 operates on a pay for what you use model. As a result you need a credit card to use EC2 so Amazon can bill you once per month based on your usage. The first step is to sign-up for an AWS account. This account will give you access to AWS documentation and other content. After you have an AWS account you can then enroll in EC2. It is at this point that the credit card is required.

All interaction with EC2 occurs over web service APIs. Both REST and SOAP style interfaces are supported. Web service authentication occurs via X.509 certificates or secret values depending on the web service API used. Amazon nicely offers to generate an X.509 certificate and public/private keys for you. Letting Amazon create the keys and the certificate is probably a good idea for most people since it is not an entirely trivial task. However, depending on how paranoid you are you might want to create the keys locally. Amazon says they don’t store the private keys they generate and I have no reason to doubt them but generating the keys locally reduces the possibility that your private key will be compromised.

It’s all about virtual machines

The fundamental unit in EC2 is a virtual machine. If you have experience with Xen or VMWare you can think of EC2 as a giant computer capable of hosting thousands of virtual machines. In fact, the virtualization technology used by EC2 is Xen. At present only Linux based operating systems are supported but Amazon says that they are working towards supporting additional OSs  in the future. Since Xen already has the capability to host Windows and other operating systems this certainly should be possible.

All virtual machine images in EC2 are stored in Amazon’s S3 data storage service. Think of S3 as a file system in this context. Each virtual machine image stored in S3 is assigned an Amazon Machine Image (AMI) identifier. It is this identifier that serves as the name of the virtual machine image within EC2.

Virtual machine images within EC2 can be instantiated to become a running instance. Many instances of an image can be running at any one time.  Each instance has its own disks, memory, network connection etc so it is completely independent from the other instances booted from the same image. Think of the virtual machine image as an operating system installation disk. This is all very similar to VMWare and other virtualization technologies.

Amazon and the AWS community provide a large number of AMIs for various Linux distributions. Some are general images while others are configured to immediately run a Ruby on Rails application or fill some other specialized role. Of course it is also possible to create new AMIs either for public or private use. Private images are encrypted such that only EC2 has access to them. Since private images will likely contain proprietary code this is a necessary feature.

For an example of why you might want multiple images consider a three tier web application which consists of a web server tier, application tier and a database tier. By having an AMI for each of these machine types the application author can quickly bring new virtual machines in any tier online without having to make configuration changes after the new instance has booted. EC2 also allows a small amount of data to be passed to new instances. This data can be used like command line arguments. For example the address of a database server could be passed to the new instance.

Interacting with EC2

All interaction with EC2 occurs via very extensive web service APIs. Creating and destroying new instances is trivial as is obtaining information on the running instances. There is even a system in place for instances to obtain information about themselves such as their public IP address. Where applicable, such as when starting a new virtual machine instance, these web service calls must be authenticated via a X.509 certificate or a secret value.

Since not everyone will want to write their own EC2 management software Amazon provides a set of command line utilities (written in Java) which wrap the web service APIs. This allows the user to start, stop and manage EC2 instances from the command line.

Creating a new instance is as simple as:

./ec2-run-instances ami-f937d290 -k amazon

The ‘-k amazon’ specifies the name of the SSH private key to use. I’ll come back to this in a bit. Starting ten instances of this image can be accomplished by adding ‘-n 10′

./ec2-run-instances ami-f937d290 -n 10 -k amazon

It is also possible to look at the virtual machines console output. Unfortunately, this is read-only. Management activities are not possible via the console. In this case the instance identifier is passed not the AMI.

./ec2-get-console-output i-8fad57e6

Again, all of the management activities happen via web service APIs so you can build whatever management software you require.

What do the VMs look like?

Hardware platforms

At present Amazon offers three different virtual hardware platforms.

  1. Small instance:  1.7 GB of memory, 1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute Unit), 160 GB of instance storage, 32-bit platform.
  2. Large instance: 7.5 GB of memory, 4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each), 850 GB of instance storage, 64-bit platform.
  3. Extra large instance: 15 GB of memory, 8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each), 1690 GB of instance storage, 64-bit platform.

Data storage

The storage layout of a small instance running a Fedora 8 image looks like the following:

 -bash-3.2# cat /proc/partitions
major minor  #blocks  name

   8     2  156352512 sda2
   8     3     917504 sda3
   8     1    1639424 sda1
-bash-3.2# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1             1.6G  1.4G  140M  91% /
none                  851M     0  851M   0% /dev/shm
/dev/sda2             147G  188M  140G   1% /mnt

Output from top:

Tasks:  49 total,   1 running,  48 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.6%us,  1.0%sy,  0.0%ni, 96.1%id,  0.6%wa,  0.0%hi,  0.0%si,  1.7%st
Mem:   1740944k total,    88904k used,  1652040k free,     4520k buffers
Swap:   917496k total,        0k used,   917496k free,    33424k cached

Data persistence

The disk partitions attached to each instance are allocated when the reservation is created. While these file systems will survive a reboot they will not survive shutting the instance down. Also note that Amazon makes it clear that internal maintenance may shut down virtual machines. This basically means that you cannot consider the disks attached to the reservations as anything more then temporary storage. It is expected that applications running on the EC2 platform will make use of the S3 data storage service for data persistence. In fact, signing up for the EC2 service automatically gives access to S3.

Network configuration

Once instantiated each virtual machine has a single Ethernet interface and is assigned two IP addresses. The IP address assigned to the Ethernet interface is a RFC-1918 (private) address. This address can be used for communication between EC2 instances. The second address is a globally unique IP address. This address is not actually assigned to an interface on the virtual machine. Instead NAT is used to map the external address to the internal address. This allows the instance to be directly addressed from anywhere on the Internet but does limit communication to using the protocols supported by Amazon’s NAT system. At present traffic to and from the virtual machines is limited to the common transport layer protocols (TCP and UDP) making it impossible to use other transport protocols such as SCTP or DCCP.

Both the internal and external IP addresses are assigned to new instances at boot time. EC2 does not support static IP address assignment.

Authentication and Security

Firewall

Amazon implements firewall functionality in the NAT system which handles all public Internet traffic going to and from the EC2 instances. When instantiated each instance can be assigned a group name or use the default group. The group name functions like an access list. Changing the access rules associated with a group is accomplished with the ec2-authorize command. The following example allows SSH, HTTP and HTTPS to a group named ‘webserver’.

 ec2-authorize webserver -P tcp -p 22
 ec2-authorize webserver -P tcp -p 80
 ec2-authorize webserver -P tcp -p 443

Instance authentication

The authentication method used to connect to an EC2 instance depends on whether or not you build your own images. If you build your own image you can use whatever authentication or management solution you like. Obvious examples include configuring the image with predefined usernames and passwords and using SSH or perhaps Webadmin. Installing SSH keys for each user and disabling password authentication is probably the best choice.

Authentication when using the publicly available images is a little more complicated.  Having a default user/password combination or even default user SSH keys would allow other users to easily login to an instance booted from a publicly available image. To get around this problem Amazon has created a system whereby you can register an SSH key with EC2. During the virtual machine imaging process the public portion of this SSH key is installed as the user key for the root user.

How is this different from normal server co-location?

The biggest difference between server colocation and EC2 is the ephemeral nature of the resources in EC2. This is a positive property in that it is trivial to obtain new resources in EC2. On the negative side of things the fact that ‘machines’ can disappear and that other resources such as IP address assignments are unpredictable adds new complexities.

Machine failure

Amazon states that servers can be shut down during maintenance periods and of course hardware failures will happen. Both of these events will result in virtual machine instances ‘failing’. Since disks and therefore the data that they contain disappear when instances die it seems that the complete failure of individual virtual servers is going to be a more common event than one might expect with traditional server co-location. Consider that a massive power failure event in Amazon’s data center(s) will be the equivalent to a traditional colocation facility being destroyed. Not only do you temporarily lose operational capability but each and every server and the data they were processing and storing would be gone.

In reality every large scale web service should plan for large failure events and individual server failure is also expected to happen regularly given enough nodes. Perhaps deployment on EC2 will make these events just enough more likely to force developers to address them rather than implicitly assuming that they will never occur.

If anyone reading this has experience using EC2 I would love to hear about how often you experience virtual machine failure.

HTTP load balancing

Another interesting complication comes from the fact that EC2 does not support static IP address assignments. Often large web deployments include a device operating as a load balancer in front of many web servers. This may be a specialized device or another server running something like mod_proxy. Using example.com as an example, a typical deployment would point the DNS A records for www.example.com to the load balancer devices. When colocating a server it is normal to be assigned a block of IP addresses for your devices. This makes it easy to replace a failed load balancer node without requiring DNS changes. However, in the case of EC2 you do not know the IP address of your load balancer node until it has booted. As already discussed, this node can disappear and when its replacement comes back online it will be assigned a different IP address.

This presents a problem because the Internet’s DNS infrastructure relies on the ability of DNS servers to cache information. The length of time that a particular DNS record is cached is called the time to live (TTL). Within the TTL time a DNS server will simply return the last values it obtained for www.example.com rather than traversing the DNS hierarchy to obtain a new answer. The dynamic nature of IP address assignment inside EC2 does not mix well with long TTL values. Imagine a TTL value of one day for www.example.com and the failure of the load balancer node. The result would be up to a full day where portions of the Internet would be unable to reach www.example.com. Perhaps more inconvenient would be the user seeing another EC2 customer’s site if the address was reassigned.

In order to work around this problem one solution is to use a very low TTL value. This is the approach taken by AideRSS.

 $ dig www.aiderss.com

; < <>> DiG 9.5.0b1 < <>> www.aiderss.com
;; global options:  printcmd
;; Got answer:
;; ->>HEADER< <- opcode: QUERY, status: NOERROR, id: 3721
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 5, ADDITIONAL: 5

;; QUESTION SECTION:
;www.aiderss.com.               IN      A

;; ANSWER SECTION:
www.aiderss.com.        3600    IN      CNAME   aiderss.com.
aiderss.com.            60      IN      A       72.44.48.168
. . .
$ host 72.44.48.168
168.48.44.72.in-addr.arpa domain name pointer ec2-72-44-48-168.compute-1.amazonaws.com.

AideRSS is using a sixty second TTL for the aiderss.com A record. This means that every sixty seconds all DNS servers must expire the cached value and go looking for a new value.

Another site hosted on EC2 is Mogulus (just found them when looking for EC2 customers). They take a slightly nicer approach to this problem.

$ dig www.mogulus.com

; < <>> DiG 9.5.0b1 < <>> www.mogulus.com
;; global options:  printcmd
;; Got answer:
;; ->>HEADER< <- opcode: QUERY, status: NOERROR, id: 46281
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 2, ADDITIONAL: 2

;; QUESTION SECTION:
;www.mogulus.com.               IN      A

;; ANSWER SECTION:
www.mogulus.com.        7200    IN      A       67.202.12.112
www.mogulus.com.        7200    IN      A       72.44.57.45
$ host 67.202.12.112
112.12.202.67.in-addr.arpa domain name pointer ec2-67-202-12-112.z-1.compute-1.amazonaws.com.
$ host 72.44.57.45
45.57.44.72.in-addr.arpa domain name pointer ec2-72-44-57-45.z-1.compute-1.amazonaws.com.

Rather than a single A record with a very low TTL Mogulus uses two A records pointing to two different EC2 nodes and a TTL of 7200 seconds (two hours).

Personally, I consider these low TTL values (especially the 60s one) to be mildly anti-social behavior because it forces additional work on DNS servers throughout the Internet to deal with a local problem. Amazon should consider adding the ability to statically provision IP addresses. This would allow the Internet facing EC2 nodes to have consistent addresses and thereby reduce the failover problems. Like everything else in EC2, this could be charged by usage. I’d be happy to pay a few dollars (5, 10, x?) a month for single IPv4 address within EC2 that I could assign to a node of my choosing.

Pricing

Unless you are reading this close to the date it was written it is probably a good idea to visit Amazon for pricing information instead of relying on the data here.

Instance time

When I first starting investigating EC2 I misinterpreted EC2’s pricing. I thought that instance usage was charged on a CPU time basis. This would effectively mean that an idle server would cost next to nothing. The correct interpretation is that billing is based on how long the instance is running not how much CPU it uses. The current EC2 pricing is:

  • $0.10/hour - Small Instance
  • $0.40/hour - Large Instance
  • $0.80/hour - Extra Large Instance

This makes the constant use of a single small instance cost $70/month. Pretty reasonable especially when you consider that you do not have to buy the hardware.

Data transfer

Using EC2 also incurs data transfer charges.

  • Data transfer into EC2 from the Internet: $0.10/GB.
  • Data transfer out of EC2 to the Internet: $0.18/GB (gets cheaper if you use > 10TB/month).

Data transfer between EC2 nodes and to/from the S3 persistent storage service is free. Note that S3 has its own pricing structure.

Summary

In a lot of ways EC2 is similar to server location services. At its lowest level EC2 gives you a ’server’ to work with. Given the prices outlined above using EC2 as a colocation replacement may be a good choice depending on your requirements.

What really makes EC2 interesting is its API and dynamic nature. The EC2 API makes it possible for resources such as servers and the hosting environment in general to become a component of your application instead of something which the application is built on. Applications built on EC2 have the ability to automatically add and remove nodes as demands change. Replacing failed nodes can also be automated. Giving applications the ability to respond to their environment is very intriguing idea. Somehow it makes the application seem more alive.

709415
Full

Wednesday, March 26, 2008 - 1:43am by Dan Siemon

I finally got around to watching A new way to look at networking yesterday. This is a talk given by Van Jacobson at Google in 2006 (yes, it has been on my todo list for a long time).This is definitely worth watching if you are interested in networking.

A couple of quick comments (These are not particularly deep or anything. This is mostly for my own reference later.):

  • He says that the current Internet was designed for conversations between end nodes but we’re using it for information dissemination.
    • Me: This distinction relies on the data being disseminated to each user being identical. However, in the vast majority of cases even data that on the surface is identical such as web site content is actually unique for each visitor. Any site with advertisements or with customizable features are good examples. As a result we are still using the Internet for conversations in most situations.
  • He outlines the development of networking:
    • The phone network was about connecting wires. Conversations were implicit.
    • The Internet added metadata (the source and destination) to the data which allowed for a much more resilient network to be created. The Internet is about conversations between end nodes.
    • He wants to add another layer where content is addressable rather than the source or destination.
  • He argues for making implicit information explicit so the network can make more intelligent decisions.
    • This is what IP did by adding the source and destination to data.
  • His idea of identifying the data not the source or destination is very interesting. A consequences of this model is that data must be immutable, identifiable and build in metadata such as the version and the date. It strikes me how the internal operation of the Git version control system matches these requirements.
707494
Full

Tuesday, March 25, 2008 - 3:31am by Geoff Wozniak

I hope to post more interesting and provocative things in the near future. Right now, I’m too busy with teaching, writing a dissertation and looking for a job. In the meantime…

An amusing experiment in marketing, except the messages hold no particular meaning.

My favourite exchange happens when the person is holding up a sign that says “Hyp-hens”:

Woman: “Why are you doing this?”

Guy with sign: “I am informing people about hyphens.”

704655
Full

Saturday, March 22, 2008 - 11:48pm by Mike


The functional programming language community seems to be moderately abuzz over the recent release of the Disciplined Disciple Compiler. The first announcement I saw for it I ignored almost completely. Dialect of Haskell, allows destructive updates, allows mixing lazy and strict, etc. None of that really grabbed me.

Then this afternoon I saw another blog mentioned it on the Planet Haskell feed. This time it was mentioned with the phrase “region, effect, and closure inference” and immediately I had to read more about it.

The compiler seems to be a Google summer of code project by an Australian Ph.D. student. There is no paper yet! This means trying to figure out what’s going on involves combing through all the wiki pages and reading some tea leaves. Regions—à la Tofte—are not used for memory allocation. Darn! They seem to be used as part of the effect system, e.g., to mark certain regions of memory as being pure/constant/read-only or allowing destructive updates.

All of the magic is in the type system. It looks very well thought-out and very rich and is all orthogonal to the usual Haskell type system.

It looks like an interesting language to follow. I shall be keeping track of its developments.

From a syntactic stand-point, it seems to further my belief that you need to choose a “side” on the strict versus lazy issue. Disciple should in theory nicely and transparently mix strictly evaluated expressions with lazily evaluated ones. In practice this means things are strict by default but can be made lazy with annotations.

In my mind, what defines a language as being “strict” or “lazy” is the standard library. A language which truly is agnostic on the issue should allow, for example, map to be used in either a strict or lazy manner rather than having both a strict map and a lazy map. I don’t see an easy way to get this, so for the time being it seems languages have to pick sides.

In a similar vein, one of the nice features of Disciple is that its effect system ends the duality in Haskell between pure and impure functions. For instance, in Haskell you have a map which works on pure functions and a mapM which works on monadic functions. No such division exists in Disciple.