Thursday, September 3, 2009

How Many Bits Does it Take to Make Me?

After reading Bunnie's article, On Influenza A (H1N1) I came up with the idea for this post. You should definitely read his article, but here's a brief summary:

DNA is comprised of pairs of molecules, each pair can be one of four choices. In computer science lingo, each pair is represented by 2 bits. The H1N1 virus is 26,022 pairs long, or is represented by 3.2 kbytes.

How much data is 3.2 kbytes? Not very much. A compiled Hello World program in C is around 12 kbytes! It takes four times as much data to dump "Hello World" to screen than to kill a human! Here are some other interesting data points:

HIV: ~2.4 kilobytes
One page typed: ~2 kilobytes

The entire HIV genome takes up less space than the cell phone bill I pay each month!

Human DNA: ~730 megabytes
Windows XP Installation Disc: ~580 megabytes
Windows Vista Installation Disc: ~1.9 gigabytes

At some point between Windows XP and Windows Vista, operating systems became more complicated than people!

Fruit Fly: ~40 megabytes
Damn Small Linux: ~50 megabytes

Even the smallest of linux distributions is put to shame by a fruit fly.

Rice: ~105 megabytes
Canine: ~600 megabytes
Puppy Linux: ~103 megabytes

Puppy Linux is actually much closer in size to the genome of a grain of rice than that of an actual dog. Perhaps they should rename the project?

I chose to measure the size of the installation media used for each of these operating systems because this represents the minimum size required to create a functioning system. I felt that this measure is most similar to that of DNA. While your DNA does not describe every single detail about you, it does contain the necessary information to create you.

Disclaimer: I got the numbers for all the different OS sizes and genome lengths just from googling. I am probably wrong on some of them, but unfortunately I forgot to keep track of where they all came from. If you see something wrong with my numbers, let me know. I converted number of base pairs to megabytes by typing the following into google: (# of base pairs) * 2 bits in megabytes

Saturday, August 15, 2009

StackOverflow Experiment Results

Thanks to many of the users for pointing me to the official data dump, available here, I was able to complete my experiment.

I measured, using the number of questions asked containing a specific tag, the activity of various programming languages throughout the week. My hypothesis is: Newer dynamic languages like Ruby and Python will see a rise in questions ask
ed on the weekend while more corporate languages like C# and Java will see a dropoff in activity on the weekend.

My theory is that programmers choose to use languages like Python and Ruby for their personal projects, despite their weaknesses, because these languages are more fun to program in. Since programmers tend to work on these projects at night and on the weekends, they will probably be asking questions related to their projects during these times.

Fortunately, the results supported my hypothesis. A plot (made u
sing Python) of the relative number of questions asked per day of the week is shown below. The values were computed by calculating the percentage of questions asked for each topic relative to the total number of questions being asked. This controls for the overall drop in traffic to on the weekends.

Python and Ruby both have a sharp rise on the weekend, while C# and Java both fall off. The fall of C# is quite a bit more pronounced than that of Java, but the effect is still clear. Another interesting note is that the two "workweek languages" both have a rise in activity on Mondays. Maybe programmers leave work Friday and continue to mull over problems at work during the weekend, then ask their problems early Monday morning.

Even though the relative activity of Python and Ruby rises on the weekend, it is still important to note that C# still sees activity around three times higher. This shows that there are still more people using C# than Python on the weekend, just not as many as during the week.

I'm not too sure exactly what the implications of these results are. Let me know what you think.

Wednesday, July 29, 2009

StackOverflow Experiment

I've been an active member of for a little over a month now. I've gotten some great help from the community on a variety of problems, both work related and personal. This gave me an idea to run a little experiment, piggy-backing off of the SO community.

The goal of this experiment is to determine if certain computer languages are used primarily at work, while others are used primarily for freetime/personal projects. Basically, I want to validate the stereotype that languages like C++ and Java are used in corporate settings while programmers tend to choose languages like Python or Ruby for their own projects.

I am attempting to measure this based on the assumption that most work-related projects are worked on during the week from around 9AM-5PM, and that freetime projects are worked on during evenings and on weekends. I created a script that will assess the number of questions asked on a topic at various times throughout the day and week.

My hypothesis is that the number of questions asked per hour on languages like Ruby and Python will be higher on evenings and weekends than during the work week, while the number of questions asked per hour on C++ and Java type languages will be highest during the work week.

I am going to define the work-week as 6am to 8pm Eastern Time, Monday through Friday. I feel that this time range is broad enough to capture work times across the United States, but not too broad to include much evening time for the East Coast. I also realize that the total activity on SO can vary throughout the day, so I am measuring the number of total questions asked as well to act as a control.

I also acknowledge that the number of questions asked on a topic is not the very best measure of a topic's activity. However, people do usually ask questions about what they're currently working on. I might post statistics for both questions asked and page views.

I'll have the results in a day or so, whenever I get the time to make the pretty charts.

Friday, July 17, 2009

Semantic Web Blues

At a talk I attended last week, Ralph Swick from the W3C described the current state of Semantic Web technology, and where the W3C would like to take it. The talk was great, but a couple of things Ralph said really stood out as problems in taking the Sematic Web forward.

While describing how the Sematic Web works, Ralph used the phrase "One man's metadata is another man's data." This really struck me. The metadata that we generate automatically while taking pictures on our cameras, saving documents in Word and reading emails can be incredibly valuable to the Sematic Web.

An image of a building is not that useful on it's own, but when you add the name of the photographer, the time the image was taken and the exact Lat/Long coordinates of the camera, a computer might be able to figure out what the name of the building is. Standardized meta data like this is going to be key in making the Semantic Web useful.

Unfortunately, the culture of the web today doesn't recognize this. Metadata is considered useless. Companies like Google and Yahoo even reccomend stripping it from images to decrease page loading times. Unfortunately, the cost of moving a couple extra bits over the wire outweighs the context gained from knowing where an image was taken.

This culture of minimization on the web has to change before the Semantic Web can take off. Next time you start to strip the metadata from your files to save space, remember that one man's garbage is another man's treasure. The few bytes you're throwing away could be incredibly useful to someone else. With today's hard drive prices, keeping an extra 10 or 100 megabytes around isn't costing you very much.

Tuesday, July 14, 2009

G-Code Interpreter

As part of my 3D-Printer project, I'm writing a G-Code interpreter to run on an Arduino. My machine will support a fairly limited subset of G-Code, as well as a few extra commands specific to my machine.
  1. Using the partial G-Code spec available from LinuxCNC, I developed a rudimentary parser. Here's the control flow of the parser:
  2. Split G-Code file into lines (blocks)
  3. Search each block for valid words in specific order: X, Y, Z, S, F, T, M, G
  4. Handle each word appropriately (set feed, set speed, move, etc.)

There are some notes here:

Location values must be maintained statically so code like this will run:
G01 Z1.0 F.05 X0.0
Speeds and feeds must be read first so code like this will run:
G01 Z1.0 F.05
Since my printer only has one toolhead, I changed the T word to represent temperature. I realize this is an awful and confusing decision, and will probably change it eventually. Here's my (messy) code.
void  Printer::parseBlock(char *string)
//Location must be maintained
static Location location = Location(0,0,0);

//Control words: Feed rate, extrusion speed and temperature
int s = 0;
int f = 0;
int t = 0;

//G Mode is persistent
static int g = 0;

//Machine Option is not persistent
int m = 0;

//Search string for accepted codes, then execute
//order is important, be careful when changing

char *p = string;

//Position First
if ((p = strpbrk(string, "X")))
sscanf(p, "X%f", &(location.iX));
if ((p = strpbrk(string, "Y")))
sscanf(p, "Y%f", &(location.iY));

if ((p = strpbrk(string, "Z")))
sscanf(p, "Z%f", &(location.iZ));

//Speed, Temp and Feed
if ((p = strpbrk(string, "S")))
sscanf(p, "S%d", &s);

if ((p = strpbrk(string, "F")))
sscanf(p, "F%d", &f);

if ((p = strpbrk(string, "T")))
sscanf(p, "T%d", &t);

//Operation Codes (M)
//Conflicts are resolved by order
//AKA: conflicts are not resolved
p = string;
while ((p = strpbrk(p, "M")))
sscanf(p, "M%d", &m);
//Move to next spot

//G Codes
//Conflicts are resolved in order, AKA not resolved
p = string;
while ((p = strpbrk(p, "G")))
sscanf(p, "G%d", &g);
G(g, location);


Thursday, July 9, 2009

Simmons LED Display Part 4: Django Web App

This is part four of my series on the Simmons LED Display. I'm going to describe the implementation of the web-based front end.

Here are some of the specs I wrote up for the web applic
  1. Users go to the Simmons LED website to enter messages.
  2. The Simmons LED website tells the user when their message will be displayed.
  3. The Simmons LED server displays the message at the given time for a fixed time (2 minutes).
  4. Only Simmons residents will be able to post messages.
  5. Messages will be profanity/obscenity/inappropriate filtered.
It's a pretty simple idea, the only non-straightforward aspect is some misdirection in the form of a database. The database allows for separation betw
een the display code and the web page code. Here's the flow of the program:
  1. A user visits the Simmons LED website.
  2. The user is allowed access if they are a Simmons resident.
  3. A message gets entered to the website.
  4. The website profanity filters the message.
  5. The website saves the message, with the current time, into the database.
  6. The website asks the database how many messages there are in line.
  7. The website tells the user when their message will be displayed.
  8. The display service repeatedly checks the database for new messages.
  9. The display service displays the oldest message for two minutes, then marks it as displayed.
The purpose of the database is to buffer the inputs, in case tons of users post messages all at once.

The Django code is shown below:
Note: this code doesn't contain the profanity filter. I am planning on adding this as a validator in the message model.

The (very messy) html templates are coming soon.

Wednesday, July 8, 2009

3D Printing

Even though 3D printing seems like something out of a science fiction novel, the technology has been around for quite a few years now. Machines exist today which can take computer-generated models such as the mouse shown below (from Wikipedia), and turn them into real-life objects.
The quality of these printed objects is on par with most other manufacturing and prototyping processes and their ease of use is unparalleled. Companies like Dimension, Objet and Desktop Factory all make and sell plug and play 3D printers.

If this technology is so great and easy to use, why doesn't everyone have a 3D printer? The problem is cost. The cheapest printers made by Dimension and Desktop Factory cost around $5000! These companies are advertising these machines as breakthrough devices, and yet they cost 10% of the average US household income.

These companies need to take a step back if they want to put a 3D printer in every home in America. A (completely unscientific) survey I conducted recently on the Amazon Mechanical Turk suggested that people would jump at the opportunity to buy a desktop 3D printer in the price range of $200-$500, even if it meant lower print quality than the existing machines.

A device like this would be invaluable for home use. Coupled with an online object library such as Thingiverse and the rise of easy to use 3D CAD software like Google Sketchup, home repairs will be easier than ever. The low cost would make itself up in fewer trips to the hardware store in no time. Need a new kitchen utensil? Break the battery cover off of your TV remote? No problem. Click print and you're all set.

This is the exact goal of the RepRap project: to put an inexpensive 3D printer in every home. This group has made tremendous progress in designing and building self-reprodudicing 3D printers: printers that can print other printers. This idea is pretty alluring; imaging building a device that can build a copy of itself. The growth would be exponential.

The only problem is that the technology to print parts like electronics and motors simply doesn't exist yet. This means that at best, the RepRap can currently print around 60% of its own parts. Unfortunately, a 3D printer that can print 60% of its parts is just as useful as one that can print 0% of its parts. Until this technology arrives, I feel the RepRap will be stuck in engineering labs and hobbyists basements.

This is where a project I've been working on since last January comes in. Geoff Tsai and I have been working to design and build a low-cost printer ($200-$500), similar to the RepRap, that will be manufactured using traditional techniques rather than 3D printing. We built our first prototype in May and were successful in printing a few small test parts. I'm working on improving our software and improving the reliabilty of our machine.

I hope to have a second prototype completed by the end of the fall. More updates will be coming!

Tuesday, July 7, 2009

Gmail Out of Beta

As a Gmail user for the past four years, it's pretty hard to believe that Gmail is actually out of beta. I guess it's a good thing Google decided not to wait until Duke Nukem Forever came out. I wonder what this means for the future of the internet? Regardless, I'll be using Gmail with the Beta tag enabled. I fear change.

Monday, July 6, 2009

CNC Line Following Algorithm

This is a temporary break to my series on the Simmons Hall LED Display. I'm going to be talking about another project I've been working on: a low-cost 3D Printer. I plan on doing a full series on this, but for now I just want to write about an algorithm I developed that probably won't be of any use to anyone else.

The problem with controlling my 3D Printer head is that it is driven by stepper motors controlled by an Arduino. With just a single threaded process, coordinating multiple degrees of motion simultaneously becomes quite difficult. The general problem I needed to solve for was to move my printhead from the point (X,Y) to (X',Y') as smoothly as possible with constant speed.

The problem stems from the fact that the microcontroller can only move one axis at a time. Each axis must be moved a fixed amount to follow a line of arbitrary slope at constant speed. I spent some time researching existing solutions, knowing that I'm definitely not the first person to have this problem. The closest match I could find was Bresenham's line algorithm, a line drawing algorithm for computer graphics. The image below shows the result of this algorithm (from Wikipedia).

Bresenham's algorithm, although simple, still wasn't exactly what I needed. Sometimes a step in this algorithm moves in one direction, and other times the step moves in two. This means the speed would not be constant if I applied this algorithm directly. Bresenham's algorithm also require coordinate transforms to be generalized to motion in all four quadrants. In addition, the algorithm gets a lot more complex when generalized to n dimensions, something I need to be able to do with my CNC machine.

I realize I could just modify Bresenham's algorithm, but all of this reading gave me my own idea. Here's the summary of my algorithm, in two dimensions:
  1. Treat each step as a decision
  2. The options are each direction: up, down, left, right
  3. Take a fake step in each direction from the current position
  4. Compute the cartesian distance from each new position to the goal
  5. Take a step in the direction that minimizes this distance.
I'm sure there are a ton of ways to optimize this algorithm. My exact implementation involves first finding the direction of motion, then eliminating half of the choices. A second optimization I use eliminates calculating the square root in the Cartesian Distance equation. Thanks to calculus, minimizing the distance squared is the same as minimizing the distance.

Here's some code in Python to run the algorithm:
def stepFromCurrentToGoal(z0, y0, x1, y1):
#base case
if (x0==x1 and y0==y1):

xIter = 1 if (x1 > x0) else -1

yIter = 1 if (x1 > x0) else -1

#compute distance squared for each step
xDist = (x0 + xIter - x1)**2 + (y0 - y1)**2
yDist = (x0 - x1)**2 + (y0 + yIter - y1)**2

#step to minimize distance
if xDist <= yDist:
print "Step in X From: " + str(x0) + " to: " + str(x0+xIter)
print "Step in Y From: " + str(y0) + " to: " + str(y0+yIter)

#keep going recursively
stepFromCurrentToGoal(x0, y0, x1, y1)
Let me know what you think.

HooPaid on Facebook in Public Beta

A web/mobile application I've been working on for almost a year now is now going into public beta. The app, HooPaid, is a social-mobile transaction management system for groups of friends. The basic idea is this:

You and your friends/roommates/acquantainces go out to dinner a couple times a month. Your friend Joe always forgets his wallet because he comes to dinner straight from track practice, Sue wants to pay with her credit card for the airline miles she uses to go visit her sick grandmother across the country, Dave only carries twenties, etc... At the end of the meal, someone always owes someone else. The goal of HooPaid is to capture these casual transactions and alleviate the stress associated with social lending.

You no longer have to scribble IOUs on napkins and post-it notes, only to lose them and have no idea how much money you're owed or you owe. HooPaid also consolidates transactions between you and your friends, so you only need to worry about the running tab. I plan on posting a series of articles on the implementation of HooPaid, but for now I would really appreciate any help testing the application. You can add the application on Facebook here.

There are a few known bugs that I'm working on fixing, but don't hesitate to report anything you find.

Simmons LED Display Part 3: Software Implementation

This is part 3 of my series of posts on the Simmons LED Display. I'm going to discuss my implementation of the control software used to operate the display.

The software stack is comprised of four parts: a web service used to input messages, a cron-type service used to keep checking for new messages to display, a driver service used to convert the message strings into synchronized pin timings and the microcontroller code used to actually turn on and off the LEDs.

Even though the C/C++ environment of the Arduino is super-easy as far as microcontrollers go, I chose to keep most of the logic on the computer running the web service. This meant I could implement the hard code in Python, making my life much easier.

An overview of the software stack is shown to the left. It might look like a mess: that's because it is one. I'll attach all of the source code files soon, as well as instructions on how to use them.

Coming Next: Django App Sources

Saturday, July 4, 2009

Simmons LED Display Part 2: Hardware Implementation

I chose to use the Arduino microcontroller for this project for many reasons. They're relatively cheap, easy to program and even easier to use. You literally just plug the device into your computer via a standard USB cable and you're ready to program. The Arduinos each have 14 total I/O pins, two of which are used by the onboard serial communications chip. This means you have 12 free pins to play with on each device, if you still want to communicate with a computer.

I needed 36 total pins (6 by 6), and I wasn't confident enough to try to learn how to build a multiplexing circuit, so I just decided to use multiple Arduinos. This made even more sense considering I needed to light up LEDs on multiple floors of my building, and wanted to avoid long cable runs. For symmetry, I bought 4 Arduinos, to have 2 on each floor.

After several unsuccessful attempts to make the power boards on Radio Shack stripboards, I finally discovered Winford Engineering. They carry excellent prototyping products. Some pictures of the completed power boards are shown to the left. Each power circuit is just one MOSFET and one 2 Ohm Resistor. The Digikey part numbers I used are: IRFZ14PBF and 23J2R0E. The schematics are coming soon.


Simmons LED Display Part 1: Architecture

This is the first post of a series documenting the Simmons Hall LED Display I worked on in September 2008. The idea for the project came from an earlier project by Dheera Venkatraman. Over the years, Dheera's display fell apart. I decided to reimplement the 6 by 6 Luxeon LED Display as a way to learn a little bit more about electronics and microcontrollers.

The display spans two floors of Simmons Hall, an undergraduate dormitory at MIT. Housed in the student gym, the display is visible far into Boston at night and is capable of scrolling most alphanumeric characters to display messages.

Just a little warning before I get into the architecture of this project: I know very little about electronics, so most of the decisions I made about the implementation of this display were uneducated guesses and probably represent the worst way to do most things. Now, onto the design.

I wanted the display to be very easily programmable. This meant that anyone should be able to change the message without needing to know how to program a
PIC or rewire the circuit. I also wanted to eventually build a web-interface for the display, so people could change the message in real time without having to even see the circuitry.

I also wanted to keep the electronics simple and expandable in case someone wanted to maintain it later. With these things in mind, the general architecture I arrived at is shown to the left.

I would have a computer connected to multiple microcontrollers, each controlling multiple LED's. The computer's role would be to decide which LED's needed to be turned on and for how long to print a given message. The computer would then tell the microcontrollers to turn pins on and off to form messages. The microcontrollers would act as slaves, listening to the computer for instructions.
Cracking open my copy of The Art Of Electronics, I found a small problem. The Luxeon LEDs I planned on using drew a full amp of power each. No microcontroller I could find could source that much current. I got some help from a couple electrical engineers and designed a power circuit for the LEDs that could be operated by a TTL signal from a microcontroller. The architecture then changed to what is shown to the left.

The microcontrollers toggle MOSFETs on the power boards which light up the LEDs. I wanted to place a power board on each floor to avoid running my own power cables too far.

First Post

I'm planning on using this blog to document what I do for fun as well as to collect my thoughts on the current goings-on in today's engineering world. I'm going to start with a backlog of the projects I've worked on over the past couple years both as an undergraduate at MIT and just for fun. Look for some documentation of my projects coming soon.