Every Day We’re Shuffling…

A friend of mine is working on a shuffling algorithm. Here’s what he’s got:

int *shuffle2(int numItems)
  //int index[numItems];
  int *randomIndex = malloc(sizeof(int)*numItems);
  int i,j;
  for(i = 0; i < numItems; i++)
      //index[i] = 0;
      randomIndex[i] = -1;

  int randomNum = rand()%numItems;
  for(j = 0; j < numItems; j++)
      while(randomIndex[randomNum] != -1)
	  randomNum = rand()%numItems;//random(0, numItems-1);
      //index[randomNum] = 1;
      randomIndex[randomNum] = j;
  return randomIndex;

It has certain problems. I’ll cover those later. After I figure them out.

But until then, I will study Chi-Squared tests.


Posted in Uncategorized | Leave a comment

WikiRank (using PageRank algorithm)

So for grins I implemented pagerank on Wikipedia (this was actually last month). I figured I’d share my results (and code, although it’s kind of a hack job) in case anyone was curious how it turned out.


The code is here: http://pillow.rscheme.org/pagerank-src.tar.gz


There are two folders: pagecount/ contains code to download the November page view data and aggregate it. pagerank/ contains all the code to parse the Wikipediaenwiki dump, parse it, and run pagerank on the result.


If you run:

gcc -O2 sparsematrix.c pagerank.c main.c -lpthread -lm -o pagerank

Inside pagerank/ you will get an executable file called “pagerank”, which will perform the pagerank algorithm on an arbitrary network specified in the file “network”. There’s a small test network in the file testdata, which is the one Dr. Cline showed us in class (in the slides). It’ll output the pagerank data as a vector, one element on each line, into the file vector-out.


The file toolchain.sh shows how I ran it. On the Wikipedia data you need about 20GB of disk space and 16GB of RAM to run everything comfortably. It ran in about 2-3 hours.


Final results are here, in a CSV file: http://pillow.rscheme.org/wikirank-results.csv.gz (warning: 286MB compressed, a 20 minute download or so). Also, it’s bigger than most spreadsheet programs will accept.


There are three columns, the name of the article, the expected rank according to pagerank, and the actual rank based on page view data from November 2014.


The pagerank algorithm worked by multiplying the markov matrix by a vector 1000 times. I’m not sure if that’s enough, but the eigenvector wasn’t changing by a measurable amount (I measured the angle between successive iterations), so I assumed it was good.


Okay, now that the technical stuff is over, now some pictures. I made a plot, where X was the expected rank from pagerank and Y was the actual rank in November, then made a density chart out of that. In this image, red means more points and blue means fewer points:

Odd little graphic (The Z axis is the log of the number of points in the cell). You would expect it to be square, except that a lot of pages didn’t have empirical pagerank data attached to them, so they had to be cut out. The scale is 512 = 16,000,000


Ideally, if the pagerank algorithm perfectly predicted the actual page ranks, then the graph would be white except for a bright red line along Y=X.


We don’t see that, exactly. We see a sort of plume in the lower left corner, which in general follows a line. This plume shows that pagerank in general got the higher-ranked pages right, and in general didn’t guess that unpopular pages would be popular or vice versa.


Over towards the right, we see a few bands of lighter blue. These are presumably because certain pages on Wikipedia aren’t linked to often, and aren’t visited very often either (it’s not hard to think of an example or two). I can imagine there are some clusters of pages which are like this – perhaps bridges in Iran, or Gymnastics competitions in China. These clusters would form those vertical bands.


Anyway, here’s a zoomed-in image of the plume:

As you can see, it’s slightly more likely that a page that’s popular in reality gets ranked lowly by the pagerank than it is for a page that pagerank expects to be popular gets rankly lowly in reality. (remember, lower ranks are more popular)


I would imagine that a lot of the error would be because Wikipedia’s traffic is driven primarily by what’s happening in the news, rather than networking effects like the internet’s traffic.


Anyway, I guess the result of all this is that pagerank actually works. It may still be magic, but it’s magic which actually works. Also a working sparse matrix pagerank program, if you ever need one.

Posted in Uncategorized | Leave a comment

Farewell, 2014

Happy New Year! (in about 30 minutes, at least)

Goodbye, 2014. It was nice knowing you, but frankly I’m glad we’re moving on.

2015 should be a blast. Let’s wait and see!


Posted in Uncategorized | Leave a comment

Destination: Italy


Average food consumption per person per day: 5.26 lbs

Average water consumption per person per day: 4.18 lbs

Average mass consumption per person per day: 9.44 lbs

Someone will consume 94 pounds over the course of 10 days. Give or take.

Hypothetically, if you had to ship a friend to Italy, it would take 10 days by UPS, and the box would weigh roughly 224 pounds. If they fit inside a 3 foot cube box, which would be a bit cramped but survivable, then the billable weight is 282 pounds. As long as you stay under that, the trip from Austin, TX, USA to Salerno, Italy would cost $553 and take about 8 days.

That’s assuming that the person in question was worth 1 dollar. The EPA values a human life at $5.5 million dollars, but UPS won’t insure that much.

If I value my shippable friend at $1 million, the price rises dramatically to $8996. An airline ticket costs about $1500 from the US to Rome, so the break even point there is when the person in question was worth $120 thousand. Then the air freight cost would equal $1500 or so also.


Posted in Uncategorized | Leave a comment

Spooktacular Pillow Renovations!

And by “renovation” I really just mean “it’s back to working.”

Have a great rest of October!


Posted in Uncategorized | Leave a comment

Signed Integer Arithmetic Doesn’t Always Overflow

I’m in a computer architecture class at UT. This is fine and dandy. Recently we got a homework/lab, and some people started noticing that one of the reference functions didn’t perform correctly on all machines. Here’s the offending code:

int test_fitsBits(int x, int n)
  int TMin_n = -(1 << (n-1));
  int TMax_n = (1 << (n-1)) - 1;
  return x >= TMin_n && x <= TMax_n;

This function is supposed to return 1 if the number x can be represented as a two’s-complement number with n bits, and 0 otherwise. For instance, test_fitsBits(3, 2)=1 and test_fitsBits(-3, 2)=0

Can you spot the error? It took me an hour to determine what the error was and then another hour to determine that the error wasn’t a compiler error. If you enjoy hunting for bugs like you enjoy a good murder mystery novella, read on. Otherwise, you might want to stop here, because this is the tale of how I learned that two’s complement is not required by C spec.

So I’m getting errors on Ubuntu when I compile with clang3.5 using “-O -Wall”, but not when I use gcc 4.8 on the same machine. So there’s gotta be some sort of optimization thing going on there.

Oh, but check this out. From tests.c, this is how GCC compiled the fitsBits function:

        leal    -1(%rsi), %ecx
        movl    $1, %eax
        sall    %cl, %eax
        movl    %eax, %ecx
        leal    -1(%rax), %eax
        cmpl    %eax, %edi
        setle   %al
        negl    %ecx
        cmpl    %ecx, %edi
        setge   %dl
        movzbl  %dl, %edx
        andl    %edx, %eax

And how LLVM did the same thing:

test_fitsBits:                          # @test_fitsBits
# BB#0:
                                        # kill: ESI<def> ESI<kill> RSI<def>
        leal    -1(%rsi), %ecx
        movl    $1, %eax
                                        # kill: CL<def> CL<kill> ECX<kill>
        shll    %cl, %eax
        movl    %eax, %ecx
        negl    %ecx
        cmpl    %edi, %eax
        setg    %al
        cmpl    %ecx, %edi
        setge   %cl
        andb    %al, %cl
        movzbl  %cl, %eax

The code was, of course:

int test_fitsBits(int x, int n)
  int TMin_n = -(1 << (n-1));
  int TMax_n = (1 << (n-1)) - 1;
  return x >= TMin_n && x <= TMax_n;

So the LLVM code is setting %al to 1 if %edi > %eax. Then it sets %cl if %ecx >= %edi. Then it ANDs %al and %cl and returns the result.


  • %edi is the argument x
  • %cl (and %ecx) is the argument n – 1 (via the lea instruction)
  • sets %eax to 1<<%cl
  • copies %eax to %ecx
  • (arithmetically) negates %ecx (so %ecx = ~%ecx + 1)
  • sets %al if %eax > %edi (this is the “x <= TMax_n” comparison)
  • sets %cl if %edi >= %ecx (this is the “x >= TMin_n” comparison)
  • returns %al AND %cl

GCC, on the other hand:

  • %edi is the argument x
  • %cl (and %ecx) is the argument n – 1 (via the lea instruction)
  • sets %eax to 1<<%cl
  • copies %eax to %ecx
  • Subtracts 1 from %eax (via the lea instruction)
  • sets %al if %edi <= %eax (this is the “x <= TMax_n” comparison)
  • (arithmetically) negates %ecx (so %ecx = ~%ecx + 1)
  • sets %dl if %edi >= %ecx, and copies the result to %edx padding with zeros (this is the “x >= TMin_n” comparison)
  • returns %edx AND %eax.

So let’s get down into the nitty-gritty. What are the differences?

  • GCC subtracts 1 from %eax before checking if %edi <= %eax.
  • LLVM doesn’t subtract 1, but instead checks for %eax > %edi.

Which are the same (the case where %eax is Tmin will never happen).

Hrm, that didn’t pan out. Well, time to look at the errors. The errors that occur with LLVM only occur when n==32==0x20.

Time for some traditional debugging. I added this printed right before the return in fitsBits:

printf("%i %i %i %i\n", TMin_n, TMax_n, x>=TMin_n, x<=TMax_n);

And what do we get? The errors still occur in LLVM, so that’s a good sign. However, check this out:

-2147483648 2147483647 1 0
Test fitsBits(1[0x1],32[0x20]) failed.
  Gives 1[0x1].  Should be 0[0x0]

So it’s checking to see if 1>=-2147483648 and getting 1, which is good. Then it checks to see if 1<=2147483647, but it’s getting 0. What gives?

Alright, back to the assembly output. Some differences between them:

  • LLVM’s check is setting %al if %eax > %edi.
  • GCC sets %al if %edi <= %eax.

So let’s say n is 32. Then %cl is 31. %eax becomes 1 shifted 31 times, or 0x80 00 00 00. This is where trouble begins – keen eyes will notice that this is a large negative number. INT_MIN, to be exact. GCC subtracts 1 from this number to get 0x7F FF FF FF, which is a large positive number – in fact, all 32 bit numbers are less than this. However, LLVM optimized away the -1 and did the comparison directly against a large negative number.

This leads us to the simplest reproducible manifestation of the bug: Subtracting 1 from a number, and checking to see if it’s <= something. Like so:


int fn(int x, int n)
  int v = x - 1;
  return n <= v;

int main(int argc, char **argv)
  int x = INT_MIN;
  int n = 14;
  if (fn(x, n)) {
  } else {
  return 0;

And the output:

lane@beryllium:~/ut/fall2014/CS429/lab1$ gcc -O -Wall llvm-bug.c 
lane@beryllium:~/ut/fall2014/CS429/lab1$ ./a.out 
lane@beryllium:~/ut/fall2014/CS429/lab1$ clang -O -Wall llvm-bug.c 
lane@beryllium:~/ut/fall2014/CS429/lab1$ ./a.out 

So this could be either a weird quirk in the spec, or a bug in the LLVM compiler. We check without using the -O flag:

lane@beryllium:~/ut/fall2014/CS429/lab1$ gcc -Wall llvm-bug.c 
lane@beryllium:~/ut/fall2014/CS429/lab1$ ./a.out 
lane@beryllium:~/ut/fall2014/CS429/lab1$ clang -Wall llvm-bug.c 
lane@beryllium:~/ut/fall2014/CS429/lab1$ ./a.out 

Hrmm… Looks like the C spec is on our side here. Unless there’s something undefined, as Mr. Selfridge pointed out could be the case.

Unfortunately, the C99 specification costs 198 Swiss Francs to download, and I have 0 Swiss Francs.

Fortunately, things happened and my formal verification friends lent me their copy. At least that’s the story I’m going by.

Quick digression: C99 only specifies up to 63 nested levels of parenthesis within a full expression or declarator. Or 63 significant initial characters in an internal identifier or macro name. So if you have two macros, each named 64 a’s followed by a “1” or a “2”, respectively, then according to the C99 specification they are the same name. Also, you can’t have more than 1023 members in a single structure or union.

Un-digress. Note that the C99 spec limits for ints are that ints are at least 16 bits, longs are at least 32 bits, and long longs are at least 64 bits.

Also note that the unsigned overflow is defined as being the modulo. We knew that.

If you have a copy of ISO/IEC 9899:1999, you can follow along. Note in section that two’s complement is not required for representing integers. If we go back to section we see that the limits for int are -32767 to 32767, where a 16-bit two’s complement representation would be -32768 to 32767.

According to the definition of addition (6.5.6(5)), “the result of the binary + operator is the sum of the operands.” Thank you, I didn’t know that before I read that. I feel enlightened now.

But then I found my search button, and went to 3.4.3(3), an example of undefined behavior. They say that “an example of undefined behavior is the behavior on integer overflow.”

Go back to 6.2.5(9), which says that “a computation involving unsigned operands can never overflow.”

Go to the way bottom, section H.2.2(1) states that “The signed C integer types […] are compatible with LIA-1. […] An implementation that defined signed integer types as also being modulo need not detect integer overflow.” (emphasis added)

Alright, so what’s the take? C99 doesn’t define how signed integer overflow works. So all that stuff we learned in class may or may not apply, and you just need to know when it does and when it doesn’t. Not that anyone’s read this far.

Going to the “C Extensions to the C Language Family” page for GCC, we learn that they do not define signed integer overflow either.

LLVM also doesn’t define it.

So it’s not a compiler bug. Mr. Selfridge was correct. I guess if you’re a part of the formal verification group you get good insight into these things.

And so we see that GCC also has the same bug:

lane@beryllium:~/ut/fall2014/CS429/lab1$ gcc -O1 -Wall llvm-bug.c 
lane@beryllium:~/ut/fall2014/CS429/lab1$ ./a.out 
lane@beryllium:~/ut/fall2014/CS429/lab1$ gcc -O2 -Wall llvm-bug.c
lane@beryllium:~/ut/fall2014/CS429/lab1$ ./a.out 

And so, from the analysis above, I suggest that fitsBits in test.c be changed to the following code for uniformity across C optimizers:

int test_fitsBits(int x, int n)
  int TMin_n = -(1 << (n-1));
  int TMax_n = (1 << (n-1)) - 1;
  if (n >= (sizeof(int)<<3)) return 1;
  return x >= TMin_n && x <= TMax_n;


Posted in Uncategorized | Leave a comment

Flickr Image Resolutions

Out of a sample of 6511 random cat images taken from Flickr, 640×640 represents 31% of all image resolutions.

Below is my chart covering the cumulative distribution of the 731 distinct image resolutions:


As we can clearly see, 10% of resolutions make up 75% of images on Flickr.

Also, the average size of a Flickr image is 1.7MB. They recently announced that they have 6 billion images, for a total of 10PB of images. 10PB of images would use 1,700 WD60EFRX 6TB drives, which retail for $300 each (and yes, you can buy 6TB drives off the shelf). The total cost would be $510,000 for a single set of drives, not counting the computers to run them, and different image resolutions, and replica sets, and so forth. Google did a study in 2007 which found that hard drives which are constantly spinning have an expected lifespan of around 5 or 6 years (you have to do some extrapolating to get that, and it’s an estimate). That comes out to 24 expected failures per month, or $7,200 per month in replacements.

We can assume that Flickr has around 10 datacenters, as a sort of fermi approximation (and CDN caching, etc). If the computers to run the drives cost about as much as the drives itself (which seems reasonable), then we get a total bill of $10 million to setup all of the datacenters, plus $72,000 per month in replacement drives, plus $1 million/yr in staff and $1 million/yr in housing.

Amazon S3 sells data for $0.0275/GB/month over 5PB/month. That comes out to $280,500 per month, or $3.4 million per year. Slightly cheaper than the in-house datacenter.

What is the point of all this? Compression. Every 1% of compression that they can eke out of their images saves them at least $27,000 per year, and that’s a number that’s only growing (unless Flickr folds). So if you can manage to make a JPEG 3% smaller, you could make a hundred grand a year for the rest of your life off the savings.

Flickr is but a small fish in a big pond. According to Quora, Facebook currently has 90 billion pictures, 15x the size of Flickr, and adding 6 billion pictures per month. Someone is uploading all of Flickr to Facebook every month. Using our numbers from above, Facebook spends $5 million per month on new hard drives to put pictures on, not counting replacements. A 1% improvement saves them $50,000 per month.

Something to think about, next time you run a large datacenter.


Posted in Uncategorized | Leave a comment

Dear Prior Owner of Eric Foner’s “Give Me Liberty!” Second Edition Textbook

Dear Prior Owner of Eric Foner’s “Give Me Liberty!” Second Edition Textbook:

Your note was received intact, whoever you are. And you’re right.

photo (4)It did make me smile. Thank you.


Posted in Uncategorized | Leave a comment

REST Performance

Rest performance: It’s a measure of how much rest you can get in a given span of time.

REST performance: It’s a measure of how much REST you can get in a given span of time.

Similar, but subtly and importantly different. My tool will measure the second, but not the first.


Do, check it out. Here’s the sample perf.def.js file, broken up into bite-sized chunks with explanations.

var perf_utils = require(PERF_UTILS_PATH);
var mkpoisson = perf_utils.mkpoisson;

Note that the .def.js file is just a NodeJS file, like any other. It gets called with the global PERF_UTILS_PATH variable, which defines the path to the perf utils file. This file contains several one useful utilities, such as mkpoisson. You’ll see how this is useful later.


var ClientInstance = function() {
    this.client_id = "";

This defines what consistant state is. perf-rest works by maintaining a whole bunch of instances, and moving them between different states according to rules that we’ll define below.  One way to think of this is sort of like cookies on a web browser – your web browser may make multiple requests to a web site, but the cookies stay roughly constant. For example here, I’ve defined a variable “client_id” which is a unique ID which the server gives to us.


var ClientRequests = {
    "base": {
	hostname: "localhost",
	port: 3000,
	headers: {
	    "Content-Type": "application/json"

This is where we start defining the different possible requests that can be made to the REST server in question. Note that this is not a complete request: it is only a base request which other requests build upon. Much like an object-oriented language (I realize that my audience is going to be more and more Java-oriented, sadly, so you can think of it like Java where classes can subclass other classes).

    "authenticated": {
	oncall: function(state, params) {
	    this.headers = {client_id: state.client_id};
	parent: "base" // The parent request

Woah! I was just talking about subclassing requests! And here we are. An “authenticated” request has all of the fields of a “base” request.

It also has the oncall function defined. This function is run when the request is run, and is given the instance state. This state is used to create a header. After the oncall function is run, the headers are merged with the headers defined in the base request. Headers are the only thing that are merged; everything else is overwritten. For instance, if the authenticated request had specified a hostname, then it would overwrite the one in it’s parents.

    "login": {
	parent: "base",
	path: "/login",
	method: "GET",
	onfinish: function(res, state) {
	    state.client_id = res.client_id;
	    //console.log("We are client "+state.client_id+"!");

Notice the new onfinish function. This function is called with the parsed response from the request. Note that it’s parsed as JSON: If the remote server doesn’t respond in JSON, then that’s so sad for you. File an issue on Github and I’ll fix it.

The onfinish function, as you can see here, can be used to update state from the server’s response.

    "perform_expensive_action": {
	oncall: function(state, params) {
	    var delayTime = Math.random()*10.0;

	    // Look! You can specify these in the oncall function
	    this.path = "/delay";
	    this.body = {delaytime: delayTime};
	parent: "authenticated",
	// path is specified in the oncall function
	method: "POST"

Hey, you figure it out. Let’s move on to how these requests apply.


…not like Alabama or Texas. Like Markov.

var ClientStates = {
    "preinit": { // Stores the delay for before init is called
	delay: mkpoisson(60.0)

There are three special states, preinit, init, and exit. You will see each of these.

Preinit specifies the delay between when the program starts and when the init state is called. Note here the mkpoisson we noticed before. “delay” can be either a number or a function: If it’s a number, then that exact literal number of seconds is used as the delay. If it’s a function, then that function is called and the return value is used as the number of seconds in the delay. What mkpoisson(x) does is return a function which returns a poisson random number centered on x. That is, by setting delay to be “mkpoisson(60.0)”, we have said that the delay for preinit is a poisson-distributed random number centered on 60.0.

    "init": {
	request: "login",
	transition: [{ // Specify the transition after the request
	    prob: 1.0,
	    dest: "delay_action"

The second of the three special states. When an instance enters the “init” state it is said to be “active” or “alive”, and appears in the output’s instance (“client”) count.

That’s the only special thing about init. Otherwise, it’s just a normal state. “request” specifies which of the above requests gets used. The request gets sent once, when the state is entered. If you don’t want to send a request, use the placeholder “noop”.

“transition” defines which state we transition to after the request is completed. It is a list of probabilities, delays, and destinations. One of these objects is picked out of the list based on the flat probability distribution defined by the “prob”. Note that the sum of the “prob” fields must add to 1, or else undefined behavior occurs.

Once one of the paths is selected, a delay is inserted according to the “delay” field. Obviously, in this instance, the “delay” field is omitted, so the delay is taken to be zero (that is, an immediate transfer to the next state). The “destination” field specifies the name of the state to move to.

    "delay_action": {
	request: "perform_expensive_action",
	transition: [{
	    prob: 0.95,
	    delay: mkpoisson(30.0),
	    dest: "delay_action"
	}, {
	    prob: 0.05,
	    dest: "exit"

Here, we can see the delay field in use. After the request “perform_expensive_action” is performed, there is a 95% chance that after a delay (defined by a poisson distributed random number centered at 30.0) the instance in question will move to the “delay_action” state. Which, by happenchance, is this state. So it’s a loop. With a 5% probability, the instance will immediately move to the “exit” state. While fairly self-explanatory, I feel I should spell it out. When the exit state is reached, the instance is no longer included in the output’s count of instances/clients, and nothing more can be done.

One last thing before I finish this file.

module.exports = {
    Requests: ClientRequests,
    States: ClientStates,
    Instance: ClientInstance,
    numClients: 1000

This exports the requests, states, instance definition, and the number of clients.


Here I’ve described in pretty good detail my perf-rest project’s definition files. The actual usage of the program is described on the Github page. If you have any questions or comments, please feel free to email me or track me down or file an issue with the Github issue reporter. Also, pull requests are cool. Or requests for support. Or just drop a line and say you used it.


Posted in Uncategorized | Leave a comment

3d Photoshop

Recently, a group of researchers at CMU developed a program which uses off-the-shelf 3d models to edit images in 3d. I’m terrible at explaining, so here’s a link to the thing: (also so I don’t lose it…)


Also, new panorama!


The camera mount changed to be a servo ziptied to a block of wood. My suspicion is that the wood absorbs most of the high-frequency interference we saw with the prior panorama.


Posted in Uncategorized | Leave a comment