Codiqa is the fastest mobile prototyping tool around. Learn more
0 Flares 0 Flares ×

In Growing a Data-Driven Startup – Part 1 I discussed using MongoDB instead of traditional RDBMSs with SQL to aggregate and process statistical data for my startup, Codiqa.  In that post, I ended up with a rather long script that summed click counts and landing page signup counts to produce conversion rates for the various ad campaigns and referral sources we have been using.  In this post I've modified my original script to use MongoDB's Map/Reduce features to cut down on cruft and be more idiomatic.

The new script is here:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
var landings = db['landing'];

var before = new Date();
var result = landings.mapReduce(function() {
    emit(this.campaign, { count: 1 });
  },
  function(key, values) {
    var result = { campaign: key, count: 0, numSignups: 0, conversion: 0 };
    values.forEach(function(value) {
      result.count += value.count;
    });
    var signups = db['beta_interest'].count({ campaign: key });
    result.numSignups = signups;

    if(result.count == 0) {
      result.conversion = 0;
    } else {
      result.conversion = (result.numSignups / result.count) * 100;
    }
    return result;
  },
  {
    out: {
      replace: 'landing2conv'
    }
  }
);

// Result of map reduce operation is stored in a temporary collection
var resultCollection = db[result.result].find()

var totalLandings = landings.count();
var totalSignups = db['beta_interest'].count();
print(totalLandings + ' landings');
print(totalSignups + ' signups');

var totalConversionRate = (totalSignups / totalLandings) * 100;
print('Conversion rate: ' + totalConversionRate.toFixed(2) + '%');
print('Campaign conversions:');
resultCollection.forEach(function(obj) {
  var val = obj.value;
  print('t' + val.campaign + ': ' + val.conversion.toFixed(2) + '% [' + val.numSignups + '/' + val.count + ']');
});
var after = new Date();
var diff = after.getTime() - before.getTime();
print('nFinished in ' + diff + 'ms');

You'll notice that instead of manually grouping entries together, I've used the emit function to output entires that have the same campaign keys.  The reduce function then sums up click counts for landing entries grouped by the campaign key, grabs the total number of signups for this campaign, and calculates a conversion rate for the campaign.

When running a map/reduce job in MongoDB, output rows can be either stored in memory or in a collection.  In this case I've decided to output my rows to a collection that will be replaced with the results of the map/reduce operation each time the script is run.  I enjoy how natural it feels to use collections as an output destination for the map/reduce operation.

One pitfall of this script is that execution time has gone up quite a bit (about 3x). Adding an index to the campaign field helped decrease execution time, and there are some other places I could optimize. The map/reduce operation will scale better given larger and larger data sets, and I find that the new script has a satisfactory blend between performance, readability, and maintainability.

Max

Hi, I'm Max, Co-founder of Codiqa, the easiest way to build jQuery Mobile prototypes. I'd love to talk with you: follow me!

More Posts

0 Flares Buffer 0 Twitter 0 Facebook 0 Google+ 0 LinkedIn 0 0 Flares ×
Share →
Buffer