Building a pricing screen which reflects local currency

different-currency-notes-money.jpg

On my morning walks, I have been enjoying listening to SaaS related podcasts, and yesterday morning, I was listening to Jane Portman's "UI Breakfast" podcast in which she was talking to Rob Turlinckx about SaaS Pricing pages.

Now, we have just done a revamp of our own HR Partner pricing page, which actually meets most of the suggestions of what they talked about in the podcast, i.e. offering multiple currencies, showing the user's local currency automatically upon loading, showing monthly vs annual pricing etc.

HRP Pricing Page.png

Our page is quite complex (but thankfully I have a great UX designer on my team who made it as easy to use as possible), but I thought I would expand upon a couple of things that we did on there in order to display pricing in various currencies, and most importantly, how we detected the user's location and showed them the relevant pricing in their own local currency (or else defaulting back to USD if they were in a location outside of our usual pricing).

What I have done is to put together a simple pricing page on Github which you are welcome to explore and dissect.  My aim was to be able to achieve localised pricing with (a) minimal javascript code, (b) doing it all on one page and (c) for absolutely FREE - no calling upon expensive landing page or A/B testing services to generate different pricing pages at all!

This is what it looks like (yep, I'm a coder, not a designer!):

TEST Pricing Page.png

Take a look at it running live here: https://cyberferret.github.io/LocalCurrencyPricing/

Here is a Gist of the actual web page code:

<!DOCTYPE html>
<html lang="en">

  <head>

    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
    <meta name="description" content="">
    <meta name="author" content="">

    <title>Demonstration of dynamic currency display</title>

    <!-- Bootstrap core CSS -->
    <link href="vendor/bootstrap/css/bootstrap.min.css" rel="stylesheet">

    <!-- Custom styles for this template -->
    <link href="css/heroic-features.css" rel="stylesheet">

  </head>

  <body>

    <!-- Navigation -->
    <nav class="navbar navbar-expand-lg navbar-dark bg-dark fixed-top">
      <div class="container">
        <a class="navbar-brand" href="#">Widgets Inc.</a>
        <button class="navbar-toggler" type="button" data-toggle="collapse" data-target="#navbarResponsive" aria-controls="navbarResponsive" aria-expanded="false" aria-label="Toggle navigation">
          <span class="navbar-toggler-icon"></span>
        </button>
        <div class="collapse navbar-collapse" id="navbarResponsive">
          <ul class="navbar-nav ml-auto">
            <li class="nav-item active">
              <a class="nav-link" href="#">Home
                <span class="sr-only">(current)</span>
              </a>
            </li>
            <li class="nav-item">
              <a class="nav-link" href="#">About</a>
            </li>
            <li class="nav-item">
              <a class="nav-link" href="#">Services</a>
            </li>
            <li class="nav-item">
              <a class="nav-link" href="#">Contact</a>
            </li>
          </ul>
        </div>
      </div>
    </nav>

    <!-- Page Content -->
    <div class="container">

      <!-- Jumbotron Header -->
      <header class="jumbotron my-4">
        <h1 class="display-3">Hello there!</h1>
        <p class="lead">The pricing shown below should correspond to your location (or default to the US) like the pricing page on our <a href="https://www.hrpartner.io" target="_blank">HR Partner</a> site.</p>
        <a href="https://www.hrpartner.io/pricing.html" class="btn btn-primary btn-lg">See it in action!</a>
      </header>

      <div class="row text-center">
        <div class="col-lg-12 m-2">
          <p>Show me pricing in: </p>
          <div class="btn-group mb-3" role="group" aria-label="Select Currency">
            <button type="button" class="btn btn-secondary" onclick="displayPrice('USD');">USD</button>
            <button type="button" class="btn btn-secondary" onclick="displayPrice('GBP');">GBP</button>
            <button type="button" class="btn btn-secondary" onclick="displayPrice('EUR');">EUR</button>
            <button type="button" class="btn btn-secondary" onclick="displayPrice('AUD');">AUD</button>
          </div>
        </div>
      </div>

      <!-- Page Features -->
      <div class="row text-center">

        <div class="col-lg-3 col-md-6 mb-4">
          <div class="card">
            <div class="card-header">
              <h2 class="text-primary">FREE</h2>
            </div>
            <div class="card-body">
              <h4 class="card-title pricing USD_pricing">USD $0</h4>
              <h4 class="card-title pricing GBP_pricing collapse">GBP &pound;0</h4>
              <h4 class="card-title pricing EUR_pricing collapse">EUR &euro;0</h4>
              <h4 class="card-title pricing AUD_pricing collapse">AUD $0</h4>
              <p class="card-text text-info">Our free plan will suit the Scrooge McDuck's among you.</p>
            </div>
            <div class="card-footer">
              <a href="#" class="btn btn-primary">Find Out More!</a>
            </div>
          </div>
        </div>

        <div class="col-lg-3 col-md-6 mb-4">
          <div class="card">
            <div class="card-header">
              <h2 class="text-primary">Basic</h2>
            </div>
            <div class="card-body">
              <h4 class="card-title pricing USD_pricing">USD $10</h4>
              <h4 class="card-title pricing GBP_pricing collapse">GBP &pound;8</h4>
              <h4 class="card-title pricing EUR_pricing collapse">EUR &euro;9</h4>
              <h4 class="card-title pricing AUD_pricing collapse">AUD $14</h4>
              <p class="card-text text-info">This basic plan should get you kick started.</p>
            </div>
            <div class="card-footer">
              <a href="#" class="btn btn-primary">Find Out More!</a>
            </div>
          </div>
        </div>

        <div class="col-lg-3 col-md-6 mb-4">
          <div class="card">
            <div class="card-header">
              <h2 class="text-primary">Medium</h2>
            </div>
            <div class="card-body">
              <h4 class="card-title pricing USD_pricing">USD $50</h4>
              <h4 class="card-title pricing GBP_pricing collapse">GBP &pound;38</h4>
              <h4 class="card-title pricing EUR_pricing collapse">EUR &euro;42</h4>
              <h4 class="card-title pricing AUD_pricing collapse">AUD $67</h4>
              <p class="card-text text-info">For businesses that really need all the bells and whistles.</p>
            </div>
            <div class="card-footer">
              <a href="#" class="btn btn-primary">Find Out More!</a>
            </div>
          </div>
        </div>

        <div class="col-lg-3 col-md-6 mb-4">
          <div class="card">
            <div class="card-header">
              <h2 class="text-primary">Enterprise</h2>
            </div>
            <div class="card-body">
              <h4 class="card-title pricing USD_pricing">USD $200</h4>
              <h4 class="card-title pricing GBP_pricing collapse">GBP &pound;152</h4>
              <h4 class="card-title pricing EUR_pricing collapse">EUR &euro;171</h4>
              <h4 class="card-title pricing AUD_pricing collapse">AUD $270</h4>
              <p class="card-text text-info">If you have more money than Elon Musk, then this is the plan for you.</p>
            </div>
            <div class="card-footer">
              <a href="#" class="btn btn-primary">Find Out More!</a>
            </div>
          </div>
        </div>


      </div>
      <!-- /.row -->

    </div>
    <!-- /.container -->

    <!-- Footer -->
    <footer class="py-5 bg-dark">
      <div class="container">
        <p class="m-0 text-center text-white">Copyright &copy; Widgets Inc. 2018</p>
      </div>
      <!-- /.container -->
    </footer>

    <!-- Bootstrap core JavaScript -->
    <script src="vendor/jquery/jquery.min.js"></script>
    <script src="vendor/bootstrap/js/bootstrap.bundle.min.js"></script>

    <script>
      $(document).ready(function () {
        $.get("https://api.ipdata.co?api-key=e1173cbb1676c06b2136abfc7ca95e0c10b8ee98623bfbb0c47b0aaa", function (response) {
          var detectedCurrency = response.currency.code;
          displayPrice(detectedCurrency)
        }, "jsonp");
      });

      displayPrice = function(currency) {
        // First, lets hide all the current pricing
        $(".pricing").hide();
        // Is the currency within the valid range of currencies that we wish to show?
        if (currency !== null || (["USD", "AUD", "GBP", "EUR"].indexOf(currency) > -1)) {
          // If yes, then show the currency
          $("." + currency + "_pricing").show();
        } else {
          // If no, then just show USD pricing
          $(".USD_pricing").show();
        }
      }
    
    </script>

  </body>

</html>

 

If you want the full source code (with the Bootstrap and jQuery libraries etc. so you can test on your own server), then you can clone my code from my Github repository:  https://github.com/CyberFerret/LocalCurrencyPricing

Let's break down the code here.

Firstly, I am using a simple Bootstrap 4 page layout, which has a header block, then 4 columns for the pricing.  If you look at each pricing column though, I have included the 4 currencies that I want to show:

        <div class="col-lg-3 col-md-6 mb-4">
          <div class="card">
            <div class="card-header">
              <h2 class="text-primary">FREE</h2>
            </div>
            <div class="card-body">
              <h4 class="card-title pricing USD_pricing">USD $0</h4>
              <h4 class="card-title pricing GBP_pricing collapse">GBP &pound;0</h4>
              <h4 class="card-title pricing EUR_pricing collapse">EUR &euro;0</h4>
              <h4 class="card-title pricing AUD_pricing collapse">AUD $0</h4>
              <p class="card-text text-info">Our free plan will suit the Scrooge McDuck's among you.</p>
            </div>
            <div class="card-footer">
              <a href="#" class="btn btn-primary">Find Out More!</a>
            </div>
          </div>
        </div>

But have a look at the `collapse` class used in all but the USD pricing in the source code.  What this does is 'collapses' the other currencies so that they are not visible upon page load, instead showing you the USD pricing as a default.

I have also given all the pricing <h4> tags the class of `pricing` and `XXX_pricing` (where XXX is the 3 letter currency code for each locale).  You will see how I use these later to both hide and show the relevant pricing via a simple javascript function.

Now lets look at the javascript code at the bottom of the page.  There are two blocks we need to look at, namely:

      $(document).ready(function () {
        $.get("https://api.ipdata.co?api-key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", function (response) {
          var detectedCurrency = response.currency.code;
          displayPrice(detectedCurrency)
        }, "jsonp");
      });

This bit of code waits until the page is completely loaded, then goes off to the IPData.co free service to query the user's locale information, including the currency code associated with their locale.

This information is contained within the JSON response from the IPData service, and you can get to it via the response.currency.code variable.

Be warned that this call, even though it is done as a background asynchronous AJAX call, can take a few seconds to return a result - which is why we show the default USD pricing initially, rather than no pricing at all.  This can cause a disconcerting flicker upon page load, so you may want to show NO pricing on your own page as a default, which you can do by adding the `collapse` class to ALL pricing lines initially.  It is entirely up to you.

TIP: Just remember to replace the 'xxxxxxxxx' dummy API key above with your free one that you get from IPData.co!

The next bit of javascript is the one that manipulates the page DOM to show or hide the relevant currencies:

      displayPrice = function(currency) {
        // First, lets hide all the current pricing
        $(".pricing").hide();
        // Is the currency within the valid range of currencies that we wish to show?
        if (currency !== null || (["USD", "AUD", "GBP", "EUR"].indexOf(currency) > -1)) {
          // If yes, then show the currency
          $("." + currency + "_pricing").show();
        } else {
          // If no, then just show USD pricing
          $(".USD_pricing").show();
        }
      }

This function takes just one parameter, the 3 letter currency code, and then it:

  1. Hides ALL the pricing lines by default, then
  2. Checks the currency code to see if it is one of the 4 allowed codes on our page, then
  3. If it is allowed, shows the pricing which has the class of `[Currency Code]_pricing`, or
  4. Shows the default USD pricing lines.

 

One last thing - I realise that sometimes users may want to see the pricing in other currencies themselves, rather than in their local currency.  Most often when they want to compare the USD values against other services they use etc., so we should allow them the ability to do so.

Which is why I placed the button group between the header and the pricing boxes which asks for the currency they want to see.  Clicking on any of the buttons will call the `displayPrice()` function to show that locale's currency.

      <div class="row text-center">
        <div class="col-lg-12 m-2">
          <p>Show me pricing in: </p>
          <div class="btn-group mb-3" role="group" aria-label="Select Currency">
            <button type="button" class="btn btn-secondary" onclick="displayPrice('USD');">USD</button>
            <button type="button" class="btn btn-secondary" onclick="displayPrice('GBP');">GBP</button>
            <button type="button" class="btn btn-secondary" onclick="displayPrice('EUR');">EUR</button>
            <button type="button" class="btn btn-secondary" onclick="displayPrice('AUD');">AUD</button>
          </div>
        </div>
      </div>

 

That's it!  Pretty easy (and cheap), isn't it?  No need for a complex content management system, or PHP/Ruby scripting etc.  This can all be done on a free website hosting platform like Amazon S3 (which we use) or Github Pages etc. 

Have fun with it.  For my next post, I might showcase how to use a free foreign currency exchange API to dynamically calculate the other pricing based on that day's exchange rates!

 

Is verbosity helpful when designing app screens?

Apologies for the lengthy absence from posting on here.  Now that I have grown my HR Partner team a little, I have some spare time on my hands, plus some renewed motivation and energy to work on improving the system with them.

As a "programmer pretending to be a designer", I am always accused of making my applications screens just too verbose.  I tend to pepper the screen real estate with hints, tips and (what I think are) helpful snippets of information that will make the user's life easier.

Of course, when we did some real world UX testing a few months ago, I was astounded to see that most users simply didn't read the information presented to them, but instead would look for distinct CTA (call to action) links or buttons and try those out instead.

This has made me rethink my whole verbose strategy, and made me remove a lot of excess wording from many of our HR app's screens (with the able assistance and guidance of both my talented former and new UX designer).  Conceptually, this has been a hard thing for me to do - removing what I thought were helpful prompts, and replacing them with an image or single word link to our help pages.

However, there are some screens where detailed explanations ARE still necessary - mainly the screen which deals with importing a CSV file into HR Partner.  Seeing as this is a screen which a lot of our new users use, as well as the fact that we have absolutely no control over the layout and format of the CSV import file the customer supplies, I thought that some extra explanations at the bottom of the import screen may be useful to guide them to a pain free import process.

Here was the old explanation text at the bottom of the CSV import screen:

Screen Shot 2018-07-10 at 9.16.28 am.png

As you can see - very wordy.  But what niggled at my UX designer the most was that the explanations for Gender, Departments, Locations etc. were still fairly vague, and worse still, prompted them to leave the import screen and go to another screen in order to look at what the valid import options were.

What she suggested was that we actually present the valid options all on the one screen, which means that they can check and modify their import file without having to leave this app screen, and get the chance to be distracted or lose interest.

So, the new screen looks like:

Screen Shot 2018-07-10 at 9.09.45 am.png

Because categories such as Department or Employment Status only have about 5 or 6 items in them, it was no problem to actually list them out on this screen directly.  As a bonus, we also modified the import code to use some default values if the information supplied in the import file was missing or invalid.

We actually added more words to the mix, but I am hoping that in this instance, the extra information will help the user to create a better import file and have a better user experience at the end of the day.

Can you think of any other way we can improve on this? I'd love to hear your thoughts in the comments.

 

Racing Along - Building a Telemetry system using Crystal & RethinkDB

f1-david-acosta-allely-shutterstock-com-edit.jpg

Like most younger lads, I often dreamed of being a Formula 1 race car driver, and I have fond memories of watching the likes of Ayrton Senna, Alain Prost, Nigel Mansell etc. race around Adelaide in the late 80's.  The smell, action and romance of F1 always appealed to me.

Alas, my driving skills are barely passable on the public roads, so a race track is a far safer place without me hurling a one ton machine around it.  I have kept in touch with the technological advances within the competition though, and am amazed at how far it has come these days.  I distinctly remember Jackie Steward stopping the race commentary back in the 80's so we could hear one of the first radio transmissions between driver and engineer.  I think it was Alain Prost, and the quality of the transmission was so bad that no one could work out what Prost was saying.

Nowadays, a wealth of data is sent between race car and the engineers in the pit wall, and even to the main team HQ across the other side of the world - who often know the health of the car far better than the driver piloting it at 300km/h.

Back to me.  I've been vicariously working out my lost race driver frustrations on Codemaster's F1 games for the past few years, which are quite realistic, with better graphics and simulation each year.  I only recently found out that Codemasters actually supplies a telemetry feed from their game via UDP, in real time.  I was excited to see so many third party vendors creating apps and race accessories that use this feed (e.g. steering wheels with speed, engine rev and gear displays on them).

Last weekend I thought to myself - "Why don't I try and create a racing telemetry dashboard? The kind that the race engineers or the team engineers back in HQ would use?".  Could I in fact, create a real time dashboard that ran on a web browser and could let someone on the other side of the world watch my car statistic in real time as I blasted around a track?

Well, lets start with the F1 2017 game itself.  It can send a UDP stream to a specific address and port, or just broadcast the stream on a subnet on a specific port.  The secret is to try and latch on to that stream, and either store it, or preferably send it on to another display in real time.

The question was, what technology could I use to grab this UDP feed?  Well, I have recently been dabbling with a new language called Crystal.  It is very similar to Ruby, which I have been using on all my web apps in the past few years, however instead of being an interpreted language, it is compiled, which gives it blazing speed.

Speed is the key here (and not only on the track).  The UDP data is transmitted at anything from 20 to 60Hz.  A typical 90 second race lap could see anything from 1500 to 4000 packets of data sent across.

I decided that I would need to do two things - capture that stream of data into a database for later historical reporting, AND also parse and send this data along to any web browsers that were listening, which meant I had to use a constant connection system like Websockets.  Now, the other bonus is that Crystal's Websocket support is top class too!

So what I did was to write a small (about 150 lines) Crystal app that could do this.  I ended up using the Kemal framework for Crystal, because I needed to build out some fancy display screens etc., and Kemal brings all the MVC goodies to the Crystal language.

Straight away, I came across the first problem I would encounter with trying to consume a constant stream of telemetry data.  Codemaster's sends the data as a packet of around 70 Float numbers.  Luckily, they document what the numbers indicate on their forums, but I have to firstly, consume the packet, then parse the packet to extract the bits of data I need from it (i.e. the current gear selected, the engine revs, the brake temperatures for each of the 4 tyres etc.), then I need to store that information in RethinkDB (which is one of my favourite NoSQL systems out there today), and THEN send the (parsed) packet data to any listening web browser who had an active websocket connection.  Whew.

But really, the actual core lines of code to that took only about 20 lines (excluding the parsing of the 70 odd parameters.  How could I do this effectively?  Well, Crystal has a concept of multi threading, or, multiple Fibers to use their terminology.  I would simply consume the incoming UDP packets on one fiber, then spawn another thread to do the parsing, saving and handing off of the data to the websocket!  It worked beautifully.

Here is a shortened version of the core code that does this bit:

SOCKETS = [] of HTTP::WebSocket
raw_data = Bytes.new(280)

# fire up the UDP listener
puts "UDP Server listening..."
server = UDPSocket.new
server.bind "0.0.0.0", 27003
udp_active = false

# now connect to rethinkdb
puts "Connecting to RethinkDB..."
conn = r.connect(host: "localhost")

def convert_data(raw_data, offset)
  pos = offset * 4
  slice = {raw_data[pos].to_u8, raw_data[pos+1].to_u8, raw_data[pos+2].to_u8, raw_data[pos+3].to_u8}
  return pointerof(slice).as(Float32*).value.to_f64
end

ws "/telemetry" do |socket|
  # Add this socket to the array
  SOCKETS << socket
  # clear out any old data collected in the UDP stream
  server.flush
  puts "Socket server opening..."
  udp_active = true
  
  socket.on_close do
    puts "Socket closing..."
    SOCKETS.delete socket
    # Stop receiving the UDP stream when the last socket closes
    udp_active = false if SOCKETS.empty?
  end

  spawn do
    while udp_active
      bytes_read, client_addr = server.receive(raw_data)
      telemetry_data["m_time"] = convert_data(raw_data, 0)
      telemetry_data["m_lapTime"] = convert_data(raw_data, 1)
      telemetry_data["m_lapDistance"] = convert_data(raw_data, 2)
      telemetry_data["m_totalDistance"] = convert_data(raw_data, 3)
      << SNIP LOTS OF SIMILAR CONVERSION LINES >>
      telemetry_data["m_last_lap_time"] = convert_data(raw_data, 62)
      telemetry_data["m_max_rpm"] = convert_data(raw_data, 63)
      telemetry_data["m_idle_rpm"] = convert_data(raw_data, 64)
      telemetry_data["m_max_gears"] = convert_data(raw_data, 65)
      telemetry_data["m_sessionType"] = convert_data(raw_data, 66)
      telemetry_data["m_drsAllowed"] = convert_data(raw_data, 67)
      telemetry_data["m_track_number"] = convert_data(raw_data, 68)
      telemetry_data["m_vehicleFIAFlags"] = convert_data(raw_data, 69)
      xmit = telemetry_data.to_json
      r.db("telemetry").table("race_data").insert(telemetry_data).run(conn)    
      begin
        SOCKETS.each {|thesocket| thesocket.send xmit}
      rescue
        puts "Socket send error!"
      end
    end
  end

end

NOTE: Port 27003 for the USP listening port.  27 was the late, great Ayrton Senna's racing number, and he won 003 World Driver's Championships in his time!

That is really the core of the system. The first few lines set up a UDP listener, and also the connection to RethinkDB.  Then there is a short routine I define which converts the incoming little endian FLOAT values to a big endian Float64 value that Crystal expects.  Then there is the Websocket listener which grabs the incoming packets, and spawns a fiber to process it when it comes in.

The rest of the system is a pretty basic Bootstrap based web site with 3 pages.  Oh yeah - Crystal serves up these web pages as well, along with customising sections via ERC templates.  Not bad for a single executable that is only around 2MB when compiled!

There is a Live page which uses a Websocket listener to stream the live data to various realtime moving FLOT graphs, as well as the car position on a track map:

 

Then there is a historical data page which allow the engineer to plot race data lap by lap for an already run race:

F1 Historic Telemetry.png

Then a Timing page which shows lap times extracted from the data stream:

F1 Lap Times.png

No space or time to go into those parts in detail here, so I might save those for another blog post.

My main intent with this post was to try and learn Crystal, and to see if I could build a robust and fast Websocket server.  Mission achieved.

I must say I had great fun using this system - I actually had my son play the game on our PS4 while I watched him on my iMac web browser from my office on a different floor of the house altogether.  I could even tell when he struggled on certain parts of the track (the game sends car position data in real time too), and I could see when he was over revving his engines or cooking his brakes trying to pass another car.  This was a 10/10 as far as a fun project goes, no matter the impracticality of it.

 

Building a face recognition app in under an hour

Over the weekend, I was flicking through my Amazon AWS console, and I noticed a new service on there called 'Rekognition'.  I guess it was the mangled spelling that caught my attention, but I wondered what this service was? Amazon has a habit of adding new services to their platform with alarming regularity, and this one slipped past my radar somehow.

So I dived in and checked it out, and it turns out that in late 2016, Amazon released their own image recognition engine on their platform.  It not only does facial recognition, but general photo object identification too.  It is still fairly new, so the details were sketchy, but I was immediately excited to try it out.  Long story short, within an hour, I had knocked up a quick sample web page that could grab photos from my PC camera and perform basic facial recognition on it.  Want to know how to do the same? Read on...

I had dabbled in facial recognition technology before, using third party libraries, along with the Microsoft Face API, but the effort of putting together even a rudimentary prototype was fraught with complexity and a steep learning curve.  But while browsing the Rekognition docs (thin as they are), I realised that the AWS API was actually quite simple to use, while seemingly quite powerful.  I couldn't wait, and decided to jump in feet first to knock up a quick prototype.

The Objective

I wanted a 'quick and dirty' single web page that would allow me to grab a photo using my iMac camera, and perform some basic recognition on the photo - basically, I wanted to identify the user sitting in front of the camera.

The Amazon Rekognition service allows you to create one or more collections.  A collection is simply a, well, collection of facial vectors for sample photos that you tell it to save.  NOTE: The service doesn't store the actual photos, but a JSON representation of measurements obtained from a reference photo.

Once you have a collection on Amazon, you can then take a subject photo and have it compare the features of the subject to its reference collection, and return the closest match.  Sounds simply doesn't it?  And it is.  To be honest, coding the front end of this web page to get the camera data actually took longer than the back end to perform the recognition - by a factor of 3 to 1 !!

So, in short, the web page lets you (1) create or delete a collection of facial data on Amazon, (2) upload face data via a captured photo to your collection, and (3) compare new photos to the existing collection to find a match.

Oh, and as a tricky extra (4), I also added in the Amazon Polly service to this demo so that after recognising a photo, the page will broadcast a verbal, customised greeting to the person named in the photo!

The Front End

My first question was what library to use to capture the image using my iMac camera.  After a quick Google search, I found the amazing JPEG Camera library on GitHub by amw, which allows you to use a standard HTML5 canvas to perform the capture, or fallback to a Flash widget for older browsers.  I quickly grabbed the library, and modified the example javascript file for my needs.

The Back End

For the back end, I knocked up a quick Sinatra project, for a lightweight Ruby based framework that could do all the heavy lifting with AWS.  I actually used Sinatra extensively (well, Padrino actually) to build all my web apps, and highly recommend the platform.

Note: Amazon Rekognition example actually promote uploading the source photos used in their API to an Amazon S3 bucket first, then processing them.  I wanted to avoid this double step and send the image data directly to their API instead, which I managed to do.

I also managed to do a similar thing with their Polly greeting.  Instead of saving the audio to an MP3 file and playing that, I managed to encode the MP3 data directly into an <audio> tag on the page and play it from there!

The Code

I have placed all the code for this project on my GitHub page.  Feel free to grab it, fork it and improve it as you like.  I will endeavour to explain the code in more detail here.

The Steps

First things first, you will need an Amazon AWS account.  I won't go into the details of setting that up here, because there are many articles you can find on Google for doing so.

Creating an AWS IAM User

But once you are set up on AWS, the first thing we need to do is to create an Amazon IAM (Identity & Access Management) user which has the permissions to use the Rekognition service.  Oh, we will also set up permissions for Amazon's Polly service as well, because once I got started on these new services, I could not stop.

In the Amazon console, click on 'Services' in the top left corner, then choose 'IAM' from the vast list of Amazon services.  Then, on the left hand side menu, click on 'Users'.  This should show you a list of existing IAM users that you have created on the console, if you have done so in the past.

Click on the 'Add User' blue button on the top of this list to add a new IAM user.

Give the user a recognisable name (more for your own reference), and make sure you tick 'Programmatic Access' as you will be using this IAM in an API call.

Next is the permissions settings.  Make sure you click the THIRD box on the screen, that says 'Attach existing policies directly'.  Then, on the 'Filter: Policy Type' search box below that, type in 'rekognition' (note the Amazonian spelling) to filter only the Rekognition policies. Choose 'AmazonRekognitionFullAccess' from the list by placing a check mark next to it.

Next, change the search filter to 'polly', and place a check mark next to 'AmazonPollyFullAccess'.

Nearly there.  We now have full permission for this IAM for Amazon Rekognition and Amazon Polly.  Click on 'Next: Review' on the bottom right.

On the review page, you should see 2 Managed Policies giving you full access to Rekognition and Polly.  If you don't, go back and re-select the policies again as per the previous step.  If you do, then click 'Create User' on the bottom right.

Now this page is IMPORTANT.  Make a note of the AWS Key and Secret that you are given on this page, as we will need to incorporate it into our application below.  

This is the ONLY time that you will be shown the key/secret for this user, so please copy and paste the info somewhere safe, and download the CSV file from this page with the information in it and keep it safe as well.

Download the Code

Next step, is to download the sample code from my GitHub page so you can modify it as necessary.  Go to this link and either download the code as ZIP file, or perform a 'git clone' to clone it to your working folder.

First thing you need to do is to create a file called '.env' in your working folder, and enter these two lines, substituting your Amazon IAM Key and Secret in there (Note: These are NOT real key details below):

export AWS_KEY=A1B2C3D4E5J6K7L10
export AWS_SECRET=T/9rt344Ur+ln89we3552H5uKp901

You can also just run these two lines on your command shell (Linux and OSX) to set them as environment variable that the app can use.  Windows user can run them too, just replace the 'export' prefix with 'set'.

Now, if you have Ruby installed on your system (Note: No need for full Ruby on Rails, just the basic Ruby language is all you need), then you can run

bundle install

to install all the pre-requisites (Sinatra etc.), then you can type

ruby faceapp.rb

to actually run the app.  This should start up a web browser on port 4567, so you can fire up your browser and go to 

http://localhost:4567

to see the web page and begin testing.

Using the App

The web page itself is fairly simple.  You should see a live streaming image on the top center, which is the feed from your on board camera.

The first thing you will need to do is to create a collection by clicking the link at the very bottom left of the page.  This will create an empty collection on Amazon's servers to hold your image data.  Note that the default name for this collection is 'faceapp_test', but you can change that on the faceapp.rb ruby code (line 17).

Then, to begin adding faces to your collection, ask several people to sit down in front of your PC or table/phone, and make sure their face is in the photo frame ONLY (Multiple faces will make the scan fail).  Once ready, enter their name in the text input box and click the 'Add to collection' button.  You should see a message that their facial data has been added to the database.

Once you have built up several faces in your database, then you can get random people to sit down in front of the camera and click on 'Compare image'.  Hopefully for people who have been already added to the collection, you should get back their name on screen, as well as a verbal greeting personalised to their name.

Please note that the usual way for Amazon Rekognition to work is to upload the JPEG/PNG photo to an Amazon S3 Bucket, then run the processing from there, but I wanted to bypass that double step and actually send the photo data directly to Rekognition as a Base64 encoded byte stream.  Fortunately, the aws-sdk for Ruby allows you to do both methods.

Lets walk through the code now.

First of all, lets take a look at the we page raw HTML itself.

https://github.com/CyberFerret/FaceRekognition-Demo/blob/master/views/faceapp.erb

This is a really simple page that should be self explanatory to anyone familiar with HTML creation.  Just a series of names divs, as well as buttons and links.  Note that we are using jQuery, and also Moment.js for the custom greeting.  Of note is the faceapp.js code, which does all the tricky stuff, and the links to the JPEG camera library.

You may also notice the <audio> tags at the bottom of the file, and you may ask what this is all about - well, this is going to be the placeholder for the audio greeting we send to the user (see below).

Let's break down the main app js file.

https://github.com/CyberFerret/FaceRekognition-Demo/blob/master/public/js/faceapp.js

This sets up the JPEG Camera library to show the camera feed on screen, and process the upload of the images.

The add_to_collection() function is straightforward, in that it takes the captured image from the camera, then does a post to the /upload endpoint along with the user's name as the parameter.  The function will check that you have actually entered a name or it will not continue, as you need a short name as a unique identifier for this facial data.

The upload function simply checks that the call to /upload finished cleanly, and either displays a success message or the error if it doesn't.

The compare_image() function is what gets called when you click the, well, 'Compare image' button.  It simply grabs a frame from the camera, and POSTs the photo data to the /compare endpoint.  This endpoint will return either an error, or else a JSON structure containing the id (name) of the found face, as well as the percentage confidence.

If there is a successful face match, the function will then go ahead and send the name of the found face to the /speech endpoint.  This endpoint calls the Amazon Polly service to convert the custom greeting to an MP3 file that can be played back to the user.

The Amazon Polly service returns the greeting as a binary MP3 stream, and so we take this IO stream and BaseEncode64 it, and place it as an encoded source link in the <audio> placeholder tags on our web page, which we can then do a .play() on the element in order to play the MP3 through the user's speakers using the HTML5 Web Audio API.

This is also the first time I have placed encoded data in the audio src attribute, rather than a link to a physical MP3 file, and I am glad to report that it worked a treat!

Lastly on the app js file is the greetingTime() function.  All this does is work out whether to say 'good morning/afternoon/evening' depending on the user's time of day.  A lot of code for something so simple, but I wanted the custom greeting they hear to be tailored to their time of day.

Lastly, lets look at the Ruby code for the Sinatra app.

https://github.com/CyberFerret/FaceRekognition-Demo/blob/master/faceapp.rb

Pretty straightforward Sinatra stuff here.  The top is just the requires that we need for the various AWS SDK and other libraries.

Then there is a block setting up the AWS authentication configuration, and the default collection name that we will be using (which you can feel free to change).

Then, the rest of the code is simply the endpoints that Sinatra will listen out for.  It listens for a GET on '/' in order to display the actual web page to the end user, and it also listens out for POST calls to /upload, /compare and /speech which the javascript file above posts data to.  Only about 3 or 4 lines of code for each of these endpoints to actually carry out the facial recognition and speech tasks, all documented in the AWS SDK documentation.

That's about all that I can think of to share at this point.  Please have fun with the project, and let me know what you end up building with it.  Personally, I am using this project as a starting block for some amazing new features that I would love to have in our main web app HR Partner.

Good Luck, and enjoy your facial recognition/speech synthesis journey.