I have recently needed to run varnish (A very fast web cache for busy sites) in a situation that also required use of HTTPS on the box. Unfortunately, Varnish does not not handle crypto, which is probably a good thing given how easy it is for programmers to make mistakes in their code, rendering the security useless!

Whilst recipes for Stunnel and Varnish together exist, information on running them on the same box whilst still presenting the original source IP to varnish for logging/load balancing purposes was scarce – the below configuration “worked for me”, at least on Debian 7.0. (Wheezy) You will need the xt_mark module which should be part of most distributions, but I found was missing from some hosted boxes and VMs with custom kernels. The specific versions of software we are running are based on:

  • Linux 3.13.5
  • Varnish 3.0.6
  • STunnel 5.24

If you are running Varnish 4, the VCL here will require rewriting as it has been updated from version 3.

IPTables – mark traffic from source port 8088 for routing
iptables -t mangle -A OUTPUT -p tcp -m multiport --sports 8088 -j MARK --set-xmark 0x1/0xffffffff

Routing configuration – anything marked by IPTables, send back to the local box. These two can be added under iface lo as “post-up” commands if you’re on a Debian box.
ip rule add fwmark 1 lookup 100
ip route add local 0.0.0.0/0 dev lo table 100

STunnel configuration. The connect IP MUST be an IP on the box other than loopback, i.e. it will not work if you specify 127.0.0.1.

[https]
accept = 443
connect = 10.1.1.1:8088
transparent = source

From default.vcl:
import std;

sub vcl_recv {
// Set header variables in a sensible way.
remove req.http.X-Forwarded-Proto;

if (server.port == 8088) {
set req.http.X-Forwarded-Proto = “https”;
} else {
set req.http.X-Forwarded-Proto = “http”;
}

set req.http.X-Forwarded-For = client.ip;
std.collect(req.http.X-Forwarded-For);
}

sub vcl_hash {
// SSL data returned may be different from non-SSL.
// (E.g. including https:// in URLs)
hash_data(server.port);
}

Clojure Meetup as part of the Cambridge Non-dysfunctional Programmers group, in Metails office at 50 St Andrews St, Cambridge

Clojure Meetup as part of the Cambridge Non-dysfunctional Programmers group, in Metails office at 50 St Andrews St, Cambridge

This month’s meetup of the Cambridge NonDysFunctional Programmers will be hosted here at Metail’s Cambridge office next Thursday (26th November) from 6.30pm. I (Ray Miller) will be giving a hands-on introduction to web development with Clojure, where attendees get to implement their first Clojure web application from the ground up.

Along the way, we’ll learn about the Compojure routing library, Ring requests and responses, middleware, and generating HTML with hiccup. Time permitting, we’ll also cover interacting with a relational database and using buddy to add session-based authentication and authorization to our application.

The theme running through the tutorial is implementation of an ad server (an example I shamelessly stole from Dan Benjamin’s Meet Sinatra screencast). This demo application delivers a Javascript snippet to embed random ads in a web page and tracks user click-throughs. It also provides an administrative interface for reporting and managing ads. If you’ve already seen Dan’s screencast, you’ll see how Clojure compares with Sinatra to implement the same application.

See the Meetup page for full details and to sign up.

 

Shortly after I joined Metail in late summer 2011 there was a typical English bit of weather; namely an apocalyptic downfall of rain just as I was leaving work on a Wednesday. I was immediately transported back to a Kenyan balcony on which I had spent many happy hours as a proper colonial – sipping a G&T and watching the sun go down. The temperature was right, the sun was low in the sky, the tree was in full bloom (well it had leaves on it at any rate) and a tropical storm was in the air. Thinking that it was only fair to share this experience with those who had not been fortunate enough to have exposure to the original I packed my bag on Friday with a selection of gins, tonic and appropriate accoutrements.

This isn't www.xkcd.com

Unpacking the essentials after moving office

This went down well with colleagues and every Friday since we have endeavoured to gather and raise a glass to the end of the week. It has provided a great opportunity to relax and to meet other team mates and their partners and children. It also allowed teams to bond and cross-team conversations to happen. We also got the chance to hear the result of Nick’s nimble fingers (this harks back to the halcyon days when there was room to swing a cat and strum a guitar upstairs in 16), and share in the occasional sing song.

As the team has expanded I have not been able to support the whole cost so have asked for contributions of £10 a month towards the cost and welcome any suggestions or requests for particular drinks. As well as widening the group who make the drinks each week. We have progressed from simple gins and tonics to brambles, why nots, and even non gin-based drinks – manhattans, daiquiris, sidecars and orange brulées spring immediately to mind. Most of our shopping is done at Cambridge Wine Merchants who are always happy to help us out with ideas or substitutions for ingredients.

Oh and finally like any good dealer – your first few hits are free.

 

This is not http://www.smbc-comics.com/

Ace of Clubs Daiquiris made by Ian Taylor and photographed by Andrew Dunn

For anyone thinking of setting up something similar, the following price structure was designed to be as inclusive as possible and has been in place as the office head count has more than quadrupled. It has kept the club in the black and allowed it to provide something alcoholic and non-alcoholic for everyone at Christmas. Membership of the club is purely optional.

Members: £10 a month (first month free)

Interns: drink free

Partners: drink free

Guests: drink free

Non-Members from the office: drink free

Most of Metail’s try-it-on tech is delivered as a single page JavaScript application that embeds inside retailer sites, calling back to Metail-hosted services in Europe over HTTPS/JSON. In this post, Nick Day describes some of the more difficult trade-offs that we needed to make to make the best use of Content Distribution Networks to improve the speed of our app.

tl;dr When you’re using a CDN to accelerate the delivery of a 3rd party HTTPS / AJAX app to users far away, you end up having to choose your devils. We chose the one called “pre-flight OPTIONS request”.

We’re currently working with Dafiti, a Brazilian company whose user base is almost exclusively in-country. Our web-application and web-cache servers are based in the UK, so to aid user-experience we’ve moved as much as possible of our static and dynamic content so it’s served through a CDN (CloudFront, which handily has edge nodes in São Paulo).  

The last piece of this “CDN-ifying” has been to serve the initial HTML page from CloudFront.  Naturally, this has a large impact on page load speeds as the browser can do nothing else until this resource has been obtained; so fulfilling the request from a nearby CloudFront node is hugely preferable to having it trot under the Atlantic to hit our servers and then back.  However, as this page is now being served from the CDN’s domain, browser security constraints mean that any AJAX requests being made from it must either:

  1. also go through the CDN domain, or
  2. be served directly from your service domain using cross-origin resource sharing (CORS)

For cacheable content obtained through AJAX (in our case things like garment information) the preferred option is clear; serve it through the CDN.  The decision is more involved for content that you don’t want to or can’t cache, like requests that you need to be up-to-date (e.g. account details) or update requests. In this case your choice is for the lesser of two evils.

Serving through the CDN:

  • you’re making every one of these requests take an indirect route, as the CDN will be passing them all on for your servers to handle
  • on top of that there’s an extra hit; SSL de/encryption must happen in the CDN for these requests to be forwarded
  • you have to manage potentially complex CDN config to do the right thing for each request.  You wouldn’t have to do this with your typical CDN-cached content, but in practise you may need to add cookies and headers to a whitelist for the CDN to forward.  CloudFront makes this problem extra-fun by lacking a programmatic way to update/version your distribution configuration, making it easy to take down your service accidentally, and hard to roll back.
  • you’re paying (real money) each time you route one of these requests (somewhat unnecessarily) through a CDN

Alternatively, if you send these requests directly to your service domain:

  • you’ll need to implement and maintain CORS handling for those endpoints being requested from other domains
  • for requests deemed “non-simple” by the CORS spec (methods other than GET/HEAD or POST; using Content-Type other than a very strict set, which surprisingly does not include JSON; or using custom headers) the browser will automatically make a pre-flight OPTIONS request to check that it is able to make the request before actually making the one you desire.  This is a big deal if your longer-than-you’d-like cross-Atlantic request suddenly becomes two!  It was this point that almost pushed me toward taking the CDN route for these requests.

Fortunately for us, there are two saving graces with this latter choice of sending the requests directly to our service domain:

  1. All of the requests in our app that require a pre-flight request are non-blocking, that is that they don’t require a user to wait before viewing part of, or accessing functionality in the app. They’re all “update in the background” kind of requests. Granted, if these take too long you could imagine timing issues creeping in, but we should be coding defensively to guard against those kind of asynchronous problems anyway.
  2. The CORS spec allows you to specify a caching max-age for the OPTIONS requests. Of course, since you’re not going through a CDN, this caching will be on a per-user basis against the web-cache. At the moment, unfortunately for us, even a cache hit must come to Europe. The answer here is that it’s probably going to make sense for us to move web-caches closer to where the users are, even if the apps stay rooted in the UK.  Another slightly niggling point is that some browsers currently have an enforced maximum cache period that’s quite short for pre-flight requests (e.g. Chrome is 5 minutes).  As this is shorter than our average user session, it means that it’s likely a user will have to make those same requests more than once per session.  From half a world away, that’s not ideal.

Taking all of these points into consideration, we favoured sending requests directly to our service domain; preferring to take the hit with the “background” requests that require a pre-flight in order to improve performance of most of our non-caching requests.  

As hosts of the excellent Data Insights Cambridge meetup, we’re excited for the upcoming talk by Dr Sacha Krstulovic from Audio Analytic Ltd. (who also happen to be our downstairs neighbours!). Dr Krstulovic will be speaking about ‘Filling the sensory gap in Big Data’.

What is Filling the sensory gap in Big Data?

Data analysis creates value by uncovering relationships between various types of data. Whilst there is lots of thought put into developing new data analysis techniques, this talk, on the other hand, focuses on the nature of the data itself. As a matter of fact, the evolution of technology across time has unlocked access to data layers of an increasingly rich and complex nature. As a consequence, new value propositions have arisen not only from new analysis techniques, but also from treating new types of data. Nowadays, this movement has reached as deep as allowing to think of humans as having an equivalent data self in various computer systems. However, at the cutting edge frontier of this movement, there is still an opportunity that is untapped, that of creating value from exploring one of the richest and hardest to reach data sets: human sensory data. As a case study, Audio Analytic is exploring this opportunity in the domain of acoustic data.

The Speaker

Dr Sacha Krstulovic

Dr Sacha Krstulovic is the VP of Technology at Audio Analytic, a company building a significant leadership in automatic sound recognition. Before joining AA, Sacha was a Senior Research Engineer at Nuance’s Advanced Speech Group (Nuance ASG), where he worked on pushing the limits of large scale speech recognition services such as Voicemail-to-Text and Voice-Based Mobile Assistants (Apple Siri type services). Prior to that, he was a Research Engineer at Toshiba Research Europe Ltd., developing novel Text-To-Speech synthesis approaches able to learn from data. He is the author and co-author of two book chapters, two international patents and several articles in international journals and conferences.

The meetup is scheduled for Thursday, November 12, 2015 at 7:00 pm at 50 St Andrew’s St, CB2 3AH. We hope to see you there, just sign up for it on the Data Insights Cambridge meetup page.

As part of the Visualization Team at Metail, I spend a large proportion of my time staring at renders of models, hoping that too much virtual flesh isn’t being exposed. It’s a onerous task and any automation to make my life easier is always appreciated. Can we get computers to make sure “naughty bits” aren’t being accidentally shown to our customers?

Detecting Flesh Colours

The diversity of flesh colours is quite large; the following is a small selection of MeModel renders:

Feet Skin Colours

Feet Skin Colours

The range of colours considered “fleshy” is further compounded by the synthesised lighting environment: shadows tend to push the colours towards black, highlights towards white.

The first question to consider is which colour space to use for detecting flesh colours. There’s quite a bit of informal debate about this. Some propose perceptual spaces such as HSL or HSV, but I’m going to stick my neck out and say that there’s no reason not to use plain, old RGB.1 Or, more precisely, to overcome the effects of fluctuating lighting levels, some form of normalized RGB. For example:

M = max(RGB) or 1 iff G ≡ B ≡ 0
R* = R ÷ M
G* = G ÷ M
B* = B ÷ M

Two observations can be made at this point:

  1. If G ≡ B ≡ 0, we’re in such a dark place that we cannot tell whether we’re looking at flesh or not; and
  2. The dominant colour channel in all (healthy) human skin is red (green-skinned lizards and blue Venusians are unsupported), so MR.

Typically, we further assume that the least dominant colour channel is blue. This leads to a pleasing heuristic:

(RGB) is flesh only if R > G > B

Marrying this up with a hue/brightness wheel gives us a generous segment of potentially “fleshy” colours:

Hue/Brightness Wheel

Hue/Brightness Wheel

There are various refinements to this linear programming technique to further partition the colour space, including:

  • J. Kovac, P. Peer, and F. Solina, “Human skin colour clustering for face detection” in Proceedings of EUROCON 2003. Computer as a Tool. The IEEE Region 8, 2003
  • A. A.-S. Saleh, “A simple and novel method for skin detection and face locating and tracking” in Asia-Pacific Conference on Computer-Human Interaction 2004 (APCHI 2004), LNCS 3101, 2004
  • D. B. Swift, “Evaluating graphic image files for objectionable content” US Patent US 6895111 B1, 2006
  • G. Osman, M. S. Hitam and M. N. Ismail, “Enhanced skin colour classifier using RGB Ratio model” in International Journal on Soft Computing (IJSC) Vol.3, No.4, November 2012

For example, one in-house heuristic we tried could be coded in JavaScript as:

function IsSkin(rgba) {
  if ((rgba.a > 0.9) && (rgba.r > rgba.g) && (rgba.g > rgba.b)) {
    var g = rgba.g / rgba.r;
    if ((g > 0.6) && (g < 0.9)) {
      var b = rgba.b / rgba.r;
      var expected_b = g * 1.28 - 0.35;
      return Math.abs(b - expected_b) < 0.05;
    }
  }
  return false;
}

However, there are some fundamental flaws with merely classifying each pixel as either “fleshy” or “non-fleshy”:

  1. The portion of the colour space that is taken up by human hair, although larger, overlaps the space taken up by flesh tones. This makes distinguishing between long hair lying on top of clothes and actual flesh very difficult.
  2. Many clothes or parts of clothes are deliberately designed to be flesh-coloured or “nude” looking.
  3. You do not know if the “fleshy” pixels are at naughty locations of the body or not.
  4. As the body shape parameters of the MeModel changes, the location of the “naughty bits” changes.

We’ll address these issues in the next part. However, even with fairly simplistic heuristics, the techniques discussed thus far reduce the number of images that have to be pored over by humans, as opposed to automated systems, by up to 90%.

Footnotehsv

The maximal range of hues generally considered as skin tones by most flesh pixel detectors is 0° ≤ H ≤ 60° (red to yellow).

This range equates to R ≥ G ≥ B, as can be seen in the chart on the right.

Furthermore, the saturation and value (or lightness) components of HSV (or HSL) detectors are usually discarded. Therefore, the “flesh predicate” can be constructed purely from relative comparisons between RG and B, or pairwise differences thereof. This is why I believe the RGB colour space to be no less accurate than other colour spaces.

AWS Meetup

On the 22nd October Metail sponsored and hosted the third Cambridge AWS User Group (who have a few photos of the event up). It was great to welcome the meetup organiser Jon Green, AWS tech evangelist Ian Massingham (the main speaker for the evening), and around 25 others to our office “town hall” for drinks, snacks (including honey roasted cashews), networking and of course the talks.

Ian had two spots on the schedule. The first was a round up of the announcements made last week in at AWS re:Invent. I’m not going to produce my own recap of Ian’s summary, A Cloud Guru gave a nice summary of the keynotes at re:Invent in their medium blog post and there’s a nice summary with follow on links over at trendmicro. There were a few questions from the audience last night all of which Ian answered to their satisfaction. Seems AWS is doing something right 🙂 My favourite bit of the talk was during the overview of AWS’ now beta IoT platform (apparently the first thing AWS themselves have referred to as a platform) and the consumer stories from re:Invent. Here the focus was largely around an application in farming, mentioning the possibility of fitting monitoring devices to animals and being able to buzz them to wake them up. As someone who grew up in the valleys of Wales surrounded by sheep I found the thought of experimentally setting the state of your flock to be awake quite amusing. The scale of farming in the US is slightly larger than what I’m used so that’s a lot more sheep to wake up through patchy signal coverage ;).

Ian’s second talk was a demo of the new EC2 container service. I’ve played with docker before but it’s not my area of expertise however the audience seemed suitably impressed and there was a good bit of discussion at the end.

Having bribed my way onto the agenda by getting Metail to host (and doing the literal leg work to get the drinks and snacks to the office) it was good to give a talk to an engaged audience. I opted to go into some depth on the tracking and batch processing steps of our data architecture and skimmed over the insights and how we actually drive the pipeline. I’m hoping to get invited back to go in to a bit more depth. Elastic MapReduce is a slightly more niche service in AWS, and perhaps being overtaken by the use of real time systems such as AWS’ kinesis family. The talk itself is up on slideshare and you can see me in action on the @CambridgeAWS twitter account 🙂

We are often complimented on the selection of beers which we put on at the meetups at Metail. This is in large part owing to the excellent choice found at the King’s Parade branch of Cambridge Wine Merchants; it makes a nice lunch time outing to go over and choose the selection for the evening’s meetup. Having said that, a little data insight from our event hosting is that diet coke is the most popular drink.

Photo cc Ian M.

I’m heading off to the 6th 3D Body Scanning Technologies conference (3DBST) for two days next week (27, 28 October).

Each day will host parallel sessions about 3D human body scanning and its industrial applications. According to the conference programme, this year they focus on

  • 3D body scanning technologies
  •  body measurements and its application in fashion industry
  •  medical applications
  •  scanning for health and sport
  •  anthropometric studies & surveys
  •  body scanning assessment & use

As you can imagine, there’s plenty there that could be relevant to what we are doing in Metail R&D to create better MeModels and clothing visualisation. I’m particularly looking forward to the following topics and tech talks:

  • Mobile body modelling solution 
    • Precise and Automatic Anthropometric Measurement Extraction Using Template Registration (DFKI, Germany)
    • Challenges of Designing a 3D Camera for Mobile Handsets and Tablets (Alces Technology, USA)
    • Volume Extraction from Body Scans for Bra Sizing (University Erlangen-Nuremberg, Erlangen, Germany)
  • Low cost scanning solutions
    • From Handheld Tablets to Retail Booths – New 3D Body Scanning Solutions (Size Stream, USA)
    • Low-Cost Data-Driven 3D Reconstruction and its Application (IBV, Spain)
    • Unlocking the Body as a Digital Platform and the Rise of Reliable Consumer Scanners (Body Labs Inc., USA)
  • Cloth/garment modelling
    • 3D Product Development for Loose-Fitting Garments Based on Parametric Human Models (TU Dresden and ITU, Germany)
    • Determination of the Air Gap Thickness Underneath the Garment for Lower Body Using 3D Body Scanning
    • A Dense Surface Motion Capture System for Accurate Acquisition of Cloth Deformation (Surrey uni, UK)

Apart from the tech talks there are demos from scanning companies such as Styku (USA), 3dMD (UK/USA), Size Stream (USA), VITRONIC (Germany), Sizzy (France), IBV (Spain), TechMed 3D (Canada).

If you’re going to the conference and want to meet up for a coffee or even a refreshing beer after a long tiring conference, please contact me at dongjoe@metail.com

 

We’re delighted to be hosting and sponsoring the 3rd Cambridge AWS User Group meetup tonight. Where we’re going to be getting a run down of this year’s AWS re:Invent and I’ll give an introduction to Metail’s data pipeline with a focus on how we’re using and configuring some of AWS’ services to power Metail’s data insights. As sponsors we’ll be providing beer, soft drinks and wine as well as some snacks (I’m a big fan of honey roasted cashews and in charge of purchasing, so feel free to comment on this post with any special requests).

It’s the first of these events we’re hosting here at Metail and I’ve used it as an excuse to write a presentation for the group. Although the meetup is moving around various hosts and sponsors in Cambridge you can expect to see it back at Metail in future, and hopefully a follow up/continuation of my talk in a future meetup.

Here’s the summary of the event from the user group page:

Docker, containers and the like. Plus, those of us who made it back from Vegas alive report on our findings.

The BIG news…if you’re in the Internet of Things business, a database or data analytics specialist, are into Docker containers, or use Kinesis or Lambda, there were some major announcements – and you will definitely want to be here to hear them!

This meeting is hosted, and snacks and drinks sponsored, by Metail.

We’ve got a lot to get through, so the meeting will start at 7pm sharp, with doors open from 6:30 – please be prompt!

 

The presentations:

Notices and news – Jon Green (Adeptium Consulting) and Brief Intro to Metail – Gareth Rogers (Metail)

• re:Invent 2015 Debrief – Ian Massingham (AWS) (and/or Jon Green)  – all the gory details and new announcements!

•  Metail’s Data Pipeline and AWS – Gareth Rogers

Docker and Amazon Container Service  – Ian Massingham

• [Possibly] From Zero to Hero in AWS – follow-up – Tom Clark

Hope to see you later! We also host the Cambridge R Users Group, Data Insights Cambridge, DevOps Cambridge and Cambridge NonDysFunctional Programmers so if you can’t make it, there are plenty of other opportunities to come along to Metail for some drinks, snacks and tech chat.