Catching petty thieves with black magic and data

#software-development

This post sat in my drafts for over a year now. The contents yield no result, but I hope this might inspire you to do something crazy with your day. Honestly, this idiot whom ran over a cyclist inspired me to publish this post.

Last week's video highlighted the dangerous design of NYC bike lanes. Today a cyclist in a bike lane (yes, that's a bike lane) sandwiched between two driving lanes was rear-ended by a van. #VisionZero appropriately describes the street lighting. @StreetsblogNYC @TransAlt @NYC_DOT pic.twitter.com/L1XvWf7YY6
— Jessica (@Thund3r_H4wk) November 12, 2019

This post is just for shits and giggles. While the actual goal is to catch some petty thieves which did about €400 in damages to a van, I’m not exactly sure whether I’ll succeed in that at the time of writing. Let me tell you a little bit of the backstory first.

So this all started with some petty thieves driving along our company’s building and seeing a lone van standing in the parking lot. Not really thinking brightly early in the day they decided to look what’s in it. Not looking up or around they approach the van, look into it, and decide it’s worth breaking it open for.

Best thing? It’s all caught on camera.

What’s our problem then? Didn’t we get ‘em? Well, not exactly. We’re not able to see their license plate due to overexposure. First thing we’d be trying was to alter the exposure of our imagery to try to get to see some numbers, but no. That did not work.

Digging down

So what are our options here? What information do we have, and how can we get some more information?

The car: the car seems to be a Citroën C4 manufactured somewhere between 2005 and 2010. Seems to have a somewhat light color.

License plate: Not known

And this is what is known. Our ultimate goal is to get to know the license plate. So we have to combine all information we have from the video feeds, and possibly more. The most information we can get about the license plate is this still as they’re driving away.

We have pixels! Playing around with our color curves we can get the pixels more clear! (Actually I’ll use the inverse of the picture, making it easier working with a white background.)

Next step, we’re going to check if we can get pixels to match up with the license plate! How? We get the same font, and we’re going to make sure it’s about 3 pixels high. Yup. Should work. We could brute-force the license plate, but why bother? There’s only a limited set of possibilities. See this Wikipedia page for more information about the license plate system in use in the Netherlands. But first let’s narrow down the possible license plates. In order to be able to do this we will need to know when this car was manufactured. Based on the look of the brake-lights I’d say it was somewhere between 2010 and 2015. (For reference pictures; visit https://www.cars-data.com/en/citroen-c4-2010/445)

According to Wikipedia the following ranges of license plates have been issued to cars:

00-KBB-1, registration 2009/2010
00-LBB-1, registration 2010
00-NBB-1, registration 2010/2011
00-PBB-1, registration 2011
00-RBB-1, registration 2011
00-SBB-1, registration 2011
00-TBB-1, registration 2012
00-XBB-1, registration 2012
00-ZBB-1, registration 2012/2013
1-KBB-00, registration 2013
1-SBB-00, registration 2013
1-TBB-00, registration 2013/2014
1-XBB-00, registration 2014
1-ZBB-00, registration 2014/2015
GB-001-B, registration 2015
HB-001-B, registration 2015/2016

To limit the possibilities even more, registrations with ‘SD’ or ‘SS’ are not issued. According to Wikipedia: “Nowadays the letters used do not include vowels, so as to avoid profane or obscene language. To avoid confusion with a zero, the letters C and Q are also omitted. Letters and numbers are issued in strict alphabetical/numeric order.”. This effectively leaves us with B, D, F, G, H, J, K, L, M, N, P, Q, R, S, T, V, W, X and Z (19 characters) to use as letters for use in license plates. But why bother calculating all these possible combinations? There’s a data set we can use!

Datasets! 🎉

So the RDW (Dutch instance responsible for issuing license plates) has some datasets which are freely available (see https://opendata.rdw.nl/ for the sets). One of these contains the license plates in combination with quite a lot of information about the car. Information like the brand, make, manufacture date and more technical information than I care to know are in this dataset. After downloading all this data as a CSV file it ended up being a 7.1Gb extraction. Nice.

We’re going to filter this data set with the following criteria:

Brand: Citroën
Make: C4, 5-door hatchback
Type: Passenger car
Registration: Something which is inside of the criteria outlined above.
Still allowed on the road
Accepted on the road somewhere between 2010 and 2015 (basically an alternative filter for the registrations)

After the first filter session we’re down to 4987 possible cars. Nice. Compared with the 14.1 million records in the dataset we’re only working with about 0.12% of the original amount of data. Looks good.

Let’s see if there are ways to generate images which have a quality which is at least as bad as our security cameras. A quick google search came up with this StackOverflow thread. As we’re doing nothing fancy here we can just copy paste this into our LinqPad window. Just a quick test to see whether it works and voila. I think this is something we can continue with.

Code generating images containing letters representing license plates.

But our problem is that the result is so bad we can barely make it up. Okay. Resize it! Back to StackOverflow. The resulting image looks like this. (Browser resizing makes the image blurry. I’m too lazy to turn that off right now)

License plate, but resized to a nearly unreadable format.

Just about right if you’d ask me. As we only have a 3x13 grid of pixels.

Proceeding further?

Now what? We’d have to match bit arrays. Given this data is pretty high dimensional I have decided to use K-Means clustering to try and find the relative distance between number plates, and then select the platest which were most closely related. Even though the proof of concept worked, and I could produce results, I don’t really believe this method is practically viable. Different factors like optics, skewed angles and a far from accurate depiction of a number plate are all problems which have to be experimented with and validated before they can be used in a project like this.

All these factors have made it so that I lost interest in pursuing this experiment any further. While it ‘might’ work, there are many unknowns which makes this experiment even more complex than it already seemed to be initially.

For those who are actually interested in the technical part, I have posted the script I used for all this on GitHub!