Daniel Ballinger's FishOfPrey.com: Teaching Salesforce to drive with Einstein Vision Services

tl;dr. Skip to the results.

Technology is not good or bad. It is what we do with it that matters. AI is a gift! May we use it for the betterment of all mankind! pic.twitter.com/E09tuBYCnZ
— Marc Benioff (@Benioff) September 17, 2017

There's something in that tweet that speaks to me. Maybe it's time to pull a Tony Stark and take a new direction with my creations. Moving away from the world of ~~weapons~~ questionable development and more towards something for the public good. Something topical.

Tesla, Apple, Uber, Google, Toyota, BMW, Ford, these are just a few of the companies working on creating self-driving cars. Maybe I should dabble with that too.

My first challenge was that my R&D budget doesn't stretch quite as far as the aforementioned companies. While they all have market values measured in the billions my resources are a little more modest. So the kiwi Number Eight Wire Mentality will come into play when LIDAR isn't an option.

Steve Rogers: Big man in a suit of armor. Take that off, what are you?
Tony Stark: Genius billionaire playboy philanthropist.

* Actual project did not require hammering an anvil... yet

It's another way I fancy myself a bit like Tony Stark. Less like the genius, billionaire, playboy Stark and more like trapped in a cave with only the resources at hand Stark. I'll make do with whatever I've got on hand. Some servos, a Raspberry Pi, an old web cam, a massive amount of cloud computing resources.

That last point is important, while Stark has J.A.R.V.I.S. I've got Einstein Platform Services.

I don't have a background in AI as a Data Scientist. My AI exposure has been more from copious amounts of science fiction and that one trimester back in university where I did a single paper. I'm assuming the depiction of AI in the media is about as accurate as the depiction of computing in general. So while it's fun to take inspiration from an 80's documentary on self driving cars the most useful learning from something like Knight Rider is that Larson scanners make AI's look cool (more on that later).

Elephant in the room

This is probably a good point to pause for a minute and note that, yes, I know that a solely cloud based AI isn't the best option, or even a very good option, for an autonomous vehicle. An image classification AI would only be part of a larger collection of sensor input used to make a self driving car. Connectivity and latency issues will pretty much rule out any full scale testing or moving at significant speed. That, and, I don't think my vehicle insurance would cover crashing the family car while it was trying to navigate a tricky intersection by itself. But I want to see how far I can push it anyway with a purely optical solution using a single camera.

I know that a number of the components I'm using on the prototype are less than ideal. I.e. using continuous rotation servos rather than stepper motors which would have given a more precise measure of the distance traveled. See my earlier point above that I'm mostly working with what I've got on hand. Feel free to contact me if you have a spare sum of cash burning a whole in your pocket that you want to invest in something like this.

Remember, this is just supposed to be a Mark 1 prototype. If it starts to look promising it can be refined in the future.

The Overly Simplistic Plan

The general idea for a minimalistic vehicle is:

A single forward facing web cam to capture the current position on the road and what is immediately ahead.
A Raspberry Pi to capture the image and process it.
Einstien Predictive Vision Services to identify what is in the current image and the implied probable next course of action to take.
Some basic servo based motor functions to perform the required movements.
Repeat from step (2)

I say overly simplistic here in that it doesn't really localize the position of the vehicle in the environment or provide it with an accurate course to follow. Instead it just makes a best guess prediction based on what is currently seen immediately ahead. There is no feedback mechanism based on where the car was previously or how well it is following the required trajectory.

Hardware - Some assembly required

In the ideal world I'd have repurposed an RC toy car that had full steering and drive capabilities already. Or even just 3D printed a basic car. This would have given a realistic steering behavior via the front wheels. Instead I've started out with skid steering and a dolly wheel on a lego frame. This was quick and easy to put together and drive with some servos from a previous project.

While simple is good for v1, it was still not without it's challenges.

The first problem was that I couldn't just go from zero to full power on the drive wheel servos. It tended to either damage the lego gears or send the whole vehicle rolling over backwards. I needed to accelerate smoothly to get predictable movement.

Custom LiPo based high drain power supply

Another challenge was providing a portable power source. Originally I was powering the whole setup off a fairly stock standard portable cellphone charger to power both the Pi and the servovs. However, when I powered up the servos to move forward the Raspberry Pi would reset. The voltage sag when driving the motors was enough to reset the Pi. I was able to work around this by using one of the LiPo (Lithium polymer) batteries from my quadcopters with a high C rating (they can deliver a lot of power quickly if required) and a pair of step down transformers.

Software

The standard software control loop is:

Capture webcam image
Send image off to the Einstein Image service for the steering Model.
Based on the result with the highest probability, activate servos to move the vehicle
Repeat

Capturing the image is fairly straight forward.

Since latency was already going to make fast movement impractical I opted to relay the calls via Salesforce Apex services. This made the authentication easier as I only needed to establish and maintain a valid session with the Salesforce instance to then access the Predictive Vision API via Apex. It also meant I didn't need to reimplement the metamind API and could instead use René Winkelmeyer's salesforce-einstein-platform-apex Github project. This already included workarounds for posting multipart requests required by the API.

Training

This is something that has evolved greatly since my last project using Einstein Vision Services on the automated sprinkler. Previously you created a dataset from a pre-collected set of examples and then created your model from that to make the predictions with. If it needed refining you repeated the entire process. Now you can add feedback directly to a dataset and retrain the model in-place with v2 of the Metamind API. No more adding all the same examples again.

I found it really useful to have four operating modes for the control software to facilitate the different training scenarios.

The first mode would kick in if no DataSet could be found. In these case the car would wait for input at each step. Using a wireless keyboard I could indicate the correct course of action to take. The image visible at the time would be captured and filed away with the required label. This made creating the initial DataSet much easier.

The second mode was used if the DataSet existed but didn't have a Model created yet. In this case the input I gave could be used to directly create examples. This mode wasn't so useful and I'd tend to skip it.

The third option applied if the Model could be found. In this case I could still indicate the correct course of action to take. If the model returned the expected prediction nothing further needed to occur. However, if the highest prediction differed from the expected result I could directly add the required feedback into the model.

The final forth mode was to use the model to make predictions and take action solely off the highest prediction returned.

Control theory - PID control

The earlier statement about the software action "Based on the result, activate servos to move the vehicle" is a gross simplification of what is required to have a vehicle follow a desired trajectory. In the real world you don't merely turn your vehicle full lock left or right to follow the desired course (what is referred to as bang-bang steering). Instead you make adjustments proportional to how far off the desired trajectory you are, how rapidly you are approaching the desired trajectory, and finally, if there are any environmental factors that are affecting the steering of the vehicle.

This should be familiar to anyone who makes and flys quadcopters (another interest of mine). It's referred to as PID (proportional–integral–derivative) Control and provides the automated mechanism to keep the vehicle/quadcopter at the desired state where the output needs to adapt to meet the desired input. The following video does an excellent job of explaining how it works.

With my v1 prototype the control loop speed is so slow and the positioning so basic that I'm practically going with bang-bang style steering. It is less than ideal as most scenarios require fine motor control to accurately follow the curve of the road ahead. Definitely something to look at improving in v2

The prototype results

Training example

This video shows the feedback training process for an existing model. The initial model was created off a fairly small number of examples in the dataset. I then refined it with feedback whenever the top prediction didn't match what I indicated it should be.

I've added some basic voice prompts to indicate what is occurring as it mostly runs headless during normal operation.

Self steering example

In this video I'm using an entirely different dataset to the training example above. Getting single laps out of this setup was a bit of a challenge. The real world is far less forgiving than software when something goes wrong. Through trial an error I found it was really important to get the camera angle right to give the best prediction for the current position. It was also important to adjust the amount of movement per action. Trying to make large corrective movements, particularly around corners, would often result in getting irrecoverably off course. This comes back to the "bang-bang" style steering mentioned above.

I've edited out some of the pauses to make it more watchable. So don't consider this a real time representation of how fast it actually goes.

This process was also not without its share of failures. The following video shows a more realistic speed for the device and what happens when it misjudges the cornering.

Conclusions

Lets start with the Million Dollar Question -

Can I use this today to convert my car into an AI powered automated driving machine and then live a life of leisure as it plays taxi for the all the trips the kids need to make?

Umm, no. Even if you're happy with it only being around 50% confident that it should go in a straight line rather than swerve off to the right on occasion there are still a number of things holding it back.

For example, while it could be extended to recognize a stop sign there is currently no way to judge how far away that stop sign is¹.

That's not even taking into account other things that real roads have, like traffic, pedestrians, roadworks, ... At least I'm still someway off before I need to start worrying about things like the Trolley car problem

Future Improvements

A couple of days before this post came out Salesforce released the Einstein Object Detection API. It's very similar to the Image recognition API I used above except in addition to the identified label in the image it will also give you a bounding box indicating where it was detected. With something like this I could then guess the relative position of the object in question to the current camera direction and maybe even distance based on a know size and how large the bounding box is coming back as. Definitely something to investigate for the MK2.