We all know why the chicken crossed the road –to get to the other side. BUT, do we know how he crossed the road?
The answer is he probably used his eyesight to check for any incoming cars, determine which direction to walk in, and check the status of the pedestrian crosswalk light.
HOWEVER, what if he was blind?
Would he still cross the road? Would he risk it just to get to the other side? Would he just hope to get lucky? Would he ask someone else for help? What if there isn’t anyone else to help him? What if he accidentally walks into oncoming traffic? Oh no! Poor chicken.
In reality, blind navigation is actually a huge problem in the world that affects many people
The world health organization estimates that there are 39 million blind people in the world. That’s 39 million people who can’t see anything when crossing the street.
These people have to deal with increased risk every day of their lives because our society wasn’t designed to account for their disadvantage.
This is partially addressed through traditional white canes and guide dogs. Both of which are extremely good at helping blind people avoid obstacles but neither of which are capable of providing sufficient insight regarding the visual aspects of the user’s surroundings.
The white cane is of course just a stick. And the guide dogs, while they have eyes, they (like all dogs) are colour blind and therefore can’t perceive colour based street signs or traffic signals.
This is why something as simple as crossing the street is one of the biggest challenges for blind people. Neither of these traditional technologies is capable of properly addressing it.
Currently what ends up happening is one of two things. Either the person gets someone to assist them and help them cross the street. Or they listen to hear the direction of the moving cars and then cross the street while trying to both avoid them and maintain a somewhat straight path.
However, this doesn’t adequately address the problem at all. What if there’s no one around to help? That entirely kills option 1.
And with option 2, what if it’s too noisy for you to hear the cars? Maybe it’s raining or you’re standing next to a loud construction site and your hearing is impaired by the noise.
In addition, have you ever tried to walk directly towards a target without knowing its exact direction or being able to reorient yourself by using your eyes? It’s incredibly difficult. It’s very likely that you would veer off to the left or right both of which have dangerous vehicles and obstacles in the way.
So how do we fix this?
I was thinking about this problem and I realised something I find unbelievable.
We’ve put so much effort and resources into using object detection AI to create self-driving cars and help cars navigate, however, we’ve used none of that same technology to help disadvantaged people navigate.
There’s been such a huge focus on self-driving cars and other cool applications that we’ve completely forgotten about helping an important group of people in our society.
Then I thought how can I change that?
In theory, all I’d have to do is take the same tech used by self-driving cars to recognise traffic lights and then implement it into a format and device that can be used by blind people.
My first step was recreating the software because of course I can’t just steal car companies’ software.
For my program, I identified 3 criteria which are very important in being able to create a useable product.
- The first is that it has to be accurate. It can’t be misreading traffic lights, that would be dangerous for the user.
- The second is that it has to run in real time. If it’s going to be useable for real-time navigation this is a must.
- The third is that it has to be able to run on an affordable and small processor. This is because the device has to be carried on the person and it should be reasonably priced. For the processor, I ended up choosing a Raspberry Pi because of its good price and small size.
However, achieving all these 3 things is way harder than it sounds because all of them conflict with one another.
With object detection, there is always a trade-off between speed and accuracy. And if you’re running the program on a relatively weak processor like the one found on the Raspberry Pi, it makes the problem 10X worse by severely dampening the available computational power.
So I had to make an accurate model that could be run in real time even with low processing power
To achieve this and the related speed/accuracy requirements, I had to focus a lot on model architecture.
A traditional RCNN, while very accurate, was way too slow to be functional. On the other hand, however, most models designed to work fast were not at all very accurate.
When trying to balance speed and accuracy, I actually found success with the single shot detector model found in the TensorFlow model zoo. It was actually able to provide pretty great results. However, this was only after the somewhat challenging process of training it.
The first thing I had to do was collect and label a dataset to use for the training process. I went out and took 200 images of pedestrian crosswalk lights, 100 of them in the stop state and 100 of them in the walk state.
I then for each one manually placed a bounding box around the light and labelled it as its correct class, stop or walk.
These bounding boxes and labels are necessary to tell the computer where the light in each image is and what state it’s in.
After completing the labelling process, the data set was divided into testing and training datasets.
Now that I had the model architecture and datasets I could perform the actual model training. This is when the model uses the dataset to learn how to accurately detect traffic lights and their states from images and video feeds.
For the model training, because my dataset size was relatively small I decided to leverage the machine learning method of transfer learning.
Transfer learning allows you to take an already trained object detection model and retrain it on your dataset in order to satisfy your problem
Because the pre-trained model used in the process was trained on a huge dataset (for example the coco data set containing over 200 thousand images) the model has already learned how to recognize patterns in images extremely well.
Because of this, when you’re retraining it on your dataset, it’s then able to learn the patterns for that dataset a lot better than it otherwise would've.
This produces a final model that’s very accurate even though your dataset might not have been very large, and it also reduces the training time!
But that’s not to say this worked perfectly on the first try, nope that’d obviously be too easy.
In general, a lot of machine learning is trying different configurations and methods until you find the one that works the best for your specific problem. This was no different.
It took 11 tries of changing model architectures and altering dataset configurations/labelling before I finally got a model that best satisfied the speed and accuracy requirements.
However, I’m actually extremely happy to say that despite this it works astonishingly well.
Alright, so once I had the model it was time to figure out how to make it useable.
That’s when I designed my first prototype wearable
This device is worn on your head and it works by taking in a video feed through the Pi camera on the front. The feed is then processed by the Raspberry Pi which is housed inside the 3D printed casing. The Pi then runs the object detection model on the video feed in real time.
Based on the output of the model the device gives haptic feedback through 3 buzzers located inside the front of the headband. There’s one on the left, one on the right, and one in the middle.
If the model detects the traffic light to be in the stop state, the left and right buzzer will buzz on and off informing the user that they shouldn’t cross.
If the model detects the light to be in the walk state however, one of the buzzers will buzz continuously informing the user that it’s ok to walk.
But not only that, don’t forget that along with its state the model is also capable of detecting where the pedestrian traffic light is! This means it can help the user walk in the correct direction and stay out of traffic as well.
When the light is detected to be in its walk state, the device will choose which buzzer to buzz depending on where the light is. If the light is straight ahead of the user the middle buzzer will buzz continuously. However, if the light is to the user’s left it will be the left buzzer and if it’s to the right it’ll be the right buzzer.
This solves the problem of them not knowing the state of the light, and the problem of them not knowing the exact direction to walk in.
Now they know if it’s safe or not to walk and they know the exact direction to walk in!
In addition, all of this is powered by a rechargeable battery that is stored in the case right next to the Pi, making the whole device portable, functional, practical, and easily useable.
There are still 39 million blind people in the world who suffer from navigational challenges like this every day, but hopefully one day this tech will be able to make their lives at least a little bit easier.
Thanks for reading!
Here are some further links you might want to check out
Project Website -> https://mikhailszugalew.com/NavAssistAi.html
YouTube Video -> https://www.youtube.com/watch?v=7-VyBwpMNIY
My LinkedIn -> https://www.linkedin.com/in/mikhail-szugalew/