American Society of Mechanical Engineers

Session: IMECE Undergraduate Research and Design Exposition

Paper Number: 121110

121110 - Deep Neural Networks Based Visual Odometry and Object Avoidance Using Stereo Vision

The prospects of autonomous vehicles in the transportation industry have been growing exponentially in the recent years. To ensure safe navigation on a shared road where both autonomous and traditional vehicles are present, the need for a robust and effective collision avoiding live sensor is critical. Whereas in traditional vehicles, situational awareness of the driver plays a crucial role in collision avoidance, in autonomous vehicles this function needs to be achieved through a combination of multiple sensors. Recently, object detection, and identification employing only vision-based sensors and leveraging learning frameworks has become increasingly popular. The present research integrates a learning paradigm, namely Deep Neural Networks (DNN) with a stereo vision camera on a ground rover to estimate the distance between the host vehicle and other vehicles on a shared road. The DNN model is implemented using Python and OpenCV. The stereo camera was calibrated using OpenCV and images of a 3D checkerboard pattern to eliminate radial distortion, allowing for depth and speed calculations. Following successful camera calibration, the research process took towards integrating an object detection and identification function to this camera. This object detector utilized the COCO objects dataset, and a program was written to create labeled bounding boxes around detected objects. The deep neural network for the COCO database identified these objects at 90% accuracy and with 60% confidence. Subsequently, object distance and speed estimation capabilities were incorporated into the rover program. Distance and depth differentiation were programmed through triangulating calculations using the image feed from the stereo camera. The triangulation process in the code was created based on the disparity of an object obtained from the relative positions of that object pictured in the left and right stereo images to each other, and then obtaining depth through its inverse relationship with disparity. This calculation paired up with the calibrated stereo camera enabled reliable calculation of the speed of a moving object. The triangulation calculation was run in a loop, and as sequential image frames from the stereo camera are input into this calculation, a separate calculation provides the calculated distance outputs of various objects. Additional calculations were integrated to account for the movement of the rover itself with respect to the objects in the camera's field of view. This ensures that each object with zero relative speed identified by the program returns a static depth of zero, a value calculated by the depth of an object with respect to the rover and itself from previous images. Any object moving at a non-zero relative speed is therefore identified by deviating from this static label, and the current depth is compared to the previous depth to calculate speed between successive frames. The direction of motion of the object is crudely ascertained from the pixel locations of the centroid of the object in the two frames and mapping them back to a common image frame. Finally, an object avoidance function was programmed in the rover utilizing these calculations. After basic redirection functions, the program was written to compare all depth estimations to a static distance threshold value for unmoving objects. Once the depth of an approaching object meets this value, the rover performs an avoidance turn. For moving objects, however, the algorithm was modified to calculate a dynamic distance threshold with respect to the speed of an approaching object along with a secondary calculation performed to compare a secondary threshold value to the number of frames in the future that the approaching object will meet the dynamic threshold, and the rover safely redirects once the secondary threshold is met. The process of this expedition has revealed the efficient threshold of resources required for relative speed estimation of objects within the view of a stereo camera. Pairing a basic stereo camera with a deep neural network for object detection and triangulation algorithms has shown to be a relatively low-cost solution to recreate the systems powering an autonomous vehicle.

Presenting Author: Aayan Adnan Colleyville Heritage High School

Presenting Author Biography: Aayan Adnan, Senior at Colleyville Heritage High School and Class of 2024. Currently Life Rank in Scouts of America working towards Eagle and tutors math at Mathnasium.

Authors:

Aayan Adnan Colleyville Heritage High School
Rafi Chowdhury Colleyville Heritage High School
Neel Koney Trinity Valley School
Kamesh Subbarao THE UNIVERSITY OF TEXAS AT ARLINGTON

Deep Neural Networks Based Visual Odometry and Object Avoidance Using Stereo Vision

Paper Type

Undergraduate Expo