The study of visual perception has engaged philosophers and scientists for centuries if not millennia. And the problem is still not well understood, despite new results using machine learning and technologies such as convolutional deep neural networks. How photons impinging on the retina convert into visual perceptions in the mind has been the center of volumes and volumes of scholarly articles and debate. We won’t solve this problem here, but I do wish to explore one area of this discussion which concerns eye movements and construction of the visual field.
So much visual research starts with a simple static image (ImageNet now has over 14 million such classified images) and tries to derive a response from the image. The computer is presented with an image of a bowl of fruit and asked “Where is the banana?”.
Unfortunately, that’s not how our eyes work. They are not Nikon cameras sitting on a tripod taking jpeg snaps 30 times a second. They are physiological organs sensitive to light and constantly in motion, located in a skull which is moving all the time, on top of a body which is changing position every 500 milliseconds. And yet, our perceived visual world seems stable and solid and has no relationship to the train-wreck of photons actually being received by the retina. How can that be?
This is the first of several posts here that explores the question of construction of a visual field from an eye that is in motion. What methods might the visual cortex use to start reconstructing a stable visual field from a lens and sensor that is not stable?
The first method used is a well-known and simple technique found in cameras and photo processing programs to create panoramic images. We’ve all seen the super-super-wide shots of the Grand Canyon and other scenic places. It goes by the popular name of “photo stitching” and uses multiple key-points in adjacent frames to carefully line up and blend two images. Specifically, key-points in frame A are matched using some criteria to key-points in frame B and then a perspective transform is used on A to fit it cleanly into B. The result is a beautiful A+B image. OpenCV has a number of functions to make this process simple to do.
This is an example of a visual field scan constructed using a stitching algorithm. The results are quite decent and one can speculate on where in the visual cortex might the brain to photo stitching? More details and code using OpenCV to create this final image will be provided as soon as time permits.