Radial and nonradial image distortion

Images taken by a camera are always subject to distortions – there simply is no flawless camera, no flawless lens. People often tend to think of image distortion as an aesthetic problem (the horizon of your sunset-on-the-ocean photograph being warped to a curve) or even as an aesthetic desirable (some people just like the distorted look of pictures taken by a fisheye lens).

In photogrammetry, image distortion is not just an aesthetic issue but a serious problem that has to be controlled: if the distortions of the images fed into a photogrammetric process are not corrected, the results will not be less pretty but simply wrong.

The most common type of image distortion created by an optical camera is radial distortion. The visual effect is that straight lines appear warped towards or away from the centre of the image. Because of this appearance, these distortions are called barrel and pincushion distortion, respectively.

Barrel distortion affecting the photograph of a sheet of square gridded paper (Nikon L101).

Barrel distortion affecting the photograph of a sheet of square gridded paper (Nikon L101).

While basic camera calibration or camera resectioning establishes the intrinsic camera parameters focal point (xy coordinate), image format (x and y dimensions) and focal length, camera calibration for photogrammetric applications usually includes the correction of radial distortions. A common way to do this is by applying Brown’s distortion model, a set of equations which relates undistorted pixel coordinates to the original (distorted) pixel coordinates depending on the pixel’s radial position in relation to the image’s principal point. Radial image distortions (if they are not extreme) are usually well modelled by photogrammetric software.

A second, less common but much more serious type of image distortion is non-radial distortion. For example, if the sensor pixels were not arranged in a perfectly square grid but sensor pixel distances in the x and y directions would differ, the resulting images would appear compressed or elongated. Fortunately, this is generally not the case. Much more common is a parallelogram distortion produced by many low-cost flat-bed and slide scanners: if the sensor array is not oriented exactly perpendicular to the direction of sensor movement, a scanned rectangle will become a parallelogram in the scan. This can become very relevant when using scanned aerial photographs.

One potentially important source of non-radial distortion is known as the rolling shutter effect. This refers to camera shutters which don’t expose the entire image at once. Instead, a narrow shutter opening moves across the image plane. Video and cell phone cameras generally don’t even have a mechanical shutter. In these cameras, the rolling shutter effect is due to the image sensor being read line by line. If the camera has a fast image processor, the sensor lines can be read within a very short time (the fastest possible shutter speed of a camera giving an indication of how fast the image processor is). And the slower the image processor, the bigger the problem.

The rolling shutter problem can be very severe. I have tested this by moving an LG 350ME phone while taking pictures, and the results are quite astonishing:

Photograph taken without camera movement.

Photograph taken without camera movement.

Camera rotated from left to right while photograph ws taken.

Camera rotated from left to right while photograph ws taken (LG 350ME).

Camera rotated from right to left while photgraph was taken.

Camera rotated from right to left while photgraph was taken (LG 350ME).

Camera rotated upwards while photograph was taken.

Camera rotated upwards while photograph was taken (LG 350ME).

Camera rotated downwards while photograph was taken.

Camera rotated downwards while photograph was taken (LG 350ME).

Now, the problem is quite apparent in these images. But it can sometimes be difficult to see whether there is a rolling shutter problem or not. Imagine the pictures would not show buildings but a landscape – it would be much harder to see the distorion, because the image does not necessarily appear blurred. Amd what’s worst for photogrammetric applications: there is no way to correct rolling shutter distortions (well, you could of course add high-speed accelerometer and gyro sensors to record camera movements and then figure out how to use these data to correct the distortions, but buying a better camera would be much cheaper and easier…).

7500 lines of code: the open source Lidar Visualisation Toolbox LiVT

One of the first things I did when I began working with lidar-based elevation models back in 2006/2007 was to think about how to best visualise these data to be able to discern sublte surface morphology. The “standard” shaded relief just didn’t convince me, and of course I wanted to get as much as possible out of the data. So I started developing algorithms and writing simple software tools to implement them. When I started working full time with lidar data in 2009, I had to upscale this a bit and add some data management and automatisation. Still, these tools were meant to be used for one particular project (the archaeological prospection of Germany’s federal state Baden-Württemberg; since 2010 with support by the European Commissions’s Culture Programm through the multinational project Archaeolandscapes Europe), and I was the only person working on the project. It didn’t matter that I had to make changes to the code every now and then. The amount of code grew step by step: I saw a poster about sky-view factor visualisation at the AARG conference in Bucarest in 2010, came back and added it to the other algorithms. I had other ideas or found interesting algorithms in the literature and added them. And so on.

Once in a while people would ask me if I could process data for them or if they could get the software that I was using. What software? The software I use for the project was (and still is) a makeshift collection of tools implemented in VBA under MS Excel, with a dedicated user interface showing a map of Baden-Württemberg, all the data directories and many of the parameters written directly into the code and the geocoordinates coded into the file names. Works fine for me, but it isn’t portable at all. It would not work on a different computer, and it would be a lot of work to adapt it to a different project.

With the obvious demand for a software which I could share with my colleages, I finally decided to create a portable stand-alone software in which all the visualisation algorithms would be implemented. Easy enough, just take the code from VBA, adapt it to VB2010, tidy it up and, voilá, create an executable. Well… it wasn’t quite that easy. Creating a software that others could simply download and use turned out to be quite different from just writing some code to implement one or another algorithm. And it took much more time than I had expected. Finally, there was some pressure to finish and release at least a beta version, because I had promised to give a software/visualisation tutorial (using my software) at the CAA Workshop in Berlin in January.

LiVT screenshot

Coming back from that workshop, I was inspired by Hubert Mara’s talk to add yet another visualisation technique: multi-scale invariant integrals. And finally, it was decided that what had by then been named the Lidar Visualisation Toolbox (LiVT) was more or less ready to be published at http://sourceforge.net/projects/livt/as open source software under the GNU General Public License. It’s still very much a beta version, and the code is certainly not as tidy as I would want it to be. But finally it’s out there, and if anyone is willing to help improve it, just let me know!

3D acquisition techniques: Structure from Motion

Structure from Motion (SfM), also known as Multi-View Stereo (MVS), is a technique for creating three-dimensional digital models from a set of photographs. Fundamentally, it is a stereo vision approach, that is, it uses the parallax (the difference in the apparent relative position of an object if seen from different directions) to derive 3D information (distance/depth) from 2D images. (Look at something that has foreground and background, e.g. a plant on your window sill and the house on the other side of the street, alternate between your right and left eye – the shifts you see is the parallax.)

What you need for SfM is several (2, 3, … thousands) photographs capturing an object from many different directions. Then, there are by now several software products that can perform SfM 3D reconstruction. VisualSfM (open source) and Agisoft Photoscan ($179 standard, $3499 professional) appear to be among the more popular solutions. There are several more, and there are quality, speed and usability differences, but essentially they all do the same.

A series of photographs of a rock in the desert.

A series of photographs of a rock in the desert.

The software will first analyse the photographs to find many (usually several thousand) characteristic feature points within each image, using SIFT (Scale-Invariant Feature Transform) or similar algorithms. The feature points are defined by analysing the surrounding pixels which will help to recognise similar feature points in other images. Algorithms like SIFT (rather than just trying to directly match small regions among images, e.g. by correlation) ensure that this works irrespective of scale or rotation.

The next step is to match these feature points among images to find corresponding feature points. Some feature points will not be found in any other image – so what, there are many more. Some others will be mismatched, so outliers will be recognised and excluded. The next step is the actual heart of the SfM approach: bundle adjustment. The theory for this has been around for decades, but only with powerful computer hardware was it possible to actually do this (who wants to do billions of computations on a piece of paper…). What happens during bundle adjustment is that the software tries to find appropriate camera calibration parameters and the relative positions of cameras and the feature points on the object. It is quite a challenging optimisation problem, but after all its not magic but just very sophisticated number crunching.

The algorithm starts by using two images. It assumes the known camera focal length (from the JPG’s EXIF data) or (if no data are available) a standard lens. It then uses the relative (2D) positions of the feature points in the two images to derive a rough estimate of the 3D coordinates of these points relative to the camera positions, then iteratively refines both the camera parameters and the 3D point coordinates until either a certain number of iterations or a certain threshold in the variance of the point positions has been reached. Then, the next image is added, and the software tries to fit the matched feature points of that image into the existing 3D model, iteratively refines the parameters and so on. After all images have been added, an additional bundle adjustment is run to refine the entire model. And then, voilá, there’s your sparse 3D point cloud and your modelled camera calibrations and positions.

Sparse point cloud with modelled camera positions.

Sparse point cloud with modelled camera positions.

Now, there are three or four things still worth considering. One is that you may observe slight differences in modelled camera calibration parameters or 3D data. That’s a relatively common issue when dealing with iteratively optimised data. What comes out of the process is not the “absolute truth” but something that is usually quite close to it.

The second point is that the 3D model somehow floats in space – it is not referenced to any coordinate system. Depending on the software, referencing can be done directly by inserting control points with known positions or by using known camera positions as control points. Alternatively, referencing can be done in external software (for example by applying a coordinate transform to the point cloud using Java Graticule 3D. For some purposes, the non-referenced model is good enough, for others you might at least want to scale it, for yet others a complete rotate-scale-translate transformation to reference the model to a coordinate system is necessary.

The third point is that you may want a denser point cloud. To get this, software like VisualSfM or Agisoft Photoscan uses the camera positions, orientation and calibration together with the sparse point cloud and the actual images to compute depth (distance) maps: for every pixel in the image (but usually resampled to a lower resolution), the distance between camera and object is computed. From these distance maps, a dense point cloud is then created.

Finally, what if you need a meshed model? It is either created directly after dense point cloud generation (Agisoft Photoscan), or meshing can be done in external software.

test descr

Meshed model of the rock in the desert.