Structure from motion: tips and tricks

While 3D modelling is neither the only nor the most important technique that I use in my work, I do spend some time playing, exploring and using structure-from-motion to create 3D models and orthophotos of objects, sites and landscapes. Occasionally someone asks me either general or specific questions about photo acquisition, processing etc. That’s why I am sharing my experiences here. This is not a manual but rather a collection of tips and tricks based on my own experiences as well as on discussions with colleagues and in the Agisoft user forum.

Things you can do with Structure-from-Motion

  • create 3D point clouds describing an object
  • create meshed 3D models
  • create digital surface models
  • create digital terrain models
  • create orthophotos

What you need

  • a set of overlapping digital photographs taken from different positions relative to the object
  • SfM software (I mostly use Agisoft Photoscan [This is not advertising – I do not get anything from Agisoft for mentioning their software],  but there is also free software like VisualSfM that can be used.)

Camera choice

  • Barely useable to very good 3D models can be created with any camera. In many cases (and up to a certain limit), illumination, image acquisition and processing details have a stronger influence on the result than the type of camera used.
  • Fixed focal length, fixed focus cameras without any movement of lens elements relative to one another or relative to the sensor will allow grouped camera calibration which will produce better and more reliable results.
  • I f lens elements can be expected to be moving between images (even very slightly) relative to one another or relative to the sensor (zoom, focus, image stabilisation), separate camera calibration will often produce better results.
  • Many cell phone and video cameras are subject to the rolling shutter problem (sensor lines read sequentially producing non-radial distortion which can not be dealt with by the algorithms implemented in Photoscan. Image alignment may fail or be very poor if camera was moved relative to the object even though the images may not appear blurred. Such cameras are useable, but even more care should be taken to avoid movement.
  • For small objects, cameras with a physically small sensor are often better suited because depth of field will be wider. Narrow depth of field can be very problematic for very small objects. Focus stacking is an option but requires greatly increased time and effort.
  • Avoid extreme wide angle lenses. Camera calibration will not be very good for these, and the very large view angle changes between images will make it harder for the algorithm to find matching points.
  • Scanned paper photographs, negatives and slides: While this type of imagery is also useable for SfM, it can be very problematic. There are at least three reasons for this: (i) Many (low-quality) scanners introduce non-radial distortions somewhat similar to the rolling shutter problem. This is because the scan line is often not perfectly perpendicular to the direction of its movement across the image. An additional source of distortion can be warping of the images, negatives or slides. (ii) In most cases, the scan does not cover the entire image frame, i.e. the edges are cut off to an unknown and often unequal extent. This means that the centre of the digital image may be quite far from the principal point of the camera, and this is something that results in poor camera calibration. (iii) Many analog images suffer from grain, dust or scratches which can result in poor 3D modelling results due to spurious matching points and noisy depth maps.
  • Scanned aerial photographs taken with a professional camera (including fiducial markers): These can be very good and may often be the only available source, but they can suffer the same problem as other paper/negative photographs, in particular if they were digitised using a low-quality scanner. However, if all fiducial markers are visible in the images, the distortions due to scanning can (in principle) be removed to a large extend.


  • Agisoft Photoscan is able to produce decent 3D models even if images with very different illumination are used in a single project. However, camera alignment and 3D reconstruction will usually be better if changes in illumination are minimised.
  • The entire 3D reconstruction process relies on image texture. If possible, objects should be illuminated so as to enhance image texture. Texture-less surfaces can not be modelled. Contrast stretching (applying the same contrast stretch to all images) can help if texture is poor. Point-pattern or image projection may be an option in some cases.
  • Avoid hard shadows, because 3D reconstruction in the shadowed areas may be poor.
  • Avoid built-in or direct flash, because this will result in very different illumination for every image and also hard shadows. If you need to use a flash, use tripods to set up one or several remotely triggered flashes with soft boxes, then take your images with this stable illumination. Of course, the same goes for other lights: soften the illumination and keep it as constant as possible between images. Don’t forget that you and your camera may cast undesirable shadows.
  • Using a lens-mounted circular LED illumination has been reported to work well because it creates a relatively even illumination of the field of view.
  • Glossy surfaces (water, wet surfaces, metal etc.) are problematic. Eliminating or reducing specular reflections / gloss will be very beneficial for camera calibration, alignment and 3D reconstruction. This can be done by allowing surfaces to dry before image capture, modifying illumination to minimise gloss and applying a polarising filter.

Image capture

  • Capture sufficient images with an overlap of at least 67% (i.e. every single point of the object is seen in at least 3 images). Often, an overlap of 80-90% will be a good choice. It does not cost much too take more pictures, but if you take too few you may never be able to go back and take more. If using a wide angle lens (often in interior settings), increase overlap to above 90%, because view angle changes from image to image will be very large which will make it difficult to for the algorithm to find matching points.
  • Avoid “corridor mapping” (i.e. a single or very few long parallel strips) because in such cases small errors tend to accumulate; this can lead to an overall warped model. When working with the pro version of Photoscan, using camera GPS positions as control points in the image alignment can reduce this problem.
  • When capturing an object, capture images all the way around it even if you are ultimately not interested in all sides. Closing the circle can greatly improve model quality as it reduces warping. For example, terrestrial image acquisition of the façade of a long building or wall will be equivalent to a “corridor mapping” approach as only one or very few parallel strips of images are acquired. Taking images all around the building will improve model accuracy.
  • Re-orienting the camera relative to object (i.e. not only capturing parallel strips but adding several additional strips perpendicular to the others) usually improves camera calibration and image alignment.
  • Shallow view angles can result in poor (or failed) camera alignment. View angle should usually be between 45° and 90°. View angle change from image to image should be low (less than 45°). If views from very different directions are required, adding images at intermediate view angles will greatly improve camera alignment. The same goes for strips: If several strips of images are acquires under different view angles, view angle changes between strips should be low (less than 45°). Adding an additional strip in-between will often greatly improve camera alignment. Generally, taking pictures from an elevated position allows steeper view angles. While flying is expensive, drones, kites and poles can be options to get elevated viewing positions.
  • Edges: If the object to be captured has relatively sharp edges (c. 90° or less), use higher overlap to make sure that the edges will be well-represented in the model.
  • Background masking: In principle, Photoscan is able to model both background and foreground, and then you can simply create your model from the object in the foreground and ignore those parts of the model that are background. However, background can be problematic. One reason for this is that there may be movement in the background (people walking past the scene, clouds drifting across the sky etc.) which can have a negative effect on camera calibration and alignment. Another reason is that parts of the background may accidently become part of the model you want to capture. Background masking therefore improves modelling results. Furthermore, by reducing the image areas that the software has to deal with, processing will also be faster. Using a blue (or any other colour not present in the object) screen can make background masking much easier and faster. A computer monitor has the advantage that there will be (almost) no shadow on the background.
  • Full body capture: The minimum number of photographs to properly capture an object from all directions is larger than one would expect when only thinking about the overlap constraint. Because view incidence angle in many parts of the object is very shallow (even more so if a wider angle lens is used at close distance), more images are necessary to get good results. As a first approximation, this will mean that you need to acquire one or several roughly parallel circles (c. 16-20 images each) around the centre of the object plus several more to properly capture top and bottom. Background masking becomes particularly important for full body capture, and becomes indispensible if the object is rotated (rather than the camera moved around the object).