What is computer vision and image analysis?
Computer vision is the use of AI to understand the contents of images and video. Given a photo of a damaged car, it can identify the affected panels and estimate severity. Given a product on a conveyor belt, it can flag defects. Given a document photo, it can classify it as a driver's licence, a rates notice, or a contract. Given a construction site photo, it can detect whether workers are wearing safety equipment.
Modern computer vision is powered by deep learning models trained on large image datasets. The good news for most applications is that you do not need to train your own model — major cloud providers offer pre-trained models for common tasks (object detection, image classification, text extraction from images, label detection) via APIs. Custom model training is only needed when the task is domain-specific enough that general models cannot achieve the required accuracy.
The practical result is that image analysis tasks that previously required human review can be automated or significantly accelerated, at a cost per image that is often a fraction of a cent.
When does your app need it?
- You receive photos from customers or field workers and need to categorise, assess, or act on their contents without manual review
- Your insurance, property management, or trade services platform processes damage or inspection photos that need structured assessment
- You need to automatically classify documents uploaded by users (passports, licences, certificates, invoices) and route them accordingly
- You run a manufacturing, food production, or product assembly operation and want to automate visual quality control
- Your platform involves vehicles and you need to read or verify licence plates
- You want to moderate user-uploaded images automatically — detecting inappropriate content before it is published
How much does it cost?
Adding computer vision and image analysis typically adds 13–27 hours of development — roughly $2,000–$6,000 AUD.
At the simpler end, this covers a single image analysis use case using a pre-trained cloud API, with results stored and surfaced in your application. At the more complex end, it includes multiple analysis types, a human review workflow for edge cases, custom prompt engineering for LLM-based vision analysis, and (if pre-trained models are insufficient) data labelling and custom model fine-tuning.
How it's typically built
For most Australian business applications, the starting point is a managed vision API. AWS Rekognition handles object and label detection, face detection (not recognition — privacy implications mean identification of individuals is rarely appropriate), and content moderation. Google Vision AI and Azure Computer Vision offer similar capabilities. For tasks that require understanding context and generating a description or assessment — "describe the damage visible in this photo" — GPT-4 Vision and Claude are highly capable and are often the right tool.
Images are uploaded to your storage (S3 or equivalent), then the URL or image bytes are passed to the vision API. The API returns structured results — labels with confidence scores, bounding boxes, extracted text, or a natural language description — which your application processes and stores. A confidence threshold determines which results are acted on automatically and which are queued for human review. For custom use cases where general models underperform, fine-tuning an open-source model (such as a YOLO variant for object detection) on a labelled dataset of your own images is the path to higher accuracy, but it adds significant time and requires a training dataset.
Questions to ask your developer
- Which vision task do you need? Classification, object detection, text extraction, damage assessment, and content moderation are different problems with different tooling.
- How accurate does it need to be, and what is the cost of a mistake? For high-stakes decisions (insurance payouts, safety compliance), a human-in-the-loop review stage is important.
- Will pre-trained models be sufficient, or do you need custom training? Custom training requires a labelled dataset and significantly more development time.
- What volume of images will be processed? Per-image API costs are small but accumulate — at significant volume, evaluate whether on-premise or self-hosted models are more economical.
- Are there privacy obligations around the images? Images containing people's faces, medical information, or identity documents have specific handling requirements under the Australian Privacy Act.
See also: AI document extraction · AI text generation · App cost calculator