· deepdives · 8 min read
Understanding the Shape Detection API: A Developer's Guide to Visual Recognition
A hands-on guide to the Shape Detection API - learn what it can (and can't) do, see code examples for BarcodeDetector, FaceDetector, and TextDetector, and learn practical patterns, fallbacks, and performance tips for building visual recognition features in web apps.

Why the Shape Detection API matters
Modern web apps increasingly rely on visual recognition: scanning barcodes in-store, capturing text from receipts, or detecting faces for AR experiences. The Shape Detection API (sometimes referred to as the Face/Text/Barcode Detection APIs) exposes native browser capabilities to detect shapes (faces), text (OCR), and barcodes from images or video - enabling faster, lower-latency, and more battery-friendly recognition than many pure-JavaScript libraries.
Before diving in: browser support is mixed. While BarcodeDetector has seen broader adoption in Chromium-based browsers, TextDetector and FaceDetector availability varies. Always feature-detect and provide fallbacks. For current compatibility and details, see the MDN docs and the original spec:
- MDN: BarcodeDetector, FaceDetector, TextDetector - https://developer.mozilla.org/
- Web spec / discussion: WICG Shape Detection - https://github.com/WICG/shape-detection-api
- A useful compatibility article: web.dev on Shape Detection - https://web.dev/shape-detection/
Overview: the three detectors
- BarcodeDetector: Detects barcodes and returns a type and raw value (e.g. QR, EAN13). Good for QR scanners and product scanners.
- TextDetector: Performs optical character recognition (OCR) and returns detected text blocks and bounding boxes.
- FaceDetector: Returns face bounding boxes and optional landmarks (eyes, nose, mouth). Useful for face tracking, blur/anonymization, or AR alignment.
All detectors share a similar programming model: construct the detector object, call .detect()
with an image source (ImageBitmap, HTMLVideoElement, HTMLImageElement, HTMLCanvasElement, ImageData), and process asynchronous results.
Feature detection and supported formats
Always test for API availability before calling it. Example:
// Feature detection
if ('BarcodeDetector' in window) {
const supported = await BarcodeDetector.getSupportedFormats();
console.log('Barcode formats supported:', supported);
} else {
console.warn('BarcodeDetector not available - falling back.');
}
const hasFace = 'FaceDetector' in window;
const hasText = 'TextDetector' in window;
BarcodeDetector.getSupportedFormats()
returns an array of format strings (e.g. "qr_code"
, "ean_13"
). Use that to restrict scanning to supported formats for best performance.
Basic examples
Below are compact examples for camera-based detection with each detector. Each example follows this pattern:
- Request camera access
- Draw the frame (or pass the video element) to the detector
- Draw overlay boxes on a canvas
1) Barcode detector (camera -> QR/barcode scanner)
<video id="video" autoplay playsinline></video> <canvas id="overlay"></canvas>
async function startBarcodeScanning() {
if (!('BarcodeDetector' in window)) {
console.error('BarcodeDetector not supported');
return;
}
const formats = await BarcodeDetector.getSupportedFormats();
// Choose formats you care about or omit to allow defaults
const detector = new BarcodeDetector({ formats: formats });
const video = document.getElementById('video');
const overlay = document.getElementById('overlay');
const ctx = overlay.getContext('2d');
const stream = await navigator.mediaDevices.getUserMedia({
video: { facingMode: 'environment' },
});
video.srcObject = stream;
video.addEventListener('loadedmetadata', () => {
overlay.width = video.videoWidth;
overlay.height = video.videoHeight;
requestAnimationFrame(scanFrame);
});
async function scanFrame() {
try {
const barcodes = await detector.detect(video);
ctx.clearRect(0, 0, overlay.width, overlay.height);
for (const b of barcodes) {
const { rawValue, boundingBox, format } = b;
// Draw bounding box
ctx.strokeStyle = 'lime';
ctx.lineWidth = 4;
ctx.strokeRect(
boundingBox.x,
boundingBox.y,
boundingBox.width,
boundingBox.height
);
ctx.fillStyle = 'lime';
ctx.fillText(
`${format}: ${rawValue}`,
boundingBox.x + 6,
boundingBox.y + 20
);
}
} catch (err) {
console.error('Detection error', err);
}
// Throttle scanning rate by spacing frames if needed
requestAnimationFrame(scanFrame);
}
}
startBarcodeScanning();
This scans the live camera and overlays results. Use requestAnimationFrame
to match the display rate; if performance is a problem, consider using a fixed interval (e.g. scan every 200ms) or resizing the video to a smaller canvas for detection.
2) Text (OCR)
if ('TextDetector' in window) {
const textDetector = new TextDetector();
const results = await textDetector.detect(imageElementOrCanvasOrVideo);
// results: array of {rawValue, boundingBox, cornerPoints}
}
A camera-based OCR is similar to barcode detection: feed the video element or a downscaled canvas frame to textDetector.detect(...)
. The API returns blocks and bounding boxes which you can use for overlays or extracting cropped images to send to translation services.
3) Face detection
if ('FaceDetector' in window) {
// Options: fastMode, maxDetectedFaces
const faceDetector = new FaceDetector({
fastMode: true,
maxDetectedFaces: 4,
});
const faces = await faceDetector.detect(videoOrCanvas);
// faces -> array of {boundingBox, landmarks}
}
Detected faces include bounding boxes and an array of landmark points (if supported). Use them to position UI elements (e.g., sunglasses overlay), blur faces, or anonymize images.
Practical use cases and patterns
- QR checkout/loyalty flows: Use BarcodeDetector to read QR codes linking to product pages or coupons.
- Live translation: Combine TextDetector with a translation API. Extract detected rectangles, OCR text blocks, and send them for translation, then overlay translated text back on the client.
- Barcode inventory scanning: For warehouse applications, BarcodeDetector gives a fast client-side option without server round-trips.
- AR face overlays: Use FaceDetector landmarks for placing hats, masks, or makeup effects; use fastMode for real-time responsiveness.
- Privacy-preserving analytics: Detect and blur faces on the client before uploading videos.
Fallbacks and progressive enhancement
Because support varies, provide fallbacks:
- OCR fallback: Tesseract.js (pure JS OCR) - https://github.com/naptha/tesseract.js
- Barcode fallback: jsQR (QR only) - https://github.com/cozmo/jsQR; ZXing compiled to WASM or other JS ports - https://github.com/zxing-js/library
- Face fallback: face-api.js (TensorFlow.js), or MediaPipe Face Detection (via WASM or hosting a small inference engine).
Example fallback structure:
async function detectBarcodeWithFallback(source) {
if ('BarcodeDetector' in window) {
const detector = new BarcodeDetector();
return detector.detect(source);
} else {
// Fallback to jsQR or ZXing
// draw source to small canvas and pass pixel data to jsQR
const imageData = drawToCanvasAndGetImageData(source);
const result = jsQR(imageData.data, imageData.width, imageData.height);
return result
? [{ rawValue: result.data, boundingBox: result.location }]
: [];
}
}
Performance considerations
- Downscale frames: Run detection on a resized canvas instead of full camera resolution. Many detectors don’t need pixel-perfect detail.
- Throttle frames: Process every Nth frame or at a fixed interval rather than every animation frame.
- Limit concurrency: Avoid re-entering
.detect()
if a previous promise hasn’t resolved. - Use fastMode (FaceDetector option) if available.
- Consider OffscreenCanvas + Web Worker for heavy fallback libraries (e.g., Tesseract.js) to avoid blocking the UI thread.
Example throttling pattern:
let processing = false;
async function maybeDetect() {
if (processing) return;
processing = true;
try {
await detector.detect(video);
} finally {
processing = false;
}
setTimeout(maybeDetect, 150); // scan every 150ms
}
Privacy, security, and permissions
- Secure context: Many detection APIs require HTTPS.
- Camera access: Use
getUserMedia
and explicitly request the minimal permissions you need (e.g., usingfacingMode: 'environment'
for scanners). - Local processing: One advantage of Shape Detection API is on-device processing (no images need to be uploaded for detection). When you do send images to a server (e.g., for translation), explicitly inform users.
- Anonymization: If you store or transmit face data, consider hashing or blurring options and follow applicable privacy laws.
Debugging and testing tips
- Visual overlays are invaluable for debugging bounding boxes and landmark coordinates.
- Log raw detector output to inspect format differences across browsers.
- Test on real devices, especially mobile cameras which behave differently (auto-focus, orientation).
- Use sample images (barcodes, printed text, faces) to validate behavior under controlled lighting.
Limitations and caveats
- Inconsistent support: Not all detectors are available in all browsers.
- Non-deterministic OCR accuracy: Lighting, skew, fonts, and languages affect TextDetector results.
- FaceDetector is not a face recognition API - it detects faces, not identify people. Do not use it for identification without explicit consent and legal review.
Complete example: Detect barcodes, text, and faces from a single camera feed
This high-level sketch shows how you might combine detectors. The real implementation needs careful flow control and fallbacks.
// Pseudocode sketch - combine detectors if available
async function initAllDetectors() {
const detectors = {};
if ('BarcodeDetector' in window) detectors.barcode = new BarcodeDetector();
if ('TextDetector' in window) detectors.text = new TextDetector();
if ('FaceDetector' in window)
detectors.face = new FaceDetector({ fastMode: true });
const video = await startCamera();
const overlay = document.getElementById('overlay');
const ctx = overlay.getContext('2d');
async function loop() {
ctx.drawImage(video, 0, 0, overlay.width, overlay.height);
const imageSource = overlay; // canvas works as ImageBitmapSource
// Run detectors in parallel if supported, but be careful to avoid overlapping heavy CPU work
const tasks = [];
if (detectors.barcode) tasks.push(detectors.barcode.detect(imageSource));
if (detectors.text) tasks.push(detectors.text.detect(imageSource));
if (detectors.face) tasks.push(detectors.face.detect(imageSource));
try {
const results = await Promise.all(tasks);
// Draw results from each detector
} catch (err) {
// handle errors
}
setTimeout(loop, 100);
}
loop();
}
When to choose native API vs. JavaScript libraries
Choose Shape Detection API when:
- The API is available on your target browsers.
- You need lower latency and better power/perf behavior.
- You want to avoid shipping heavy WASM/JS models.
Choose JS/WASM libraries when:
- You require consistent cross-browser behavior.
- You need advanced features (e.g., multi-language OCR tuning, face recognition, custom ML models).
Additional resources
- MDN: BarcodeDetector - https://developer.mozilla.org/en-US/docs/Web/API/BarcodeDetector
- MDN: FaceDetector - https://developer.mozilla.org/en-US/docs/Web/API/FaceDetector
- MDN: TextDetector - https://developer.mozilla.org/en-US/docs/Web/API/TextDetector
- web.dev: Shape Detection API - https://web.dev/shape-detection/
- WICG repo: Shape Detection API spec & issues - https://github.com/WICG/shape-detection-api
- Tesseract.js (OCR fallback) - https://github.com/naptha/tesseract.js
- jsQR (QR fallback) - https://github.com/cozmo/jsQR
- zxing-js (barcode fallback) - https://github.com/zxing-js/library
- face-api.js (face detection fallback) - https://github.com/justadudewhohacks/face-api.js
Final thoughts
The Shape Detection API provides elegant, native primitives for vision tasks that used to require heavy client-side libraries or server-side processing. When supported, it can significantly simplify and speed up tasks like scanning barcodes, extracting text, or tracking faces. Design your app with progressive enhancement: detect availability, use native APIs where possible, and fall back to well-known JS/WASM libraries when needed. Pay attention to performance and privacy so you build fast, responsible visual recognition features.