· deepdives · 8 min read
Unleashing Creativity: Building a Real-Time Video Journal with the MediaStream Recording API
Learn how to build a real-time video journaling app using the MediaStream Recording API and HTML5 - record, overlay captions in real time, store entries locally, and stream chunks to a server for near-instant saving.

What you’ll build and why it matters
Imagine a personal video journal that records your day-by-day reflections, adds live captions or stickers, and saves each moment safely to your device - plus uploads chunks to the server as you speak so nothing is lost. That’s what you’ll build here: a creative, privacy-first, real-time video journaling app using the MediaStream Recording API.
By the end of this article you’ll have a clear architecture, working code samples for live preview + overlay, chunked recording with near-real-time upload, local storage (IndexedDB) for offline-first reliability, and production considerations (privacy, compatibility, and performance).
High-level architecture
- Capture user camera + microphone via getUserMedia.
- Optionally draw the camera frames to a canvas to add overlays (timestamps, captions, stickers).
- Use the canvas’s captureStream() (or the raw media stream) as input to MediaRecorder.
- Record in short timeslices (e.g., 1s) so
dataavailablefires frequently - enable near-real-time uploading and safe partial saves. - Persist each chunk with metadata (timestamp, tags) to IndexedDB for offline resilience.
- Reassemble or stream chunks to a backend endpoint for server-side storage or processing.
Prerequisites and browser support
- Secure context (HTTPS). getUserMedia and MediaRecorder require it.
- Modern Chromium/Firefox browsers have good support. Safari (especially iOS) has historically lagged - test before shipping. See browser compatibility: MediaRecorder (MDN).
- Optional: Web Speech API for live transcription (also browser-dependent): Web Speech API.
Key Web APIs covered
- getUserMedia - obtain camera + microphone: getUserMedia (MDN)
- MediaRecorder - record MediaStream and receive data blobs: MediaRecorder (MDN)
- HTMLCanvasElement.captureStream - record video with overlays: captureStream (MDN)
- IndexedDB - local storage for blobs and metadata: IndexedDB (MDN)
UI sketch (HTML)
A minimal UI gives you:
- live preview video
- record / stop buttons
- caption input and sticker buttons
- timeline / gallery of entries
Example HTML (skeleton):
<!-- index.html (skeletal) -->
<div id="app">
<video id="preview" autoplay playsinline muted></video>
<canvas id="overlay" style="display:none;"></canvas>
<div class="controls">
<button id="startBtn">Start</button>
<button id="stopBtn" disabled>Stop</button>
<input id="captionInput" placeholder="Add a live caption" />
</div>
<div id="entries"></div>
</div>You’ll draw the live camera into the canvas to apply overlays (text, timestamp, sticker) and call canvas.captureStream() to get a composed MediaStream for recording.
Core JavaScript: capture, overlay, record, upload, save
Below is an opinionated, production-like approach broken into focused functions.
// app.js (focused)
const preview = document.getElementById('preview');
const overlayCanvas = document.getElementById('overlay');
const startBtn = document.getElementById('startBtn');
const stopBtn = document.getElementById('stopBtn');
const captionInput = document.getElementById('captionInput');
let cameraStream = null; // raw camera MediaStream
let composedStream = null; // canvas.captureStream() if using overlays
let recorder = null; // MediaRecorder instance
let chunkQueue = []; // chunks produced by recorder
let recordingStart = null;
let drawReq = null;
// Initialize camera
async function initCamera() {
cameraStream = await navigator.mediaDevices.getUserMedia({
video: { facingMode: 'user', width: { ideal: 1280 } },
audio: true,
});
// Show raw preview (muted to avoid echo)
preview.srcObject = cameraStream;
// Setup canvas same size as video track
const track = cameraStream.getVideoTracks()[0];
const settings = track.getSettings();
overlayCanvas.width = settings.width || 1280;
overlayCanvas.height = settings.height || 720;
// Start draw loop to apply overlays
startDrawLoop();
// Build composed stream from canvas + original audio track
const canvasStream = overlayCanvas.captureStream(30); // 30fps
const audioTrack = cameraStream.getAudioTracks()[0];
composedStream = new MediaStream([
...canvasStream.getVideoTracks(),
audioTrack,
]);
}
function startDrawLoop() {
const ctx = overlayCanvas.getContext('2d');
const videoEl = document.createElement('video');
videoEl.srcObject = cameraStream;
videoEl.muted = true;
videoEl.play();
function draw() {
// Draw video frame
ctx.drawImage(videoEl, 0, 0, overlayCanvas.width, overlayCanvas.height);
// Example overlay: translucent timestamp
ctx.fillStyle = 'rgba(0,0,0,0.5)';
ctx.fillRect(10, overlayCanvas.height - 50, 260, 40);
ctx.fillStyle = '#fff';
ctx.font = '18px sans-serif';
ctx.fillText(new Date().toLocaleString(), 20, overlayCanvas.height - 22);
// Live caption (from input)
const caption = captionInput.value;
if (caption) {
ctx.fillStyle = 'rgba(255,255,255,0.8)';
ctx.font = '24px sans-serif';
ctx.fillText(caption, 20, 40);
}
drawReq = requestAnimationFrame(draw);
}
draw();
}
// Start recording with chunk timeslice (ms)
function startRecording(timeslice = 1000) {
if (!composedStream) throw new Error('No stream available');
const options = getSupportedMimeType();
recorder = new MediaRecorder(composedStream, options);
chunkQueue = [];
recordingStart = Date.now();
recorder.ondataavailable = async e => {
if (e.data && e.data.size > 0) {
const chunkMeta = {
timestamp: Date.now(),
duration: Date.now() - recordingStart,
};
// Save chunk locally first
await saveChunkToIndexedDB(e.data, chunkMeta);
// Optionally: upload chunk to server immediately
uploadChunk(e.data, chunkMeta).catch(console.error);
}
};
recorder.onstop = async () => {
// Optionally assemble final blob from DB or simply mark entry
console.log('Recording stopped');
};
recorder.start(timeslice); // timeslice in ms triggers ondataavailable periodically
}
function stopRecording() {
if (recorder && recorder.state !== 'inactive') recorder.stop();
recordingStart = null;
}
// Pick a supported mimetype
function getSupportedMimeType() {
const candidates = [
'video/webm;codecs=vp9,opus',
'video/webm;codecs=vp8,opus',
'video/webm;codecs=h264,opus',
];
for (const m of candidates) {
if (MediaRecorder.isTypeSupported && MediaRecorder.isTypeSupported(m))
return { mimeType: m };
}
return {};
}
// Basic chunk upload (POST each chunk as it becomes available)
async function uploadChunk(blob, meta) {
const form = new FormData();
form.append('chunk', blob, 'segment.webm');
form.append('meta', JSON.stringify(meta));
const res = await fetch('/upload-chunk', { method: 'POST', body: form });
if (!res.ok) throw new Error('Upload failed');
}
// Minimal IndexedDB saving (using a simple helper)
function openDB() {
return new Promise((resolve, reject) => {
const req = indexedDB.open('video-journal', 1);
req.onupgradeneeded = () => {
const db = req.result;
db.createObjectStore('chunks', { keyPath: 'id', autoIncrement: true });
db.createObjectStore('entries', { keyPath: 'id', autoIncrement: true });
};
req.onsuccess = () => resolve(req.result);
req.onerror = () => reject(req.error);
});
}
async function saveChunkToIndexedDB(blob, meta) {
const db = await openDB();
const tx = db.transaction('chunks', 'readwrite');
tx.objectStore('chunks').add({ blob, meta });
return tx.complete;
}
// Wire buttons
startBtn.onclick = async () => {
startBtn.disabled = true;
stopBtn.disabled = false;
if (!cameraStream) await initCamera();
startRecording(1000); // emit every 1 second
};
stopBtn.onclick = () => {
startBtn.disabled = false;
stopBtn.disabled = true;
stopRecording();
};
// Handle cleanup on page unload
window.addEventListener('beforeunload', () => {
if (drawReq) cancelAnimationFrame(drawReq);
cameraStream && cameraStream.getTracks().forEach(t => t.stop());
});Notes on the above:
- Using a canvas allows you to draw UI overlays (captions, stickers, timestamps) and record them as part of the video via captureStream().
- The recorder.start(timeslice) causes ondataavailable to be called repeatedly with short blobs - perfect for chunked upload.
Server-side: receiving chunked uploads (Node/Express example)
This server accepts each chunk and stores it. In production you might stream directly to object storage (S3/GCS) or append to an existing file.
// server.js (Express)
const express = require('express');
const multer = require('multer');
const fs = require('fs');
const path = require('path');
const upload = multer({ dest: 'tmp/' });
const app = express();
app.post('/upload-chunk', upload.single('chunk'), (req, res) => {
const meta = JSON.parse(req.body.meta || '{}');
const tempPath = req.file.path;
const targetDir = path.join(
__dirname,
'uploads',
String(meta.sessionId || 'default')
);
if (!fs.existsSync(targetDir)) fs.mkdirSync(targetDir, { recursive: true });
const finalPath = path.join(
targetDir,
`${Date.now()}-${req.file.originalname}`
);
fs.rename(tempPath, finalPath, err => {
if (err) return res.status(500).send('Failed');
res.send('OK');
});
});
app.listen(3000);This example keeps things simple. In production:
- Include session IDs and ordering metadata so you can reassemble chunks server-side.
- Use S3 multipart uploads or append to a file using streaming.
Reassembling chunks into a single video
Two approaches:
- Reassemble on the server using chunk order and append them (or create a container) - reliable and allows server-side transcoding.
- Keep chunks in IndexedDB client-side and merge them into one Blob for playback/download:
// Merge blobs client-side
const blobs = [
/* array of Blob segments */
];
const final = new Blob(blobs, { type: 'video/webm' });
const url = URL.createObjectURL(final);
videoEl.src = url; // play or downloadMerging client-side is simple but can be memory-heavy for long recordings.
Optional: Live transcription and searchable entries
You can add the Web Speech API to create live captions and searchable metadata. Transcriptions can be stored alongside chunk metadata so users can jump to moments by searching text. Example starter: Web Speech API (MDN).
UX and privacy considerations
- Ask for minimal permissions (camera + microphone only when needed).
- Make it clear where recordings are stored and whether chunks are uploaded immediately or kept local.
- Provide an easy export / delete flow so users control their data.
- For sensitive users, offer client-side encryption before upload (crypto.subtle) so the server never sees plaintext.
Performance tips
- Record in short chunks (500ms–2000ms) for near-real-time upload and quick recovery after crashes.
- Avoid excessively high resolutions on mobile to reduce CPU and network usage. Provide a quality selector.
- Use hardware-accelerated codecs where available (browser-dependent).
- If you need continuous streaming to a server (low-latency), consider a WebRTC peer connection instead of MediaRecorder.
Cross-browser gotchas
- Safari historically has limited MediaRecorder support and different MIME support. Test and provide fallbacks (e.g., offer audio-only in older browsers).
- Mobile browsers may suspend background JavaScript. Use small chunks and save frequently.
Next steps / features to add
- Visual timeline of entries with thumbnails generated from blob frames.
- Rich stickers, filters, and animated overlays drawn to the canvas.
- Automatic sentiment/tone tags using a server ML model.
- End-to-end encryption for private diaries.
- Syncing and deduplication across devices.
References
- MediaRecorder API (MDN): https://developer.mozilla.org/en-US/docs/Web/API/MediaRecorder
- getUserMedia (MDN): https://developer.mozilla.org/en-US/docs/Web/API/MediaDevices/getUserMedia
- HTMLCanvasElement.captureStream (MDN): https://developer.mozilla.org/en-US/docs/Web/API/HTMLCanvasElement/captureStream
- IndexedDB API (MDN): https://developer.mozilla.org/en-US/docs/Web/API/IndexedDB_API
- Web Speech API (MDN): https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API
Final thoughts - go make something personal
You’re not just building a recorder - you’re creating a tool for personal expression. Chunked recording + live overlays unlock creative workflows: short daily reflections, mood tags, searchable memories. Start small: a robust preview, short timeslices, and safe local storage. Then iterate - add style, transcription, secure sync, and watch your users capture life as it happens.



