NIPs

NIP-A0: Voice Messages

NIP-A0 gives short voice notes their own Nostr event kinds, with direct audio URLs, reply structure through NIP-22 and optional waveform and duration metadata through imeta.

NIPs Under the hood Events, NIPs, relay behavior and the shared formats apps can trust.

Publishing and mediadraftoptionalvoice

NIP-A0: Voice Messages

NIPA0Root voicekind 1222Voice replykind 1244Recommended durationup to 60 secondsRecommended formataudio/mp4 with AAC or OpusPreview metadatawaveform and duration in imeta

Voice notes need a shape that is not just a file link

Voice messages sit between chat, social posting and media. A user expects them to feel immediate, short and playable in-line. A bare MP4 or OGG URL inside a normal note does not tell a client whether it needs to render a voice-note player, show a waveform, limit duration or treat the event as a reply.

NIP-A0 defines root voice messages as kind 1222 and voice replies as kind 1244. The content is a direct URL to an audio file. The spec recommends short recordings, typically no longer than 60 seconds, and recommends audio/mp4 with AAC or Opus for broad compatibility.

The event stays simple on purpose. It does not define live audio, rooms or long podcast episodes. It defines the small product behavior people recognize from messaging apps: press, speak, send, listen.

Two event kinds plus optional waveform metadata

Kind 1222 is the root voice message. Kind 1244 is the reply form and must follow the NIP-22 comment structure. Both put the audio URL in content. Tags can include normal Nostr metadata such as hashtags, geohashes or reply references where relevant.

The optional visual layer comes through NIP-92 imeta. A voice message can include waveform values and duration so a client can draw a compact audio preview without first downloading the whole file.

That waveform field matters for UX. A voice note without duration or visual shape feels like a random download. A voice note with metadata feels like a native conversation object.

Fabian added the voice-message NIP in July 2025

The visible file history is short. Fabian added NIP-A0 in July 2025 through PR #1984, then updated the audio format and waveform recommendation days later through PR #1990. That makes it one of the younger NIPs in the current set.

The official example uses a Blossom URL, which is a useful clue about the media stack around it. NIP-A0 defines the voice event; Blossom or another media server still hosts the audio file. NIP-92 supplies the metadata language.

For people, that means the standard is not a full voice platform. It is a small bridge between short-form audio UX and the existing Nostr media system.

First visible addition2025-07 by FabianWaveform updatePR #1990Open Git history

Clients need to make recording limits visible

A good implementation enforces or clearly warns about the 60-second expectation, uploads to a media server, writes an imeta tag with duration and waveform, and renders a small player that does not surprise the user with large downloads.

Reply voice messages needs to behave like comments through NIP-22 so they can attach to the right parent. If a client treats every voice file as a standalone social post, conversations become fragmented.

Accessibility also matters. A future client may add transcript metadata, but even now the UI can show duration, playback speed and an obvious pause state.

1222Root voice message.

1244Voice reply via NIP-22.

waveformSmall visual preview.

durationAudio length in seconds.

Voice adds privacy and moderation weight

Voice can reveal identity, background sound, location clues and emotion more directly than text. Clients need to not treat uploads casually, especially when files are stored on public media servers.

Moderation is harder too. Audio abuse is less searchable than text, so communities need reporting and playback-safe defaults.

Read NIP-A0 in the wild

NIP-A0 brings voice messages into the event model. Voice can make Nostr more human across languages and communities where text misses tone.

Audio also exposes people quickly. Storage, consent, waveform previews, transcription, deletion requests and replay risk need as much care as the recording button.

What changes when you actually use it

For you, NIP-A0: Voice Messages is felt when a post becomes a durable object: article, file, image, video, audio, bookmark, wiki entry or source reference. The question is whether the work still makes sense after one app, host or relay disappears. The concrete pieces kind 1222, kind 1244, kind: 1222, kind: 1244, content, tags decide whether the object carries enough context to survive.

What changes for builders and operators

For builders, NIP-A0: Voice Messages is context preservation. Store enough title, tag, author, hash, URL, media, preview and reference material that another interface can rebuild the object. If your feature depends on a private database to make sense, the NIP is not doing the portability work yet.

What the official file makes concrete

The official file is organized around Specification, Event Kind 1222 and Kind 1244, Visual representation with imeta (NIP-92) tag (optional), Examples, Root Voice Message Example. Inspect kind 1222, kind 1244, kind: 1222, kind: 1244, content, tags, t, imeta because these are the pieces most likely to surface as product behavior. Read it beside NIP-22, NIP-92 before treating it as isolated.

NIP-A0: Voice Messages protects context. Titles, media, hashes, source links, timestamps and references decide whether work survives beyond one app.

Where it breaks

The failure mode in NIP-A0: Voice Messages is link rot with a nice interface. Media disappears, metadata lies, source URLs change, hashes are missing or an article loses its addressable identity. The page needs to make durability part of the feature, not an afterthought.

Where this appears outside the markdown

In the ecosystem, NIP-A0: Voice Messages is part of the creator and archive layer. It decides whether writing, media, files, bookmarks, wiki material or source references remain understandable after the first app disappears. That is why media standards need to talk about storage, provenance and recovery, not only presentation.

The nearby-standard trap

The nearby-standard trap in NIP-A0: Voice Messages is flattening every creative object into a note with a link. Articles, videos, files, torrents, highlights, images, wiki entries and bookmarks carry different metadata and storage pressure. Read NIP-22, NIP-92 so the product does not throw away the part that made the object portable.

Language that keeps the feature honest

Good product copy for NIP-A0: Voice Messages names the object and the storage. It says article, file, image, video, bookmark, wiki page, torrent, highlight or podcast episode, then tells you where the signed metadata ends and where external hosting begins.

What this page does not promise

NIP-A0: Voice Messages does not guarantee that published work survives forever. It can carry richer metadata, hashes, references or addressability, but files still need hosts, relays still need retention, and clients still need to render the object faithfully. Treat the NIP as the signed map of the work, then check where the actual bytes, previews and source links live.

Read it as a field test

Start NIP-A0: Voice Messages with the object you want to keep: article, file, media, bookmark, repository, torrent, wiki entry or podcast episode. Then trace which parts are signed, which parts are hosted, and which parts another client can reconstruct from kind 1222, kind 1244, kind: 1222, kind: 1244, content, tags. That is the difference between portable publishing and a pretty link preview.

Where the standard earns trust

The source links give you places to test the interpretation in public: Voice Messages PR #1984, Waveform update PR #1990, NIP-92 Media Attachments, Blossom. Use those links to move from the spec to live libraries, mirrors, pull requests, guides or products.

Official NIP-A0 source is the anchor for exact wording, and NIP-A0 commit history shows how that wording moved over time. The strongest secondary clues here are Voice Messages PR #1984, Waveform update PR #1990, NIP-92 Media Attachments. Treat this evidence chain as part of the article, not as footnotes. A NIP page becomes useful when you can move from claim to source to working behavior without guessing.

Keep the chain visible for NIP-A0: Voice Messages: first the human promise, then kind 1222, kind 1244, kind: 1222, kind: 1244, content, tags, then the implementation record, then the real-world failure case. That order keeps NIP-A0 useful without turning it into marketing copy or protocol trivia.

Three questions to carry forward

Where do the signed metadata and the actual media or file bytes part ways?
Can the object still be identified by hash, address, title, author and source if the first URL breaks?
Does a second client know enough from kind 1222, kind 1244, kind: 1222, kind: 1244 to render the work without private context?

What to verify before you rely on it

Find kind 1222, kind 1244, kind: 1222, kind: 1244, content in the official file and check where the UI exposes the same concept.
Read NIP-22, NIP-92 as context before treating NIP-A0 as a complete product story.
Open at least one implementation, mirror, pull request or library source from the source links before trusting that the idea is mature.
Test the unhappy path: missing relays, stale metadata, invalid signatures, blocked events, expired state, revoked permissions or unavailable media.
Write the user-facing copy in plain language. If a standard changes authority, privacy, money, moderation or recovery, say that before the click.

Direct sources

Use these sources for NIP-A0: Voice Messages in that order: Official NIP-A0 source for the current wording; NIP-A0 commit history for the change record; Voice Messages PR #1984, Waveform update PR #1990, NIP-92 Media Attachments for public context. The article gives you the consequence in plain language, but the source trail is where exact fields, status notes, unresolved debates and implementation proof stay checkable.