Transmitting data via ultrasound without any special equipment

admin

Jun 27, 2025 - 17:45

0 0

Fri 27 June 2025

Transmitting data via ultrasound without any special equipment

There are secret messages flying all around you all the time, being transmitted via, most of the time, electromagnetic waves going from antenna to antenna.

ELOs “Secret Messages” is a song about posting conspiracy theories via WiFi.

But what if you need to get a few bytes from device A to device B (one of the hard problems in computer science!) and you don’t feel like making sure they’re both connected to the same network? Well, fortunately, another channel is available to us - sound, or for a more fun and less-audible experience, ultrasound.

What even is ultrasound

Sound that you can hear is, physically, the air vibrating, or being compressed and relaxed again and again, rapidly. Our ears (or our computers ears - microphones) can pick up this pressure variation. While a computer microphone picks this pressure variation up pretty much directly (and that is, in fact, what you see when you look at an audio waveform - it’s a measure of the pressure relative to ambient), the ear, and our perception, and most useful audio processing algorithms, work in the frequency domain. While you can take entire classes on this, the short version is: Every audio signal can be decomposed into frequency components - sine waves with a certain number of vibrations per second (the frequency, in Hz), which then lets you see how much energy the signal has at each frequency. If you’ve ever seen a bunch of bars go up and down on some kind of audio player - that’s exactly the frequency domain, usually with lower pitch / lower frequencies on the left and higher on the right, showing how much of the sound you are hearing is low-pitched, bassy rumbling, versus mid tones, versus highs.

That wording already implies something, which is that there is some reasonable low-mid-high range for these frequencies. For us humans, that is from about 10 to 20 Hz (sub-bass, more of a thing you feel in your stomach than you really hear) to about 20000 Hz, though as people grow older, that upper limit decreases quite a bit, and you become unable to hear the highest tones¹. Sound past that 20000 Hz barrier is what we call ultrasound - sound that is beyond (most) human ears ability to hear.

Ultrasound and you(r computer)

If you think of a thing that comes to mind when you hear “ultrasound”, it might be the thing that a doctor uses to look at what’s going on inside of you (most prominently, to check how a baby that is growing inside of you is doing). So you might be forgiven if you think of it as something that requires special devices with fun names like transducer or whatever. Not so! While our technical devices are, of course, designed to operate broadly in the same range as our ears, they generally can go a little bit beyond that. Yes, even your shitty laptop speakers and microphone are, technically, ultrasound-capable devices! And while they do usually pretty heavily start to cut frequencies once you get to about 18000 Hz or thereabouts for technical reasons, you can probably still get a signal through, and it’ll be almost or completely inaudible to most people².

So lets transmit some data

To recap:

We can play audio from a normal computer speaker that is not audible to most people
We can record that audio with a normal computer microphone

So if we can encode some data into that audio signal somewhere in that high frequency range, we can transmit data! How do we do that? This is usually the point where we’d have to write some python or C code, which if you wanted to play along, you’d have to run. Fortunately (?)³, there is now WebAudio, which is audio for the web, so instead, I can link you to a website! I’ve implemented two variants:

The thing that comes to mind immediately when you think of “how to transmit letters over audio” - Morse code dots and dashes. I’m going to skip over this, since it didn’t work super well so I pivoted to:
Since that works poorly and is actually a real pain to decode with a computer, a sort of return-to-zero frequency shift keying, where the data is encoded as shifts in the pitch of a beep.

Open the website on both your phone and your laptop and try shooting some messages back and forth!

Here’s the relevant excerpt from the encoder:

// Turn a sequence of 0s and 1s (as a string) into a RTZ FSK audio signal
function playSequence(bits) {
    const now = audioCtx.currentTime;

    // The oscillator generates a sine wave at a specified frequency
    const osc = audioCtx.createOscillator();
    const gainNode = audioCtx.createGain();

    // Fade in the oscillator to avoid turn-on click
    osc.connect(gainNode).connect(audioCtx.destination);
    osc.frequency.setValueAtTime(fp, now);
    gainNode.gain.setValueAtTime(0, now);
    gainNode.gain.linearRampToValueAtTime(1, now + fadeTime);

    // Send all the bits
    const startTime = now + fadeTime;
    const rampTime = ramp;
    for (let i = 0; i < bits.length; i++) {
        // Decide to which frequency we are going to change
        const bit = bits[i];
        const dataFreq = bit === '1' ? f1 : f0;

        // Compute start and end times
        const t0 = startTime + i * bitDuration;
        const t1 = t0 + pilotDur;
        const t2 = t1 + dataDur;

        // Transition from pilot tone to one of two data tones
        osc.frequency.setValueAtTime(fp, t1 - rampTime);
        osc.frequency.linearRampToValueAtTime(dataFreq, t1);

        // Transition from data tone back to pilot tone
        osc.frequency.setValueAtTime(dataFreq, t2 - rampTime);
        osc.frequency.linearRampToValueAtTime(fp, t2);
    }

    // Fade out the oscillator to avoid turn-off click
    const endTime = startTime + bits.length * bitDuration;
    gainNode.gain.setValueAtTime(1, endTime);
    gainNode.gain.linearRampToValueAtTime(0, endTime + fadeTime);

    // Activate
    osc.start(now);
    osc.stop(endTime + fadeTime + 0.001);
}

The characters are encoded as 8-bit ascii, with a preamble of 10 0s before the actual signal, and each character framed by two 1s, which makes it reasonably easy to find the state of the message.

The resulting audio looks like this, with the waveform below, and a spectrogram - essentially, a lot of those spectrum analyzers laid on the side and taped to each other, with the colour getting more intense the higher the bar - on top:

On the receiving end, the shifts in frequency are detected (by computing the spectrum and looking at which frequency has the most energy in it), turned back into bits, and finally, characters.

function process() {
    // Get band energy from analyzers
    const dP = new Float32Array(analyserP.frequencyBinCount);
    const d0 = new Float32Array(analyser0.frequencyBinCount);
    const d1 = new Float32Array(analyser1.frequencyBinCount);
    analyserP.getFloatFrequencyData(dP);
    analyser0.getFloatFrequencyData(d0);
    analyser1.getFloatFrequencyData(d1);

    // Pick the bin with the correct frequency
    const binP = Math.round(fp / (audioCtx.sampleRate / analyserP.fftSize));
    const bin0 = Math.round(f0 / (audioCtx.sampleRate / analyser0.fftSize));
    const bin1 = Math.round(f1 / (audioCtx.sampleRate / analyser1.fftSize));
    const maxP = dP[binP];
    const max0 = d0[bin0];
    const max1 = d1[bin1];

    // Edge detect
    if (rxState === 'pilot') {
        if (max0 > maxP + margin && max0 > threshold) {
            bufferBits += '0';
            rxState = 'data';
        } else if (max1 > maxP + margin && max1 > threshold) {
            bufferBits += '1';
            rxState = 'data';
        }
    } else if (rxState === 'data') {
        if (maxP > max0 + margin && maxP > max1 + margin && maxP > threshold) {
            rxState = 'pilot';
        }
    }
    decodeBuffer();
}

// Detect preamble and decode frames
function decodeBuffer() {
    // Find preamble: ten zeros (pilot pulses)
    const preamble = '0000000000';
    const preambleIdx = bufferBits.indexOf(preamble, 0);
    if (preambleIdx >= 0) {
        // Found preamble, cut everything before it from buffer
        bufferBits = bufferBits.slice(preambleIdx + preamble.length);
        log('Syncing to preamble at index ' + preambleIdx + ', remaining bits: ' + bufferBits.length);
        return;
    }

    if (bufferBits.length < 10) return; // Need at least 10 bits for a frame
    log('Buffer bits: ' + bufferBits);

    // Parse frame: start(1), data LSB, stop(1)
    const frame = bufferBits.slice(0, 10);
    if (frame[0] !== '1' || frame[9] !== '1') {
        // Frame invalid, drop one bit and retry
        log('Invalid frame start or end: ' + frame);
        bufferBits = bufferBits.slice(1);
    } else {
        // Valid frame, decode data
        let c = 0;
        for (let i = 0; i < 8; i++)
            if (frame[1 + i] === '1') c |= (1 << i);
        const char = String.fromCharCode(c);
        recvOutput.value += char;
        log('Decoded char: ' + char);
        bufferBits = bufferBits.slice(10);
    }
}

Does it work?

Sure does:

Please don’t ask how many times I had to rerecord this before it worked cleanly⁴.

You can even kind of find the audio of the encoded message in the audio track of that video (not audible, of course, at least not to my ears).

There are many issues - it’s not robust against interference at all (and once decoding breaks down, it probably won’t recover), there’s almost no error detection and certainly no correction, it’s slow (if I go past 10 or so bits per second, things break), the frequencies I use are relatively low because that makes things easier (so with headphones and if you are younger than me, you can probably hear it beep), etc. It could be vastly improved by someone who is better at DSP or better at JS than me.

If that is you, feel free to grab the code and add, I don’t know, Reed-Solomon error correction for the bitstream (which I tried to do but all the libraries I found are confusing to use for someone who isn’t that good at JS, and I don’t want to implement it myself), or a decoder that isn’t quite so ad-hoc, or whatever else comes to mind. The source code is available via view source, or here: https://gist.github.com/halcy/d20b0bc2de82ceae2f6ba8a83901b265. Build the secret ultrasound proximity chat application of all our dreams, exfiltrate some data clandestinely, I’m sure you can come up with something.

The twist

So, it’s a neat, but pretty useless hack, right? Not entirely: This type of signal is actually used by calling sofware to help figure whether there are other devices close to you that are already in a given meeting, in addition to other stuff like bluetooth beacons. Though, there (I presume) the encoding is a bit more robust. Get yourself a spectrum analyzer app for your phone and go hunting for the signal the next time you join a call!

Anyways, I hope you have a good day today, and remember:

That’s the principle by which “anti-loitering” devices that the neighbourhood dipshit that no one likes installed on his porch to keep away the no good youth work - they make annoying noise in the high frequency range. ↩
Even for young, healthy people, hearing very high frequencies is a lot harder. ↩
WebAudio feels like kind of a mess, but then, I don’t know JS well. Also, apparently the thing I am using is already deprecated, so, oops? ↩
Three times. I had to rerecord it three times. ↩

I just wanted to post my theory that ELO are time travelers, honestly.

Comments

Loading comments...

Reply to halcy's post

With an account on the Fediverse or Mastodon, you can respond to this post. Since Mastodon is decentralized, you can use your existing account hosted by another Mastodon server or compatible platform if you don't have an account on this one.

Copy and paste this URL into the search field of your favourite Fediverse app or the web interface of your Mastodon server.