ActionScript 3: Sound.extract() Demystified or How to Draw a Waveform in Flash

I was working on a Flash project recently and had to draw the waveforms of certain files that were included in the project. To my surprise, there wasn't much information on the Internet on how to achieve that in Flash.

There were indeed a couple of nice articles on the topic that I was able to locate: Rendering spectrums with Sound.extract() & Plotting a Sound Wave in Flash AS3, and although the code examples were really good, I realized from a beginner point of view they were a bit hard to understand. I also felt some key concepts that a beginner needs to know about how sound data is stored, how byte arrays work, what does Sound.extract() do exactly etc. were missing from these articles.

On the other hand, the official Adobe Documentation concerning the same topic was also lacking detailed information on certain aspects of the Sound & ByteArray classes that one needs to know in order to understand these classes fully, so that's what made me write this article.

If you have a certain amount of programming experience, the articles I mentioned above are more than a good start and you probably won't notice the lack of information I'm talking about, but I thought it would be nice if there was a tutorial directed towards beginners with little programming experience, so that they don't have to read dozens of pages & extra articles to understand what's going on.

First of all, let me explain what a waveform is, because I've seen different people giving it different names. A waveform is the visual representation of an audio signal (see the screenshot below)

A Waveform

Waveforms are often used in audio editors & audio players. One good example for a Flash application using waveforms is the Soundcloud Audio Player

To render a waveform of a certain audio file, you need to access and process its raw sound data. Sound data is stored as a sequence of sample blocks. A sample block is simply a collection of byte sequences. Each sequence of bytes stores a number and corresponds to a certain audio channel. For example, in a mono file each sample block contains only 1 sequence of bytes, in a stereo file you would have 2 sequences of bytes per sample block - 1 for the left channel & 1 for the right one etc. The number stored in each sequence controls the fluctuation of the audio speaker (if you're unsure what this is, you may want to read this basic explanation on How Speakers Work), or in other words: it "tells" the speaker coil how much to push or pull on the speaker cone in order to move the speaker diapraghm. This movement/fluctuation is in fact what produces the sound we hear.

You may be really confused at this point, but things will get clearer now, when you realize how raw sound data is stored in Adobe Flash.

Sounds in Flash are stored in Sound objects. To extract the raw sound data of a Sound object, you would use its extract() method. No matter what file you imported, the raw sound data is always returned as 44100Hz Stereo with the sample type being a 32-bit floating-point value - this means that no matter whether your original file was mono or stereo, whether its sample rate was 44100Hz or lower, its sample type 8-bit or 16-bit or 32-bit etc., the sound data returned by the extract() method will still be converted to 44100Hz Stereo, 32-bit, which on the other hand means:

  • Stereo: each sample block contains 2 sequences of bytes: 1 for the left channel, 1 for the right one
  • 32-bit Sample Type: each byte sequence is 32 bits long and since 1 byte = 8 bits, when we do a conversion we realize that each byte sequence is 4 bytes long, or in other words each sample block contains a total of 8 bytes.
  • 44100Hz Sample Rate: 1 second of audio is represented by 44100 sample blocks per channel. Since the file is Stereo (2 channels), this makes a total of 88200 sample blocks per second - 44100 for the left channel + 44100 for the right channel

So, the Sound.extract() method stores the raw sound data in a ByteArray. Each element of the ByteArray contains a byte (8 bits) of information. This means we need to combine every 4 elements in order to form a 32-bit sequence and since a sample block contains 2 of these sequences, because we are working with a Stereo file, every 8 elements (bytes) in the ByteArray form a sample block (see the schematic)

Sample Blocks schematic

But what do these bytes actually represent? Every sequence of 4 bytes forms a 32-bit floating point number between -1 and 1. "-1" means the speaker cone is pulled at its maximum, while "1" means a maximum push on the speaker cone. "0" on the other hand means "silence".

Having to combine every 4 bytes in order to form a 32-bit floating point number of course might be a tedious process. Fortunately, the Adobe guys thought about this and provided the ByteArray class with a readFloat() method. What readFloat() does is to read a sequence of 4 bytes (32 bits) and return the corresponding floating point number.

But how do we actually interpret the data and display in on screen. Well, in a typical waveform the X Axis shows how much time has elapsed since the start of the audio file and the Y Axis shows the speaker fluctuation (see the figure below)

Waveform data interpretation

However, if you have, say, a 5 minute audio file, that's over 26 million sample blocks - 13,230,000 per channel to be precise. That's an insanely big number - imagine you have 1 pixel on the X Axis representing a sample block, you would need 6890 24" computer monitors at 1920x1080 screen resolution to be able to see the waveform of the entire file. No, actually you won't need that many monitors, because Flash will probably stop responding long before the data is actually displayed, after all that's over 26 million drawing operations as well!

So, what you have to do is summarize the data before you put it on screen. In the example code I'm going to show you below, I split the second into 4 equal parts (that's 250ms or 11025 sample blocks per channel), but you can really experiment with any ratio. So, I analyze these 11025 sample blocks and determine the minimum and maximum values, then I draw a line that connects these values, then I analyze the next sequence of 11025 sample blocks etc. etc. until I'm out of raw sound data to analyze. This results in a good looking waveform that is also accurate and the smaller the ratio - the more accurate the waveform. Of course, this is a very, very simple waveform rendering algorithm, there are much more advanced ones out there, but that's a good start in understanding how to draw waveforms in Flash.

Here's a screenshot from my example waveform drawing application:

Waveform drawing application screenshot

Here's the actual example code:

import flash.display.Sprite;  // we need a sprite to draw the waveform to  
import flash.media.Sound;     // we need the sound class to extract raw sound  
                              // from our Sound objects
import flash.utils.ByteArray; // we need the byteArray class to store the sound  
                              // data extracted from our objects

// I have imported a random MP3 to my Flash Project and gave it class name "TestSound"
// So I just create a new Sound object using that "TestSound" MP3 that's in my library.
var sound:Sound = new TestSound();

// We will store the raw sound data in a ByteArray called "soundData"
var soundData:ByteArray = new ByteArray();

// We need two sprites to draw the waveforms for the left and right channel
var waveformLeft:Sprite = new Sprite();  
var waveformRight:Sprite = new Sprite();

// We set a basic line style and reset the drawing position for each Sprite
waveformLeft.graphics.moveTo(0,0);  
waveformLeft.graphics.lineStyle(1,0xff0000);  
waveformRight.graphics.moveTo(0,0);  
waveformRight.graphics.lineStyle(1,0xff0000);

// this is how we extract the raw sound data from our Sound object. The tricky
// part here is that the extract() method requires two parameters to be passed to it.
//
// - A reference to a ByteArray object in which the extracted raw sound data will be placed,
//       which is the "soundData" byteArray we defined at line 12
//
// - A length parameter that specifies number of sample blocks to extract from that Sound
//   object. Since we already know the data will be returned as 32-bit, 44100Hz Stereo, we
//   can easily calculate the number of sound blocks by multiplying the sound length in
//   seconds by the number of samples blocks per second (aka sample rate), which is 44100,
//   but because the "length" property of the Sound object returns a value in milliseconds,
//   we need to do a conversion here dividing it by 1000, so we can get a value in seconds,
//  hence:
//
//  Total Sample Blocks to Extract = (sound.length/1000)*44100;
sound.extract(soundData,Math.floor((sound.length/1000)*44100));

// the extract() method places the file pointer at the end of the ByteArray (something not
// mentioned in the Adobe Flash Documentation). Therefore, we need to reset the file pointer
// to the beginning of the ByteArray Object, which is position = 0
soundData.position = 0;

// for drawing purposes, we set our initial X Axis position to 0 and define a variable called
// "xStep", which determines how many pixels along the X Axis to move with each drawing step.
//
// As I have decided to draw lines connecting the minimum and maximum values of every 11025
// sample blocks (or very 250ms) that means with a "xStep" value of 1 pixel, 250ms = 2 pixels,
// or 1 pixel = 125ms
var xPos:uint = 0;  
var xStep:uint = 1;

// since the raw sound data is a floating point number between -1 and 1, it will be really hard
// to notice the fluctuations if we use that number as-is to draw our waveform lines. Therefore
// we define an "yRatio" variable to expand the visible range of the fluctuations that can be
// drawn on screen. I chose the number "100" based on error and trial - you can experiment with
// different numbers.
var yRatio:uint = 100;

// we loop through the soundData until we have enough bytes to read. We determine the number of
// bytes left using the "bytesAvailable" property of the "soundData" ByteArray object.
while(soundData.bytesAvailable > 88200)  
{
    var leftMin:Number = Number.MAX_VALUE; // a variable to store the minimum value for the Left Channel
    var leftMax:Number = Number.MIN_VALUE; // a variable to store the maximum value for the Left Channel
    var rightMin:Number = Number.MAX_VALUE;// a variable to store the minimum value for the Right Channel
    var rightMax:Number = Number.MIN_VALUE; // a variable to store the maximum value for the Right Channel
    for (var i:uint = 0;i<11025;i++) // analyze every 11025 sample blocks and determine their
    {                                // minimum and maximum values
            // we use the "readFloat()" method of the "soundData" ByteArray object to retrieve
            // the raw sound data for the left and right channels. When you call "readFloat()"
            // it retrieves the next 4 bytes from the ByteArray object, converts them to a
            // 32-bit single precision floating point number and moves the file pointer to the
            // next sequence of 4 bytes (not explained in the Adobe Flash Documentation)

            // read raw sound data for left channel (4 bytes/32 bits)
            var leftChannel:Number = soundData.readFloat();
            // read raw sound data for right channel (next 4 bytes/32 bits)
            var rightChannel:Number = soundData.readFloat();
            // 4 bytes + 4 bytes = 8 bytes = 1 sample block, remember? :)

            // check if we have a new minumum or maximum values for the left or right channels
            if (leftChannel < leftMin) leftMin = leftChannel;
            if (leftChannel > leftMax) leftMax = leftChannel;
            if (rightChannel < rightMin) rightMin = rightChannel;
            if (rightChannel > rightMax) rightMax = rightChannel;
    }
    // draw lines connecting the minimum and maximum values of the left and right channels
    // to their corresponding sprites.
    waveformLeft.graphics.lineTo(xPos,leftMin*yRatio);
    waveformRight.graphics.lineTo(xPos,rightMin*yRatio);
    xPos += xStep;
    waveformLeft.graphics.lineTo(xPos,leftMax*yRatio);
    waveformRight.graphics.lineTo(xPos,rightMax*yRatio);
    xPos += xStep;
}

// at this point the waveforms have been drawn to our left channel and right channel sprites.
// it's time to position these sprites relative to the Stage and add them to the Stage as well.
waveformLeft.x = 0;  
waveformLeft.y = 150;  
waveformRight.x = 0;  
waveformRight.y = 450;  
stage.addChild(waveformLeft);  
stage.addChild(waveformRight);

// and this is just a little extra, use the code as-is, it's not relevant to the actual
// waveform drawing process. it just enables you to scroll left and right using the
// arrow keys, so you can see the entire waveform.
var leftKey,rightKey:Boolean;  
var scrollStep:uint = 100;  
stage.addEventListener(KeyboardEvent.KEY_DOWN,OnKeyDown);  
stage.addEventListener(KeyboardEvent.KEY_UP,OnKeyUp);  
stage.addEventListener(Event.ENTER_FRAME,OnEnterFrame);

function OnKeyDown(e:KeyboardEvent):void  
{
        switch(e.keyCode)
        {
                case 39:
                        rightKey = true;
                        leftKey = false;
                        break;
                case 37:
                        rightKey = false;
                        leftKey = true;
                        break;
        }
}

function OnKeyUp(e:KeyboardEvent):void  
{
        switch(e.keyCode)
        {
                case 39:
                        rightKey = false;
                        break;
                case 37:
                        leftKey = false;
                        break;
        }
}

function OnEnterFrame(e:Event):void  
{
        if (rightKey)
        {
                waveformLeft.x -= scrollStep;
                waveformRight.x -= scrollStep;
        }
        if (leftKey)
        {
                waveformLeft.x += scrollStep;
                waveformRight.x += scrollStep;
        }
}

Download the *.fla file

For more advanced examples, check these articles:

So that's about it, I hope this tutorial was useful, don't hesitate to write a comment if you have any additional questions or I was unclear at some point.

Marin Bezhanov

Marin Bezhanov


Read more posts by this author.

 

You may also like

    Comments powered by Disqus