4800bps Speech codec  SDK  v.1.0 

Programmer's Manual

Introduction

4800bps Speech codec Software Development Kit (SDK) is designed for  Internet Telephony, Web based voice communication, Voice mail, Voice chat  developers for easy adding voice codec to their programs.

4800bps Speech codec SDK includes the Codec4800.dll which contains  speech encoder and decoder, implementation examples and free support.

Codec4800.dll has  Dll and COM interfaces. In the SDK  examples are included.

Speech codec allows  to compress digitized speech signal to output 4800bps bitrate  and decompress it.

Voice Activity Detector (VAD) for recognizing of voice and pauses to speech codec is embedded. VAD allows more efficiently channel resources using. VAD is adaptive to noise level.

4800bps speech codec compatible  with  Java  version of codec. For more information, please, contact us.

Codec performance:

Encoder input speech signal : PCM format, sampling frequency is 8000Hz, 16 bits per sample.

Decoder output speech signal format and parameters are the same.

Encoder output bitrate: 4800 bits per second.

Algorithmic delay: < 60ms.

Frame length: 240 samples.

Coded frame length: 18 bytes.

How to use  4800bps speech codec classes

In the SDK  are included:

1.       COM implementation examples,

2.       DLL implementation example.

Both examples for Visual C++ 6.0.

COM implementation Example1 allows to compress digitized voice file and decompress it. The input file  for compression must be in PCM format, sampling frequency is 8000Hz,  16bit per sample. The output file format after decompression is the same.

COM implementation Example2 allows to compress the digitized voice frame by frame. This case is appropriate for most application.

DLL implementation example allows to compress and decompress the digitized voice frame by frame.

Speech codec  in the Voice Recording Applet http://www.vimas.com/ve_record_applet_sdk.htm  and Web Voice Mail http://www.vimas.com/ve_voice_mail.htm is used.  You can test voice quality online on these sites. Please, do not forget, the voice quality depends on microphone which you used, so use good microphone for testing.

The trial version of speech codec has the same functionality as licensed version but encoder can process only 10 sec. of the  digitized speech.  

For free support contact us.

New!   Available now!

1.       The  16kbps wideband speech codec SDK in C++ and Java.

2.       The 4800bps speech codec SDK in Java.

    Contact us.

 

Reference Guide

a) DLL implementation

Codec4800.dll includes 4 public methods:

1. HANDLE   WINAPI Init( );

2.  void      WINAPI DeInit(  HANDLE hCodec);

3.  short     WINAPI FrameEncoder(   HANDLE hCodec,   const short*,    unsigned char* ,     short  );

4.  void     WINAPI FrameDecoder(   HANDLE hCodec,    const unsigned char* ,     short*,       short  )

 

 

1. HANDLE   WINAPI Init( );

 

Init( )

 

Prototype:

HANDLE   WINAPI Init( )

Description:

Creates the codec object

Parameters:

 Non.

Return value:

 Pointer to the codec object

 

 

2.  void      WINAPI DeInit(  HANDLE hCodec)

 

DeInit( )

 

Prototype:

void   WINAPI DeInit(  HANDLE hCodec  )

Description:

Deletes the codec object

Parameters:

Pointer to the codec object

Return value:

 Non.

 

 

 

3.  short     WINAPI FrameEncoder(   HANDLE hCodec,   const short*,    unsigned char* ,     short  )

 

FrameEncoder()

 

Prototype:

short     WINAPI FrameEncoder( HANDLE hCodec,   const short*,    unsigned char* ,     short  )

Description:

Encodes 240 samples speech frame and makes decision: voice or pause.

Parameters:

 

HANDLE hCodec

Pointer to the object

const short*   

Input data. Pointer to array which contains 240 samples of the input speech. Each sample is 2 bytes, so the array size is 480 bytes. In each sample the LSB (Least Significant Byte )  is first, the MSB (Most Significant Byte) is second.

unsigned char*

Output data. Pointer to array which contains 18 bytes of compressed speech frame

  short 

Input data. Parameter for VAD threshold adjustment. Recommended value is 50.

Return value:

 

short

Output data. VAD decision.  1 – frame is speech, 0 – frame is pause.

 

4.  void     WINAPI FrameDecoder(   HANDLE hCodec,    const unsigned char* ,     short*,       short  )

 

FrameDecoder()

 

Prototype:

 void  WINAPI FrameDecoder(HANDLE hCodec,  const unsigned char* ,     short*,       short  )

Description:

 Decodes the 240 samples speech frame.

Parameters:

 

HANDLE hCodec

Pointer to the object

const unsigned char*

Input data. Array which contains 18 bytes of compressed speech

short*

Output data. Array which contains 240 samples of the decoded speech. Each sample is 2 bytes, so the array size is 480 bytes. In each sample the LSB (Least Significant Byte )  is first, the MSB (Most Significant Byte) is second

short

Input data. Reserved for future lossed frames compensation mechanism.

Return value:

Non.

 

b) COM implementation  (C++)

 

1. HRESULT Encode([in] VARIANT Source, [in] VARIANT Dest, [in] short Tresh, [out,retval] VARIANT_BOOL* pIsSpeech)

2. HRESULT Decode( VARIANT Source, VARIANT Dest, VARIANT_BOOL LossFrame)

3. HRESULT     Reset( )

4. [propget] HRESULT FrameSize( short* retval)

5. [propget] HRESULT CodedFrameSize( [out, retval] short* retval)

1.   HRESULT Encode([in] VARIANT Source, [in] VARIANT Dest, [in] short Tresh, [out,retval] VARIANT_BOOL* pIsSpeech)

 

Encode( )

 

Prototype:

HRESULT Encode( [in] VARIANT Source,  [in] VARIANT Dest,  [in] short Tresh, [out,retval] VARIANT_BOOL* pIsSpeech)

Description:

Encodes 240 samples speech frame and makes decision: voice or pause.

Parameters:

 

[in] VARIANT Source

Input data object.  It can represents:

a)       array which contains 240 samples of the input speech frame, sampling frequency is 8000Hz, PCM format. Each sample is 2 bytes, so the array size is 480 bytes. In each sample the LSB (Least Significant Byte )  is first, the MSB (Most Significant Byte) is second;

b)       name of digitized voice file in PCM format. Sampling frequency is 8000Hz, 16bits per sample.

[in] VARIANT Dest

Output data object. It can represents:

a)       array which contains 18 bytes of compressed speech frame;

b)       name of compressed voice file.

[in] short Tresh

Input data. Parameter for Voice Activity Detector (VAD) threshold adjustment. Recommended value is 50.

[out,retval] VARIANT_BOOL* pIsSpeech

Output data. VAD decision.  True  – frame is speech, false – frame is pause.

Return value:

Non.

 

2.  HRESULT Decode( [in] VARIANT Source, [in] VARIANT Dest, [in] VARIANT_BOOL LossFrame)

 

Decode( )

 

Prototype:

 HRESULT Decode(  [in]  VARIANT Source,  [in]  VARIANT Dest, [in] VARIANT_BOOL LossFrame  )

Description:

 Decodes the 240 samples speech frame.

Parameters:

 

[in] VARIANT Source

Input data object.  It can represents:

a)       array which contains 18 bytes of compressed speech;

b)       name of compressed speech file. It is the consecutive of the 18bytes frames.

[in] VARIANT Dest

Output data object. It can represents:

a)       array which contains 240 samples of the decoded speech, sampling frequency is 8000Hz,PCM format.  Each sample is 2 bytes, so the array size is 480 bytes. In each sample the LSB (Least Significant Byte )  is first, the MSB (Most Significant Byte) is second;

b)       name of decompressed speech file in PCM format. Sampling frequency is 8000Hz, 16bits per sample.

[in] VARIANT_BOOL LossFrame 

Input data. Reserved for future lossed frames compensation mechanism

Output arguments:

 Non.

 

 3. HRESULT    Reset( )

 

Reset( )

 

Prototype:

HRESULT         Reset( )

Description:

Set the initial values of variables in the encoder and decoder.

Parameters:

Non.

Return value:

 Non.

           

4. [propget] HRESULT FrameSize( [out, retval]  short* retval)

 

FrameSize ( )

 

Prototype:

[propget] HRESULT FrameSize( [out, retval]  short* retval )

Description:

Calculates the size of input frame in bytes. This value allways is 480.

Parameters:

 

[out, retval] short* retval

Output data. Size of input frame in bytes. This value allways is 480.

Return value:

Non.

 

5. [propget] HRESULT CodedFrameSize( [out, retval] short* retval )

 

CofedFrameSize ( )

 

Prototype:

[propget] HRESULT CodedFrameSize( [out, retval] short* retval)

Description:

Calculates the size of output frame in bytes. This value allways is 18.

Parameters:

 

[out, retval] short* retval

Output data. Size of input frame in bytes. This value allways is 18.

Return value:

Non.

 

Copyright © VIMAS Technologies, 2001-2002.