Voice

Introduction

Integration of voice into data networks is now become a reality so this document has been written to give an overview of traditional circuit-switched voice operation and the elements that allow it to become part of a data network. Then we look at current packetised voice over data technologies, in particular Voice over IP (VoIP).

Traditional Switched Circuit Voice Operation

Switch Topology

In the USA 1974 saw an anti-trust suit brought against AT&T due to its very large userbase of telephone switches giving it an overwhelming monopoly within the voice provision market. The Modified Final Judgement in 1982 resulted in AT&T having its local call access given to seven Regional Holding Companies which were nick-named 'Baby Bells'. AT&T kept its manufacturing businesses and its long distance services.

The structure for telephone provides now means that local telephony services are provided by a Local Exchange Carrier (LEC). The LEC is however restricted in its operation to within its Local Access Transport Area (LATA). Calls between LATAs have to be handled by an Inter Exchange Carrier (IEC or IXC). There are 200 LATAs within the US.

The services that are offered by telephone companies include Plain Old Telephone Service (POTS) and Custom Local Area Signalling Services (CLASS) which enhances POTS by providing call screening, security and display features. Also available is Advanced Intelligent Networking which brings the CLASS-type features back into a centralised database, examples include Centrex that provides a virtual PBX with most of the features of a PBX being supplied by the CO.

The Central Office (CO) switches are provided by the Telephony service providers and provide the service for the Public Switched Telephone Network (PSTN) and for business environments. Businesses can also have their own switch called a Private Branch Exchange (PBX) which is a smaller version of the CO switch. The connections between the CO switches are called Interoffice Trunks and carry all the calls. Circuits that connect the CO switch to the business Private Branch Exchange (PBX) are called CO Trunks whereas trunks between PBXs are called Tie Trunks. A user that requires access to the CO trunk normally dials a code such as '9' to access it. Trunks provide the paths between switches and often have many circuits which are 'grabbed' and 'released' as and when required for calls.

The CO switches forming the PSTN also provide the connections to domestic and small business telephones. Each connection is serviced via the Local Loop which is a two-wire connection. The PSTN is circuit-switched and guarantees end-to-end connection during the call. The resources associated with that call are tied up for the duration of the call.

Traditionally, telephones use analogue technology, however many organisations also use digital telephones that contain analogue to digital converters. Most PBXs are digital.

Although the PBX is a cut down versions of the CO switch, the Key Switch or Key System is NOT a cut down version of the PBX. Key Switches tend to support up to a maximum of 250 users, incoming calls tend to be visible to all users and these users can grab outside lines directly rather than having to go through the PBX switch by dialling 9. Key switches tend to be analogue-based and they cannot switch between trunks so they cannot re-route calls out on to a different trunk.

PBX Connections

PBXs are linked to COs via trunk links such as E1 links which are able to take 30 x 64kbps channels where each channel can take one call. You can get also get trunk links between PBXs which are called private trunk links. Other types of links that can act as trunks include T1(DS1), ISDN BRI, ISDN PRI and fractional T1/E1.

When linking a number of PBXs e.g. for a large organisation with a number of offices, the ideal would be to fully mesh the links.

A fully-meshed is expensive both in WAN link costs and interface card costs. As you can see in the above diagram, the fully-meshed topology requires significantly more circuits (6) and more interfaces compared to a partially-meshed topology (only 3 circuits). A Tandem Switched approach is often used instead as a compromise. The difficulty with Tandem switched networks is that there are now multiple hops to some destinations which incur delay in the call. In the above example, for A to reach B, the call must first be routed to C before being routed to B. The PBXs involved in Tandem Switching must be able to route calls from an inbound trunk onto an outbound trunk and two timeslots are used.

The Private Branch Exchange (PBX)

The mechanical PBX was invented by an undertaker called Strowger in Kansas, USA around 1889. The idea was to replace a manual operator and allow the caller to decide where a call was to be completed.

The PBX or PABX ('A' for Automatic) contains many features which can include:

Lowest Cost Path Routing - having the preferred paths, based on cost, line quality, reliability and billing information.
Automatic Call Distribution (ACD) - finding available telephones in a pool, often called Hunt Groups.
Voice Mail
Call Forwarding - automatically forwarding of calls when a telephone is unavailable.
Calling Line Identification (CLI) - that maps the caller's number to a name in a database.
Calling Number Blocking - blocking of unwanted numbers.
Voice Conferencing - allowing a conversation to occur between more than two people.

The CO switch contains a Battery that provides power for both ringing and for the call itself. Local power to the basic analogue telephone is not required. The switch also contains the following components that enable the basic telephone to function:

Current Detector - this monitors whether the circuit is open (On Hook) where no current is flowing; or closed (Off Hook) where current is flowing.
Dial Tone Generator - this tone indicates that the switch has recognised a request by a user to make a call.
Digit Register - this recognises and deals with the dialled digits.
Ring Generator - this sends a rining signal to the called party.

Connections to the switch occur via the Terminal Interface and these connections tend to be the trunk, the lines and the telephone connections. The transmission paths between the end devices are provided by the Circuit Switching Network portion of the switch, whereas the Control Complex provides the following:

Call Setup
Call Supervision
Call Disconnection
Memory
Logic

An important feature of the PBX is its Call Accounting ability. A PBX maintains Call Detail Records (CDR), or Station Message Detail Recording (SMDR), and outputs this information to an external computer, often as 60 character length strings. The information often contains the length and cost of calls per user, department etc. Plus where the calls were directed i.e. over the trunk, local or international.

Telephone services can be provided in different ways, the analogue Plain Old Telephone Services (POTS), CLASS and AIN with intelligent features held in a central intelligent database and the system operates like a virtual PBX spread over several sites. An example of AIN is the Centrex system.

Central Office Exchange Service (Centrex)

Centrex is a way of off-loading the responsibility and cost of maintaining a PBX on to the Central Office which houses the Digital switch instead. Centrex is a service which provides reliability, resilience, support for all types of telephone equipment, support for DID, flexible upgrades, no risk of obsolete equipment and unlimited expansion.

Centrex can have high recurring costs, plus the response time for changes or additions may not be as speedy as if the organisation had its own PBX. In some countries, certain features that exist on a private PBX may not be permitted on a Centrex system.

Telephone Call Operation

Using Loop-Start signalling, traditional telephone systems operate more or less as follows:

The telephone starts off by being On Hook (Idle).
The caller lifts the handset, this is called going Off Hook and tells the switch that you wish to make a call, the telephone 'seizes' the line.
The initial electrical circuit is set up because the by going Off Hook the circuit is made and the battery can send current. The CO switch now knows that a call is being requested and acknowledge the seizure of the line by 'Winking' the circuit.
The telephone switch either public or a Private Branch Exchange (PBX) returns a dial tone (2500Hz in the UK) which informs the caller that the switch is ready to receive dialled digits.
The number to be called is dialled.
In a private organisation if it is an external call then the PBX makes a routing decision and using network signalling setup messages, requests a 64Kbps slot in the trunk link to the Central Office (CO) e.g. E1 or T1.
The CO sets up a path based on the number, it does this by 'seizing' a circuit and sending a request to the destination PBX.
The PBX at the other end learns of the call.
The PBX at the other end sets up an AC voltage (20 - 47Hz) for the ringing of the remote telephone.
The local PBX sends a ringback tone to the caller to inform them that the phone is ringing at the other end.
The telephone handset is picked up and the loop is established local to the called party.
The ringing voltage and ringback tones are removed from the circuit.
Acoustic couplers in the phones convert the speech into modulating current that is transmitted end-to-end.
Part of the signal is fed back into the talking person's earpiece. This is called Sidetone and is a comfort signal.

Release signals can vary from switch manufacturer to switch manufacturer. Some switches are able to measure the time from going off hook until the first digit is dialled. If this exceeds a pre-defined time limit then the loop may be connected to an announcement and/or a Receiver Off Hook (ROH) tone.

The signalling between the subscriber switches and the telephony service providers can be identified as follows:

Supervisory Signalling - electrical voltages and tones that can be heard are used to signify call status as follows:
- On-hook - produces an open circuit which does not allow any signalling, only the ringer can operate.
- Off-hook - lifting the handset closes the circuit and allows the telephone switch to send an audible dial tone to the receiver.
- Ringing - the switch sends a ringing voltage to the destination telephone as notification of an incoming call. Also an audible ringing tone is sent to the caller telephone to indicate that the call is progressing. This tone takes the form of a pattern called Cadence In Europe this Cadence takes the form of a double ring (duration of 0.4s separated by 0.2s) followed by two seconds of silence, whereas in the US it takes the form of two seconds of ring followed by four seconds of silence.
Address Signalling - there are two types of dialling:
- Pulse Dialling - this is the original form of dialling a number. The telephone has a rotary dial mounted on to a spring that returns the dial to its original position when it is turn. Each number is identified by the switch by how many makes and breaks are made of the local loop. The ratio of make to break must be 40% : 60%. The number of make/break cycles corresponds to the number being dialled. Each position on the rotary dial corresponds to a different number. Typically the cam that causes the makes and breaks will give 10-20 pulses a second.
- Tone Dialling - Now more commonly used is the Dual Tone Multi-Frequency (DTMF) method that uses the concept of the keypad where each key position is represented by two tones. Each row is assigned a different low frequency whilst each column is assigned a different high frequency.

When a key is pressed two tones are sent to the telephone company a low frequency tone and a high frequency tone which identify the key being pressed in much the same way X and Y co-ordinates identify a point on a graph.
Informational Signalling - The following tones are used to describe the call progress:
- Dial Tone - (Continuous 350Hz + 440Hz) indicates that the the switch is ready to receive digits.
- Busy Tone - (480Hz + 620Hz, 0.5s on and 0.5s off) indicates that the other end is busy.
- Line Ring Back - (440Hz + 480Hz, 2s on and 4s off) means that the telephone company is in the process of completing a call on behalf of the caller.
- PBX Ring Back - (440Hz + 480Hz, 1s on and 3s off) means that the switch is in the process of completing a call on behalf of the caller.
- Congestion - (480Hz + 620Hz, 0.2s on and 0.3s off) means that there is congestion in the network along the path so that the call cannot be set up.
- Reorder - (480Hz + 620Hz, 0.3s on and 0.2s off) means that all the circuits are busy on the local switch.
- Receiver Off Hook - (1400Hz + 2060Hz + 2450Hz + 2600Hz, 0.1s on and 0.1s off) means that the other end has left the receiver off the hook.
- No Such Number - (Continuous 200Hz + 400Hz) means that the dialled number does not exist.
- Confirmation Tone - (Noise at a frequency of 1Hz sounds like a slow rasping noise) means that the call setup is being attempted.

Analogue Interfaces

Foreign Exchange Station (FXS)

The Foreign Exchange Station (FXS) interface provides an analogue connection to a Group 3 fax or analogue phone. The FXS interface imitates a switch and provides power, ring voltage and dial tone just as a PBX telephone port would. The trunk side of a Key system or lines going to the CO switch from a PBX would use an FXS port.

Normally an FXS port used for an analogue phone would be set to Loop Start signalling, where as if a Key System or PBX is connected then Ground Start signalling would be preferred (see later for signalling). The Call Progress Tone is country dependent and includes the dial tone, busy tone and the ring back tone. The Cadence is also country dependent and defines how the ringing voltage is sent when a call is required, in the UK this one short ring followed by a longer ring.

Foreign Exchange Office (FXO)

The Foreign Exchange Office (FXO) interface allows you to make an analogue connection to a remote switch either a CO switch in the PSTN or a remote PBX. The switch sees the FXO interface as a telephone and so an FXO port connects to the station side of the PBX N.B. this is different from an FXS interface which expects a telephone to be connected TO it i.e. it needs a dial tone. The FXO interface provides pulse or tone dialling. This means that you can connect between an FXS interface and an FXO interface thereby providing a Foreign Exchange (FX) Trunk. This allows you to set up a long distance extension for a local phone line (called an Off-Premises Extension or OPX).

The signalling method used is normally Ground Start. You also configure the number of rings before the FXO port answers a call, this allows you to redirect calls on a router say after 4 rings if you do not answer it. The FXO port should also be configured for the dial type (pulse or DTMF) for outbound dialling. FXO ports should be able to support Supervisory Disconnect where the port can detect the 350ms drop in power from a connected switch and interpret this as a call disconnect.

E&M (Earth and Magneto)

The E&M (Earth and Magneto) (or RecEive and TransMit, or Ear and Mark) interface is used for two-way analogue trunking between PBXs or network switches. The trunk link carries E&M Lead Signalling which the carriers use to connect to the network Composite Signalling (CX), Direct Current Signalling (DX) and Simplex Signalling (SX) circuits. Nowadays newer digital technology has replaced these other circuits leaving us the E&M Lead Signalling trunks that have yet to be replaced, mainly because the low capacity circuits that they are used for do not warrant the upgrade yet.

There are two ends to the trunk circuit, the Signalling Unit side and the Trunk Circuit side which is the PBX side. If a PBX needs to route a call across the trunk then it must make a request on the signal leads to seize the trunk. Lead signalling occurs on separate wires from the voice wires and is independent of how the voice wires are cabled (The voice wires (audio path) can be 2 or 4-wire). As far as the signal leads are concerened, the E-lead is used for inbound signalling (from signalling equipment to the trunk equipment) whereas the M-lead for outbound signalling. Each of these leads has its own Ground wire. The local PBX (Trunk equipment) makes a request to seize the trunk by sending a current over the M-lead. The remote PBX detects this request on its E-lead. Once the call is complete the remote PBX signals using the M-lead.

The signalling used with E&M can be Wink-Start, Delay-Start or Immediate-Start (see later).

The problem with single wire signalling leads is that although they have little impact in the old-world electro-mechanical systems, nowadays sensitive electronics can be adversely affected by arcing and EM interference. Signalling is far better to be carried out on balanced 2-wire circuits. As a result, new E&M interfaces were introduced. There are five types of E&M interfaces.

E&M Type I

One wire is the E-lead; one wire the M-lead; one pair is used for the transmitted voice and one pair is used for the received voice. The PBX supplies the power for both the M-lead and E-lead and they have to use a common ground thereby restricting the use of E&M Type to within the same building.

The signalling end (CO) generates the E signal by the E-lead being connected to local Ground for Off-Hook and open for On-hook. The PBX (Trunk end) can then detect a current through a resistor. The M-lead is at 8v (85mA) when Off-hook and connected to local Ground when On-Hook. The PBX generates an M-signal by connecting to Battery and the signalling end detects the resultant current through a resistor.

E&M Type II

One wire is the E-lead; one wire the M-lead; one wire is the Signal ground (SG) for the E-lead; one wire is the Signal Battery (SB) for the M-lead; one pair is used for the transmitted voice and one pair is used for the received voice. Having separate returns (SG and SB) for the signalling leads allows the PBXs to exist in separate buildings.

The E-lead (signalling device to trunk circuit) is open circuit when On-Hook and goes to Signal Ground (SG) when Off-Hook and requesting a path. The M-lead (trunk circuit to the signalling device) is open circuit when On-Hook and connects to the Signal Battery (SB) when Off-Hook.

The sensor on the M lead may be biased towards -24v. The diode prevents this negative voltage appearing on the M lead if it is On-hook (open circuit).

E&M Type III

One wire is the E-lead; one wire the M-lead; one wire is the Signal ground (SG) for the E-lead; one wire is the Signal Battery (SB) for the M-lead; one pair is used for the transmitted voice and one pair is used for the received voice. The one difference between Type III and Type II is that with Type III at the PBX (Trunk Circuit) the M-lead has a relay that connects it to SG by default. This means that when the PBX wants to signal using the M-lead, it first has to disconnect the relay. This prevents spurious signals on the M-lead from signalling by mistake.

The E-lead is open circuit when On-Hook and goes to Ground when Off-Hook and requesting a path. The current is much lower on the E lead due to the high resistance E lead detectors used. The M-lead is also open circuit when On-Hook and connects to the Battery when Off-Hook. The blocking diode on the M lead does not necessarily have to be there with Type III.

E&M Type IV

The E-lead is open circuit when On-Hook and goes to Ground when Off-Hook and requesting a path. The M-lead is also open circuit when On-Hook and also connects to the Ground when Off-Hook. Both circuits operate identically.

E&M Type V

One wire is the E-lead; one wire the M-lead; one pair is used for the transmitted voice and one pair is used for the received voice. The PBX supplies the power for the E-lead, the other end supplies power for the M-lead (N.B. this is where Type V differs from Type I) and they use their local ground rather than common ground over SB and SG as in Type IV so this is unbalanced.

The E-lead is open circuit when On-Hook (Idle). The signalling end generates the E signal by the E-lead being connected to Ground (Off-Hook). The voltage can vary between -48v and -2v. The PBX can then detect a current through a resistor. The M-lead is open circuit when On-Hook. The PBX generates an M-signal by connecting to Ground and the signalling end detects the resultant current through a resistor.

Type V interfaces can be connected back to back.

Interface Timers

The following timers are often used to modify how a voice interface behaves:

Ringing Timeout - How long a telephone is rung when nobody picks up the remote end.
Initial Timeout - How long a dial tone will be sent before the first digit is dialled.
Interdigit Timeout - How long the port waits after a digit has been dialled, before the next digit is dialled.
DTMF Digit Timing - This is how long the DTMF digit signal lasts.
DTMF Interdigit Timing - This is how long the gap lasts between the DTMF digit signals.
Hookflash In Timing - The Hookflashes indicates that the caller wants to do something with the call such as transfer it. The 'Hookflash In' time is that for an incoming call such that if it is set to be quite a long time then this means that the calling phone has to be left off-hook for a long time before the call is cleared. Conversely, if the time is too short then this may be mis-interpreted as the caller hanging up. This is used where telephones have Recall keys.
Hookflash Out Timing - This is the Hookflash time that the voice port sends out.

Telephone Connections

In the UK the following diagram indicates the correct pin numbers for the LJU and the RJ45 as specified by BT and AT&T respectively:

BT LJU circuit

The normal UK analogue telephone line into a home or a direct line to an office requires a socket wired according to the above diagrams.

The 'Wiring' end is typically the outside line wiring coming in from, say, BT, and the Line Jack Unit (LJU) is where the telephone is plugged in. The first thing to note is that Surge Protection (SP) is required across the 'A' leg (Signal) and the 'B' leg (Battery) for direct lines. Secondly, a capacitor is installed in line of the 'Bell' leg which provides the capability for the handset to ring using the signal from the 'B' leg. Only the 'A' and 'B' legs need to be run in the direct-line wiring to the socket, however these two plus the ring circuit are picked up at the LJU and run into the handset.

The socket and apparatus wiring is coded as follows:

BK - Black
W - White
G - Green
B - Blue
R - Red
WG - White with Green banding
WB - White with Blue banding
WO - White with Orange banding
OW - Orange with White banding
NW - Blue with White banding
GW - Green with White banding

Note 1: - The wiring to the socket is of a single core type which is secured into the socket terminals using an insulation displacement tool.

Note 2: - The cable most often used to connect apparatus to the wall socket is a multi-cored tinsel type wire manufactured for it's flexibility.

With tinsel cable it's important that the correct tools are used when making terminations as the individual stands of wire are very difficult to solder satisfactorily.

From the diagram it is clear that the wiring arrangement is quite different to that of most countries.

Some countries, such as Ireland and New Zealand have a similar wiring arrangement but use a different type of socket.

The principal differences between the UK and other countries is the incorporation of a voltage arrestor device and components (470kohm resistor and a 1.8uF capacitor) into the wall socket.

The voltage arrestor device is a Gas Discharge Tube (GDT) component intended to short circuit the A-Wire to the B-Wire in the event of voltages exceeding approximately 250V becoming present on the telephone line.

This type of device is relatively slow acting and has been superceded by the installation of a polyswitch type of device in the line interface of most newly designed products.

The 470kohm resistor and 1.8uF capacitor are installed in the wall socket to allow testing of the telephone line from the telephone exchange.

The wall socket also contains a connection to the telecoms apparatus intended to suppress inductive spikes which are generated when loop-disconnect dialling into electro-mechanical exchanges which terminate the phone circuit with a relay coil.

Note: - These exchanges are presently being phased out of operation, which, coupled with MF4 detectors being a design feature of the replacement exchanges is resulting in the diminishing use of loop-disconnect dialling.

Voice adapters

In the UK, If a direct line is run across data structured cabling, then a Full Master LJU-RJ45 voice adapter will be required at the sockets. The Full Master adapter contains Surge Protection required for direct lines that do not go through a local switch, and it is wired according to the diagram above and the diagram below. In 110 patching installations, only one pair patching is required to patch the blue pair carrying the A and B legs.

Analogue voice circuits that are extensions served by a local switch, wired across SCS and use earth recall for services such as transference of extensions from one handset to another, require PABX Master LJU-RJ45 voice adapters. The bell circuit is still present but the surge protection is not required at the outlet since the local switch takes care of surge protection at the point that the outside direct lines run into the switch. In 110 patching installations, two-pair patching is needed to allow for the patching of the clean earth, which is jumpered on the white of the orange pair (i.e. the third termination on the 110 block).

Analogue voice circuits wired across SCS that use 'Time Break' as a means of control, only require 'Secondary' LJU-RJ45 voice adapters. These adapters only have LJU pins 2 and 5 wired straight through to RJ45 pins 5 and 4. In 110 patching installations this presents the voice circuit on the blue pair.

Different Voice Pinouts

Digital telephone systems are becoming more common and they sometimes require proprietary voice adapters. Sometimes LJU pins 1 and 6 are used for intelligent handsets and consoles, so in order to carry out patching on the blue pair in a 110 patching installation, a 'Digital' LJU-RJ45 adapter would need to be constructed where LJU pins 1 and 6 were wired to RJ45 pins 5 and 4.

BT's Meridian Switch uses LJU pins 3 and 4, so a 'Digital' adapter needs to be wired such that LJU pins 3 and 4 are wired to RJ45 pins 5 and 4, thus allowing 110 patching on the blue pair.

There are digital telephone systems that use LJU pins 2 and 5 and in such cases a standard Secondary LJU-RJ45 adapter is fine for the job.

Analogue Signalling

Loop-Start

Domestic and small office telephones are connected to the PSTN CO switch via a pair of wires called the Local Loop. The signalling used in this situation is called Loop-Start and Loop-Disconnect Signalling. Loop-Start is the most common form of signalling in the analogue environment and it provides the following services:

Public Telephone Service (PTS)
Manual or Automatic data service
Message Telecommunications Service (MTS)
Attendant call service on a manual PBX.
One-way incoming service to an attendant or Automatic Call Distribution (ACD) service

One wire of the local loop is called the Tip which is connected to ground. and the other wire is called Ring which is connected to the negative side of the 48v DC Battery. Picking up the telephone handset takes it Off Hook and makes a connection on these wires thereby allowing current to flow. The switch sends a dial tone to the receiver of the phone that has gone off hook, thereby informing the caller that the switch is ready to receive dial digits. The digits are either sent via pulses or via DTMF dial tones. The bell is always connected to the switch however a capacitor prevents the DC current flow from the battery in the switch. When dialling occurs the remote end is notified by the AC ringing voltage applied at between 20 - 47Hz (The traditional operation of the telephone was described earlier).

Ground-Start

One problem associated with Loop-Start Signalling, particularly where there are a large number of calls, is that you can experience a situation where the trunk is seized from both ends at the same time so that you end up with someone already at the other end. This is called Glare. This is due to a lack of recognition for the time interval between the seizure of a trunk at one end and the subsequent making busy the trunk at the other. Originally, a method where the user had to wait for a long timeout (up to 40 seconds) was used. After the timeout a particular tone would be heard which encouraged the user to replace the handset and try again. Ground-Start Signalling (also called 'Earth Start') is a modified form of Loop-Start Signalling whereby there is current detection at both ends which is used to request and then confirm that the trunk is available before it is seized. When a local PBX seizes the trunk it grounds one of the wires which informs the other end. This limits the possibility of glare at least outside of 100ms. Electronic switches can detect glare by timing the wink start or delay-dial signal, maybe even switching the call to another trunk.

The ground-start line conductors transmit common battery loop supervision, loop dial pulses/DTMF dial tones, alerts and the voice signal. The lines can send a 'Start to Dial' signal rather than wait for a dial tone, they can send a message indicating a new call and they can detect call disconnects and unauthorised calls.

When in the Idle state, the phone has an open circuit Tip (T) to Ring (R). The phone also has a 10-20,000 ohm Ground Detector that links it to ground and detects an Off-hook from the network.

When in Call Initiated state, the phone closes contact S which causes current to flow on the Ring side. The Network sees this and responds by closing contact N. This results in the Tip being grounded and the Ground Detector in the phone sees this.

If the Network makes the disconnection by opening N and removing ground from the Tip, then the current stops flowing. The phone waits 350ms to determine that this is an actual disconnect as opposed to an Open Switching Interval (OSI). If the phone makes the disconnection, then it opens the loop so that the line appears busy until the network removes ground from the tip and the line can return to Idle. An OSI is where both Ground and Battery are removed for a maximum period of 350ms in between state changes. There are never less than 100ms between OSIs.

Multi-Frequency Signalling

MF 4 and MF 5 are used on tie trunks between PBXs and use multi-frequency tones on the same wires as the voice signal.

Signalling System Direct Current (SSDC)

'On/Off' DC current signalling used on the voice pair between switches within a city.

Signalling System Alternating Current (SSAC)

Tone signalling used on the voice pair between switches located in different cities.

AC-15

Rather than use DC, AC-15 uses Alternating Current (hence AC) for signalling and is mainly used in the UK. 'Idle' is indicated by the frequency tone 2280Hz. Turning this frequency on and off can be used for the signalling or DTMF can be used. AC-15 can run to almost anywhere, DC versions such as DC-5 and DC-10 have a limit of 10km. There are different versions of AC-15, A, B, C and D. Because AC-15 uses tones and AC voltage, if you are to communicate with a switch that uses E&M, FXS or FXO, you will require a converter box.

Wink Start Supervision Signalling

Wink Start is used for E&M trunk seizure and goes through the following steps:

The trunk ends signal On-hook to both ends when idle.
The caller goes Off-hook
The calling switch activates the M-lead.
The called memory sets up memory ready for the dialled digits but still sends an idle On-hook signal to the calling office.
When the caller is attached at the called switch, the called switch sends a Wink Off-hook Connect signal (voltage set to -48v for anything between 140-350ms) on the E-lead. The typical duration of the Wink will depend on the manufacturer's switch e.g. I/IA-ESS (150ms), 3 ESS (140ms), SESS (250ms), EWSD (180MS), DMS-10 (200MS), DMS-100F (10-250ms). Because distortion of this Off-hook Wink can happen, the other office switch needs to recognise Off-hook winks of this duration. Anything beyond 350ms can assumed to be Glare or an error condition which may redirect the call, or signal for maintenance depending on the switch type.
The calling switch receives the Wink acknowledgement on its E-lead. This Start-Dial (On-hook to Off-hook) occurs a minimum of 210ms after the reception of the connect signal for electro-mechanical switches and the I/IA-ESS switch. This allows these switches to see at least 100ms of Off-Hook Wink after the signal has traversed the network and been recognised.
The calling switch sends the DTMF digits on the voice pair
The called device answers and the called switch activates the M-lead and keeps it at -48v for the length of the call.

Immediate Start Supervision Signalling

The 'wink' in Wink Start may be too short to detect for some PBXs and these circumstances Immediate Start can be used instead. The sequence of events are as follows:

The calling switch activates the M-lead to seize the line.
The calling switch waits for at least 150ms and then sends the dialled digits on the voice pair irrespective of whether an acknowledgement 'wink' is sent or not.
The called switch activates the M-lead when the calling device answers.
The called switch then acknowledges the calling switch.

Delay Start Supervision Signalling

Delay Start is used when the switch equipment is mechanically based and therefore very slow to respond. The sequence of events is as follows:

On a call being made that requires the trunk, the calling switch activates the M-lead.
The called switch activates its M-lead as an acknowledgement.
The called switch makes the appropriate changes to its mechanical systems ready for dialling.
The called switch deactivates its M-lead as a signal to the calling switch that it is ready to receive digits.
The calling switch then sends its DTMF digits on the voice pair.
The called device answers the call.
The called switch activates its M-lead as acknowledgement that the call has been answered.

Direct Inward Dial (DID)

DID is also known as Direct Dial In (DDI) and allows an external device to dial directly to a PBX extension without the need for an attendant. It also allows extra lines to be added with minimal cost. DID requires address signalling (dial pulsing or DTMF) to be carried through to the extension phone using Wink Start, Delay Start or Immediate Start. In addition, Loop Reverse Battery Supervision is used.

DID trunks only allow inbound calls, they also gain their battery from the local switch rather than the CO switch. The extension numbers that require DID are configured in the CO switch which then directs calls to these numbers on to the DID trunk rather than the normal trunk.

If the DID trunk lines are all busy then the caller will receive a busy tone even if the normal trunks are fine. The DID calls cannot be intercepted by the attendant.

Virtual Direct Inward Dial (VDID) allows incoming calls to be handled by an Auto-Attendant and route calls to extensions based on the calling number (even if CLID is blocked).

Quality Of Voice And Echo On Analogue Circuits

Quality is affected by a number of factors. The level of power at which voice is sent and received is important. The following power levels are good guidelines:

Analogue voice routers should have the receive power set to around -3dB.
Europe and North America telephones transmit at a power level of -9dB.
Asia and South America telephones transmit at a power level of -14dB.

The power level needs to be strong enough so as to ensure that the signal is audible at the remote end, but not too strong so that echo results. The voice provider can adjust power levels to analogue devices. If the signal reaches the switch and there is too much input gain applied, then the signal can be clipped (i.e. the power level is above PCM codes) and distorted. The same is true if the output gain at the remote end is too low or the input gain locally is too low, in this situation even DTMF tones can be missed.

Another factor that affects quality is echo. If the delay between the original sound and the echo is greater than 30ms then this can start to become a problem for most people. The loudness of an echo is also very important.

Two wires are used for all signalling and the voice in the local loop (voice receive and transmit occur on the same pair), however this converts to 4 wires for the voice signal and other wires for the signalling between switches. When the voice is converted from two wires to four wires then there is a chance of Electrical Echo (reflection) being created due to an impedance mismatch. Normally on long cable runs, echo is attenuated, however when data networks sit between two analogue ends the analogue runs are much shorter which gives less chance for echo to be attenuated.

You can also experience Acoustic Echo when using speakerphones and headsets. This is because the loudspeaker sounds are picked up by the microphone and sent back to the caller. Causes of echo are listed below:

Cable length - 1ms delay per 200km
Satellites - 250ms delay per hop
Voice encoding techniques - 0.75 to 1.6ms delay
Compression - 0.5 to 100ms delay
Acoustic Coupling (hands free phones)
Incorrect impedance of equipment

Echo Suppression can be implemented by supressing voice on the return path to prevent the feedback and resulting in half-duplex voice communication where the louder conversant wins. This causes a problem with Modem handshaking, so a tone of frequency 2025Hz is sent by the answering modem in order to turn off the voice suppression.

A more sophisticated method of dealing with Echo is Echo Cancellation which works on the receiving end by synthesising a replica of the echo (creating its own codebook) and subtracts this from the actual echo. This technique allows full-duplex operation to continue. You may notice initial echo occurring at the beginning of a conversation but then it dies away. If echo is a problem on both ends then echo cancellation needs to be operating on both ends.

Digital Signalling

Refer to Digital Signalling for detail on DS0, T1 and E1 signalling.

The summary of options for digital voice ports are as follows:

T1 - Superframe (SF) or Extended Superframe (ESF) with line encoding AMI or B8ZS.
E1 - CRC4 or no-CRC4 with line encoding AMI or HDB3.
Basic Rate ISDN (BRI) - this interface can also be used for digital PBX connectivity giving 2 voice channels and a 16kbps D-channel for the Q.931 signalling.

CO switches contain D-Channel Banks which convert from analogue voice and signalling to digital voice and signalling. A D1 Channel Bank outputs DS-1 (T1) or E1 with the digital voice channels and signalling multiplexed. Newer channel banks have appeared giving higher densities. The D2 Channel Bank supports 96 channels for every 72 channels that the D1 supports. The D3 and D4 support 144 channels. Most recently the Digital Carrier Trunk has been produced which is more manageable being smaller. PBXs use different digital signalling systems depending on manufacturer. Signalling systems based on CAS or CCS rely on standards and allow interoperability between voice switches. These include ISDN BRI and ISDN PRI interfaces using Q.931 or Q.SIG. Switch protocols that transport PBX features can be translated when these protocols run across standard signalling systems. A point-to-multipoint topology will require translation.

Single Frequency (SF) is a method used to convert E&M supervisory tones or dial pulses to a single voice frequency tone. If the trunk is idle then the SF tone is present. If the trunk is seized then the SF tone represents the dial pulses in bursts of tone.

It is not uncommon for non-standard signalling systems to be used as manufacturers aim to gain the edge on available features. Examples include the following:

Avaya
- Distributed Communications System (DCS) - not based on ISDN but T1/E1. Uses two signalling channels. Uses HDLC framed data signalling to allow transparency of features.
- DCS+ - based on ISDN PRI with one signalling channel. Also uses HDLC framed data signalling.
- Expansion Port Network (EPN) - a circuit emulation protocol used to connect PBXs in separate buildings creating a single logical PBX. You can use CES for this connection.
- Call Management System (CMS) - uses another signalling channel called BX.25 to perform call centre reporting at a central location for multiple sites.
- Non-Facility Associated Signalling (NFAS) - configuration which is non-standard where one D-channel can provide signalling for up to 300 B-channels depending on the implementation. These signalling protocols need to be Transported transparently across from router to router because the routers will not understand the protocols.
Nortel
- Meridian Customer Defined Network (MCDN) - based on ISDN PRI with Q.931, however there are extensions. Used to connect Nortel PBXs and DMS CO switches and can support multiple B channels (nB+D) similar to NFAS.
- ISDN Signalling Link (ISL) - a bit like MCDN but allows the D-channel to be any sort of serial connection on any channel number and allows analogue B channels.
- Virtual Network Service (VNS) - this a version of ISL where the B channels become switched circuits.
- DMS-100 - PRI that provides MCDN to a CO switch
- DMS-250 - PRI that provides MCDN to an IXC CO switch
- SL-100 - PRI that provides MCDN to a PBX.
BT
- Digital Private Network Signalling System (DPNSS)
- Digital Access Signalling System (DASS#2) - uses slot 16 on an E1.

If the proprietary signalling uses one CCS signalling D channel (e.g. DCS+ and DPNSS) then you can forward the frames transparently over HDLC, Frame Relay or ATM. If the proprietary signalling uses more than one CCS channel (e.g. DCS) then you need to use a TDM cross-connect method over HDLC or Frame Relay. This is where the D channels are put into a TDM group and are not restricted to channel 16, or channel 23. You can also put the D channel through ATM CES or even Serial Tunneling.

Voice Port Connections

Voice ports are used for various types of connection:

Local Calls - these calls do not use the network, PSTN or Data.
Private Line Automatic Ringdown (PLAR) - a hot line where going off hook automatically connects two phones.
On-Net - calls are routed on a data network that belongs to the organisation.
Off-Net - calls are routed on to an external PSTN network.
PBX-to-PBX - calls are routed across private tie lines between PBXs.
VoIP Gatekeeper to VoIP Gatekeeper calls - calls can be routed between VoIP gatekeepers in an IP Telephony infrastructure.

Packetised Voice Over Data

Overview

Voice networks have normally been separated from Data networks and therefore have incurred greater liabilities such as the doubling up the Wide Area Links whilst the equipment and support costs have been high to cater for the separate networks. Packetising voice provides opportunities to combine some or all of these elements resulting in greater effiency. Voice packets have mainly used ATM, Frame Relay or IP as the medium over which to travel.

A number of challenges arise when changing from a circuit-switched voice network to a packet-switched voice network. These can be summarised as:

Losing packets, which cause clipped sounds.
Packet delay, the ITU have stated in G.114 that a fixed network delay in one direction should not exceed 150ms. Network delay can also have a variable element to it due to the speed of the serial lines resulting in Serialisation Delay.
Jitter, where you can have periods of congestion as packets fill up interface queues causing the packet delay to change from packet to packet i.e. variable delay.
Reliability - data networks have not been able to boast the 99.999% reliability that voice networks can. A combination of multiple servers, distributed network devices, redundant power and network links allows the data networks to approach the 'Five nines' reliability expected of a voice network.

These challenges are dealt with in detail in Quality Of Service.

Technologies that packetise voice also provide opportunities to expand on the services that are provided by traditional circuit-switched voice systems. In the IP environment these new services utilise technologies such as XML, JAVA and TAPI that aid in the integration of voice and data plus multimedia and video technologies that enable a more complete communication experience. The devices are not limited to just voice phones but can include web-based phones, phone software and video phones running connected via Ethernet and TCP/IP.

Setting up and controlling calls is carried out in a very different way in a packetised voice environment. Call control can be centralised using Call Agents or distributed using voice gateways that can handle calls and make routing decisions. IP-based protocols such as H.323, MGCP, SGCP and SIP are used extensively in the VoIP environment for signalling and call control. These protocols give rise to devices such as Gatekeepers (that allocate bandwidth), Gateways (that translate between VoIP and PSTN networks), Multipoint Control Units (MCU) (that provide a gathering point for video conferencing, and Application servers (for voice-mail and call attendants).

Digitisation of Voice

Nyquist discovered that when human speech was being digitised it was important to sample the analogue speech signal at more than twice its highest frequency in order for the reproduced sound to be of reasonable quality. That is, when the digitised signal is decoded at the receiving end, the original sound could be reproduced accurately. Take the following simple sine wave:

If we sample precisely at twice the frequency of the wave e.g. on the circles, then there is a danger of completely missing the peaks and troughs in the sound wave and therefore resulting in a lower quality sample. If we sample at four times the frequency i.e. on the triangles, then the dotted sample is the result which more closely resembles the waveform.

Because human speech has a frequency range of 300 - 3400Hz, the CCITT recommendations are to build circuits to cater for this frequency range. A band-pass filter is used to isolate this frequency range. Rounding the top frequency to 4000Hz gives a sample rate of 8000 samples/sec, i.e. one sample every 125us. The sample output is a Pulse Amplitude Modulation (PAM).

There are problems with Signal to Noise Ratio (SQR) of pure 8-bit encoded signals because the volume (amplitude) is reduced from the original analogue signal so the PAM signal is then Quantised where an integer code is assigned to each amplitude of each sample. The integers come from a scale made of 8 divisions called Chords which are more concentrated near the origin where the low level tones are, in a logarithmic way. This means that there is less distortion of the lower tones (larger signal to noise ratio) and suits the logarithmic nature of the human ear. A linear uniform quantisation would result in poorer sound quality at lower amplitudes. Each chord is split into 16 equally spaced voltage divisions (0 to 7 positive and 0 to 7 negative).

Within G.711 two methods of companding (compacting and expanding) the voice signal have been developed. These methods apply digital values to analogue signals. Bell labs developed the U-law method of logarithmic quantisation used in North America and Japan. U-law (or 'mew-law') tends to have a lower idle noise than A-law.

The ITU modified this in G.711 to A-law which is used throughout the rest of the world.

If one end of a trunk uses U-law and the other end uses A-law then the U-law end must make the change to A-law. A-law has slightly better signal-to-noise ratio for low amplitude signals than U-law. Quantisation Error is the difference between the quantised signal and the original analogue signal.

If each integer code is given an 8 bit binary value then 64kbps would be the required bandwidth for digitised voice. This is called DS0. This form of digital encoding of voice is called Pulse Coded Modulation (PCM) and is defined with the ITU G.711 recommendation. With analogue telephones PCM is carried out centrally at the switch whereas with ISDN telephones PCM is carried out locally.

Compression of Voice

Waveform Compression

Waveform coders produce a non-linear approximation of the waveform. We have seen one form of voice compression called Pulse Code Modulation (PCM) which is a Waveform Compression Algorithm that just looks at the waveform irrespective of the voice patterns. Another Waveform Algorithm is Adaptive Differential Pulse Code Modulation (ADPCM). ADPCM takes 8000 samples per second and uses for example, 4 bits for each of the 8000 samples (giving 8000 x 4 = 32kbps bandwidth requirement). This is called the Quantisation Granularity. Using 4 bits means that there are 2⁴ = 64 different bit values instead of 8 bits in standard PCM giving 256 values. Each bit value represents a change from the value of the previous sample, with the assumption that differences are never likely to be more than 4 bits change. Every so often a full marker value is sent rather than just the differences from the previous sample. Using 4 bits instead of 8 bits means that ADPCM uses 32 Kbps so gives better use of bandwidth. The ITU designate this as compression standard G.726r32. Using 3 bits per sample is defined in G.726r24 and uses 24 Kbps of bandwidth whereas using 2 bits per sample is defined in G.726r16 and uses 16 Kbps. There is also a G.726r40. The encoding delay is typically less than 1ms which makes ADPCM very attractive, particularly in environments where there is Tandem Switching. The 'Adaptive' in ADPCM refers to the fact that the quantisation granularity changes automatically depending on the Signal-to-Quantising Noise Ratio (SQR).

Adaptive Differential Pulse Code Modulation (ADPCM) dynamically reduces how many bits are used for sampling as the network becomes more congested, 40 Kbps -> 32 Kbps -> 24 Kbps -> 16 Kbps. ADPCM gives very little delay (typically less tham 1ms) even when conversion occurs to PCM and back to ADPCM.

Vocoder

Linear Predictive Coding (LPC) is a an example of a vocoder. A vocoder synthesises the voice. This synthesis results in a voice that lacks in emotion and it is therefore difficult to identify the speaker. Compression can end up with a stream as low as 2.4kbps and the stream is typically half-duplex.

Hybrid Compression

Another form of compression is that provided by Multipulse Maximum Likelihood Quantisation (MPMLQ). Defined by G.723 this uses an algorithm that looks ahead and requires a bandwidth of 5.3kbps (G.723r53) or 6.3kbps (G.723r63). G.723 is used for video and requires up to 30 MIPS processing power.

A hybrid compression form uses Source Compression and takes the voice signals into account when compressing i.e. they perform voice modelling using fourier analysis. Hybrid coding comes under the broad spectrum of Analysis by Synthesis (AbS) coding where analysis is continually performed on the speech and the algorithm attempts to predict the waveform in the near future (around 5ms). This occurs via a feedback loop and adds a little 5ms delay to the voice path. The most common form of this is Code Excited Linear Predictive (CELP) or Algebraic Code Excited Linear Prediction (ACELP). This can provide high quality voice reproduction at low bit rates. With CELP voice signals are compressed as follows:

The 8-bit PCM signal is converted to a 16-bit linear PCM sample.
The speech is analysed and compressed with a vector quantiser.
A Vector Quantiser Codebook is used to learn and predict the voice waveform. The codebook is a collection of human voice waveforms called Diphones that make up speech. The codebook has an index typically of 1024 entries (represented by 10 bits). There is also a gain value made up of 5 bits. This controls the power.
The coder is initiated (or 'excited') by white noise, the code assigned to each sound is the index of that sound within the codebook.
The resultant code, or index, is sent to the far end for decoding back into the voice waveform using the code as an index and looking the sound up in the same codebook at the other end.

The CELP standard produces voice at 4.8kbps.

One version of CELP is called Low Delay Codebook Excited Linear Predictive (LDCELP) and is defined by G.728. LDCELP uses a small codebook and operates at 16 Kbps plus there is no lookup thereby minimising the delay to between 2 and 5ms, hence 'Low Delay'. A 10-bit codeword is assigned to every block of 5 speech samples. Four codewords are grouped together into a sub-frame which takes 2.5ms to encode and two sub-frames are transmitted at a time 5ms per pair.

Another version of CELP is Conjugate Structure Algebraic CELP (CS-ACELP) and is defined in G.729. CS-ACELP has almost the same perceived level of quality as PCM and is at least as good as ADPCM at 32kbps. CS-ACELP operates at 8 Kbps of bandwidth and works by using sound pattern matching against multiple PCM bytes and 80-byte frames take 10ms to translate. CS-ACELP performs a 5ms look ahead to predict the next wave pattern plus it also reduces noise and does pitch-synthesis filtering. G.729 is able to model nuances and accents in human speech but requires about 20 MIPs of processing power.

G.729 has two variants. Annex A (G.729A) is less processor intensive (requires about 11 MIPS) and allows double the number of calls as plain G.729. Annex B (G.729B) adds Voice Activity Detection (VAD) and Comfort Noise Generation (CNG) which work together to reduce bandwidth used. You can combine Annex B with G.729A to give G.729AB. The G.729 variants can generally interoperate with each other.

The bandwidths used by these algorithms that we have talked about are just the actual data bandwidths and do not take into account the packet headers of the protocols being used to carry the data. For instance, if you are using G.711 over Frame Relay, then you need to take into account the Frame Relay header (2 bytes), the FRF.11 header (3 bytes), the Flag (1 byte) and the FCS (2 bytes). The required bandwidth is calculated by codec bandwidth x (payload + overhead)/payload. For G.711 and a 20 byte payload this gives us 64 x (20 + 8)/20 = 90kbps. If G.729 is used instead then the same calculation gives us 8 x (20 + 8)/20 = 11.2kbps. If the payload increases to say 100 bytes for a G.729 call then the calculation gives us 8 x (100 + 8)/100 = 8.64kbps. You can see that the greater the payload, the less bandwidth is required. The default payload size for G.729 is normally 30 bytes for a voice packet. For G.711 the default payload size is 240 bytes.

ATM cells have greater overhead because of the reduced size of 53 bytes. As well as the 5 byte header there is also the 8 byte AAL5 trailer, in addition the ATM Forum have adopted the FRF.11 header in the form of the VoX header. This takes up a further 3 bytes leaving 37 bytes for the voice data. If the default G.729 payload is used then this leaves 7 bytes wasted which is padded. A calculation using these figures gives us a required bandwidth of 8 x (30 + 20)/30 = 13.3kbps, this compares with 8 x (30 + 8)/30 = 10.1kbps when G.729 is used in Frame Relay. Using G.711 over ATM is more problematic because the default payload size of 240kbps does not fit into one cell and so has to be spread over a number of cells.

This table lists the codecs and their respective speeds and bandwidth requirements for given sample sizes.

Codec	Codec Speed	Sample Size (bytes)	Frame Relay (bps)	Frame Relay with cRTP (bps)	Ethernet (bps)	Ethernet with cRTP (bps)
G.711	64000	240	76267	66133	78933	68800
G.711	64000	160	82400	67200	86400	71200
G.726r32	32000	120	44267	34133	46933	36800
G.726r32	32000	80	50400	35200	54400	39200
G.726r24	24000	80	37800	26400	40800	29400
G.726r24	24000	60	42400	27200	46400	31200
G.726r16	16000	80	25200	17600	27200	19600
G.726r16	16000	40	34400	19200	38400	23200
G.728	16000	80	25200	17600	27200	19600
G.728	16000	40	34400	19200	38400	23200
G.729	8000	40	17200	9600	19200	11600
G.729	8000	20	26400	11200	30400	15200
G.723r63	6300	48	12338	7350	13650	8663
G.723r63	6300	24	18375	8400	21000	11025
G.723r53	5300	40	11395	6360	12720	7685
G.723r53	5300	20	17490	7420	20140	10070

Voice Quality

Human speech uses a bandwidth of 100Hz to 10000Hz (if you include harmonics) with most of the speech occurring between 100Hz and 3000Hz. The more bandwidth that is allocated to cater for human speech the more faithful is the sound to the original, this is called Fidelity. Human speech quality is also affected by Echo, Delay and Jitter (Delay variation). Jitter is often a symptom of voice of data networks.

A subjective test used by the ITU for assessing the quality of the sound is the Mean Opinion Score (MOS). The MOS is a statistical measurement of voice quality based on human opinion of a certain spoken sentence. In English the sentence used is "Nowadays, a chicken leg is a rare dish". The Ratings are as follows:

Unsatisfactory - Very annoying distortion which is objectionable
Poor - Annoying distortion but not objectionable
Fair - Perceptible distortion that is slightly annoying
Good - Slight perceptible level of distortion but not annoying
Excellent - Imperceptible level of distortion

The following table gives examples of comparative scores regarding the different types of compression:

Method	MOS
PCM G.711	4.1
ADPCM 32K G.726	3.85
ADPCM 24K g.726	3.5
ADPCM 16K G.726	3.0
LDCELP G.728	3.61
CS-ACELP G.729	3.92
CS-ACELP G.729a	3.9
GSM	3.3/3.4
G.723.1 MPMLQ (6.3kbps)	3.9
G.723.1 ACELP (5.3kbps)	3.8

A score of 4.0 is considered to be toll quality. These scores are reassessed regularly and change with time. One thing to bear in mind is that delay is not taken into account with the MOS.

The following table gives examples of comparative MOS scores for G.729 under different conditions:

Condition	MOS
Average speech	3.92
Low input level	3.54
Two tandem hops	3.46
Three tandem hops	2.68
5% bit error rate	3.24
5% frame error rate	3.02

The ITU has a number of voice quality standards, these are:

G.111 - Overall Loudness Rating (OLR)
G.113 - Quantisation Distortion Requirements for the International Calculated Planning Impairment Factor (ICPIF) which is a Total Impairment Value (Itot) that is the sum of the following:
- Io - not good enough OLR or high circuit noise
- Iq - PCM quantisation distortion
- Idte - talker echo
- Idd - long one-way transmission times
- Ie - distortion from special equipmentsuch as low bit-rate decoders.
The following guidelines are recommended:
- 5 - Very good
- 10 - Good
- 20 - Adequate
- 30 - Limiting Case
- 45 - Exceptional limiting case
- 55 - Customers likely to react strongly
G.114 - end-to-end delay recommendations
- 1-150ms - suitable for voice
- 150-250ms - starts to affect voice quality
- 250-400ms - can be annoying (satellite delay can be as much as 500ms one-way, hence why VoIP on a satellite link is not feasible in an interactive way)
- >400ms - unacceptable
G.131 - if the one-way delay exceeds 25ms then echo cancellers must be used.

A more objective voice quality measurement exists called Perceptual Speech Quality Measurement (PSQM) and was original defined by the ITU in the standard P.861. PSQM uses a rating scale of 0 to 6.5 and is sometimes mapped to the MOS rating scale of 0 to 5. The test equipment implements PSQM by comparing the transmitted speech to the original input in real time. Accuracy is rated at more than 90% c.f. MOS. This information can be linked to SNMP-based management systems.

BT also developed a voice quality measurement algorithm called Perceptual Analysis Measurement System (PAMS) that is used to predict the effect on voice quality measurement scores when various waveform codecs, languages etc. are used. The ITU has now combined this PAMS with PSQM to form an updated standard P.862 that can give a more objective prediction of subjective scoring systems.

Silence Suppression and Comfort Noise

On average only 20-30% of the time in a conversation is actually used for talking, the rest is silence. Rather than keep transmitting the silence as normally happens it is more bandwidth efficient to stop transmitting and save bandwidth sometimes up to 35% of the bandwidth can be saved. This is known as Silence Activity Detection (SAD) or conversely Voice Activity Detection (VAD). A packet is sent called a Silence Indicator (SID) to notify the other end that the voice activity power level has dropped below a certain threshold e.g. -50 dbm. VAD requires a 5ms look-ahead buffer so this adds delay to the voice path. You would look to use VAD only on WAN circuits.

There is an issue with VAD in that pure silence is off-putting to the users so techniques are employed to introduce white (or pink) noise locally to simulate this Background noise.

Related to silence is the concept of Sidetone. This just plays the speaker's voice through the earpiece locally so that the speaker does not think that there is a faulty handset.

Fax

The Group 3 Fax or the Modem is designed to run on the analogue network even though it operates digitally internally. No silence suppression or compression can be applied and even though the Fax typically just uses 9.6Kbps, a whole 64Kbps channel is used. This is because the analogue signal is continuous and no silence suppression can be used, nor compression as you cannot lose any of the digital information. Faxes and Modems use a 2100Hz tone to identify themselves to the switch. The standard analogue Fax protocol is T.30.

Fax speeds include the following:

Single Frequency Tone
V.21 - 300bps
V.27ter - 2400bps and 4800bps
V.29 - 7200bps and 9600bps - requires 9600bps demodulated or 40kbps ADPCM.
V.33 - 12000bps and 14400bps
V.17 - 7200bps and 9600bps

Traditionally, fax machines have differed in their facilities offered and T.30 compatatibility. In addition, their tolerance of packet delays and receive errors is low because fax machines use synchronous modems which has no built in flow control. If a calling fax does not receive a response from a receiving fax within normally 3 seconds, the whole message is transmitted again. Proprietary local spoofing techniques can ease this issue of delay that can be incurred between fax machines over great distances.

Fax Relay

When running Fax over a VoIP network The ITU standard T.38 has been developed using a DSP in a T.38 Gateway that detects the Fax tones and operates Fax Relay. On detection of the analogue Fax signal a normal VoIP call is setup, this is defined by T.30 which is T.38 over IP. This DSP then converts the analogue signal coming from the Fax machine to a digital bit stream. This bitstream is sent within VoIP packets at the speed 9.6kbps and these packets are tagged as Fax VoIP packets. This saves bandwidth as it compares with the 64kbps bandwidth normally taken up by the Fax call as it is traditionally converted to PCM. The T.38 gateway at the other end detects the tagged Fax-VoIP packets and the DSP there converts the bit stream back to an analogue signal that will be received by the remote Fax machine. T.38 allows you to direct bitstreams to PCs containing T.38-compliant Fax software thereby allowing greater flexibility.

If the delay is large on a path, rather then lose fax relay packets it is a good idea to increase the buffer size to several hundred milliseconds because real time interaction is not important.

It is also possible to use FRF.11 to send Fax Relay over Frame Relay.

Fax Store and Forward

ITU's T.37 standard allows you to send faxes by converting them into E-mail attachments. These attachments are TIFF files of the faxes themselves. An On-Ramp gateway performs the conversion to E-mail and attachment. This E-mail is stored and routed by SMTP servers throughout the IP network to end up at an Off-Ramp gateway where conversion back to Group 3 fax is performed. Mechanisms within Extended Simple Mail Transfer Protocol (ESMTP) provide extra features such as delivery confirmation of faxes.

Fax Passthrough

If T.30 fax data is NOT compressed or demodulated i.e. just a G.711 PCM 64kbps channel that is transported without any Voice Activity Detection (VAD), then two faxes can talk to each other directly over the VoIP network.

Modem

Modem Passthrough

Modem Passthrough operates in the same way as Fax Passthrough where just a G.711 PCM 64kbps channel is transported without any Voice Activity Detection (VAD), then two modems can talk to each other directly over the VoIP network.

Modem Relay

The modem analogue signals are converted into digital format by a gateway, and these signals are transported using Simple Packet Relay Transport (SPRT) which uses UDP across the IP network to a remote gateway. The remote gateway converts the signals back to analogue and forwards the signal on to the remote modem.

Traffic Engineering

When designing a voice network it is necessary to size trunks and equipment ports to suit. In order to do this you need to gather information and statistics from the PSTN carriers, the Call Detail Records (CDR) in the PBX and telephone bills. The PSTN can give statistics on the number of calls offered, the number of abandoned calls and when all the trunks are busy, these are called Peg counts.

The PSTN can also give the Grade of Service (GoS) rating for a trunk group. The GoS is a measure of the probability that a call is blocked, for instance one call out of 100 being blocked is given by P(.01) and one call blocked out of 1000 is given by P(.001). This probability applies to the busiest period of the day.

The PSTN can also provide the total amount of traffic carried per trunk. The number of trunks needed for the voice traffic in a particular location is based on peak daily traffic. A carrier will provide the number of calls carried but will not give the number of calls offered i.e. attempted. Only the local PBX can tell you how many calls were offered and therefore how many calls failed.

If the voice traffic is to run over a data network, you also have to take into account the statistics provided by SNMP management stations, network analysers and router interface statistics. You need to ensure that data delay and throughput is not impaired as well as the GoS for the voice traffic. If the data peak demand occurs at similar times throughout the day to the voice peak demands, then this has to be taken into account when designing the voice network.

The offered traffic load (A) is made up of the product of the number of originated calls in an hour (C) and the average holding time for a call (T) i.e. A = C x T. The average holding time is not just the average time that a call takes but includes the call set up and tear down as well as incomplete calls. This is normally calculated by taking the average call length and adding up to 16% to it. Quite often billing records round up the duration of a call to the next minute rather than the nearest minute. This means that they are overstated by an average of 30 seconds each call. For traffic calculations, if you are using the billing records you need to factor in a reduction by multiplying the number of calls by 0.5 minute to obtain the number of excess minutes.

The concept of the Busy Hour is used to represent the number of call attempts during the busiest hour that the organisation experiences on its telephone network. If you have access to the CDR records then to work out the busiest hour, take the 10 busiest days in a year, sum the traffic on an hourly basis, find the busiest hour and work out the average amount of time a call takes (average duration).

If you do not have access to a year's worth of traffic information then you could take a month's worth (about 22 working days) determine an average day's worth and multiply that by 17%. The reason for doing this is that busy hour traffic represents about 17% of all the traffic that occurs in one day.

The next thing to calculate is the amount of traffic a trunk can handle in an hour, normally we calculate this for the Busy Hour. This traffic volume measurement is measured in Erlangs, a measurement which is dimensionless. See https://www.erlang.com/what-is-an-erlang/. For example, if each user in an organisation of 100, makes 12 calls in the busy hour with an average duration of 6 minutes per call, then the offered traffic load (A) is given by C x T which is 12 x 100 x 6 giving 7200 minutes. Because an Erlang is based on an hour, this then gives us a value for the Busy Hour of 7200 / 60 = 120 erlangs.

An Erlang is sometimes equated to 60 call minutes (3600 call seconds or 36 centum call seconds, CCS).

Erlang models are:

Erlang B - if overflow paths exist when trunks are busy, DID trunks are required to allow PSTN rerouting (there are more people than calls). This is used most of the time.
Erlang B Extended - when no alternative path exists the caller hears a busy signal.
Erlang C - used in call centres where there are more calls than people and calls are placed on hold if no bandwidth is available.

When traffic engineering your aim is to maintain or exceed the GoS. To do this you need to work out how many trunks you will need now that you know the erlangs in a busy hour. This requires a look at three areas:

Possible sources of calls - The more possible sources of calls exist, the wider the distribution in arrival times and call duration times.
Arrival characteristics - calls that come from independent sources are close to random in their arrival characteristics, the more there are; the more closely they follow the Poisson Distribution (Bell curve), where the peak probability of a certain number of calls being made in the busy hour is represented by the peak of the curve. Smooth traffic patterns are not random and are due to reliance on other applications (call centres, tele-marketing etc.). The Poisson distribution is therefore not suitable. Similarly, Bursty traffic is not random either.
Handling Lost Calls - these can be dealt with in three different ways:
- Lost Calls Cleared (LCC) - if the system is busy the call is cleared, this underestimates the number of trunks required.
- Lost Call Held (LCH) - even if the call fails to connect, the assumption is made that the call is active and the call is redialled continuously during the average call hold time. This over estimates the number of trunks required.
- Lost Calls Delayed (LCD) - the call is placed in a queue until the system can deal with it.

The complexity of traffic engineering necessitates the use of erlang tables or calculators, to work out the number of trunks required given that you know the volume of traffic in erlangs and you know the target GoS. The most common table used is Erlang B which uses the Poisson Distribution, based on infinite resources and uses LCC for lost call assumptions. When you have multiple sites and multiple trunks between those sites, it is often necessary to create a Call Density Matrix that has branch-to-branch and branch-to-HQ entries for the busy hour call minutes. You can use this matrix to work out the erlangs on a site-to-site basis.

When calculating trunk sizes for a VoIP network you need to find out how much data bandwidth each call will take. This will depend on the codec and sample size being used. The earlier table gives an idea of bandwidth used on a per call basis. Multiplying the appropriate bandwidth by the number of calls allows you to work out the trunk size.

An Erlang is continuous use of one trunk, designed around the busy hour. If we JUST look at this however, the most of the time the system is over specified, therefore aim to have a percentage of the calls blocked.

Use Erlang and data rate conversion tables for VoIP, VoFR and VoATM to calculate bandwidth. Other factors affecting bandwidth usage include Voice Activity Detection (VAD), Music on Hold (MOH) and the RTCP stream. It is therefore a good idea to add a little extra when sizing bandwidth requirements.

Voice Over Frame Relay (VoFR)

VoFR allows you to run voice and data over the same WAN infrastructure which has management and cost benefits, plus the frame header overhead is low. It can be used to replace a tie line with a PVC (maintaining PBX features) or to provide an Off Premises Extension (OPX) to a PBX via a router.

In order for voice to run over Frame Relay, fragmentation of the data frames needs to occur to allow steady voice traffic. This fragmentation can be a proprietary format, end-to-end FRF.12 or FRF.11 annex C. For QoS on slow links all DLCIs on an interface must be fragmented.

FRF.12 is useful when PVCs are sharing the same physical link or when VoIP is being used over the Frame Relay. The fragmentation header is omitted on frames less than the fragment size so just the largest frames (those larger than the fragmentation threshold) are fragmented. FRF.12 has no knowledge of what is in the frame whether data or VoIP, so both get fragmented.

In FRF.11 annex C, VoFR frames are all fragmented and all packets no matter the size contain the fragmentation header. FRF.11 is therefore used just for Voice over Frame Relay fragmentation over one DLCI.

For more detail on fragmentation go to Frame Relay Fragmentation and Voice Over Frame Relay.

If you want to centrally control billing and administration then you can set up a hub-spoke Frame Relay WAN where the central HQ is the hub and tandem switching occurs for calls between spoke sites:

When using the WAN links we need to convert to a more efficient codec, in this example we have used G.729. This gains us the benefit of bandwidth savings. There is a problem however. Take the example where a call is made from site B to site C:

The call initiates as a G.711 call via the PBX.
At the router the conversion to G.729 is made and the call is routed over to the HQ for central billing.
The router at the HQ converts the call back to G.711 so that the PBX can manage and route the call.
The PBX realises that the call is destined for site C so pushes it out to the HQ router where it is again converted to G.729 encoding.
On arrival at the site C router the call is converted back to G.711 where it is finally sent to the recipient off the PBX there.

This is called Tromboning where several compressions and decompressions occur within one call. This then adds delay and deteriorates the quality of the call. One way around this is for the PBX to be able to understand the codec, or another way is for the router at the HQ to reroute the call to site C without troubling the PBX at the HQ.

If the routers have the ability to operate dial plans, then routing of calls based on the dialled number could be carried out at the router. Tandem switching could therefore be eliminated altogether since the Frame Relay cloud ends up acting as a large virtual voice switch.

To calculate the voice payload size we use the formula Payload (bytes) = sample size (ms) x data rate(kbps)/8. If you were running G.711 over Frame Relay then a 20ms voice sample would have a payload size of 20ms x 64kbps/8 = 160bytes. The actual Frame Relay frame size is therefore 167 bytes because we need to include the 7 byte Frame Relay header (including FRF.11, sequence number and CRC). Remember that we were looking at a 20ms sound sample, so for one second of speech there will be 167 x 1000/20 x 8 = 66.8 kilobits per second bandwidth being used.

Performing the same calculation for G.729 at 8kbps with the same 20ms sample size gives a bandwidth usage of 10.8kbps.

Voice Over ATM (VoATM)

ATM is explained in ATM.

AAL5 is frequently used for data due to the fact that all 48 bytes are available for the payload. If we took a typical 20ms sample of voice and encoded it in G.729, then we would end up with a payload of 20 bytes. Because of the fixed cell size of ATM the remaining 28 bytes of the payload would be padded out. This would mean that for every 20ms sample there would be 20 bytes of data and 28 bytes of overhead. Given that the cell header is 5 bytes resulting in a 53 byte cell for each 20ms voice sample this produces a bandwidth requirement of 53 x 8 x 1000/20 = 21.2kbps for each call. This could be considered inefficient because of the 28 bytes of padding. Provided that the delay budget allows it, you could increase the sample size to say 30ms or more to reduce the wasted bandwidth from the padding bytes. Even so, Frame Relay is more efficient. For good quality voice it is good to stick to 20ms samples (50 packets per second).

You can use Circuit Emulation Services (CES) (which uses AAL1) to replace a leased line between PBXs. The TDM format is converted to ATM cells and the PCM stream is placed into the cells without being compressed so no DSPs are used and the delay is very low. There is no internal echo cancellation so this may have to be added externally. With multiple sites attaching to an HQ you would need to run the hub site PBX as a tandem switch because there is no opportunity for routers to re-route calls based on dialled digits.

Unstructured CES takes the unmodified clear channel T1/E1 data stream across emulating the whole E1/T1 interface. A voice channel fills the whole payload of the cell. This is good for equipment that uses proprietary framing. Structured CES maintains the channelised/fractional T1/E1 DS-0 information and allows you to have multiple voice channels within the payload. TDM devices can then be removed.

For low speed links (<768kbps) you really need to fragment and interleave the larger packets in order to prevent delays to the voice traffic on the interface. AAL5 does not support LFI so we either have to have separate PVCs for voice (and have it contracted at VBR-rt) or we employ MLP over ATM which provides LFI for low speed links. Many routers only support one instance of SAR at a time so having multiple PVCs is not going to help here. Routers such as Cisco's IGX or Nortel's Passport have ATM backplanes that deal with this issue nicely.

When deploying MLP over ATM, ideally you want the fragments to fit into an exact number of cells to ensure the greatest use of the payload when using AAL5. When making the fragment size calculations it is worth bearing in mind that the AAL5 overhead is 8 bytes whilst the MLP over ATM overhead is 10 bytes.

If a WAN network is implementing internetworking between Frame Relay and ATM using FRF.5 then there are likely to be quite large delays emanating from the internetworking switches. This makes it unsuitable for voice traffic.

Voice Over IP (VoIP)

Overview

VoIP is fast becoming the data method for voice packet transport. IP is more flexible than either ATM or Frame Relay, not only because of the quicker re-routing and resilient capabilities, but also because of the extra features that can be bolted on to the IP environment to exponentially increase the number of applications that the VoIP environment can utilise. VoIP has some quality issues that are different from traditional voice, these include Jitter, packet loss and queuing problems when small voice packets compete with large data packets. Thes issues are dealt with in detail in Quality of Service.

Real-Time Transport Protocol (RTP)

In a Voice over IP environment Real-Time Transport Protocol is an Internet standard (RFC 1889) used to transport real-time voice data. TCP is used for the H.323 signalling protocols, and UDP for SIP and MGCP. RTP uses UDP for transport because if packets get lost, there is no point in re-sending the data. This diagram illustrates the RTP header:

Version - currently at version 2
Padding - indicates if padding bits have been added to the end
Extension - indicates that a header extension has been included
Contributing Source Count - this is the number of Contributing Source identifiers in the Contributing Source field.
Marker
Payload Type - the codec being used
Sequence - the first one is randomly generated and this number indicates if a packet has been lost
Time Stamp - the time stamp of the first octet of data, the first one being randomly generated.
Synchronisation Source - this is a random number that is used to identify a particular data stream when multiplexing data streams.
Contributing Source - if multiple streams are multiplexed, then the source stream numbers are listed here. There could be no streams at all or up to 15, therefore you could have up to 15 x 32-bit numbers here.

The RTP header is 12 bytes in length (not including the Contributing Source stream list which could add another 60 bytes made up of 4 bytes x 15) and follows the 8-byte UDP header and the 20-byte IP header. If you are running VoIP through a VPN then you have a VPN header to consider which can be from 20 to 60 bytes, plus an additional IP header of 20 bytes.

RTP has the ability to identify the payload and timestamp the packets, plus it sequences the packets and monitors the packet delivery, re-ordering them if necessary. Normally RTP uses the even UDP ports 16384 up to 32767.

When using RTP, a technique called Compressed RTP (cRTP) can be utilised whereby the IP header (20 bytes), UDP header (8 bytes) and the RTP header (12 bytes) can be compressed from the usual 40 bytes down to normally 2 bytes, or 4 bytes if the UDP checksum is used. This is suitable for slow point-to-point links (< 2Mbps), preferably using hardware for the compression.

The protocol Real-Time Transport Control Protocol (RTCP) (RFC 1890) is used to transport control information and services about current RTP sessions i.e. it monitors the bearer. It also carries a canonical name which is an identifier of the source of the RTP stream. This is used by the transport layer at the receiving end in order to synchronise audio with video. The RTCP information includes jitter, delay and packet loss as well as packet counts. In addition, RTCP includes time information such as the NTP time as known by the sender. There must be at least one RTCP packet every 5 seconds and as RTP traffic increases so RTCP increases as a set percentage of the RTP traffic (5%). RTCP also uses UDP ports between 16384 and 32767, and the port number used for a session is the odd-numbered port next to the even-numbered RTP port used for the RTP session. A one-way telephone conversation (e.g. Music on Hold) uses one RTP stream and one RTCP session, therefore a two-way telephone conversation uses two RTP UDP ports and two RTCP UDP ports.

Signalling and Call Control

VoFR and VoATM are fine for simple point-to-point topologies but for Voice over data to be a serious contender to traditional voice systems there needs to be a scalable way of building these topologies and communicating within them and this is where VoIP comes in.

One required element is a Gateway that connects and translates between a traditional analogue telephony system and an IP-based telephony system. Such a Gateway should be able to connect via analogue ports such as FXO, FXS and E&M as well as digital voice ports such as E1, T1 and ISDN. In addition, this gateway should be able to translate and interact with an IP-based telephony system via Ethernet/IP connections as well as having the ability to make call routing and call management decisions using whatever IP-based call control mechanism is being used. The gateway also is required if you are using more than one IP-based call control system, as you need to translate between them.

The call control system is a vital element to the VoIP environment and controls how calls are managed within the IP network. The control signalling is handled separately from the actual voice streams. Umbrella call control systems include H.323, SIP, MGCP, SCCP (Cisco's version called 'Skinny') and Megaco (H.248). The call control mechanism will not only set up the RTP/RTCP sessions but also negotiate parameters such as codec, media type, bit rate and other features about the call. There is a need to monitor the resources used by each call and to maintain a database of the call records. This provides the ability then to control who is allowed to call and what resources they are allowed to use. Call control gives you the ability to route a call based on the dialled number, this therefore requires a way of registering and resolving addresses (numbers). Using the call control system in an IP environment you can decide whether to administer the calls from a centralised point or in a distributed way.

H.323

As a Call Control Protocol H.323 has four main components:

Terminal - an intelligent endpoint which could be a phone, video device, PC software etc.
Gateway - a endpoint device that converts from the PSTN (non H.323) to the H.323 environment
Gatekeeper - address translation between zones, admission and bandwidth control
Multipoint Control Unit - allows point-to-multipoint communications with multiple H.323 terminals.

The H.323 umbrella set of protocols was originally designed to manage multimedia traffic over LANs and WANs, and was used originally for video conferencing. H.323 has been extended in version 2 to cater for the VoIP environment and is the most widely used call control protocol for VoIP. The call control encoding uses Abstract Syntax Notation (ASN.1). H.323 versions 3 and 4 have been developed recently and allow greater flexibility in the choice of transport protocol (UDP or TCP). H.323 has its origins in ISDN's Q.931 and uses the G-series voice coders as well as the H-series video coders. H.323 version 2 covers the following areas:

RAS Signalling Channel - H.225.0 is used in Registration, Admission and Status (RAS) messages between endpoints (gateways or terminals) and Gatekeepers dealing with registrations, bandwidth changes etc. RAS basically says 'hi, I'm here with my IP address and phone number'. RAS uses UDP port 1719 for the RAS messages and UDP port 1718 for unicast gatekeeper discovery. If there are no gatekeepers, then there are no RAS messages.
Call Signalling Channel - H.225.0 based on Q.931, H.225 allows endpoints to use call setup procedures in order to create connections with other endpoints. This uses TCP port 1720.
Call Control Channel - H.245 transmits control messages between VoIP components such as signalling, capabilities, timers, mode requests etc. The capabilities are the IP addresses, the ports to be used and the codec.

The following diagram illustrates the structure of H.323 in IP:

The H.323 terminal is designed mainly for audio communication and it can interact with other multimedia terminals:

H.310 terminals on Broadband ISDN
H.320 terminals on ISDN
H.321 terminals on Broadband ISDN
H.322 terminals on guaranteed QoS LANs
H.324 terminals on Switched Circuit Networks (SCN) and wireless networks

The H.323 allows for an optional Gatekeeper that can provide the following functions:

Address translation
Admission control
Bandwidth management and control
Management of Zones
Call Control signalling, Management and Authorisation

An example of bandwidth management is when a G.711 call comes in say from device A and goes to device B, then B needs to transfer this call to device C on a remote site that uses G.729. The bandwidth requirement changes. H.323v2 can do this, H.323v1 could not.

The Gatekeeper gives scalability to a VoIP design and can rival the traditional telephony topology.

RAS messages are listed below:

Gatekeeper Messages
- GRQ - GatekeeperRequest sent by an endpoint to the gatekeeper multicast address 224.0.1.41
- GCF - GatekeeperConfirm
- GRJ - GatekeeperReject
Registration Messages
- RRQ - RegistrationRequest sent by an endpoint to its Gatekeeper
- RCF - RegistrationConfirm
- RRJ - RegistrationReject
Unregistration Messages
- URQ - UnregistrationRequest sent by an endpoint to unregister
- UCF - UnregistrationConfirm
- URJ - UnregistrationReject
Bandwidth Change Messages
- BRQ - BandwidthChangeRequest sent by an endpoint
- BCF - BandwidthChangeConfirm
- BRJ - BandwidthChangeReject
Location Messages
- LRQ - LocationRequest sent by an endpoint or Gatekeeper either to a known Gatekeeper or to the Gatekeeper multicast address. This is a request to translate an E.164 address/number.
- LCF - LocationConfirm
- LRJ - LocationReject
Call Admission Messages
- ARQ - AdmissionRequest sent by an endpoint to a gatekeeper including the remote endpoint and the required bandwidth for the call.
- ACF - AdmissionConfirm
- ARJ - AdmissionReject
Disengage Messages
- DRQ - DisengageRequest
- DCF - DisengageConfirm
- DRJ - DisengageReject
Status Messages
- IRQ - InfoRequest
- IRR - InfoRequestResponse
- IACK - InfoRequestAck
- INAK - InfoRequestNack

In a large VoIP telephone network it is impractical to configure dial peers for every single phone so the idea of a H.323 Gatekeeper has been introduced that holds a database of phone numbers and host names (IP addresses) that is referenced by VoIP routers (also called Gateways). So Gateways are Voice Capable Routers (VCR) that convert analogue voice to digital, PSTN to H.323 call control and provides call setup and call clearing. Gatekeepers translate phone numbers (E.164 addresses) to IP addresses and provides zone management for scalability etc.

The following diagram illustrates the sequence of events the H.323 protocol architecture goes through when operating with multiple Gatekeepers:

Each gateway has a dial peer configured to point to their own Gatekeeper rather than have lots of dial peers one for each phone number. This is analogous to the IP default gateway.

Take the worst case scenario where no devices know about each other, using the numbered arrows, the sequence of events when phone A wants to call Phone B operates as follows:

Registration Request (RRQ) - (H.225 (RAS) on UDP port 1719) I am GatewayA with IP address x.x.x.x and my E.164 number.
Registration Confirm (RCF) back from the Gatekeeper.
Admission Request (ARQ) - I have an extension number 100 and I want a certain amount of bandwidth to call it, can I register?
Admission Confirm (ACF) - GatekeeperA registers GatewayA, the number is now on the database. The Gatekeeper can reject the registration if it wants to.
Admission Request (ARQ) - Where is phone number 200? What is it's IP address?
Request (LRQ) - Where is phone number 200? What is it's IP address?
Confirm (LCF) - 200 is GatewayB.
Admission Confirm (ACF) - 200 is GatewayB.
H.225 setup - setup the call to 200 using H.225.0 on TCP port 1720.
Admission Request (ARQ) - from GatewayB, can I accept the incoming call?
Admission Confirm (ACF) - GatekeeperB says yes you can.
Response to call setup - from GatewayB
H.245 exchange - exchange capabilities and open the logical channel.
RTP - UDP media exchange i.e. the voice packets, between the endpoints.
RTCP - UDP RTP Control channel set up between endpoints.
DisengageRequest (DRQ) - From both Gateways to their respective Gatekeepers once the Phones have completed the call.
DisengageConfirm (DCF) - From both Gatekeepers to their respective Gateways.

Databases can be localised to zones rather than have setup traffic all over the Wide area to just one database. Rereferencing the zones, or routing to these zones is done via the area code e.g. 0207 for London. A Supergatekeeper (or Directory Gatekeeper) can be configured that only knows the area codes rather than the individual phones numbers. This hierarchical arrangement is similar in nature to DNS.

If no Gatekeepers are involved then the gateways need to know of the other gateway via IP address or DNS name. These gateways set up TCP H.225.0 call signalling channel between themselves rather than use a Gatekeeper. There is no need for the endpoints to go through the RAS on UDP registration procedure.

In H.323 v1 the gateway went through the whole RAS registration process every 30 seconds. In H.323 v2 the full registration need only occur at the start but within the RRQ message the endpoint states a TTL. The gatekeeper responds by decrementing the TTL in the RCF message. Just before it expires the endpoint sends a RRQ with the keepalive bit set to TRUE which refreshes the registration for that endpoint. Because there are a number of transactions going on within the H.323 set up, there is the capability in H.323 v2 of speeding up the call process by utilising Fast Connect Call Setup. When a Gateway intitiates a Call setup with another Gateway using H.225 on TCP port 1720, then the control channel using H.245 is combined with this so that capabilities and logical channel setup are exchanged within the same session. The RTP/RTCP streams are still separate.

Resilience

Because of the critical nature of the Gateway and Gatekeeper, there are methods in design that provide resilience. Using a protocol such as HSRP or VRRP, multiple gatekeepers can share the same virtual MAC and IP addresses. Only one is active, the other standby in case of failure. Flows are momentarily disrupted on a failure as the failover is not stateful.

A Gateway can be set up with multiple Gatekeepers from which it can pick one to use in case one has failed, or it can multicast out in order to find a Gatekeeper. H.323 allows an endpoint to be associated with only one Gatekeeper at a time.

Gatekeepers send each other location requests when trying to find endpoints. If more than one Gatekeeper is configured for a particular prefix, then any one of these Gatekeepers can respond. Similarly, multiple Gateways can also be configured with the same prefix.

An additional element is the prepending of the Technology Prefix to the dialled number. This may be done by the gateway or the gatekeeper. Either way the gatekeeper checks the prefix and examines its technology prefix table to see which gateway(s) are registered with that prefix. The prefix identifies the capabilities of the gateway and therefore that which the call requires. The ITU have defined technology prefix characters, some of which are as follows:

1# - Voice Gateways
2# - H.320 Gateways (ISDN video conferencing)
3# - Voicemail Gateways

Conferencing

Conferences where more than two users communicate, can take a number of forms. H.323 provides support for Multipoint conferences via the following components:

Multipoint Controller (MC) - sets up an H.245 Control channel with each endpoint.
Multipoint Processor (MP) - processes and mixes streams so that multiple streams can be sent to one or more endpoints.
Multiple Control Unit (MCU) - a unit that contains the MC and may also have an MP.

The Centralised Conference is where the endpoints have their data, audio and video channels connected to the MP. This allows each endpoint to operate using different codecs and the MCU can decode into PCM for commonality. In a Decentralised Conference, the endpoints multicast the data, audio and video streams to each other rather than be connected to a central MP. This means that the same codecs must be used. H.323 does allow a Hybrid where one stream (e.g. audio) may operate in a centralised manner whilst another stream operates in a decentralised manner. An Ad-hoc Conference is where two endpoints in a call decide to convert their call in to a point-to-point conference and invite others to join them. They either use an MC that is near by or a Gatekeeper.

Implications On Security

H.323 can use many ports, so a firewall has to understand H.323 and look for call set ups before it allows through the UDP ports that RTP/RTCP use. In order to do this the firewall has to keep track of the flows and has to rid the allowable ports from its table when the respective flows have finished.

H.323 supports the concept of an H.323 Proxy Server. You may have the situation where you may wish to provide network security for the IP telephony endpoints such that remote endpoints are unable to see the local endpoints. The Proxy server not only can act on behalf of the Gatekeeper, it can also act on behalf of an endpoint. When a local endpoint wishes to reach a remote endpoint, communication occurs between the local endpoint and its local Gatekeeper. The local Gatekeeper finds the remote Gatekeeper who refers the local Gatekeeper to the remote Proxy. The local Gatekeeper tells the lcoal endpoint that it needs to talk to the local Proxy. Both local and remote Proxies talk and use their respective gatekeepers as they complete the call between the local and remote endpoints.

Session Initiation Protocol (SIP)

The RFC for SIP RFC 2543 has been superceded by RFC 3261 (SIP v2). SIP is used to provide signalling and control which establishes, maintains and terminates multimedia sessions. SIP uses the concept of session invitations using protocols such as Session Announcement Protocol (SAP) (RFC 2974) and Session Description Protocol (SDP) (RFC 2327). The signalling sits on TCP or UDP.

Addressing is dealt with using HTTP, E.164 and E-mail. Location of services is managed by DNS (DNS SRV record) and call routing is dealt with by Telephony Routing over IP (TRIP). Using text-based protocols makes SIP easier to troubleshoot. SIP supports Intelligent Network (IN) telephony subscriber services such as name mapping, redirection and personal mobility.

SIP sessions are peer-to-peer where the peers are called User Agents (UA). A User Agent Client (UAC) initiates a request whereas a User Agent Server (UAS) contacts the destination and responds to the request on behalf of the destination. Telephones and Gateways can be UACs or UASs.

As well as UACs and UASs, there are also SIP servers:

Proxy Server - forwards SIP requests on behalf of clients or other Proxy Servers. Proxy Servers can perform call routing, access control and security.
Redirect Server - tells the UA which server it should communicate with.
Registrar Server - processes requests from clients that register their location.
Location Server - provides address resolution to Proxy or Redirect Servers, either using its own tools or by accessing other tools such as Finger, rwhois or LDAP.

Messages are based on RFC 822 and RFC 2068 (HTTP) and there are two types, Request and Response.

Request - containing Request Line, Header Line and Message Body.
Response - containing Status Line, Header Line and Message Body.

There are four headers, General, Entity, Request and Response.

The Request line contains a Method that determines what the receipient (e.g. a server) should do. There are six methods:

INVITE - client invites a server to join in a session, includes session parameters.
ACK - response received by client.
BYE - client or server initiates the termination of the call.
CANCEL - client or server cancels any request.
OPTIONS - client obtains the server capabilities.
REGISTER - provides information to a server periodically refreshed.

Response messages use codes similar to HTTP and are grouped as follows:

1XX - Information
2XX - Successful
3XX - Redirection
4XX - Client error
5XX - Server error
6XX - Global Failure

A SIP address contains an optional user ID, a host description (domain name or IP address) and optional parameters (e.g. password). Identification of the address begins with sip: (or sips: for secure SIP) and could simply take the form sip:jsmith@test.com or maybe sip:jsmith@10.1.1.1. You could have more complex addresses such as sips:23057731944@test.com;user=phone, indicating the use of secure SIP and E.164 addressing. Another example is sip:113957216;password=pass@test.com, where a user ID is being used instead of an E.164 address, plus a password has been assigned.

Endpoints (UAs) register addresses with the Registrar server. An address can be resolved by a variety of means as described earlier.

Call Setup Using Direct Communication

If the UAC knows the UAS address then they communicate directly as follows:

Call Setup Via A SIP Proxy Server

Call Setup Via A SIP Redirect Server

For resilience, multiple Proxy and/or Redirect servers can be configured on the UAs. Additionally, each server can be configured with the same DNS name.

Media Gateway Control Protocol (MGCP)

Overview

H.323 and SIP operate as peer-to-peer signalling control protocols where endpoints have intelligence. Simple Gateway Control Protocol (SGCP) (developed by Telcordia) has a different approach based on stimulus and response where the endpoints are dumb. Cisco has its own version of SCCP called Simple Client Control Protocol (SCCP) (also known as Skinny) and is used on its IP phones. Level 3 also developed Internet Protocol Device Control (IPDC). IPDC and SGCP were designed to improve on H.323, these were combined with the backing of the IETF into MGCP v1. Lucent designed Media Device Control Protocol (MDCP) and MEGACO developed the Media Gateway Controller. The aim is to combine the benefits of these with MGCP v1 to create an enhanced MGCP.

RFC 2705 describes MGCP version 1.0 and its architecture is defined in RFC 2805. MGCP is standardised by the IETF and has a centralised architecture where a central Call Agent acts as a Media Gateway Controller. Gateways and Endpoints rely on the Call Agent for instruction. E.164 addressing is used and communication between the Call Agent and Endpoints/Gateways uses SDP on UDP and uses text.

Endpoints

The Call Agent needs to understand the various types of Endpoints that exist along with their capabilities. These are as follows:

DS0 - single channel
Analogue Line - e.g. FXS, FXO
Announcement Server access point
Interactive Voice Response (IVR) access point.
Conference Bridge access point
Packet Relay - an access point that bridges between incompatible gateways
Wiretap access point - for recording and playing back communications.
ATM trunk-side interface - an audio channel in an ATM network.

An endpoint has an identifier which is made up of a local name and the domain name of the gateway. These are separated by an @, an example is endpointname@gatewaydomain.com.

Gateways

Here are the seven types of Gateway:

Trunk Gateway SS7 User Part (ISUP) - ISDN signalling endpoints
Trunk Gateway Multifrequency (MF) - digital or analogue MF signalling endpoints
Network Access Server (NAS) - connects to endpoints that use modems for data
Combined NAS and VoIP Gateway - connects to endpoints that use modems for data and VoIP
Access Gateway - supports digital and analogue endpoints attached to a PBX.
Residential Gateway - connects to endpoints that have traditional analogue interfaces
Announcement server - connects to endpoints that access announcement servers

Call Setup and Connections

The Calls and Connections in MGCP by default use UDP port 2427 and centre around the Call Agent as illustrated below:

The Call Agent requests the Gateways to Notify it of themselves and their endpoints. The Gateways duly comply. The RQNT often contains relevant Event/Signal packages plus a Dial Map so that the gateway(s) can collect digits before notifying (NTFY) the Call Agent. In addition, the Call Agent might include events for the gateway to monitor.
The Call Agent tells the gateway to create the connection (CRCX) plus which RTP ports to use. The gateways respond, in this case Gateway B informs the Call Agent which session parameters to use e.g. RTP and RTCP ports. Gateway B's CRCX response is an encapsulated RQNT in SDP.
The Call Agent now tells Gateway A to modify its session parameters to match those of Gateway B. The MDCX is encapsulated RQNT in SDP.
The RTP media stream starts.
At the end of the call, Endpoint A hangs up and Gateway A notifies the Call Agent. The Call Agent then instructs gateway A to delete the call (DLCX). Once this has been acknowledged, the Call Agent instructs Gateway B to delete the call aswell.

For multipoint calls, the Call Agent instructs and expects an endpoint to be able to do this.

Rather than the gateway having to communicate with the Call Agent every time a digit is dialled, the Dial Plan is loaded on to the gateway in the form of a Digit Map.

Events

Events are monitored by the Gateway and the Call Agent instructs the Gateway what to do when these events occur. MGCP Events are defines as follows:

Continuity Detected
Continuity Tone
DTMF digits
Fax Tones
Flash Hook
Modem Tones
Going Off-hook (code = hd)
Going On-hook

Signals

The following Signals are used by the Call Agent to instruct the gateway:

Answer Tone
Busy Tone
Call Waiting Tone
Confirm Tone
Continuity Test
Continuity Tone
Dial Tone (code = dl)
Distinctive Ringing
DTMF Tones
Intercept Tone
Network Congestion Tone
Off-hook Warning Tone
Pre-emption Tone
Ring Back Tone
Ringing

Control Commands

There are nine Control Commands used by the Call Agent and the Gateways. We have briefly touched on a few of these earlier when looking at the call setup.

EDCF (EndpointConfiguration) - the Call Agent sends this to find out the coding characteristics of an endpoint interface.
RQNT (NotificationRequest) - the Call Agent tells the gateway to look out for an endpoint event and to take an action.
NTFY (Notify) - the gateway notifies the Call Agent that an event has occurred.
CRCX (CreateConnection) - the Call Agent tells the gateway to set up a connection with an endpoint.
MDCX (ModifyConnection) - the Call Agent tells the gateway to update the session parameters for a connection.
DLCX (DeleteConnection) - the Call Agent could send this or the gateway might if it lacks resources.
AUEP (AuditEndpoint) - the Call Agent sends this to obtain the status of an endpoint.
AUCX (AuditConnection) - the Call Agent sends this to obtain the status of a connection.
RSIP (RestartInProgress) - the gateway sends this to tell the Call Agent that the endpoints are no longer available.

Packages

Events and Signals are grouped together in Packages. A Package contains events and signals that are relevant to a particular type of endpoint. RFC 2705 defines the following ten packages:

G - Generic Media
D - DTMF
M - MF
T - Trunk
L - Line
H - Handset
R - RTP
N - Network Access Server (NAS)
A - Announcement Server
S - Script

In addition, RFC 3064 defines CAS packages and RFC 3149 defines business phone packages.

Gateways often handle different types of endpoints, so different types of gateway are assigned different packages:

Trunk Gateway SS7 User Part (ISUP) - G, D, T, R
Trunk Gateway Multifrequency (MF) - G, M, D, T, R
Network Access Server (NAS) - G, M, T, N
Combined NAS and VoIP Gateway - G, M, D, T, N, R
Access Gateway (VoIP) - G, D, M, R
Access Gateway (VoIP & NAS) - G, D, M, N, R
Residential Gateway - G, D, L, R
Announcement server - A, R

VoIP Implementations

Computer Telephony Integration (CTI) - interaction with customer database information based on the dialling number, softphones, IVR, call centre management systems etc.
Unified Messaging - linking with Microsoft Exchange allows interaction with E-mail, VoIP calls, FAX and messaging applications such as MSN. Individuals can use any of these media to make contact.
IP Centrex - central control of IP telephony functions rather than have an organisation manage its own system
Hospitality - provision of long distance LAN and voice access from a hotel environment
Multi-tenant
Pre and Post paid Calling Card
Hoot and Holler - always on multi-user conferences
Collaborative Computing - using distribute servers applications such as whiteboard software, video streaming, FTP, IP phones etc. can be used to provide a work environment for a group.
Call Centres
Toll Bypass - bypass the TDM networks provided by the PSTN
Voice XML - Voice Extensible Markup Language allows you to use voice to control web applications.

VoIP Implementations

In general, Voice mail operates as follows:

Caller A calls B
After 3 rings the call is diverted to the voice mail which is attached to the PBX.
The PBX instructs the voice mail system to play the recorded greeting and Caller A leaves a message.
The PBX sets the voltage on B's line so that a Message Waiting Indicator (MWI) lights up.

The information passed between the Voice Mail system and the PBX include the calling number, the called number, message waiting information and the reason for the call not being answered. As far as the PBX connection to the Voice Mail system is concerned there are a variety of ways in which signalling is passed:

Bellcore's standard Simplified Messaging Desk Interface (SMDI) which uses an out of band serial link.
Proprietary In-band signalling using DTMF tones.
Voice mail line cards that emulate the PBX.

Linking Voice Mail systems together across the PSTN is generally carried out using the standard but inefficient Audio Messaging Interchange Specification (AMIS). There is a proposed standard called Voice Profile for Internet Mail (VPIM) which uses TCP/IP, SMTP and MIME to link voice mail systems across an IP network. VPIMv2 necessitated the use of G.726 only, however VPIMv3 supports G.711, G.726 and G.723.1 as well as Microsoft-Global System Mobile (MS-GSM).

The Dial Plan

A Dial Plan is a set of rules that governs what becomes of incoming and outgoing calls. Getting the dial plan correct at the beginning can save not only alot of money but it can also ease the administration of the voice system, provide good security and improve reliability.

Whether the voice network is a traditional PSTN-based one, a VoIP network or a mixture, the dial plan structure is essential for efficient call routing and management. As well as each country having its own national dial plan, individual organisations also need to device their own internal dial plan that makes sense and uses the most efficient paths for calls.

Cost savings are made by keeping calls on-net as much as possible so that you bypass the tolls imposed by the PSTN. These same routing configurations can be used to provide resilience so that should there be a problem with the network, the call can be routed off-net. The dial plan can determine who is allowed to use expensive CPU and DSP resources and thereby prevent overloading of resources. The dial plan is analogous to static IP routing other than we are using E.164 numbers instead of IP addresses.

PBXs implement dial plans using tables which would typically be the following:

Lead Digit Table - for the first digit e.g. '0' or '9'
Route List - how calls are routed based on time of day, permissions and available capacity
Special Number Table
Local Number Table
Time Table - gives the ability to route groups of numbers depending on the time of day.
Class of Service Table - this contains access groups that determine the type of access users have e.g.
- COS 1 - Lobby phone with emergency access only and maybe one internal number
- COS 2 - Admin phone with internal and emergency access only
- COS 3 - Sales phone with local, long distance and emergency access
- COS 4 - Managers phone with local, long distance, emergency and some premium lines access
- COS 5 - Executive phone with no restrictions
Auto Attendant Table

The dial plan can be configured to limit calls to mobile phones, or limit inbound calls. The Class of Service groups can be used to limit access to features.

You may use the dial plan to use different providers. In the US 10-10 dialling provides the ability to select a different provider. To do this the PBX needs to be able to strip and insert digits accordingly in order to influence the routing.

Basic Enterprise Dial Plan

The dial plan will include the following:

Internal extensions are mapped to DID numbers so that external callers can go straight to the individual within the organisation.
Outgoing calls will have the switchboard number rather than the internal extension number.
Inter-office calls use 4 digits to give plenty of numbers.
A number starting with 9 will be broken out on to the PSTN straightaway.
Emergency numbers such as 999 or 911 are directed straight out on to the PSTN.
Special extensions are created for other sites so that calls can remain On-net.

North American Dial Plan

The aim with a dial plan is to make it scalable in order to make call routing more simple. This is often carried out in a hierarchical manner done by summarising number addresses. The ITU developed the E.164 numbering system to provide some form of international agreement on numbering plans. The North American numbering plan is based on 10 digits where 3 numbers are used for the area codes and 7 numbers for the phone number. On seeing an area code such as '123' the CO switch can ignore the following 7 digits and forward the call on to the relevant area switch straightaway reducing the post dial delay.

The North American Dial Plan adheres to the E.164 international plan and takes the following form:

Transit Network (Long Distance Carrier) - 3-4 digits
Country Code - 1-3 digits
Area Code - 3 digits
Exchange - 3 digits
Extension - 4 digits

The US emergency service number is 911 and was introduced in 1968 to unify all the different emergency service numbers and to ensure consistency. This has been augmented by the introduction of Enhanced 911 (E911) and is required by law in some places. E911 has the following enhancements:

Automatic Number Identification (ANI) so calling number can be identified.
Automatic Location Identification (ALI) so caller can be found within 1000ft, due to the information included with the number includes the address, office and nearest emergency department.

CLI information is always passed through the network. If it is blocked, then it is the remote PBX that does the blocking.

A specialised switch at the Central Office called a Selective Router (SR) is used to route emergency calls and links in with the office PBX. The emergency calls are manned by personnel located at the Public Service Answering Point (PSAP). In a Multiline Telephone System (MLTS) such as an office, the location of the individual call is difficult to place because the ALI information just indicates where the PBX is located not the individual. This is different of course for a domestic caller. To get round this problem of knowing where an office caller is located, the PBX maintains a ANI-ALI database that contains the location of extension numbers. The PSAP grabs this information off the PBX in order to know more precisely where the call originated.

Home

Disclaimer