Integration of voice into data networks is now become a reality so this document has been written to
give an overview of traditional circuit-switched
voice operation and the elements that allow it to become part of a data network. Then we look at
current packetised voice over data technologies, in particular Voice over IP (VoIP).
Traditional Switched Circuit Voice Operation
In the USA 1974 saw an anti-trust suit brought against AT&T due to its very large userbase of telephone switches
giving it an overwhelming monopoly within the voice provision market. The Modified Final Judgement in 1982
resulted in AT&T having its local call access given to seven Regional Holding Companies which were
nick-named 'Baby Bells'. AT&T kept its manufacturing businesses and its long distance services.
The structure for telephone provides now means that local telephony services are provided by a Local Exchange Carrier (LEC).
The LEC is however restricted in its operation to within its Local Access Transport Area (LATA). Calls between
LATAs have to be handled by an Inter Exchange Carrier (IEC or IXC). There are 200 LATAs within the US.
The services that are offered by telephone companies include Plain Old Telephone Service (POTS) and
Custom Local Area Signalling Services (CLASS) which enhances POTS by providing call screening, security
and display features. Also available is Advanced Intelligent Networking which brings the CLASS-type
features back into a centralised database, examples include Centrex that provides a virtual PBX with most of the features of a PBX
being supplied by the CO.
The Central Office (CO) switches are provided by the Telephony service providers and provide the
service for the Public Switched Telephone Network (PSTN) and for business environments.
Businesses can also have their own switch called a Private Branch Exchange (PBX)
which is a smaller version of the CO switch. The connections between the
CO switches are called Interoffice Trunks and carry all the calls. Circuits that connect the
CO switch to the business Private Branch Exchange (PBX) are called CO Trunks
whereas trunks between PBXs are called Tie Trunks. A user that requires access
to the CO trunk normally dials a code such as '9' to access it.
Trunks provide the paths between switches and often have many circuits which are 'grabbed' and
'released' as and when required for calls.
The CO switches forming the PSTN also provide the connections to domestic and small business
telephones. Each connection is serviced via the Local Loop which is a two-wire connection.
The PSTN is circuit-switched and guarantees end-to-end connection during the call. The resources
associated with that call are tied up for the duration of the call.
Traditionally, telephones use analogue technology, however many organisations also use digital
telephones that contain analogue to digital converters. Most PBXs are digital.
Although the PBX is a cut down versions of the CO switch, the Key Switch or Key System is NOT a cut
down version of the PBX. Key Switches tend to support up to a maximum of 250 users, incoming calls tend to be visible to all users
and these users can grab outside lines directly rather than having to go through the PBX switch by dialling 9.
Key switches tend to be analogue-based and they cannot switch between trunks so they cannot re-route calls out on
to a different trunk.
PBXs are linked to COs via trunk links such as E1 links which are able to take 30 x 64kbps
channels where each channel can take one call. You can get also get trunk links between PBXs which are
called private trunk links. Other types of links that can act as trunks include T1(DS1), ISDN BRI,
ISDN PRI and fractional T1/E1.
When linking a number of PBXs e.g. for a large organisation with a number of offices, the ideal would
be to fully mesh the links.
A fully-meshed is expensive both in WAN link costs and interface card costs. As you can see in the above diagram, the fully-meshed topology
requires significantly more circuits (6) and more interfaces compared to a partially-meshed topology (only 3 circuits).
A Tandem Switched approach is often used instead as a compromise. The difficulty with Tandem switched networks is
that there are now multiple hops to some destinations which incur delay in the call. In the above example, for A to reach B,
the call must first be routed to C before being routed to B. The PBXs involved in Tandem Switching must be able to
route calls from an inbound trunk onto an outbound trunk and two timeslots are used.
The Private Branch Exchange (PBX)
The mechanical PBX was invented by an undertaker called Strowger in Kansas, USA around 1889.
The idea was to replace a manual operator and allow the caller to decide where a call was to be
The PBX or PABX ('A' for Automatic) contains many features which can include:
- Lowest Cost Path Routing - having the preferred paths, based on cost, line quality, reliability and
- Automatic Call Distribution (ACD) - finding available telephones in a pool, often called Hunt Groups.
- Voice Mail
- Call Forwarding - automatically forwarding of calls when a telephone is unavailable.
- Calling Line Identification (CLI) - that maps the caller's number to a name in a database.
- Calling Number Blocking - blocking of unwanted numbers.
- Voice Conferencing - allowing a conversation to occur between more than two people.
The CO switch contains a Battery that provides power for both ringing and for the call itself.
Local power to the basic analogue telephone is not required. The switch also contains the following
components that enable the basic telephone to function:
- Current Detector - this monitors whether the circuit is open (On Hook) where no current
is flowing; or closed (Off Hook) where current is flowing.
- Dial Tone Generator - this tone indicates that the switch has recognised a request by
a user to make a call.
- Digit Register - this recognises and deals with the dialled digits.
- Ring Generator - this sends a rining signal to the called party.
Connections to the switch occur via the Terminal Interface and these connections tend to be
the trunk, the lines and the telephone connections. The transmission paths between the end devices are
provided by the Circuit Switching Network portion of the switch, whereas the Control
Complex provides the following:
- Call Setup
- Call Supervision
- Call Disconnection
An important feature of the PBX is its Call Accounting ability. A PBX maintains Call Detail
Records (CDR), or Station Message Detail Recording (SMDR), and outputs this information
to an external computer, often as 60 character length strings. The information often contains the
length and cost of calls per user, department etc. Plus where the calls were directed i.e.
over the trunk, local or international.
Telephone services can be provided in different ways, the analogue Plain Old Telephone Services
(POTS), CLASS and AIN with intelligent features held in a central intelligent
database and the system operates like a virtual PBX spread over several sites. An example of AIN
is the Centrex system.
Central Office Exchange Service (Centrex)
Centrex is a way of off-loading the responsibility and cost of maintaining a PBX on to the Central
Office which houses the Digital switch instead. Centrex is a service which provides reliability,
resilience, support for all types of telephone equipment, support for DID,
flexible upgrades, no risk of obsolete equipment and unlimited expansion.
Centrex can have high recurring costs, plus the response time for changes or additions may not be
as speedy as if the organisation had its own PBX. In some countries, certain features that exist on
a private PBX may not be permitted on a Centrex system.
Telephone Call Operation
Using Loop-Start signalling, traditional telephone systems operate more or less as follows:
- The telephone starts off by being On Hook (Idle).
- The caller lifts the handset, this is called going Off Hook and tells the switch that you
wish to make a call, the telephone 'seizes' the line.
- The initial electrical circuit is set up because the by going Off Hook the circuit
is made and the battery can send current. The CO switch now knows that a call is being requested
and acknowledge the seizure of the line by 'Winking' the circuit.
- The telephone switch either public or a Private Branch Exchange (PBX) returns a dial tone
(2500Hz in the UK) which informs the caller that the switch is ready to receive dialled digits.
- The number to be called is dialled.
- In a private organisation if it is an external call then the PBX makes a routing decision and
using network signalling setup messages,
requests a 64Kbps slot in the trunk link to the Central Office (CO) e.g. E1 or T1.
- The CO sets up a path based on the number, it does this by 'seizing' a circuit and sending a request
to the destination PBX.
- The PBX at the other end learns of the call.
- The PBX at the other end sets up an AC voltage (20 - 47Hz) for the ringing of the remote telephone.
- The local PBX sends a ringback tone to the caller to inform them that the phone is ringing at the
- The telephone handset is picked up and the loop is established local to the called party.
- The ringing voltage and ringback tones are removed from the circuit.
- Acoustic couplers in the phones convert the speech into modulating current that is transmitted
- Part of the signal is fed back into the talking person's earpiece. This is called Sidetone
and is a comfort signal.
Release signals can vary from switch manufacturer to switch manufacturer. Some switches are able
to measure the time from going off hook until the first digit is dialled. If this exceeds a pre-defined
time limit then the loop may be connected to an announcement and/or a Receiver Off Hook (ROH)
The signalling between the subscriber switches and the telephony service providers can be identified
- Supervisory Signalling - electrical voltages and tones that can be heard are used to
signify call status as follows:
- On-hook - produces an open circuit which does not allow any signalling, only
the ringer can operate.
- Off-hook - lifting the handset closes the circuit and allows the telephone switch
to send an audible dial tone to the receiver.
- Ringing - the switch sends a ringing voltage to the destination telephone as notification
of an incoming call. Also an audible ringing tone is sent to the caller telephone to indicate
that the call is progressing. This tone takes the form of a pattern called Cadence
In Europe this Cadence takes the form of a double ring (duration of 0.4s separated by 0.2s)
followed by two seconds of silence, whereas in the US it takes the form of two seconds of ring
followed by four seconds of silence.
- Address Signalling - there are two types of dialling:
- Pulse Dialling - this is the original form of dialling a number. The telephone
has a rotary dial mounted on to a spring that returns the dial to its original position
when it is turn. Each number is identified by the switch by how many makes and breaks
are made of the local loop. The ratio of make to break must be 40% : 60%. The number
of make/break cycles corresponds to the number being dialled. Each position on the rotary dial
corresponds to a different number. Typically the cam that causes the makes and breaks
will give 10-20 pulses a second.
- Tone Dialling - Now more commonly used is the Dual Tone Multi-Frequency (DTMF)
method that uses the concept of the keypad where each key position is represented by two tones.
Each row is assigned a different low frequency whilst each column is assigned a different
- When a key is pressed two tones are sent to the telephone company a low
frequency tone and a high frequency tone which identify the key being pressed in much the same
way X and Y co-ordinates identify a point on a graph.
- Informational Signalling - The following tones are used to describe the call progress:
- Dial Tone - (Continuous 350Hz + 440Hz) indicates that the the switch is ready to
- Busy Tone - (480Hz + 620Hz, 0.5s on and 0.5s off) indicates that the other end is busy.
- Line Ring Back - (440Hz + 480Hz, 2s on and 4s off) means that the telephone company is
in the process of completing a call on behalf of the caller.
- PBX Ring Back - (440Hz + 480Hz, 1s on and 3s off) means that the switch is
in the process of completing a call on behalf of the caller.
- Congestion - (480Hz + 620Hz, 0.2s on and 0.3s off) means that there is congestion
in the network along the path so that the call cannot be set up.
- Reorder - (480Hz + 620Hz, 0.3s on and 0.2s off) means that all the circuits are busy
on the local switch.
- Receiver Off Hook - (1400Hz + 2060Hz + 2450Hz + 2600Hz, 0.1s on and 0.1s off) means
that the other end has left the receiver off the hook.
- No Such Number - (Continuous 200Hz + 400Hz) means that the dialled number does not exist.
- Confirmation Tone - (Noise at a frequency of 1Hz sounds like a slow rasping noise)
means that the call setup is being attempted.
Foreign Exchange Station (FXS)
The Foreign Exchange Station (FXS) interface provides an analogue connection to a Group 3 fax
or analogue phone. The FXS interface imitates a switch and provides power, ring voltage and
dial tone just as a PBX telephone port would. The trunk side of a Key system or lines going to the CO switch from
a PBX would use an FXS port.
Normally an FXS port used for an analogue phone would be set to Loop Start signalling, where as if a Key System or PBX
is connected then Ground Start signalling would be preferred (see later for signalling).
The Call Progress Tone is country dependent
and includes the dial tone, busy tone and the ring back tone. The Cadence is also country dependent and
defines how the ringing voltage is sent when a call is required, in the UK this one short ring followed by a longer ring.
Foreign Exchange Office (FXO)
The Foreign Exchange Office (FXO) interface allows you to make an analogue connection to
a remote switch either a CO switch in the PSTN or a remote PBX. The switch sees the FXO interface
as a telephone and so an FXO port connects to the station side of the PBX
N.B. this is different from an FXS interface which expects a telephone to be connected
TO it i.e. it needs a dial tone. The FXO interface provides pulse or tone dialling.
This means that you can connect between an FXS interface and an FXO interface thereby providing
a Foreign Exchange (FX) Trunk. This allows you to set up a long distance extension
for a local phone line (called an Off-Premises Extension or OPX).
The signalling method used is normally Ground Start. You also configure the number of rings before the FXO port
answers a call, this allows you to redirect calls on a router say after 4 rings if you do not answer it.
The FXO port should also be configured for the dial type (pulse or DTMF) for outbound dialling.
FXO ports should be able to support Supervisory Disconnect where the port can detect the 350ms drop in power
from a connected switch and interpret this as a call disconnect.
E&M (Earth and Magneto)
The E&M (Earth and Magneto) (or RecEive and TransMit, or Ear and Mark)
interface is used for two-way analogue trunking between PBXs or network switches. The trunk link
carries E&M Lead Signalling which the carriers use to connect to the network Composite
Signalling (CX), Direct Current Signalling (DX) and Simplex Signalling (SX) circuits.
Nowadays newer digital technology has replaced these other circuits leaving us the E&M Lead Signalling
trunks that have yet to be replaced, mainly because the low capacity circuits that they are used for
do not warrant the upgrade yet.
There are two ends to the trunk circuit, the Signalling Unit side and the Trunk Circuit
side which is the PBX side. If a PBX needs to route a call across the trunk then it must make a request
on the signal leads to seize the trunk. Lead signalling occurs on separate wires from the voice wires
and is independent of how the voice wires are cabled (The voice wires (audio path) can be 2 or 4-wire).
As far as the signal leads are concerened,
the E-lead is used for inbound signalling (from signalling equipment to the trunk equipment)
whereas the M-lead for outbound signalling. Each of these leads has its own Ground wire.
The local PBX (Trunk equipment)
makes a request to seize the trunk by sending a current over the M-lead. The remote PBX
detects this request on its E-lead. Once the call is complete the remote PBX signals using the M-lead.
The signalling used with E&M can be Wink-Start, Delay-Start or Immediate-Start
The problem with single wire signalling leads is that although they have little impact in the old-world
electro-mechanical systems, nowadays sensitive electronics can be adversely affected by arcing and
EM interference. Signalling is far better to be carried out on balanced 2-wire circuits. As a result,
new E&M interfaces were introduced. There are five types of E&M interfaces.
E&M Type I
One wire is the E-lead; one wire the M-lead; one pair is used for the transmitted voice and one pair
is used for the received voice. The PBX supplies the power for both the M-lead and E-lead
and they have to use a common ground thereby restricting the use of E&M Type to within the same
The signalling end (CO) generates the E signal by the E-lead
being connected to local Ground for Off-Hook and open for On-hook. The PBX (Trunk end) can then detect
a current through a resistor. The M-lead is at 8v (85mA) when Off-hook and connected to local Ground
when On-Hook. The PBX generates an M-signal by connecting to Battery and the signalling end detects the
resultant current through a resistor.
E&M Type II
One wire is the E-lead; one wire the M-lead; one wire is the Signal ground (SG) for the
E-lead; one wire is the Signal Battery (SB) for the M-lead;
one pair is used for the transmitted voice and one pair is used for the received voice.
Having separate returns (SG and SB) for the signalling leads allows the PBXs to exist in
The E-lead (signalling device to trunk circuit) is open circuit when On-Hook and goes to Signal Ground
(SG) when Off-Hook and requesting a path. The M-lead (trunk circuit to the signalling device) is open
circuit when On-Hook and connects to the Signal Battery (SB) when Off-Hook.
The sensor on the M lead may be biased towards -24v. The diode prevents this negative voltage
appearing on the M lead if it is On-hook (open circuit).
E&M Type III
One wire is the E-lead; one wire the M-lead; one wire is the Signal ground (SG) for the
E-lead; one wire is the Signal Battery (SB) for the M-lead;
one pair is used for the transmitted voice and one pair is used for the received voice.
The one difference between Type III and Type II is that with Type III at the PBX (Trunk Circuit) the
M-lead has a relay that connects it to SG by default. This means that when the PBX
wants to signal using the M-lead, it first has to disconnect the relay. This prevents
spurious signals on the M-lead from signalling by mistake.
The E-lead is open circuit when On-Hook and goes to Ground when Off-Hook and requesting a path.
The current is much lower on the E lead due to the high resistance E lead detectors used.
The M-lead is also open circuit when On-Hook and connects to the Battery when Off-Hook.
The blocking diode on the M lead does not necessarily have to be there with Type III.
E&M Type IV
One wire is the E-lead; one wire the M-lead; one wire is the Signal ground (SG) for the
E-lead; one wire is the Signal Battery (SB) for the M-lead;
one pair is used for the transmitted voice and one pair is used for the received voice.
Both the SG and SB are grounded.
The E-lead is open circuit when On-Hook and goes to Ground when Off-Hook and requesting a path.
The M-lead is also open circuit when On-Hook and also connects to the Ground when Off-Hook.
Both circuits operate identically.
E&M Type V
One wire is the E-lead; one wire the M-lead; one pair is used for the transmitted voice and one pair
is used for the received voice. The PBX supplies the power for the E-lead, the other end
supplies power for the M-lead (N.B. this is where Type V differs from Type I)
and they use their local ground rather than common ground over SB and SG as in Type IV so this is unbalanced.
The E-lead is open circuit when On-Hook (Idle).
The signalling end generates the E signal by the E-lead being connected to Ground (Off-Hook).
The voltage can vary between -48v and -2v. The PBX
can then detect a current through a resistor. The M-lead is open circuit when On-Hook.
The PBX generates an M-signal by connecting to Ground
and the signalling end detects the resultant current through a resistor.
Type V interfaces can be connected back to back.
The following timers are often used to modify how a voice interface behaves:
- Ringing Timeout - How long a telephone is rung when nobody picks up the remote end.
- Initial Timeout - How long a dial tone will be sent before the first digit is dialled.
- Interdigit Timeout - How long the port waits after a digit has been dialled, before the next digit is dialled.
- DTMF Digit Timing - This is how long the DTMF digit signal lasts.
- DTMF Interdigit Timing - This is how long the gap lasts between the DTMF digit signals.
- Hookflash In Timing - The Hookflashes indicates that the caller wants to do something with the call such as transfer it.
The 'Hookflash In' time is that for an incoming call such that if it is set to be quite a long time then this means that
the calling phone has to be left off-hook for a long time before the call is cleared. Conversely, if the time is too short
then this may be mis-interpreted as the caller hanging up. This is used where telephones have Recall keys.
- Hookflash Out Timing - This is the Hookflash time that the voice port sends out.
In the UK the following diagram indicates the correct pin numbers for the LJU and the RJ45 as
specified by BT and AT&T respectively:
BT LJU circuit
The normal UK analogue telephone line into a home or a direct line to an office requires a socket wired according to the above diagrams.
The 'Wiring' end is typically the outside line wiring coming in from, say, BT, and the Line Jack Unit (LJU) is where the
telephone is plugged in. The first thing to note is that Surge Protection (SP) is required across the 'A' leg (Signal)
and the 'B' leg (Battery) for direct lines. Secondly, a capacitor is installed in line of the 'Bell' leg which provides
the capability for the handset to ring using the signal from the 'B' leg. Only the 'A' and 'B' legs need to be run in the
direct-line wiring to the socket, however these two plus the ring circuit are picked up at the LJU and run into the handset.
The socket and apparatus wiring is coded as follows:
- BK - Black
- W - White
- G - Green
- B - Blue
- R - Red
- WG - White with Green banding
- WB - White with Blue banding
- WO - White with Orange banding
- OW - Orange with White banding
- NW - Blue with White banding
- GW - Green with White banding
Note 1: - The wiring to the socket is of a single core type which is
secured into the socket terminals using an insulation displacement tool.
Note 2: - The cable most often used to connect apparatus to the wall socket is a multi-cored
tinsel type wire manufactured for it's flexibility.
With tinsel cable it's important that the correct tools are used when making terminations as the
individual stands of wire are very difficult to solder satisfactorily.
From the diagram it is clear that the wiring arrangement
is quite different to that of most countries.
Some countries, such as Ireland and New Zealand have a similar
wiring arrangement but use a different type of socket.
The principal differences between the UK and other countries is
the incorporation of a voltage arrestor device and components
(470kohm resistor and a 1.8uF capacitor) into the
The voltage arrestor device is a Gas Discharge Tube (GDT) component
intended to short circuit the A-Wire to the B-Wire in the event of voltages
exceeding approximately 250V becoming present on the telephone line.
This type of device is relatively slow acting and has been superceded by the
installation of a polyswitch type of device in the line interface of most newly
The 470kohm resistor and 1.8uF capacitor are installed in the wall socket to
allow testing of the telephone line from the telephone exchange.
The wall socket also contains a connection to the telecoms apparatus intended to
suppress inductive spikes which are generated when loop-disconnect dialling into
electro-mechanical exchanges which terminate the phone circuit with a relay coil.
Note: - These exchanges are presently being phased out of operation, which,
coupled with MF4 detectors being a design feature of the replacement exchanges is resulting
in the diminishing use of loop-disconnect dialling.
In the UK, If a direct line is run across data structured cabling, then a Full Master LJU-RJ45 voice adapter will be required
at the sockets. The Full Master adapter contains Surge Protection required for direct lines that do not go through a local switch, and it
is wired according to the diagram above and the diagram below. In 110 patching installations, only one pair patching is required to patch the
blue pair carrying the A and B legs.
Analogue voice circuits that are extensions served by a local switch, wired across SCS and use earth recall for services
such as transference of extensions from one handset to another, require PABX Master LJU-RJ45 voice adapters. The bell
circuit is still present but the surge protection is not required at the outlet since the local switch takes care of surge
protection at the point that the outside direct lines run into the switch. In 110 patching installations, two-pair patching
is needed to allow for the patching of the clean earth, which is jumpered on the white of the orange pair (i.e. the third
termination on the 110 block).
Analogue voice circuits wired across SCS that use 'Time Break' as a means of control, only require 'Secondary' LJU-RJ45 voice
adapters. These adapters only have LJU pins 2 and 5 wired straight through to RJ45 pins 5 and 4. In 110 patching installations
this presents the voice circuit on the blue pair.
Different Voice Pinouts
Digital telephone systems are becoming more common and they sometimes require proprietary voice adapters. Sometimes LJU pins
1 and 6 are used for intelligent handsets and consoles, so in order to carry out patching on the blue pair in a 110 patching
installation, a 'Digital' LJU-RJ45 adapter would need to be constructed where LJU pins 1 and 6 were wired to RJ45 pins 5 and 4.
BT's Meridian Switch uses LJU pins 3 and 4, so a 'Digital' adapter needs to be wired such that LJU pins 3 and 4 are wired to
RJ45 pins 5 and 4, thus allowing 110 patching on the blue pair.
There are digital telephone systems that use LJU pins 2 and 5 and in such cases a standard Secondary LJU-RJ45 adapter is fine
for the job.
Domestic and small office telephones are connected to the PSTN CO switch via a pair of wires called
the Local Loop. The signalling used in this situation is called Loop-Start and
Loop-Disconnect Signalling. Loop-Start is the most common form of signalling in the analogue
environment and it provides the following services:
- Public Telephone Service (PTS)
- Manual or Automatic data service
- Message Telecommunications Service (MTS)
- Attendant call service on a manual PBX.
- One-way incoming service to an attendant or Automatic Call Distribution (ACD) service
One wire of the local loop is called the Tip which is connected to ground.
and the other wire is called Ring which is connected to the negative side of the
48v DC Battery. Picking up the telephone handset takes it Off Hook and makes a
connection on these wires thereby allowing current to flow. The switch sends a dial tone to the
receiver of the phone that has gone off hook, thereby informing the caller that the switch is ready
to receive dial digits. The digits are either sent via pulses or via DTMF dial tones.
The bell is always connected to the
switch however a capacitor prevents the DC current flow from the battery in the switch.
When dialling occurs the remote end is notified by the AC ringing voltage applied at between
20 - 47Hz (The traditional operation of the telephone was described earlier).
One problem associated with Loop-Start Signalling, particularly where there are a large number of
calls, is that you can experience a situation where the trunk is seized from both ends at the same
time so that you end up with someone already at the other end. This is called Glare.
This is due to a lack of recognition for the time interval between the seizure of a trunk at one
end and the subsequent making busy the trunk at the other. Originally, a method where the user had
to wait for a long timeout (up to 40 seconds) was used. After the timeout a particular tone would
be heard which encouraged the user to replace the handset and try again.
Ground-Start Signalling (also called 'Earth Start') is a modified form of Loop-Start Signalling
whereby there is current detection at both ends which is used to request and then confirm that the trunk
is available before it is seized. When a local PBX seizes the trunk it grounds one of the wires which
informs the other end. This limits the possibility of glare at least outside of 100ms. Electronic
switches can detect glare by timing the wink start or delay-dial signal, maybe even switching the
call to another trunk.
The ground-start line conductors transmit common battery loop supervision, loop dial pulses/DTMF dial
tones, alerts and the voice signal. The lines can send a 'Start to Dial' signal rather than wait for
a dial tone, they can send a message indicating a new call and they can detect call disconnects and
When in the Idle state, the phone has an open circuit Tip (T) to Ring (R).
The phone also has a 10-20,000 ohm Ground Detector that links it to ground and detects an
Off-hook from the network.
When in Call Initiated state, the phone closes contact S which causes current to flow
on the Ring side. The Network sees this and responds by closing contact N. This results
in the Tip being grounded and the Ground Detector in the phone sees this.
If the Network makes the disconnection by opening N and removing ground from the Tip,
then the current stops flowing. The phone waits 350ms to determine that this is an actual
disconnect as opposed to an Open Switching Interval (OSI). If the phone makes the
disconnection, then it opens the loop so that the line appears busy until the network
removes ground from the tip and the line can return to Idle. An OSI is where both Ground
and Battery are removed for a maximum period of 350ms in between state changes. There are never
less than 100ms between OSIs.
MF 4 and MF 5 are used on tie trunks between PBXs and use multi-frequency tones on the
same wires as the voice signal.
Signalling System Direct Current (SSDC)
'On/Off' DC current signalling used on the voice pair between switches within a city.
Signalling System Alternating Current (SSAC)
Tone signalling used on the voice pair between switches located in different cities.
Rather than use DC, AC-15 uses Alternating Current (hence AC) for signalling and is mainly used in the UK.
'Idle' is indicated by the frequency tone
2280Hz. Turning this frequency on and off can be used for the signalling or DTMF can be used. AC-15 can run
to almost anywhere, DC versions such as DC-5 and DC-10 have a limit of 10km. There are different versions of AC-15, A, B, C and D.
Because AC-15 uses tones and AC voltage, if you are to communicate with a switch that uses E&M, FXS or FXO, you will require
a converter box.
Wink Start Supervision Signalling
Wink Start is used for E&M trunk seizure and goes through the following steps:
- The trunk ends signal On-hook to both ends when idle.
- The caller goes Off-hook
- The calling switch activates the M-lead.
- The called memory sets up memory ready for the dialled digits but still sends an idle On-hook
signal to the calling office.
- When the caller is attached at the called switch, the called switch sends a Wink Off-hook Connect
signal (voltage set to -48v for anything between 140-350ms) on the E-lead. The typical duration
of the Wink will depend on the manufacturer's switch e.g. I/IA-ESS (150ms), 3 ESS (140ms),
SESS (250ms), EWSD (180MS), DMS-10 (200MS), DMS-100F (10-250ms). Because distortion of this
Off-hook Wink can happen, the other office switch needs to recognise Off-hook winks of this duration.
Anything beyond 350ms can assumed to be Glare or an error condition which may redirect the call, or
signal for maintenance depending on the switch type.
- The calling switch receives the Wink acknowledgement on its E-lead.
This Start-Dial (On-hook to Off-hook) occurs a minimum of 210ms after the reception of the
connect signal for electro-mechanical switches and the I/IA-ESS switch. This allows these switches
to see at least 100ms of Off-Hook Wink after the signal has traversed the network and been recognised.
- The calling switch sends the DTMF digits on the voice pair
- The called device answers and the called switch activates the M-lead and keeps it at
-48v for the length of the call.
Immediate Start Supervision Signalling
The 'wink' in Wink Start may be too short to detect for some PBXs and these circumstances Immediate
Start can be used instead. The sequence of events are as follows:
- The calling switch activates the M-lead to seize the line.
- The calling switch waits for at least 150ms and then sends the dialled digits on the voice
pair irrespective of whether an acknowledgement 'wink' is sent or not.
- The called switch activates the M-lead when the calling device answers.
- The called switch then acknowledges the calling switch.
Delay Start Supervision Signalling
Delay Start is used when the switch equipment is mechanically based and therefore very slow
to respond. The sequence of events is as follows:
- On a call being made that requires the trunk, the calling switch activates the M-lead.
- The called switch activates its M-lead as an acknowledgement.
- The called switch makes the appropriate changes to its mechanical systems ready for dialling.
- The called switch deactivates its M-lead as a signal to the calling switch that it is
ready to receive digits.
- The calling switch then sends its DTMF digits on the voice pair.
- The called device answers the call.
- The called switch activates its M-lead as acknowledgement that the call has been answered.
Direct Inward Dial (DID)
DID is also known as Direct Dial In (DDI) and allows an external device to dial directly
to a PBX extension without the need for an attendant. It also allows extra lines to be added with
minimal cost. DID requires address signalling (dial pulsing or DTMF)
to be carried through to the extension phone using Wink Start, Delay Start or Immediate Start.
In addition, Loop Reverse Battery Supervision is used.
DID trunks only allow inbound calls, they also gain their battery from the local switch rather than the
CO switch. The extension numbers that require DID are configured in the CO switch which then directs
calls to these numbers on to the DID trunk rather than the normal trunk.
If the DID trunk lines are all busy then the caller will receive a busy tone even if the normal trunks
are fine. The DID calls cannot be intercepted by the attendant.
Virtual Direct Inward Dial (VDID) allows incoming calls to be handled by an Auto-Attendant and
route calls to extensions based on the calling number (even if CLID is blocked).
Quality Of Voice And Echo On Analogue Circuits
Quality is affected by a number of factors. The level of power at which voice is sent and received is important. The following power
levels are good guidelines:
- Analogue voice routers should have the receive power set to around -3dB.
- Europe and North America telephones transmit at a power level of -9dB.
- Asia and South America telephones transmit at a power level of -14dB.
The power level needs
to be strong enough so as to ensure that the signal is audible at the remote end, but not too strong so that echo results. The voice
provider can adjust power levels to analogue devices. If the signal reaches the switch and there is too much input gain applied,
then the signal can be clipped (i.e. the power level is above PCM codes)
and distorted. The same is true if the output gain at the remote end is too low or the input gain locally
is too low, in this situation even DTMF tones can be missed.
Another factor that affects quality is echo.
If the delay between the original sound and the echo is greater than 30ms then this can start to
become a problem for most people. The loudness of an echo is also very important.
Two wires are used for all signalling and the voice in the local loop
(voice receive and transmit occur on the same pair), however this converts
to 4 wires for the voice signal and other wires for the signalling between switches. When the
voice is converted from two wires to four wires then there is a chance of
Electrical Echo (reflection) being created due to an impedance mismatch. Normally on long
cable runs, echo is attenuated, however when data networks sit between two analogue ends the analogue
runs are much shorter which gives less chance for echo to be attenuated.
You can also experience Acoustic Echo when using speakerphones and headsets. This is because the
loudspeaker sounds are picked up by the microphone and sent back to the caller. Causes of echo
are listed below:
- Cable length - 1ms delay per 200km
- Satellites - 250ms delay per hop
- Voice encoding techniques - 0.75 to 1.6ms delay
- Compression - 0.5 to 100ms delay
- Acoustic Coupling (hands free phones)
- Incorrect impedance of equipment
Echo Suppression can be implemented by supressing voice on the return path to prevent the feedback
and resulting in half-duplex voice communication where the louder conversant wins.
This causes a problem with Modem handshaking, so a tone of frequency 2025Hz is sent by the answering
modem in order to turn off the voice suppression.
A more sophisticated method of dealing with Echo is Echo Cancellation which works on the
receiving end by synthesising a replica of the echo (creating its own codebook)
and subtracts this from the actual echo. This technique allows full-duplex operation to continue.
You may notice initial echo occurring at the beginning of a conversation but then it dies away.
If echo is a problem on both ends then echo cancellation needs to be operating
on both ends.
Refer to Digital Signalling
for detail on DS0, T1 and E1 signalling.
The summary of options for digital voice ports are as follows:
- T1 - Superframe (SF) or Extended Superframe (ESF) with line encoding AMI or B8ZS.
- E1 - CRC4 or no-CRC4 with line encoding AMI or HDB3.
- Basic Rate ISDN (BRI) - this interface can also be used for digital PBX connectivity giving
2 voice channels and a 16kbps D-channel for the Q.931 signalling.
CO switches contain D-Channel Banks which convert from analogue voice and signalling
to digital voice and signalling. A D1 Channel Bank outputs DS-1 (T1) or E1 with
the digital voice channels and signalling multiplexed. Newer channel banks have appeared giving
higher densities. The D2 Channel Bank supports 96 channels for every 72 channels
that the D1 supports. The D3 and D4 support 144 channels. Most recently
the Digital Carrier Trunk has been produced which is more manageable being smaller.
PBXs use different digital signalling systems depending on manufacturer. Signalling systems
based on CAS or CCS rely on standards and allow interoperability between voice switches.
These include ISDN BRI and ISDN PRI interfaces using Q.931 or
Q.SIG. Switch protocols that transport PBX features can be translated
when these protocols run across standard signalling systems. A point-to-multipoint topology
will require translation.
Single Frequency (SF) is a method used to convert E&M supervisory tones or dial pulses to a single
voice frequency tone. If the trunk is idle then the SF tone is present. If the trunk is seized then the
SF tone represents the dial pulses in bursts of tone.
It is not uncommon for non-standard signalling systems to be used as manufacturers aim to gain the
edge on available features. Examples include the following:
- Distributed Communications System (DCS) - not based on ISDN but T1/E1. Uses two
signalling channels. Uses HDLC framed data signalling to allow transparency of features.
- DCS+ - based on ISDN PRI with one signalling channel. Also uses HDLC framed data
- Expansion Port Network (EPN) - a circuit emulation protocol used to connect PBXs
in separate buildings creating a single logical PBX. You can use CES for this connection.
- Call Management System (CMS) - uses another signalling channel called BX.25
to perform call centre reporting at a central location for multiple sites.
- Non-Facility Associated Signalling (NFAS) - configuration which is non-standard where
one D-channel can provide signalling for up to 300 B-channels depending on the implementation.
These signalling protocols need to be Transported transparently across from router to router
because the routers will not understand the protocols.
- Meridian Customer Defined Network (MCDN) - based on ISDN PRI with Q.931, however
there are extensions. Used to connect Nortel PBXs and DMS CO switches and can support multiple
B channels (nB+D) similar to NFAS.
- ISDN Signalling Link (ISL) - a bit like MCDN but allows the D-channel to be any sort
of serial connection on any channel number and allows analogue B channels.
- Virtual Network Service (VNS) - this a version of ISL where the B channels become
- DMS-100 - PRI that provides MCDN to a CO switch
- DMS-250 - PRI that provides MCDN to an IXC CO switch
- SL-100 - PRI that provides MCDN to a PBX.
- Digital Private Network Signalling System (DPNSS)
- Digital Access Signalling System (DASS#2) - uses slot 16 on an E1.
If the proprietary signalling uses one CCS signalling D channel (e.g. DCS+ and DPNSS) then you can forward the
frames transparently over HDLC, Frame Relay or ATM. If the proprietary signalling uses more than
one CCS channel (e.g. DCS) then you need to use a TDM cross-connect method over HDLC or Frame Relay.
This is where the D channels are put into a TDM group and are not restricted to channel 16, or channel
23. You can also put the D channel through ATM CES or even Serial Tunneling.
Voice Port Connections
Voice ports are used for various types of connection:
- Local Calls - these calls do not use the network, PSTN or Data.
- Private Line Automatic Ringdown (PLAR) - a hot line where going off hook automatically connects two phones.
- On-Net - calls are routed on a data network that belongs to the organisation.
- Off-Net - calls are routed on to an external PSTN network.
- PBX-to-PBX - calls are routed across private tie lines between PBXs.
- VoIP Gatekeeper to VoIP Gatekeeper calls - calls can be routed between VoIP gatekeepers in an IP Telephony infrastructure.
Packetised Voice Over Data
Voice networks have normally been separated from Data networks and therefore have incurred greater
liabilities such as the doubling up the Wide Area Links whilst the equipment
and support costs have been high to cater for the separate networks. Packetising voice provides
opportunities to combine some or all of these elements resulting in greater effiency.
Voice packets have mainly used ATM, Frame Relay or IP as the medium over which to travel.
A number of challenges arise when changing from a circuit-switched voice network to a packet-switched
voice network. These can be summarised as:
- Losing packets, which cause clipped sounds.
- Packet delay, the ITU have stated in G.114 that a fixed network delay in one direction should
not exceed 150ms. Network delay can also have a variable element to it due to the speed of the serial
lines resulting in Serialisation Delay.
- Jitter, where you can have periods of congestion as packets fill up interface queues causing the
packet delay to change from packet to packet i.e. variable delay.
- Reliability - data networks have not been able to boast the 99.999% reliability that voice networks
can. A combination of multiple servers, distributed network devices, redundant power and network links
allows the data networks to approach the 'Five nines' reliability expected of a voice network.
These challenges are dealt with in detail in Quality Of Service
Technologies that packetise voice also provide opportunities to expand on the services that are
provided by traditional circuit-switched voice systems. In the IP environment
these new services utilise technologies
such as XML, JAVA and TAPI that aid in the integration of voice and data plus multimedia and video
technologies that enable a more complete communication experience. The devices are not limited
to just voice phones but can include web-based phones, phone software and video phones running
connected via Ethernet and TCP/IP.
Setting up and controlling calls is carried out in a very different way in a packetised voice
environment. Call control can be centralised using Call Agents or distributed using
voice gateways that can handle calls and make routing decisions. IP-based protocols such as
H.323, MGCP, SGCP and SIP are used extensively in the VoIP environment for signalling and
call control. These protocols give rise to devices such as Gatekeepers (that allocate bandwidth),
Gateways (that translate between VoIP and PSTN networks), Multipoint Control Units (MCU) (that
provide a gathering point for video conferencing, and Application servers (for voice-mail
and call attendants).
Digitisation of Voice
Nyquist discovered that when human speech was being digitised it was important to sample the
analogue speech signal at more than twice its highest frequency in order for the reproduced sound to be of
reasonable quality. That is, when the digitised signal is decoded at the receiving end, the original
sound could be reproduced accurately. Take the following simple sine wave:
If we sample precisely at twice the frequency of the wave e.g. on the circles, then there is a danger of completely
missing the peaks and troughs in the sound wave and therefore resulting in a lower quality sample.
If we sample at four times the frequency i.e. on the triangles, then the dotted sample is the result
which more closely resembles the waveform.
Because human speech has a frequency range of 300 - 3400Hz, the CCITT recommendations are
to build circuits to cater for this frequency range. A band-pass
filter is used to isolate this frequency range. Rounding the top frequency to 4000Hz gives a sample
rate of 8000 samples/sec, i.e. one sample every 125us. The sample output is a Pulse Amplitude
There are problems with Signal to Noise Ratio (SQR) of pure 8-bit encoded signals because the volume
(amplitude) is reduced from the original analogue signal so
the PAM signal is then Quantised where an integer code is assigned to each amplitude of each
sample. The integers come from a scale made of 8 divisions called Chords which are more
concentrated near the origin where the low level tones are, in a logarithmic way.
This means that there is less distortion of the lower tones (larger signal to noise ratio) and suits
the logarithmic nature of the human ear. A linear uniform quantisation would result in poorer sound
quality at lower amplitudes. Each chord is split into 16 equally spaced voltage divisions (0 to 7
positive and 0 to 7 negative).
Within G.711 two methods of companding (compacting and expanding) the voice signal have been developed.
These methods apply digital values to analogue signals.
Bell labs developed the U-law method of logarithmic quantisation used in North America
and Japan. U-law (or 'mew-law') tends to have a lower idle noise than A-law.
The ITU modified this in G.711 to A-law which is used throughout the rest of the world.
If one end of a trunk uses U-law and the other end uses A-law then the U-law end must
make the change to A-law. A-law has slightly better signal-to-noise ratio for low
amplitude signals than U-law.
Quantisation Error is the difference between the quantised signal and the original analogue signal.
If each integer code is given an 8 bit binary value then 64kbps would be the required bandwidth for
digitised voice. This is called DS0. This form of digital encoding of voice is called Pulse Coded
Modulation (PCM) and is defined with the ITU G.711 recommendation. With analogue telephones
PCM is carried out centrally at the switch whereas with ISDN telephones PCM is carried out locally.
Compression of Voice
Waveform coders produce a non-linear approximation of the waveform.
We have seen one form of voice compression called Pulse Code Modulation (PCM) which is a Waveform
Compression Algorithm that just looks at the waveform irrespective of the voice patterns.
Another Waveform Algorithm is Adaptive Differential Pulse Code Modulation
(ADPCM). ADPCM takes 8000 samples per second and uses for example, 4 bits for each of the
8000 samples (giving 8000 x 4 = 32kbps bandwidth requirement). This is called the
Quantisation Granularity. Using 4 bits means that there are
24 = 64 different bit values instead of 8 bits in standard PCM giving 256 values.
Each bit value represents a change from the value of the previous
sample, with the assumption that differences are never likely to be more than 4 bits change.
Every so often a full marker value is sent rather than just the differences from the previous sample.
Using 4 bits instead of 8 bits means that ADPCM uses 32 Kbps so gives better use of bandwidth.
The ITU designate this as compression standard G.726r32. Using 3 bits per sample is defined in
G.726r24 and uses 24 Kbps of bandwidth whereas using 2 bits per sample is defined in G.726r16 and uses
16 Kbps. There is also a G.726r40.
The encoding delay is typically less than 1ms which makes ADPCM very attractive, particularly
in environments where there is Tandem Switching. The 'Adaptive' in ADPCM refers to the fact that
the quantisation granularity changes automatically depending on the Signal-to-Quantising Noise
Adaptive Differential Pulse Code Modulation (ADPCM) dynamically reduces how many bits are used
for sampling as the network becomes more congested, 40 Kbps -> 32 Kbps -> 24 Kbps ->
16 Kbps. ADPCM gives very little delay (typically less tham 1ms) even when conversion occurs to PCM
and back to ADPCM.
Linear Predictive Coding (LPC) is a an example of a vocoder. A vocoder synthesises the voice.
This synthesis results in a voice that lacks in emotion and it is therefore difficult to identify
the speaker. Compression can end up with a stream as low as 2.4kbps and the stream is typically
Another form of compression is that provided by Multipulse Maximum Likelihood Quantisation
(MPMLQ). Defined by G.723 this uses an algorithm that looks ahead and
requires a bandwidth of 5.3kbps (G.723r53) or 6.3kbps (G.723r63).
G.723 is used for video and requires up to 30 MIPS processing power.
A hybrid compression form uses Source Compression and takes the voice signals into
account when compressing i.e. they perform voice modelling using fourier analysis.
Hybrid coding comes under the broad spectrum of Analysis by Synthesis
(AbS) coding where analysis is continually performed on the speech and the algorithm attempts
to predict the waveform in the near future (around 5ms). This occurs via a feedback loop and adds
a little 5ms delay to the voice path.
The most common form of this is Code Excited Linear Predictive (CELP) or
Algebraic Code Excited Linear Prediction (ACELP). This
can provide high quality voice reproduction at low bit rates. With CELP
voice signals are compressed as follows:
- The 8-bit PCM signal is converted to a 16-bit linear PCM sample.
- The speech is analysed and compressed with a vector quantiser.
- A Vector Quantiser Codebook is used to learn and predict the voice waveform. The codebook is a
collection of human voice waveforms called Diphones that make up speech. The codebook
has an index typically of 1024 entries (represented by 10 bits). There is also a gain value
made up of 5 bits. This controls the power.
- The coder is initiated (or 'excited') by white noise, the code assigned to each sound is the
index of that sound within the codebook.
- The resultant code, or index, is sent to the far end for decoding back into the voice waveform
using the code as an index and looking the sound up in the same codebook at the other end.
The CELP standard produces voice at 4.8kbps.
One version of CELP is called Low Delay Codebook Excited Linear Predictive (LDCELP)
and is defined by G.728. LDCELP uses a small codebook and operates at 16 Kbps plus there
is no lookup thereby minimising
the delay to between 2 and 5ms, hence 'Low Delay'. A 10-bit codeword is assigned to every block
of 5 speech samples. Four codewords are grouped together into a sub-frame which takes 2.5ms
to encode and two sub-frames are transmitted at a time 5ms per pair.
Another version of CELP is Conjugate Structure Algebraic CELP (CS-ACELP) and is defined in
G.729. CS-ACELP has almost the same perceived level of quality as PCM and is at least as good
as ADPCM at 32kbps. CS-ACELP
operates at 8 Kbps of bandwidth and works by using sound pattern matching against multiple PCM bytes and 80-byte
frames take 10ms to translate. CS-ACELP performs a 5ms look ahead to predict the next wave
pattern plus it also reduces noise and does pitch-synthesis filtering. G.729 is able to model nuances
and accents in human speech but requires about 20 MIPs of processing power.
G.729 has two variants. Annex A (G.729A) is less processor intensive (requires about 11 MIPS)
and allows double the number
of calls as plain G.729. Annex B (G.729B) adds Voice Activity Detection (VAD) and
Comfort Noise Generation (CNG) which work together to reduce bandwidth used.
You can combine Annex B with G.729A to give G.729AB. The G.729 variants can generally interoperate
with each other.
The bandwidths used by these algorithms that we have talked about are just the actual data bandwidths
and do not take into account the packet headers of the protocols being used to carry the data.
For instance, if you are using G.711 over Frame Relay, then you need to take into account the
Frame Relay header (2 bytes), the FRF.11 header (3 bytes), the Flag (1 byte) and the FCS (2 bytes).
The required bandwidth is calculated by codec bandwidth x (payload + overhead)/payload.
For G.711 and a 20 byte payload this gives us 64 x (20 + 8)/20 = 90kbps. If G.729 is used instead
then the same calculation gives us 8 x (20 + 8)/20 = 11.2kbps. If the payload increases to say 100 bytes
for a G.729 call then the calculation gives us 8 x (100 + 8)/100 = 8.64kbps. You can see that the
greater the payload, the less bandwidth is required. The default payload size for G.729 is normally
30 bytes for a voice packet. For G.711 the default payload size is 240 bytes.
ATM cells have greater overhead because of the reduced size of 53 bytes. As well as the 5 byte header
there is also the 8 byte AAL5 trailer, in addition the ATM Forum have adopted the FRF.11 header in the
form of the VoX header. This takes up a further 3 bytes leaving 37 bytes for the voice data. If the
default G.729 payload is used then this leaves 7 bytes wasted which is padded. A calculation using
these figures gives us a required bandwidth of 8 x (30 + 20)/30 = 13.3kbps, this compares with
8 x (30 + 8)/30 = 10.1kbps when G.729 is used in Frame Relay. Using G.711 over ATM is more problematic
because the default payload size of 240kbps does not fit into one cell and so has to be spread over
a number of cells.
This table lists the codecs and their respective speeds and bandwidth requirements for given
||Sample Size (bytes)
||Frame Relay (bps)
||Frame Relay with cRTP (bps)
||Ethernet with cRTP (bps)
Human speech uses a bandwidth of 100Hz to 10000Hz (if you include harmonics)
with most of the speech occurring between 100Hz and 3000Hz. The more bandwidth that is allocated
to cater for human speech the more faithful is the sound to the original, this is called Fidelity.
Human speech quality is also affected by Echo, Delay and Jitter (Delay variation).
Jitter is often a symptom of voice of data networks.
A subjective test used by the ITU for assessing the quality of the sound is the Mean Opinion Score (MOS).
The MOS is a statistical measurement of voice quality based on human opinion of a certain spoken
sentence. In English the sentence used is "Nowadays, a chicken leg is a rare dish".
The Ratings are as follows:
- Unsatisfactory - Very annoying distortion which is objectionable
- Poor - Annoying distortion but not objectionable
- Fair - Perceptible distortion that is slightly annoying
- Good - Slight perceptible level of distortion but not annoying
- Excellent - Imperceptible level of distortion
The following table gives examples of comparative scores regarding the different types of compression:
|ADPCM 32K G.726
|ADPCM 24K g.726
|ADPCM 16K G.726
|G.723.1 MPMLQ (6.3kbps)
|G.723.1 ACELP (5.3kbps)
A score of 4.0 is considered to be toll quality.
These scores are reassessed regularly and change with time. One thing to bear in mind is that delay is
not taken into account with the MOS.
The following table gives examples of comparative MOS scores for G.729 under different conditions:
|Low input level
|Two tandem hops
|Three tandem hops
|5% bit error rate
|5% frame error rate
The ITU has a number of voice quality standards, these are:
- G.111 - Overall Loudness Rating (OLR)
- G.113 - Quantisation Distortion Requirements for the International Calculated
Planning Impairment Factor (ICPIF) which is a Total Impairment Value (Itot)
that is the sum of the following:
- Io - not good enough OLR or high circuit noise
- Iq - PCM quantisation distortion
- Idte - talker echo
- Idd - long one-way transmission times
- Ie - distortion from special equipmentsuch as low bit-rate decoders.
- The following guidelines are recommended:
- 5 - Very good
- 10 - Good
- 20 - Adequate
- 30 - Limiting Case
- 45 - Exceptional limiting case
- 55 - Customers likely to react strongly
- G.114 - end-to-end delay recommendations
- 1-150ms - suitable for voice
- 150-250ms - starts to affect voice quality
- 250-400ms - can be annoying (satellite delay can be as much as 500ms one-way, hence why
VoIP on a satellite link is not feasible in an interactive way)
- >400ms - unacceptable
- G.131 - if the one-way delay exceeds 25ms then echo cancellers must be used.
A more objective voice quality measurement exists called Perceptual Speech Quality Measurement
(PSQM) and was original defined by the ITU in the standard P.861. PSQM uses a rating
scale of 0 to 6.5 and is sometimes mapped to the MOS rating scale of 0
to 5. The test equipment implements PSQM by comparing the transmitted speech to the
original input in real time. Accuracy is rated at more than 90% c.f. MOS. This information
can be linked to SNMP-based management systems.
BT also developed a voice quality measurement algorithm called Perceptual Analysis Measurement
System (PAMS) that is used to predict the effect on voice quality measurement scores when
various waveform codecs, languages etc. are used. The ITU has now combined this PAMS with PSQM
to form an updated standard P.862 that can give a more objective prediction of subjective
Silence Suppression and Comfort Noise
On average only 20-30% of the time in a conversation is actually used for talking, the rest is silence.
Rather than keep transmitting the silence as normally happens it is more bandwidth efficient to stop
transmitting and save bandwidth sometimes up to 35% of the bandwidth can be saved.
This is known as Silence Activity Detection (SAD) or conversely Voice Activity
Detection (VAD). A packet is sent called a Silence Indicator (SID) to notify the other
end that the voice activity power level has dropped below a certain threshold e.g. -50 dbm.
VAD requires a 5ms look-ahead buffer so this adds delay to the voice path. You would look to use
VAD only on WAN circuits.
There is an issue with VAD in that pure silence is off-putting to the users so
techniques are employed to introduce white (or pink) noise locally to simulate this Background
Related to silence is the concept of Sidetone. This just plays the speaker's voice through
the earpiece locally so that the speaker does not think that there is a faulty handset.
The Group 3 Fax or the Modem is designed to run on the analogue network even though it operates digitally
internally. No silence suppression or compression can be applied and even though the Fax typically
just uses 9.6Kbps, a whole 64Kbps channel is used. This is because the analogue signal is continuous and no silence suppression can be
used, nor compression as you cannot lose any of the digital information. Faxes and Modems use a 2100Hz tone to identify themselves to the
switch. The standard analogue Fax protocol is T.30.
Fax speeds include the following:
- Single Frequency Tone
- V.21 - 300bps
- V.27ter - 2400bps and 4800bps
- V.29 - 7200bps and 9600bps - requires 9600bps demodulated or 40kbps ADPCM.
- V.33 - 12000bps and 14400bps
- V.17 - 7200bps and 9600bps
Traditionally, fax machines have differed in their facilities offered and T.30 compatatibility.
In addition, their tolerance of packet delays and receive errors is low because fax machines use
synchronous modems which has no built in flow control. If a calling fax does not receive a response
from a receiving fax within normally 3 seconds, the whole message is transmitted again. Proprietary
local spoofing techniques can ease this issue of delay that can be incurred between fax machines
over great distances.
When running Fax over a VoIP network The ITU standard T.38 has been developed using a DSP in a
T.38 Gateway that detects
the Fax tones and operates Fax Relay. On detection of the analogue Fax signal a normal VoIP call
is setup, this is defined by
T.30 which is T.38 over IP. This DSP then converts the analogue signal coming from the Fax
machine to a digital bit stream.
This bitstream is sent within VoIP packets at the speed 9.6kbps and these packets are tagged as Fax
VoIP packets. This saves
bandwidth as it compares with the 64kbps bandwidth normally taken up by the Fax call as it is
traditionally converted to PCM.
The T.38 gateway at the other end detects the tagged Fax-VoIP packets and the DSP there converts the
bit stream back to an analogue signal
that will be received by the remote Fax machine. T.38 allows you to direct bitstreams to PCs containing
T.38-compliant Fax software
thereby allowing greater flexibility.
If the delay is large on a path, rather then lose fax relay packets it is a good idea to increase
the buffer size to several hundred milliseconds because real time interaction is not important.
It is also possible to use FRF.11 to send Fax Relay over Frame Relay.
Fax Store and Forward
ITU's T.37 standard allows you to send faxes by converting them into E-mail attachments. These attachments are TIFF files of the
faxes themselves. An On-Ramp gateway performs the conversion to E-mail and attachment. This E-mail is stored and routed
by SMTP servers throughout the IP network to end up at an Off-Ramp gateway where conversion back to Group 3 fax is
performed. Mechanisms within Extended Simple Mail Transfer Protocol (ESMTP) provide extra features such as delivery confirmation
If T.30 fax data is NOT compressed or demodulated i.e. just a G.711 PCM 64kbps channel that is transported without any Voice Activity
Detection (VAD), then two faxes can talk to each other directly over the VoIP network.
Modem Passthrough operates in the same way as Fax Passthrough where just a G.711 PCM 64kbps channel is transported without any
Voice Activity Detection (VAD), then two modems can talk to each other directly over the VoIP network.
The modem analogue signals are converted into digital format by a gateway, and these signals are transported using
Simple Packet Relay Transport (SPRT) which uses UDP across the IP network to a remote gateway. The remote gateway
converts the signals back to analogue and forwards the signal on to the remote modem.
When designing a voice network it is necessary to size trunks and equipment ports to suit. In order
to do this you need to gather information and statistics from the PSTN carriers, the Call Detail Records
(CDR) in the PBX and telephone bills. The PSTN can give statistics on the number of calls offered,
the number of abandoned calls and when all the trunks are busy, these are called Peg counts.
The PSTN can also give the Grade of Service (GoS) rating for a trunk group. The GoS is a
measure of the probability that a call is blocked, for instance one call out of 100 being blocked
is given by P(.01) and one call blocked out of 1000 is given by P(.001). This
probability applies to the busiest period of the day.
The PSTN can also provide the
total amount of traffic carried per trunk. The number of trunks needed for the voice traffic
in a particular location is based on peak daily traffic. A carrier will provide the number of calls
carried but will not give the number of calls offered i.e. attempted. Only the local PBX can
tell you how many calls were offered and therefore how many calls failed.
If the voice traffic is to run over a data network, you also have to take into account the statistics
provided by SNMP management stations, network analysers and router interface statistics. You need
to ensure that data delay and throughput is not impaired as well as the GoS for the voice traffic.
If the data peak demand occurs at similar times throughout the day to the voice peak demands, then
this has to be taken into account when designing the voice network.
The offered traffic load (A) is made up of the product of the number of originated calls
in an hour (C) and the average holding time for a call (T) i.e. A = C x T.
The average holding time is not just the average time that a call takes but includes the call
set up and tear down as well as incomplete calls. This is normally calculated by taking the
average call length and adding up to 16% to it. Quite often billing records round up the duration of
a call to the next minute rather than the nearest minute. This means that they are overstated
by an average of 30 seconds each call. For traffic calculations, if you are using the billing
records you need to factor in a reduction by multiplying the number of calls by 0.5 minute to
obtain the number of excess minutes.
The concept of the Busy Hour is used to represent the number of call attempts during the
busiest hour that the organisation experiences on its telephone network. If you have access to
the CDR records then to work out the busiest hour, take the 10 busiest days in a year, sum the traffic on
an hourly basis, find the busiest hour and work out the average amount of time a call takes
If you do not have access to a year's worth of traffic information then you could take a month's
worth (about 22 working days) determine an average day's worth and multiply that by 17%. The reason for doing
this is that busy hour traffic represents about 17% of all the traffic that occurs in one day.
The next thing to calculate is the amount of traffic a trunk can handle in an hour, normally we calculate
this for the Busy Hour. This traffic volume measurement is measured in Erlangs
, a measurement
which is dimensionless. See https://www.erlang.com/what-is-an-erlang/
For example, if each user in an organisation of 100, makes 12 calls in the
busy hour with an average duration of 6 minutes per call, then the offered traffic load (A) is given
by C x T which is 12 x 100 x 6 giving 7200 minutes. Because an Erlang is based on an hour, this
then gives us a value for the Busy Hour of 7200 / 60 = 120 erlangs.
An Erlang is sometimes equated to 60 call minutes (3600 call seconds or 36 centum call seconds, CCS).
Erlang models are:
- Erlang B - if overflow paths exist when trunks are busy, DID trunks are required to allow
PSTN rerouting (there are more people than calls). This is used most of the time.
- Erlang B Extended - when no alternative path exists the caller hears a busy signal.
- Erlang C - used in call centres where there are more calls than people and calls are
placed on hold if no bandwidth is available.
When traffic engineering your aim is to maintain or exceed the GoS. To do this you need to work out
how many trunks you will need now that you know the erlangs in a busy hour. This requires a look
at three areas:
- Possible sources of calls - The more possible sources of calls exist, the wider the
distribution in arrival times and call duration times.
- Arrival characteristics - calls that come from independent sources are close to random
in their arrival characteristics, the more there are; the more closely they follow the
Poisson Distribution (Bell curve), where the peak probability of a certain number of calls being
made in the busy hour is represented by the peak of the curve. Smooth traffic patterns are not
random and are due to reliance on other applications (call centres, tele-marketing etc.). The
Poisson distribution is therefore not suitable. Similarly, Bursty traffic is not random either.
- Handling Lost Calls - these can be dealt with in three different ways:
- Lost Calls Cleared (LCC) - if the system is busy the call is cleared, this
underestimates the number of trunks required.
- Lost Call Held (LCH) - even if the call fails to connect, the assumption is made
that the call is active and the call is redialled continuously during the average call hold time.
This over estimates the number of trunks required.
- Lost Calls Delayed (LCD) - the call is placed in a queue until the system can deal with
The complexity of traffic engineering necessitates the use of erlang tables or calculators,
to work out the number of trunks
required given that you know the volume of traffic in erlangs and you know the target GoS. The
most common table used is Erlang B which uses the Poisson Distribution, based on infinite
resources and uses LCC for lost call assumptions. When you have multiple sites and multiple
trunks between those sites, it is often necessary to create a Call Density Matrix that
has branch-to-branch and branch-to-HQ entries for the busy hour call minutes. You can use
this matrix to work out the erlangs on a site-to-site basis.
When calculating trunk sizes for a VoIP network you need to find out how much data bandwidth each
call will take. This will depend on the codec and sample size being used. The earlier table
gives an idea of bandwidth used on a per call basis. Multiplying the appropriate bandwidth by the
number of calls allows you to work out the trunk size.
An Erlang is continuous use of one trunk, designed around the busy hour. If we JUST look at this
however, the most of the time the system is over specified, therefore aim to have a percentage
of the calls blocked.
Use Erlang and data rate conversion tables for VoIP, VoFR and VoATM to calculate bandwidth. Other
factors affecting bandwidth usage include Voice Activity Detection (VAD), Music on Hold (MOH) and
the RTCP stream. It is therefore a good idea to add a little extra when sizing bandwidth requirements.
Voice Over Frame Relay (VoFR)
VoFR allows you to run voice and data over the same WAN infrastructure which has management and cost
benefits, plus the frame header overhead is low. It can be used to replace a tie line
with a PVC (maintaining PBX features) or to provide an Off Premises Extension (OPX) to a PBX
via a router.
In order for voice to run over Frame Relay, fragmentation of the data frames needs to occur to allow
steady voice traffic. This fragmentation can be a proprietary format, end-to-end FRF.12 or FRF.11
annex C. For QoS on slow links all DLCIs on an interface must be fragmented.
FRF.12 is useful when PVCs are sharing the same physical link or when VoIP is being used over the Frame
Relay. The fragmentation header is omitted on frames less than the fragment size so just the largest
frames (those larger than the fragmentation threshold) are fragmented. FRF.12 has no knowledge of what
is in the frame whether data or VoIP, so both get fragmented.
In FRF.11 annex C, VoFR frames are all fragmented and all packets no matter the size
contain the fragmentation header. FRF.11 is therefore used just for Voice over Frame Relay fragmentation
over one DLCI.
If you want to centrally control billing and administration then you can set up a hub-spoke Frame
Relay WAN where the central HQ is the hub and tandem switching occurs for calls between spoke sites:
When using the WAN links we need to convert to a more efficient codec, in this example we have
used G.729. This gains us the benefit of bandwidth savings. There is a problem however.
Take the example where a call is made from site B to site C:
- The call initiates as a G.711 call via the PBX.
- At the router the conversion to G.729 is made and the call is routed
over to the HQ for central billing.
- The router at the HQ converts the call back to G.711 so that
the PBX can manage and route the call.
- The PBX realises that the call is destined for site C
so pushes it out to the HQ router where it is again converted to G.729 encoding.
- On arrival at the site C router the call is converted back to G.711 where it is finally sent
to the recipient off the PBX there.
This is called Tromboning where several compressions and decompressions occur within one call.
This then adds delay and deteriorates the quality of the call. One way around this is for the PBX
to be able to understand the codec, or another way is for the router at the HQ to reroute the call
to site C without troubling the PBX at the HQ.
If the routers have the ability to operate dial plans, then routing of calls based on
the dialled number could be carried
out at the router. Tandem switching could therefore be eliminated altogether since the Frame Relay
cloud ends up acting as a large virtual voice switch.
To calculate the voice payload size we use the formula Payload (bytes) = sample size (ms)
x data rate(kbps)/8.
If you were running G.711 over Frame Relay then a 20ms voice sample would have a payload size
of 20ms x 64kbps/8 = 160bytes.
The actual Frame Relay frame size is therefore 167 bytes because we need to include the 7 byte
Frame Relay header (including FRF.11, sequence number and CRC). Remember that we were looking
at a 20ms sound sample, so for one second of speech there will be 167 x 1000/20 x 8 = 66.8 kilobits
per second bandwidth being used.
Performing the same calculation for G.729 at 8kbps with the same 20ms sample size gives a bandwidth
usage of 10.8kbps.
Voice Over ATM (VoATM)
AAL5 is frequently used for data due to the fact that all 48 bytes are available for the
payload. If we took a typical 20ms sample of voice and encoded it in G.729, then we would end up with
a payload of 20 bytes. Because of the fixed cell size of ATM the remaining 28 bytes of the payload
would be padded out. This would mean that for every 20ms sample there would be 20 bytes of data
and 28 bytes of overhead. Given that the cell header is 5 bytes resulting in a 53 byte cell
for each 20ms voice sample this produces a bandwidth requirement of 53 x 8 x 1000/20 = 21.2kbps
for each call. This could be considered inefficient because of the 28 bytes of padding. Provided
that the delay budget allows it, you could increase the sample size to say 30ms or more to reduce
the wasted bandwidth from the padding bytes. Even so, Frame Relay is more efficient. For good quality
voice it is good to stick to 20ms samples (50 packets per second).
You can use Circuit Emulation Services (CES) (which uses AAL1) to replace a leased line
between PBXs. The TDM format is converted to ATM cells and the PCM stream is placed into the
cells without being compressed so no DSPs are used and the delay is very low. There is no
internal echo cancellation so this may have to be added externally. With multiple sites
attaching to an HQ you would need to run the hub site PBX as a tandem switch because there
is no opportunity for routers to re-route calls based on dialled digits.
Unstructured CES takes the unmodified clear channel T1/E1 data stream across emulating the whole
E1/T1 interface. A voice channel fills the whole payload of the cell. This is good
for equipment that uses proprietary framing. Structured CES
maintains the channelised/fractional T1/E1 DS-0 information and allows you to have multiple
voice channels within the payload. TDM devices can then be removed.
For low speed links (<768kbps) you really need to fragment and interleave the larger packets in order to
prevent delays to the voice traffic on the interface. AAL5 does not support LFI so we either
have to have separate PVCs for voice (and have it contracted at VBR-rt)
or we employ MLP over ATM which provides LFI
for low speed links. Many routers only support one instance of SAR at a time so having multiple
PVCs is not going to help here. Routers such as Cisco's IGX or Nortel's Passport have ATM
backplanes that deal with this issue nicely.
When deploying MLP over ATM, ideally you want the fragments to fit into an exact number of cells to
ensure the greatest use of the payload when using AAL5. When making the fragment size calculations
it is worth bearing in mind that the AAL5 overhead is 8 bytes whilst the MLP over ATM overhead is
If a WAN network is implementing internetworking between Frame Relay and ATM using FRF.5 then there
are likely to be quite large delays emanating from the internetworking switches. This makes it
unsuitable for voice traffic.
Voice Over IP (VoIP)
VoIP is fast becoming the data method for voice packet transport. IP is more flexible than either
ATM or Frame Relay, not only because of the quicker re-routing and resilient capabilities, but also
because of the extra features that can be bolted on to the IP environment to exponentially increase
the number of applications that the VoIP environment can utilise. VoIP has some quality issues
that are different from traditional voice, these include Jitter, packet loss and queuing problems
when small voice packets compete with large data packets. Thes issues are dealt with in detail
in Quality of Service
Real-Time Transport Protocol (RTP)
In a Voice over IP environment Real-Time Transport Protocol is an Internet standard
used to transport real-time voice data. TCP is used for the H.323 signalling protocols, and UDP
for SIP and MGCP. RTP uses UDP for transport because if packets get lost, there is no point in re-sending the data.
This diagram illustrates the RTP header:
- Version - currently at version 2
- Padding - indicates if padding bits have been added to the end
- Extension - indicates that a header extension has been included
- Contributing Source Count - this is the number of Contributing Source identifiers
in the Contributing Source field.
- Payload Type - the codec being used
- Sequence - the first one is randomly generated and this number indicates if a packet
has been lost
- Time Stamp - the time stamp of the first octet of data, the first one being randomly
- Synchronisation Source - this is a random number that is used to identify a particular
data stream when multiplexing data streams.
- Contributing Source - if multiple streams are multiplexed, then the source stream numbers
are listed here. There could be no streams at all or up to 15, therefore you could have up to
15 x 32-bit numbers here.
The RTP header is 12 bytes in length (not including the Contributing Source stream list which
could add another 60 bytes made up of 4 bytes x 15)
and follows the 8-byte UDP header and the 20-byte IP header.
If you are running VoIP through a VPN then you have a VPN header to consider which can be from 20 to 60
bytes, plus an additional IP header of 20 bytes.
RTP has the ability to identify the payload and timestamp the packets, plus it sequences the packets
and monitors the packet delivery, re-ordering them if necessary. Normally RTP uses the even UDP ports
16384 up to 32767.
When using RTP, a technique called Compressed RTP (cRTP) can be utilised whereby the IP header
(20 bytes), UDP header (8 bytes) and the RTP header (12 bytes) can be compressed from the usual 40
bytes down to normally 2 bytes, or 4 bytes if the UDP checksum is used. This is suitable for
slow point-to-point links (< 2Mbps), preferably using hardware for the compression.
The protocol Real-Time Transport Control Protocol (RTCP)
) is used to transport control
information and services about current RTP sessions i.e. it monitors the bearer. It also carries a canonical name
which is an identifier of the source of the RTP stream. This is used by the transport layer
at the receiving end in order to synchronise audio with video.
The RTCP information includes jitter, delay and packet loss as well
as packet counts. In addition, RTCP includes time information such as the NTP time as known by
the sender. There must be at least one RTCP packet every 5 seconds and as RTP traffic increases
so RTCP increases as a set percentage of the RTP traffic (5%). RTCP also uses UDP
ports between 16384 and 32767, and the port number
used for a session is the odd-numbered port next to the even-numbered RTP port used for the
RTP session. A one-way telephone conversation (e.g. Music on Hold) uses one RTP stream and one
RTCP session, therefore a two-way telephone conversation uses two RTP UDP ports and two RTCP UDP ports.
Signalling and Call Control
VoFR and VoATM are fine for simple point-to-point topologies but for Voice over data to be a serious
contender to traditional voice systems there needs to be a scalable way of building these topologies
and communicating within them and this is where VoIP comes in.
One required element is a Gateway that connects and translates between a traditional analogue
telephony system and an IP-based telephony system. Such a Gateway should be able to connect via
analogue ports such as FXO, FXS and E&M as well as digital voice ports such as E1, T1 and ISDN.
In addition, this gateway should be able to translate and interact with an IP-based telephony system
via Ethernet/IP connections as well as having the ability to make call routing and call management
decisions using whatever IP-based call control mechanism is being used. The gateway also is required
if you are using more than one IP-based call control system, as you need to translate between them.
The call control system is a vital
element to the VoIP environment and controls how calls are managed within the IP network. The control
signalling is handled separately from the actual voice streams. Umbrella call control systems include
H.323, SIP, MGCP, SCCP (Cisco's version called 'Skinny') and Megaco (H.248). The call control mechanism will not only
set up the RTP/RTCP sessions but also negotiate parameters such as codec, media type, bit rate and
other features about the call. There is a need to monitor the resources used by each call and to maintain
a database of the call records. This provides the ability then to control who is allowed to call
and what resources they are allowed to use.
Call control gives you the ability to route a call based on the dialled
number, this therefore requires a way of registering and resolving addresses (numbers).
Using the call control system in an IP environment you can decide whether to administer the calls from
a centralised point or in a distributed way.
As a Call Control Protocol H.323 has four main components:
- Terminal - an intelligent endpoint which could be a phone, video device, PC software etc.
- Gateway - a endpoint device that converts from the PSTN (non H.323) to the H.323 environment
- Gatekeeper - address translation between zones, admission and bandwidth control
- Multipoint Control Unit - allows point-to-multipoint communications with multiple H.323
The H.323 umbrella set of protocols was originally designed to manage multimedia traffic over LANs
and WANs, and was used originally for video conferencing. H.323 has been extended in version 2 to cater
for the VoIP
environment and is the most widely used call control protocol for VoIP. The call control encoding uses
Abstract Syntax Notation (ASN.1). H.323 versions 3 and 4 have been developed recently and allow
greater flexibility in the choice of transport protocol (UDP or TCP). H.323 has its origins
in ISDN's Q.931 and uses the G-series voice coders as well as the H-series video coders.
H.323 version 2 covers the following areas:
- RAS Signalling Channel - H.225.0 is used in Registration, Admission and
Status (RAS) messages between endpoints (gateways or terminals) and Gatekeepers dealing with registrations, bandwidth
changes etc. RAS basically says 'hi, I'm here with my IP address and phone number'.
RAS uses UDP port 1719 for the RAS messages and UDP port 1718 for unicast gatekeeper
discovery. If there are no gatekeepers, then there are no RAS messages.
- Call Signalling Channel - H.225.0 based on Q.931, H.225 allows endpoints to use
call setup procedures in order to create connections with other endpoints. This uses TCP
- Call Control Channel - H.245 transmits control messages between VoIP components
such as signalling, capabilities, timers, mode requests etc. The capabilities are the IP addresses,
the ports to be used and the codec.
The following diagram illustrates the structure of H.323 in IP:
The H.323 terminal is designed mainly for audio communication and it can interact with other multimedia
- H.310 terminals on Broadband ISDN
- H.320 terminals on ISDN
- H.321 terminals on Broadband ISDN
- H.322 terminals on guaranteed QoS LANs
- H.324 terminals on Switched Circuit Networks (SCN) and wireless networks
The H.323 allows for an optional Gatekeeper that can provide the following
- Address translation
- Admission control
- Bandwidth management and control
- Management of Zones
- Call Control signalling, Management and Authorisation
An example of bandwidth management is when a G.711 call comes in say from device A and
goes to device B, then B needs to transfer this call to device C on a remote
site that uses G.729. The bandwidth requirement changes. H.323v2 can do this, H.323v1 could not.
The Gatekeeper gives scalability to a VoIP design and can rival the traditional telephony topology.
RAS messages are listed below:
- Gatekeeper Messages
- GRQ - GatekeeperRequest sent by an endpoint to the gatekeeper multicast address 188.8.131.52
- GCF - GatekeeperConfirm
- GRJ - GatekeeperReject
- Registration Messages
- RRQ - RegistrationRequest sent by an endpoint to its Gatekeeper
- RCF - RegistrationConfirm
- RRJ - RegistrationReject
- Unregistration Messages
- URQ - UnregistrationRequest sent by an endpoint to unregister
- UCF - UnregistrationConfirm
- URJ - UnregistrationReject
- Bandwidth Change Messages
- BRQ - BandwidthChangeRequest sent by an endpoint
- BCF - BandwidthChangeConfirm
- BRJ - BandwidthChangeReject
- Location Messages
- LRQ - LocationRequest sent by an endpoint or Gatekeeper either to a known Gatekeeper
or to the Gatekeeper multicast address. This is a request to translate an E.164 address/number.
- LCF - LocationConfirm
- LRJ - LocationReject
- Call Admission Messages
- ARQ - AdmissionRequest sent by an endpoint to a gatekeeper including the remote endpoint
and the required bandwidth for the call.
- ACF - AdmissionConfirm
- ARJ - AdmissionReject
- Disengage Messages
- DRQ - DisengageRequest
- DCF - DisengageConfirm
- DRJ - DisengageReject
- Status Messages
- IRQ - InfoRequest
- IRR - InfoRequestResponse
- IACK - InfoRequestAck
- INAK - InfoRequestNack
In a large VoIP telephone network it is impractical to configure dial peers for every single phone so the idea
of a H.323 Gatekeeper has been introduced that holds a database of phone numbers and host names (IP addresses) that
is referenced by VoIP routers (also called Gateways). So Gateways are Voice Capable Routers (VCR) that convert
analogue voice to digital, PSTN to H.323 call control and provides call setup and call clearing.
Gatekeepers translate phone numbers (E.164 addresses) to IP addresses and provides zone management for
The following diagram illustrates the sequence of events the H.323 protocol architecture goes through when
operating with multiple Gatekeepers:
Each gateway has a dial peer configured to point to their own Gatekeeper rather than have lots of
dial peers one for each phone number. This is analogous to the IP default gateway.
Take the worst case scenario where no devices know about each other,
using the numbered arrows, the sequence of events when phone A wants to call Phone B operates as follows:
- Registration Request (RRQ) - (H.225 (RAS) on UDP port 1719) I am GatewayA with IP
address x.x.x.x and my E.164 number.
- Registration Confirm (RCF) back from the Gatekeeper.
- Admission Request (ARQ) - I have an extension number 100
and I want a certain amount of bandwidth to call it, can I register?
- Admission Confirm (ACF) - GatekeeperA registers GatewayA, the number is now on the database.
The Gatekeeper can reject the registration if it wants to.
- Admission Request (ARQ) - Where is phone number 200? What is it's IP address?
- Request (LRQ) - Where is phone number 200? What is it's IP address?
- Confirm (LCF) - 200 is GatewayB.
- Admission Confirm (ACF) - 200 is GatewayB.
- H.225 setup - setup the call to 200 using H.225.0 on TCP port 1720.
- Admission Request (ARQ) - from GatewayB, can I accept the incoming call?
- Admission Confirm (ACF) - GatekeeperB says yes you can.
- Response to call setup - from GatewayB
- H.245 exchange - exchange capabilities and open the logical channel.
- RTP - UDP media exchange i.e. the voice packets, between the endpoints.
- RTCP - UDP RTP Control channel set up between endpoints.
- DisengageRequest (DRQ) - From both Gateways to their respective Gatekeepers
once the Phones have completed the call.
- DisengageConfirm (DCF) - From both Gatekeepers to their respective Gateways.
Databases can be localised to zones rather than have setup traffic all over the Wide area to just one database.
Rereferencing the zones, or routing to these zones is done via the area code e.g. 0207 for London.
A Supergatekeeper (or Directory Gatekeeper) can be configured that only knows the area
codes rather than the individual
phones numbers. This hierarchical arrangement is similar in nature to DNS.
If no Gatekeepers are involved then the gateways need to know of the other gateway via IP address
or DNS name. These gateways set up TCP H.225.0 call signalling channel between themselves rather
than use a Gatekeeper. There is no need for the endpoints to go through the RAS on UDP registration
In H.323 v1 the gateway went through the whole RAS registration process every 30 seconds.
In H.323 v2 the full registration need only occur at the start but within the RRQ message the
endpoint states a TTL. The gatekeeper responds by decrementing the TTL in the RCF message.
Just before it expires the endpoint sends a RRQ with the keepalive bit set to TRUE which
refreshes the registration for that endpoint.
Because there are a number of transactions going on within the H.323 set up, there is the capability
in H.323 v2 of speeding up the call process by utilising Fast Connect Call Setup.
When a Gateway intitiates a Call setup with another Gateway using H.225 on TCP port 1720,
then the control channel using H.245 is combined with this so that capabilities and logical channel
setup are exchanged within the same session. The RTP/RTCP streams are still separate.
Because of the critical nature of the Gateway and Gatekeeper, there are methods in design that provide
resilience. Using a protocol such as HSRP or VRRP, multiple gatekeepers can share the same virtual
MAC and IP addresses. Only one is active, the other standby in case of failure. Flows are momentarily
disrupted on a failure as the failover is not stateful.
A Gateway can be set up with multiple Gatekeepers from which it can pick one to use
in case one has failed, or it can multicast out in order to find a Gatekeeper.
H.323 allows an endpoint to be associated with only one Gatekeeper at a time.
Gatekeepers send each other location requests when trying to find endpoints. If more than one Gatekeeper
is configured for a particular prefix, then any one of these Gatekeepers can respond.
Similarly, multiple Gateways can also be configured with the same prefix.
An additional element is the prepending of the Technology Prefix to the dialled number.
This may be done by the gateway or the gatekeeper. Either way the gatekeeper checks the prefix
and examines its technology prefix table to see which gateway(s) are registered with that prefix.
The prefix identifies the capabilities of the gateway and therefore that which the call requires.
The ITU have defined technology prefix characters, some of which are as follows:
- 1# - Voice Gateways
- 2# - H.320 Gateways (ISDN video conferencing)
- 3# - Voicemail Gateways
Conferences where more than two users communicate, can take a number of forms.
H.323 provides support for Multipoint conferences via the following components:
- Multipoint Controller (MC) - sets up an H.245 Control channel with each endpoint.
- Multipoint Processor (MP) - processes and mixes streams so that multiple streams can be sent to
one or more endpoints.
- Multiple Control Unit (MCU) - a unit that contains the MC and may also have an MP.
The Centralised Conference is where the endpoints have their data, audio and video channels
connected to the MP. This allows each endpoint to operate using different codecs and the MCU can
decode into PCM for commonality. In a Decentralised Conference, the endpoints multicast
the data, audio and video streams to each other rather than be connected to a central MP. This means
that the same codecs must be used. H.323 does allow a Hybrid where one stream (e.g. audio)
may operate in a centralised manner whilst another stream operates in a decentralised manner.
An Ad-hoc Conference is where two endpoints in a call decide to convert their call in to
a point-to-point conference and invite others to join them. They either use an MC that is near by or
Implications On Security
H.323 can use many ports, so a firewall has to understand H.323 and look for call set ups before it
allows through the UDP ports that RTP/RTCP use. In order to do this the firewall has to keep track of the flows
and has to rid the allowable ports from its table when the respective flows have finished.
H.323 supports the concept of an H.323 Proxy Server. You may have the situation where you may
wish to provide network security for the IP telephony endpoints such that remote endpoints are unable to
see the local endpoints. The Proxy server not only can act on behalf of the Gatekeeper, it can also
act on behalf of an endpoint. When a local endpoint wishes to reach a remote endpoint, communication
occurs between the local endpoint and its local Gatekeeper. The local Gatekeeper finds the remote
Gatekeeper who refers the local Gatekeeper to the remote Proxy. The local Gatekeeper tells the lcoal
endpoint that it needs to talk to the local Proxy. Both local and remote Proxies talk and use
their respective gatekeepers as they complete the call between the local and remote endpoints.
Session Initiation Protocol (SIP)
The RFC for SIP RFC 2543
has been superceded by
SIP is used to provide signalling and
control which establishes, maintains and terminates multimedia sessions. SIP uses the concept of
session invitations using protocols such as Session Announcement Protocol (SAP)
) and Session Description Protocol
). The signalling sits on
TCP or UDP.
Addressing is dealt with using HTTP, E.164 and E-mail. Location of services is managed by DNS
(DNS SRV record) and call routing is dealt with by Telephony Routing over IP (TRIP).
Using text-based protocols makes SIP easier to troubleshoot. SIP supports Intelligent Network
(IN) telephony subscriber services such as name mapping, redirection and personal mobility.
SIP sessions are peer-to-peer where the peers are called User Agents (UA). A User Agent
Client (UAC) initiates a request whereas a User Agent Server (UAS) contacts the destination
and responds to the request on behalf of the destination. Telephones and Gateways can be UACs or UASs.
As well as UACs and UASs, there are also SIP servers:
- Proxy Server - forwards SIP requests on behalf of clients or other Proxy Servers. Proxy
Servers can perform call routing, access control and security.
- Redirect Server - tells the UA which server it should communicate with.
- Registrar Server - processes requests from clients that register their location.
- Location Server - provides address resolution to Proxy or Redirect Servers, either using
its own tools or by accessing other tools such as Finger, rwhois or LDAP.
Messages are based on RFC 822
(HTTP) and there are two types, Request
- Request - containing Request Line, Header Line and Message Body.
- Response - containing Status Line, Header Line and Message Body.
There are four headers, General, Entity, Request and Response.
The Request line contains a Method that determines what the receipient (e.g. a server) should do.
There are six methods:
- INVITE - client invites a server to join in a session, includes session parameters.
- ACK - response received by client.
- BYE - client or server initiates the termination of the call.
- CANCEL - client or server cancels any request.
- OPTIONS - client obtains the server capabilities.
- REGISTER - provides information to a server periodically refreshed.
Response messages use codes similar to HTTP and are grouped as follows:
- 1XX - Information
- 2XX - Successful
- 3XX - Redirection
- 4XX - Client error
- 5XX - Server error
- 6XX - Global Failure
A SIP address contains an optional user ID, a host description (domain name or IP address) and optional
parameters (e.g. password). Identification of the address begins with sip: (or sips:
for secure SIP) and could simply take the form sip:firstname.lastname@example.org or maybe
sip:email@example.com. You could have more complex addresses such as
sips:firstname.lastname@example.org;user=phone, indicating the use of secure SIP and E.164 addressing.
Another example is sip:113957216;email@example.com, where a user ID is being used instead
of an E.164 address, plus a password has been assigned.
Endpoints (UAs) register addresses with the Registrar server. An address can be resolved by a variety
of means as described earlier.
Call Setup Using Direct Communication
If the UAC knows the UAS address then they communicate directly as follows:
Call Setup Via A SIP Proxy Server
Call Setup Via A SIP Redirect Server
For resilience, multiple Proxy and/or Redirect servers can be configured on the UAs. Additionally,
each server can be configured with the same DNS name.
Media Gateway Control Protocol (MGCP)
H.323 and SIP operate as peer-to-peer signalling control protocols where endpoints have intelligence.
Simple Gateway Control
Protocol (SGCP) (developed by Telcordia) has a different approach based on stimulus and response
where the endpoints are dumb. Cisco has its own version of SCCP called Simple
Client Control Protocol (SCCP) (also known as Skinny) and is used on its IP phones.
Level 3 also developed Internet Protocol Device Control (IPDC). IPDC and SGCP were designed
to improve on H.323, these were combined with the backing of the IETF into MGCP v1.
Lucent designed Media Device Control Protocol (MDCP) and MEGACO developed the Media
Gateway Controller. The aim is to combine the benefits of these with MGCP v1 to create
an enhanced MGCP.
describes MGCP version 1.0 and its architecture is
defined in RFC 2805
. MGCP is standardised by
the IETF and has a centralised
architecture where a central Call Agent
acts as a Media Gateway Controller
rely on the Call Agent for instruction. E.164 addressing is used and
communication between the Call Agent and Endpoints/Gateways uses SDP on UDP and uses text.
The Call Agent needs to understand the various types of Endpoints that exist along with their
capabilities. These are as follows:
- DS0 - single channel
- Analogue Line - e.g. FXS, FXO
- Announcement Server access point
- Interactive Voice Response (IVR) access point.
- Conference Bridge access point
- Packet Relay - an access point that bridges between incompatible gateways
- Wiretap access point - for recording and playing back communications.
- ATM trunk-side interface - an audio channel in an ATM network.
An endpoint has an identifier which is made up of a local name and the domain name of the gateway.
These are separated by an @, an example is firstname.lastname@example.org.
Here are the seven types of Gateway:
- Trunk Gateway SS7 User Part (ISUP) - ISDN signalling endpoints
- Trunk Gateway Multifrequency (MF) - digital or analogue MF signalling endpoints
- Network Access Server (NAS) - connects to endpoints that use modems for data
- Combined NAS and VoIP Gateway - connects to endpoints that use modems for data and VoIP
- Access Gateway - supports digital and analogue endpoints attached to a PBX.
- Residential Gateway - connects to endpoints that have traditional analogue interfaces
- Announcement server - connects to endpoints that access announcement servers
Call Setup and Connections
The Calls and Connections in MGCP by default use UDP port 2427 and
centre around the Call Agent as illustrated below:
- The Call Agent requests the Gateways to Notify it of themselves and their endpoints. The Gateways
duly comply. The RQNT often contains relevant Event/Signal packages plus a Dial Map so that
the gateway(s) can collect digits before notifying (NTFY) the Call Agent. In addition, the Call Agent might
include events for the gateway to monitor.
- The Call Agent tells the gateway to create the connection (CRCX) plus which RTP ports to use.
The gateways respond, in this case Gateway B informs the Call Agent which session parameters to use
e.g. RTP and RTCP ports. Gateway B's CRCX response is an encapsulated RQNT in SDP.
- The Call Agent now tells Gateway A to modify its session parameters to match those of Gateway B.
The MDCX is encapsulated RQNT in SDP.
- The RTP media stream starts.
- At the end of the call, Endpoint A hangs up and Gateway A notifies the Call Agent. The Call Agent
then instructs gateway A to delete the call (DLCX). Once this has been acknowledged, the Call Agent instructs
Gateway B to delete the call aswell.
For multipoint calls, the Call Agent instructs and expects an endpoint to be able to do this.
Rather than the gateway having to communicate with the Call Agent every time a digit is dialled,
the Dial Plan is loaded on to the gateway in the form of a Digit Map.
Events are monitored by the Gateway and the Call Agent instructs the Gateway what to do when these events
occur. MGCP Events are defines as follows:
- Continuity Detected
- Continuity Tone
- DTMF digits
- Fax Tones
- Flash Hook
- Modem Tones
- Going Off-hook (code = hd)
- Going On-hook
The following Signals are used by the Call Agent to instruct the gateway:
- Answer Tone
- Busy Tone
- Call Waiting Tone
- Confirm Tone
- Continuity Test
- Continuity Tone
- Dial Tone (code = dl)
- Distinctive Ringing
- DTMF Tones
- Intercept Tone
- Network Congestion Tone
- Off-hook Warning Tone
- Pre-emption Tone
- Ring Back Tone
There are nine Control Commands used by the Call Agent and the Gateways. We have briefly
touched on a few of these earlier when looking at the call setup.
- EDCF (EndpointConfiguration) - the Call Agent sends this to find out the coding
characteristics of an endpoint interface.
- RQNT (NotificationRequest) - the Call Agent tells the gateway to look out for an endpoint
event and to take an action.
- NTFY (Notify) - the gateway notifies the Call Agent that an event has occurred.
- CRCX (CreateConnection) - the Call Agent tells the gateway to set up a connection with
- MDCX (ModifyConnection) - the Call Agent tells the gateway to update the session parameters
for a connection.
- DLCX (DeleteConnection) - the Call Agent could send this or the gateway might if it lacks
- AUEP (AuditEndpoint) - the Call Agent sends this to obtain the status of an endpoint.
- AUCX (AuditConnection) - the Call Agent sends this to obtain the status of a connection.
- RSIP (RestartInProgress) - the gateway sends this to tell the Call Agent that the endpoints
are no longer available.
Events and Signals are grouped together in Packages
. A Package contains events and signals
that are relevant to a particular type of endpoint.
defines the following ten packages:
- G - Generic Media
- D - DTMF
- M - MF
- T - Trunk
- L - Line
- H - Handset
- R - RTP
- N - Network Access Server (NAS)
- A - Announcement Server
- S - Script
In addition, RFC 3064
defines CAS packages
and RFC 3149
defines business phone packages.
Gateways often handle different types of endpoints, so different types of gateway are assigned different
- Trunk Gateway SS7 User Part (ISUP) - G, D, T, R
- Trunk Gateway Multifrequency (MF) - G, M, D, T, R
- Network Access Server (NAS) - G, M, T, N
- Combined NAS and VoIP Gateway - G, M, D, T, N, R
- Access Gateway (VoIP) - G, D, M, R
- Access Gateway (VoIP & NAS) - G, D, M, N, R
- Residential Gateway - G, D, L, R
- Announcement server - A, R
- Computer Telephony Integration (CTI) - interaction with customer database information based
on the dialling number, softphones, IVR, call centre management systems etc.
- Unified Messaging - linking with Microsoft Exchange allows interaction with E-mail,
VoIP calls, FAX and messaging applications such as MSN. Individuals can use any of these media to make
- IP Centrex - central control of IP telephony functions rather than have an organisation
manage its own system
- Hospitality - provision of long distance LAN and voice access from a hotel environment
- Pre and Post paid Calling Card
- Hoot and Holler - always on multi-user conferences
- Collaborative Computing - using distribute servers applications such as whiteboard software,
video streaming, FTP, IP phones etc. can be used to provide a work environment for a group.
- Call Centres
- Toll Bypass - bypass the TDM networks provided by the PSTN
- Voice XML - Voice Extensible Markup Language allows you to use voice to control web applications.
In general, Voice mail operates as follows:
- Caller A calls B
- After 3 rings the call is diverted to the voice mail which is attached to the PBX.
- The PBX instructs the voice mail system to play the recorded greeting and Caller A leaves a message.
- The PBX sets the voltage on B's line so that a Message Waiting Indicator (MWI) lights up.
The information passed between the Voice Mail system and the PBX include the calling number, the called
number, message waiting information and the reason for the call not being answered.
As far as the PBX connection to the Voice Mail system is concerned there are a variety of ways in which
signalling is passed:
- Bellcore's standard Simplified Messaging Desk Interface (SMDI) which uses an out of band
- Proprietary In-band signalling using DTMF tones.
- Voice mail line cards that emulate the PBX.
Linking Voice Mail systems together across the PSTN is generally carried out using the standard but
inefficient Audio Messaging Interchange Specification (AMIS). There is a proposed standard
called Voice Profile for Internet Mail (VPIM) which uses TCP/IP, SMTP and MIME to link
voice mail systems across an IP network. VPIMv2 necessitated the use of G.726 only, however
VPIMv3 supports G.711, G.726 and G.723.1 as well as Microsoft-Global System Mobile (MS-GSM).
The Dial Plan
A Dial Plan is a set of rules that governs what becomes of incoming and outgoing calls. Getting the
dial plan correct at the beginning can save not only alot of money but it can also ease the
administration of the voice system, provide good security and improve reliability.
Whether the voice network is a traditional PSTN-based one, a VoIP network or a mixture, the dial plan
structure is essential for efficient call routing and management. As well as each country having
its own national dial plan, individual organisations also need to device their own internal dial plan
that makes sense and uses the most efficient paths for calls.
Cost savings are made by keeping calls on-net as much as possible so that you bypass the tolls imposed
by the PSTN. These same routing configurations can be used to provide resilience so that should there
be a problem with the network, the call can be routed off-net.
The dial plan can determine who is allowed to use expensive CPU and DSP resources and thereby
prevent overloading of resources. The dial plan is analogous to static IP routing other than we are
using E.164 numbers instead of IP addresses.
PBXs implement dial plans using tables which would typically be the following:
- Lead Digit Table - for the first digit e.g. '0' or '9'
- Route List - how calls are routed based on time of day, permissions and available capacity
- Special Number Table
- Local Number Table
- Time Table - gives the ability to route groups of numbers depending on the time of day.
- Class of Service Table - this contains access groups that determine the type of access users have e.g.
- COS 1 - Lobby phone with emergency access only and maybe one internal number
- COS 2 - Admin phone with internal and emergency access only
- COS 3 - Sales phone with local, long distance and emergency access
- COS 4 - Managers phone with local, long distance, emergency and some premium lines access
- COS 5 - Executive phone with no restrictions
- Auto Attendant Table
The dial plan can be configured to limit calls to mobile phones, or limit inbound calls. The
Class of Service groups can be used to limit access to features.
You may use the dial plan to use different providers. In the US 10-10 dialling provides the
ability to select a different provider. To do this the PBX needs to be able to strip and insert
digits accordingly in order to influence the routing.
Basic Enterprise Dial Plan
The dial plan will include the following:
- Internal extensions are mapped to DID numbers so that external callers can go straight to the
individual within the organisation.
- Outgoing calls will have the switchboard number rather than the internal extension number.
- Inter-office calls use 4 digits to give plenty of numbers.
- A number starting with 9 will be broken out on to the PSTN straightaway.
- Emergency numbers such as 999 or 911 are directed straight out on to the PSTN.
- Special extensions are created for other sites so that calls can remain On-net.
North American Dial Plan
The aim with a dial plan is to make it scalable in order to make call routing more simple. This is
often carried out in a hierarchical manner done by summarising number addresses. The ITU
developed the E.164 numbering system to provide some form of international agreement
on numbering plans. The North American numbering plan is based on 10 digits where
3 numbers are used for the area codes and 7 numbers for the phone number. On seeing an area code
such as '123' the CO switch can ignore the following 7 digits and forward the call on to the
relevant area switch straightaway reducing the post dial delay.
The North American Dial Plan adheres to the E.164 international plan and takes the following form:
- Transit Network (Long Distance Carrier) - 3-4 digits
- Country Code - 1-3 digits
- Area Code - 3 digits
- Exchange - 3 digits
- Extension - 4 digits
The US emergency service number is 911 and was introduced in 1968 to unify all the different
emergency service numbers and to ensure consistency. This has been augmented by the introduction
of Enhanced 911 (E911) and is required by law in some places. E911 has the following
- Automatic Number Identification (ANI) so calling number can be identified.
- Automatic Location Identification (ALI) so caller can be found within 1000ft,
due to the information included with the number includes the address, office and nearest emergency
CLI information is always passed through the network. If it is blocked, then it is the remote PBX
that does the blocking.
A specialised switch at the Central Office called a Selective Router (SR) is used to route
emergency calls and links in with the office PBX. The emergency calls
are manned by personnel located at the Public Service Answering Point (PSAP). In a
Multiline Telephone System (MLTS) such as an office, the location of the individual call
is difficult to place because the ALI information just indicates where the PBX is located not
the individual. This is different of course for a domestic caller. To get round this problem
of knowing where an office caller is located, the PBX maintains a ANI-ALI database that contains
the location of extension numbers. The PSAP grabs this information off the PBX in order to
know more precisely where the call originated.