0% found this document useful (0 votes)
44 views20 pages

Automation of Mobile Radio Network Performance and Fault Management

The document is a thesis submitted in partial fulfillment of a Master of Science degree. It examines the automation of performance and fault management in mobile radio networks. Specifically, it aims to analyze the current performance management process and identify problems, then propose automated solutions to enhance productivity. The scope is limited to the European 3G UMTS Terrestrial Radio Access Network.

Uploaded by

bigoharazu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views20 pages

Automation of Mobile Radio Network Performance and Fault Management

The document is a thesis submitted in partial fulfillment of a Master of Science degree. It examines the automation of performance and fault management in mobile radio networks. Specifically, it aims to analyze the current performance management process and identify problems, then propose automated solutions to enhance productivity. The scope is limited to the European 3G UMTS Terrestrial Radio Access Network.

Uploaded by

bigoharazu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 20

Automation of

Mobile Radio Network


Performance and Fault Management
(Matkapuhelinradioverkon suorituskyvyn- ja vianhallinnan automatisointi)

A thesis submitted in partial fulfilment


of the requirements for the degree of
Master of Science

Espoo 28.2.2007

Helsinki University of Technology


Department of Electrical and Communications Engineering

Author: Magnus Wallström magnus.wallstrom@nokia.com


Supervisor: Timo Korhonen timo.korhonen@tkk.fi
Instructor: Mikko Lamberg, MSc (Tech), Nokia Networks mikko.lamberg@nokia.com

1 2007-02-28 / Magnus Wallström


Contents
• Introduction and background
• Literature review:
• Architecture of Mobile Radio Access Networks
• State of the art in management of mobile networks as defined by 3GPP
• Performance management data
• Functionality scenarios in UTRAN
• Methods of the practical study
• Results:
• Results (I/IX): Current PM and FM organisation and process
• Results (II/IX): Example on current PM and FM process (1/3)
• Results (III/IX): Example on current PM and FM process (2/3)
• Results (IV/IX): Example on current PM and FM process (3/3)
• Results (V/IX): Analysis of the current organisation and process
• Results (VI/IX): Problems in current PM and FM process, Interrelationship- and why-why-diagrams of the process
problems
• Results (VII/IX): Summary of the analysis
• Results (VIII/IX): Solution for automated investigation
• Results (IX/IX): Implementation of the solution
• Conclusions of the thesis
• References

2 2007-02-14 / Magnus Wallström


Key concepts
• 3GPP = Project that aims to develop GSM and UMTS specifications in cooperation with
the vendors, operators and standardisation organisations. The acronym 3GPP stands for
Third Generation Partnership Project.
• Fault management = Functions that enable the detection and location of failures in the
network and scheduling of repairs. 3GPP specifies the requirements of the concept.
• Mobile radio access network = a network that provides wireless access to users through
radio interface and allows the mobile users to move between coverage areas without
losing connection, i.e. handover.
• Performance management = Functions that enable the performance measurements of
network services. 3GPP specifies the requirements of the concept.

3 2007-02-14 / Magnus Wallström


How to enhance the
productivity of the UTRAN
Introduction and background performance management
investigations?

• Research area: Mobile Radio Network Performance and Fault Management


• Research questions:
• What is a mobile radio access network and how it is managed?
• What is the current performance management set-up in the organisation under study?
• What is the organisation and communication structure?
• What is the process to tackle performance problems in UTRAN?
• What are the problems of the current set-up?
• What could be solutions to the root-problems found from the current setup?
• Scope:
• Limited to European 3G mobile radio network = UTRAN (UMTS Terrestrial Radio Access Network)
• Other major mobile radio network technologies are: GERAN (GSM radio network), Wimax (WMAN) and
WiFi (WLAN)

4 2007-02-14 / Magnus Wallström


Architecture of Mobile Radio Access Networks
• General architecture Uu / Um / .. Iu / A /Gb / ..

• UE – User Equipment Mobile n e tw ork

• Consists of Mobile Equipment (ME) and Subscriber Identity Module (SIM) for the end- RAN
user to access the mobile network
• RAN – Radio Access Network
UTRAN PSTN
• Currently the most popular mobile RANs are UTRAN and GERAN
UE GERAN CN
• Other radio access technologies are LTE, WiMAX and WiFi
• CN – Core Network IP
X
• All RANs are attached to a CN that provides switching and access to services in PSTN
and any IP network O&M

• OSS – Operations Support System



OSS
All parts of the mobile network may be managed by a centralised system

• UTRAN architecture
Uu Iub Iu (CS/PS)
• Network elements
• RNC – Radio Network Controller Mobile n e tw ork
UTRAN CN
• Node B aka. BTS – Base Transceiver Station
• A – ATM transmission nodes B
• Interfaces RNC CS
• IuCS: RNC to Circuit Switched Core Network (voice and video calls) B Iur
• IuPS: RNC to Packet Switched Core Network (data calls) B
• Iur: RNC to RNC B A RNC PS
• Iub: RNC to BTS UE
• Uu: BTS to UE
• O&M: OSS to any network element: RNC, BTS, ATM-nodes and CN elements (MSC, O&M
HLR, SGSN, GGSN etc.)) OSS

5 2007-02-14 / Magnus Wallström


State of the art in management of mobile networks
as defined by 3GPP
• Network management areas relevant to RAN technical support
• Performance Management (PM)
• Keeps track on the network performance status and analyses the effects of configuration changes in the network [3GPP TS 32.101]
• Bases on measurements that are continuously recorded in the network elements
• Fault Management (FM)
• Consists of fault detection, fault localisation, fault reporting, fault correction and fault repair [3GPP TS 32.111-1]
• Bases mainly on alarms and system logs that the network elements produce
• Software management (SWM)
• Covers software request management, installation, customer feedback and software fault management, i.e. detection of software faults
and finding resolution to the problems. This duty is close to and overlapping with fault management (FM) [3GPP TS 32.101]
• Configuration management (CM)
• Controls the operational parameters of network elements [3GPP TS 32.600]

• Process of applying the network management:


1. Performance monitored
2. Faults localised 3a. CM
Depending on the type of failure:
3a. Configuration changed
or 1. PM 2. FM param or
software?
3b. Software defect(s) corrected
4. Monitor the performance (step 1)
3b. SWM

6 2007-02-14 / Magnus Wallström


Functionality scenarios in UTRAN
• Control plane, i.e. signaling, on RRC connection (Radio Resource Control)
• Major purpose: setup and release a call
• User plane, i.e. the traffic, on RAB connections (Radio Access Bearer):
• Major purpose: define the QoS class of the call:
• Conversational class, RT (Real Time), applications: CS voice and video calls
• Streaming class, RT, applications: CS streaming video
• Interactive class, NRT (Non RT), applications: PS (Packet Switched) web browsing
• Background class, NRT, applications: emails, MMS (Multimedia Messaging Service)
• Signaling scenarios:
• MTC (Mobile Terminated Call) scenario
1. Paging: RNC sends an “RRC Paging Type 1” message to the Uu interface
2. RRC connection setup: The paged UE responses by starting the radio control connection establishment procedure by (1.) sending an “RRC Connection
Request” message to RNC (“RRC Connection Setup Attempt” counter is updated). (2.) RNC tries to allocate radio resources (BTS) and if successful, it
responses with “RRC: Connection Setup” message (“RRC Connection Setup Complete” counter is updated). (3.) Finally the UE responses with “RRC:
Connection Setup Complete” message (“RRC Access Complete” counter is updated).
3. Transaction reasoning: RNC and CN negotiate on the transaction type
4. Authentication and Security procedure: UMTS subscriber and network authenticate each other, and other security mechanism are activated
5. RAB setup for transaction: Actual communication resources for the transaction are allocated.
6. Transaction: UE has an active user plane bearer connection across the whole UMTS network
7. RAB release for transaction clearing: Network resources related to the transaction are released, i.e. all the RAB active connections for an UE are released
8. RRC connection release: Radio control connection between the UE and the UTRAN is released
• Mobility (handover scenario):
1. Measurement: the UE sends a radio-link measurement report to the RNC
2. Decision: the final decision to make a handover is done in RNC by the RRM handover control algorithms. Decision bases on the handover criteria and
algorithm parameters
3. Execution: handover signalling between e.g. UE and RNC, and radio resource allocation e.g. in BTS

7 2007-02-14 / Magnus Wallström


Performance management data
• Performance counters
• UTRAN collects thousands of counters that measure the amount of specific events
• E.g. RRC Setup Attempts, RRC Setup Completes, RRC Setup Attempt Failure RNC, RRC
Setup Attempt Failure BTS etc.
• KPI (Key Performance Indicator)
Calculated most often from performance counters to relative %-values
• Relative% KPIs are comparable between networks of different sizes,
absolute values are not, because the amount of traffic varies
• Form: KPI = (a formula of performance counters)
Examples:
• RRC_Acc% = “RRC access complete ratio” = “RRC Access Completes” / “RRC Setup Attempts”
• CSSR, Call Setup Success Rate (voice call) =
RRC_Acc% * (RAB_voice_attempts-RAB_voice_failures) / RAB_attempts
• CCSR, Call Completion Success Rate (voice call) =
(RAB_active_voice_failures) / (RAB_active_voice_failures + RAB_active_voice_succesful_completes)

8 2007-02-14 / Magnus Wallström


Methods of the practical study
• Based on UCD (User Centered Design) process and framework
• Chronologically the practical study had three phases:
I. Study and define the current process and organisation
a. Study: interview
b. Study: focus group
c. Study: contextual enquiry
II. Analyse the current set-up
a. Analysis: brainstorming
b. Analysis: affinity diagram
c. Analysis: double teams
d. Analysis: interrelationship diagram
e. Analysis: why-why-analysis
III. Develop an enhanced process
a. Solution: brainstorming
b. Solution: SWOT analysis
c. Solution: UML diagrams

9 2007-02-14 / Magnus Wallström


Results (I/IX):
Current PM and FM organisation and process
• Organisation-wise Technical support is the
communicator between the local customer contact Local
teams and product line R&D organisation. Custom er A
tea m A
1. Local teams communicate the performance status of
the customer networks to the technical support. Local Technical
Custom er B R&D
2. Technical support investigates and analyses the tea m B Support
performance degradations and makes decisions to fix
them with the co-operation of R&D.
3. R&D’s responsibility is to develop corrections to the Custom er C
system, if no other solution is effective.

• The process follows the three phases of the root-


cause analysis methodology:
Investig
Inve stigatio
ationn~~PM
PM Analysis~~FM
Analysis FM Decisio
De cisionn~~CC
MM~~SM
SM
• Investigation (maps to PM [3GPP])
1. Ge t top level PM data
• Analysis (maps to FM [3GPP])
2. Find perform ance dips 6. Analyse the logs and
• Decision (maps to SWM and CM [3GPP]) other deta il level da ta
7. Generate solutions
that w as ga thered
• Each phase of the process has deliverables that are 3. Get deta il level PM da ta
during the inve stiga tion
utilised in the later phases. 4. Find problem atic site s

5. Ga ther logs, a la rm s etc.

10 2007-02-14 / Magnus Wallström


Results (II/IX):
Example on current PM and FM process (1/3)
1. Get KPIs and failure counters for the required top 2. Find measurement periods where there is a dip in
object (i.e. RNC) performance:
• Achieved by using a reporting tool that collects the • Call setup performance: at 11 the CSSR KPI has had poor
needed counters from the OSS measurement values. The phenomenon has been partly ongoing during
database and calculates the KPI values based on the the next hour
counters. By manual post-processing the data, the • Retainability: high drop call ratio at 16. Counter diagram
graphical output verifies that the drop in CCSR is due to high number of
RAB active failures.
KPIs: Failure counters:
105.00 9000 9000 500
100.00 8000 8000 450

95.00 7000 7000 400

RRC Connection Setup Attempts


90.00 6000
350
6000
300
85.00 5000 5000
KPI [%]

250
80.00 4000 4000
200
75.00 3000 3000
150
70.00 2000 2000 100
65.00 1000 1000 50
60.00 0
0 0
00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23
23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23
2/ 2/ 2/ 2/ 2/ 2/ 2/ 2/ 2/ 2/ 2/ 2/ 2/ 2/ 2/ 2/ 2/ 2/ 2/ 2/ 2/ 2/ 2/ 2/
6/1 6/1 6/1 6/1 6/1 6/1 6/1 6/1 6/1 6/1 6/1 6/1 6/1 6/1 6/1 6/1 6/1 6/1 6/1 6/1 6/1 6/1 6/1 6/1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
RRC setup attempts CSSR CCSR 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20

RRC setup failures RRC access failures RAB setup failures RAB access failures
11 2007-02-14 / Magnus Wallström RAB active failures RRC active failures RRC setup attempts
Results (III/IX):
Example on current PM and FM process (2/3)
• 3. Get the KPIs and failure counters on BTS level.
• It can be achieved using the same reporting tool than in the first phase. The output is extensive
list of all the BTS under one RNC, all measurement periods and counters per each BTS.
• 4. Find the network elements that are causing the performance dip.
• After post-processing the data, the results are lists of BTS that are the main contributors to the
performance dips

RRC access failures


RRC setup attempts

RAB access failures


RRC access failures
RRC setup attempts

RAB access failures

RRC active failures


RAB active failures
RRC active failures

RRC setup failures


RAB active failures

RAB setup failures


RRC setup failures

RAB setup failures

RNC id
RNC id

BTS id
BTS id

CCSR
CSSR
Cell id
CCSR
CSSR
Cell id
time

time
2006/12/23 11 1 104 1041 15.02 99.01 1345 1132 4 3 4 2 4 2006/12/23 16 1 123 1231 98.46 15.80 1234 8 4 3 4 1023 4
2006/12/23 11 1 104 1042 19.19 96.97 1032 820 4 5 5 6 5 2006/12/23 16 1 123 1232 98.45 24.70 1032 3 4 4 5 765 5
2006/12/23 11 1 104 1043 22.26 97.76 602 452 7 8 1 3 4 2006/12/23 16 1 123 1233 97.51 45.32 602 4 7 3 1 321 4
2006/12/23 11 1 69 691 56.76 99.05 185 69 6 2 3 1 2 2006/12/23 16 1 69 691 96.62 91.40 385 2 6 2 3 32 2
2006/12/23 11 1 69 693 59.60 96.61 99 24 7 8 1 2 2 2006/12/23 16 1 69 693 97.66 91.62 342 1 3 3 1 28 2
2006/12/23 11 1 201 2011 64.29 96.30 84 19 6 3 2 2 3

12 2007-02-14 / Magnus Wallström


Results (IV/IX):
Example on current PM and FM process (3/3)
5. Gather the system logs for those network elements that are main contributors of the RNC performance dip.
• Achieved by connecting to the network element’s O&M unit either by manual command procedures or using a tool that
automates the procedure. The log files are usually in binary format, so they need to be opened by a parser or converted to
textual format before the analysis can take place.
6. Analyse the detailed data.
• The format of the data is vendor specific, i.e. not defined in public specifications
=> no general guidance can be set for the analysis itself.
• Highly dependant on the individual system specialists that can handle the versatile analysis and can produce reliable
results The analysis can be in this context treated as a black box, which has the input of system data, i.e. logs,
parameters, alarms, counters and KPIs, and output of set of root-causes for the occurred performance problem.
• 7. Generate a solution to the root-cause.
• Needs the presence of a skilled system specialist. Depending on the type of solution, finding a working solution might
need trial and error approach.
• Before applying the solution to a live network, it is tested in a test bed of the vendor. Some network operators have also
test beds of their own, on which they verify the solutions, e.g. SW corrections, before they are installed to the live
network.

13 2007-02-14 / Magnus Wallström


Results (V/IX):
Analysis of the current organisation and process
• Main problems:
• Problems in current organisation operation
• 7.2.1 High travel costs
• 7.2.2 Troubleshooting poorly controlled
• Problems in current PM and FM process
• 7.3.2 NE logs not available for performance dips
• 7.3.3 Alarms not mapped to performance dips
• 7.3.4 Configuration data not available for performance dips
• 7.3.5 Internal failures not distinguished from external causes
• 7.3.6 Investigation is time consuming

14 2007-02-14 / Magnus Wallström


Results (VI/IX): Problems in current PM and FM process,
Interrelationship- and why-why-diagrams of the process problems
Interrelationship Why-w hy

In terna l fa ilu res n ot


distin gu ish e d from No refe ren ce poin ts
extern a l

In vestiga tion No con sisten t set o f tools


Ma n u a l w o rk for p erform a n ce m a n a gem en t
is tim e co n su m in g a n d tro u blesh oo tin g a va ila ble

Ala rm s n o t m a pp ed NE logs a re too la rg e to be sa ved con tinu ously for a lon g tim e

Ala rm s n o t colle cted Perform a n ce da ta is n ot a va ila ble to b a se d ecisio n to ga th e r log s

CM da ta not a va ila ble


Too m u ch tra ffic
RNC CPU/MEM
per RNC CPU a n d
NE logs not a va ila ble ca p a city too sm a ll
m e m ory ca pa city fo r
for m o n ito rin g
co n tin u o us m on itorin g

La ck of com pe ten ce

15 2007-02-14 / Magnus Wallström


Results (VII/IX):
Summary of the analysis
• Analysis set two general requirements for the solution:
• Support fault management analysis conducted by system specialists.
The solution should be able to collect relevant fault management (FM) data, i.e. NE logs,
configuration data and alarms, for troubleshooting. The evaluation of the FM data relevance
bases on the performance measurement data, which may be collected either from OSS or
from RNC.
• Support general reporting of performance conducted by operator and vendor
performance management bodies.
The solution should produce scalable reports of the performance measurement data. Reports
should represent the performance data both on whole network and individual network
element level down to the level of a single cell. Other statistical requirements are: timely
aggregation and that the data can be averaged.

16 2007-02-14 / Magnus Wallström


Results (VIII/IX):
Solution for automated investigation
INPUT:
•“Connection to a live network”. The requirement of the developed solution is either a working remote or on-site
connection to the network. This prevents limitations on from which specific parts of the network the data is gathered, i.e.
the OSS, NEs or some other databases in the network.

Investig
Inve stigatio
ationn~~PM
PM
Analysis~~FM
Analysis FM
Syste m log file s for the fa ilure s
“the so
“the solutio
lutionn”:
”: tha t have occurre d in the Decisio
De cisionn
AuAutom
tomaate
te dd live ne tw ork ~~CC
MM~~SMSM
Conne ction to a
ininvestigation
vestigation
live ne twork ofth
of theennetw
etwork
ork
performan
perform ancece
Ove rvie w re port of the
live ne tw ork pe rform an ce

OUTPUT:
•System log files and other detail data for the failures that have occurred in the live network. The root-cause analysis
phase utilises this data to make decisions.
•Overview reporting of the network performance that can be utilised in reporting the status of the network to company
management and to customer, i.e. the network operator.

17 2007-02-14 / Magnus Wallström


Results (IX/IX):
Implementation of the solution
• The distributed system consists of five separate applications:
• RNC monitor
• RNC static performance data fetcher
• OSS data fetcher
• Processor & Report (application)
• Report (server) Sol2:
Sol2:
RNC
RNCmmon
onitor/ da ta
ito r/da ta fe
fetche
tch err
Sol3:
Sol3: Iub Iu (CS/PS)
Log&a
Log&ala larm
rm
fetch e r
fetch e r Mo b ile n e tw o rk
UTRAN CN

B
RNC CS
B
Sol5:
Sol5: Sol4:
Sol4: Iur

Re
Report
po rt Proce
Processor
ssor B
B A RNC PS

O&M

OSS

Sol1:
Sol1:
Oss
Oss da
data
ta fe
fetch
tcheerr
18 2007-02-14 / Magnus Wallström
Conclusions of the thesis
• Summary of the thesis, Thesis studied practical problems of mobile radio network management:
• Conclusion: UTRAN vendor technical support requires a distributed system of troubleshooting tools to enhance its
troubleshooting processes
• Purpose of the troubleshooting tools is to enhance the performance investigation by automating gathering of the performance and
other relevant network behaviour data for the time periods where network suffers from low performance
• The reasoning of the solution bases on
• Current troubleshooting set-up study:
• Organisation: vendor home base technical support that is a link between local teams, which are located by the operated networks, and the vendor
R&D. During special occasions, e.g. a new product release or emergency situation in network, the organisation may adjust itself by transferring
temporarily system specialist to work locally by the operated network.
• Process: The practical performance and fault management process consists of three phases: investigation, analysis and decision.
• The analysis of the current set-up:
• currently the main problem is the inefficiency of the first, i.e. investigation, phase in the performance and fault management process.
• Generalisation of the results
• Same principles are applicable to other radio network (e.g. GERAN) performance and fault management
• Utilization of an OSS in data gathering makes the solution more portable to other radio network systems
• Typically OSS uses relational SQL databases. Different radio networks have different performance indicators. Then the same tools may be used
after modifying SQL-queries, which is a straightforward process
• Future work
• Scope was limited to investigation. Also the complex analysis-phase has demanding development needs.
• Technical support organization requires product-processes to manage the development and maintanance of the
troubleshooting tools.

19 2007-02-14 / Magnus Wallström


References
• Standards and Technical Specifications
• 3GPP: GSM, 3G and LTE
• IEEE: WiFi and WiMAX
• Commercial material
• Nokia Multiradio: http://www.nokia.com/NOKIA_COM_1/Microsites/NokiaWorld/Press/Multiradio_Press_Backgrounder.pdf
• Cisco WiMAX: http://www.cisco.com/en/US/netsol/ns616/networking_solutions_customer_profile0900aecd80334a23.html
• Previous thesis’
• Kujala, Kimmo (2006) Expert System for Mobile Network Troubleshooting. Thesis. Diplomityö, TKK /
Sähkö- ja tietoliikennetekniikan osasto, 2006. 72p.
• An attempt to build automated fault analysis tool system. The result in the thesis was that automated analysis is
still unreliable!
• Utriainen, Juha (2004) UTRAN Operation System Security. Thesis. Diplomityö, TKK / Sähkö- ja
tietoliikennetekniikan osasto, 2004. 64p.
• Gives a good overview on the UTRAN O&M (Operation and Maintenance)
• Handbooks
• Kaaranen, Heikki (2005) UMTS Networks – Architecture, Mobility and Services. Second Edition. JOHN
WILEY & SONS. ISBN: 0470011033
• Nielsen, Jakob (1993) Usability Engineering. Boston: Academic Press, 1993.

20 2007-02-14 / Magnus Wallström

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy