IOT MSC Electronics Notes
IOT MSC Electronics Notes
IOT MSC Electronics Notes
more clarification. Although it seems that every object capable of connecting to the Internet
will fall into the “Things” category, this notation is used to encompass a more generic set of
entities, including smart devices, sensors, human beings, and any other object that is aware of
its context and is able to communicate with other entities, making it accessible at anytime,
anywhere. This implies that objects are required to be accessible without any time or place
restrictions.
Ubiquitous connectivity is a crucial requirement of IoT, and, to fulfil it, applications
need to support a diverse set of devices and communication protocols, from tiny sensors
capable of sensing and reporting a desired factor, to powerful back-end servers that are utilized
for data analysis and knowledge extraction. This also requires integration of mobile devices,
edge devices like routers and smart hubs, and humans in the loop as controllers
Initially, Radio-Frequency Identification (RFID) used to be the dominant technology
behind IoT development, but with further technological achievements, wireless sensor
networks (WSN) and Bluetooth-enabled devices augmented the mainstream adoption of the
IoT trend. These technologies and IoT applications have been extensively surveyed previously
A smart network is a communication infrastructure characterized by the following
functionalities:
• standardization and openness of the communication standards used, from layers
interfacing with the physical world (ie, tags and sensors), to the communication layers
between nodes and with the Internet;
• object addressability (direct IP address) and multifunctionality
HUMAN IN THE LOOP
IoT is also identified as an enabler for machine-to-machine, human-to-machine, and
human-with environment interactions. With the increase in the number of smart devices and
the adoption of new protocols such as IPv6, the trend of IoT is expected to shift toward the
fusion of smart and autonomous networks of Internet-capable objects equipped with the
ubiquitous computing paradigm. Involving human in the loop of IoT offers numerous
advantages to a wide range of applications, including emergency management, healthcare, etc.
Therefore, another essential role of IoT is to build a collaborative system that is capable of
effectively responding to an event captured via sensors, by effective discovery of crowds and
also successful communication of information across discovered crowds of different domains.
(1) expanding the communication channel between objects by providing a more
integrated communication environment
(2) Facilitating the automation and control process, whereby administrators can
manage each object’s status via remote consoles;
(3) saving in the overall cost of implementation, deployment, and maintenance, by
providing detailed measurements and the ability to check the status of devices remotely
1
IoT ARCHITECTURES
The building blocks of IoT are sensory devices, remote service invocation,
communication networks, and context-aware processing of events. Reliability is considered as
the most import design factor in IoT, careful consideration is needed in designing failure
recovery and scalability. Additionally, since mobility and dynamic change of location has
become an integral part of IoT systems with the widespread use of smartphones, state-of-the-
art architectures need to have a certain level of adaptability to properly handle dynamic
interactions within the whole ecosystem. Different service and presentation layers are shown
in this architecture. Service layers include event processing and analytics, resource
management and service discovery, as well as message aggregation and Enterprise Service Bus
(ESB) services built on top of communication and physical layers. API management, which is
essential for defining and sharing system services and web-based dashboards
SOA-BASED ARCHITECTURE
In IoT, service-oriented architecture (SOA) might be imperative for the service
providers and users. SOA ensures the interoperability among the heterogeneous devices.
Let us consider a generic SOA consisting of four layers, with distinguished
functionalities as follows:
• Sensing layer is integrated with available hardware objects to sense the status of things
• Network layer is the infrastructure to support over wireless or wired connections
among things
• Service layer is to create and manage services required by users or applications
• Interfaces layer consists of the interaction methods with users or applications
Generally, in such architecture a complex system is divided into subsystems that are
loosely coupled and can be reused later that is modular decomposability feature. Hence
providing an easy way to maintain the whole system by taking care of its individual
components. This can ensure that in the case of a component failure the rest of the system
component can still operate normally. This is immense value for effective design of an IoT
application architecture, where reliability is the most significant parameter.
2
SOA has been intensively used in WSN, due to its appropriate level of abstraction and
advantages modular design. Bringing these benefits to IoT, SOA has the potential to augment
the level of interoperability and scalability among the objects in IoT.
Moreover, from the user’s perspective, all services are abstracted into common sets, removing
extra complexity for the user to deal with different layers and protocols
2 API-ORIENTED ARCHITECTURE
Likewise, building APIs for IoT applications helps the service provider attract more
customers while focusing on the functionality of their products rather than on presentation. In
addition, it is easier to enable multitenancy by the security features of modern Web APIs such
as OAuth, APIs which indeed are capable of boosting an organization’s service exposition and
commercialization. It also provides more efficient service monitoring and pricing tools than
previous service-oriented approaches.
The figure illustrates the main elements of the OpenIoT software architecture along
with their interactions and functionalities, in particular:
3
• The Sensor Middleware, which collects, filters, and combines data streams stemming
from virtual sensors
• The Cloud Computing Infrastructure, which enables the storage of data streams
stemming from the sensor middleware, thereby acting as a cloud database.
• The Directory Service, which stores information about all the sensors that are
available in the OpenIoT platform. It also provides the means (ie, services) for
registering sensors with the directory, as well as for the look-up (ie, discovery) of
sensors.
• The Global Scheduler, which processes all the requests for on-demand deployment of
services, and ensures their proper access to the resources (eg, data streams) that they
require.
• The Local Scheduler component, which is executed at the level of the Sensor
Middleware, and ensures the optimized access to the resources managed by sensor
middleware instances (ie, GSN nodes in the case of the OpenIoT implementation).
• The Service Delivery and Utility Manager, which performs a dual role. On the one
hand, it combines the data streams as indicated by service workflows within the
OpenIoT system, in order to deliver the requested service.
• The Request Definition tool, which enables the specification of service requests to the
OpenIoT platform
• The Request Presentation component, which is in charge of the visualization of the
outputs of an IoT service.
• The Configuration and Monitoring component, which enables management and
configuration functionalities over the sensors, and the IoT services that are deployed
within the platform
INTERNET PRINCIPLES
INTERNET COMMUNICATIONS: AN OVERVIEW
Suppose that you wanted to send a message to the authors of this book, but you didn’t
have the postal address, and you didn’t have any way to look up our phone number You
remember that we’re from the UK, and London is the biggest city in the UK. So, you send a
postcard to your cousin Bob, who lives there. Your cousin sees that the postcard is for some
crazy hardware and technology people. So, he puts the postcard in an envelope and drops it off
at the London Hackspace because the guys there probably know what to do with it. At the
Hackspace, Jonty picks up the envelope and sees that it’s for some people in Liverpool. Like
all good Londoners, Jonty never goes anywhere to the north of Watford, but he remembers that
Manchester is in the north too. So he calls up the Manchester Digital Laboratory (MadLab),
opens the envelope to read the contents, and says, “Hey, I’ve got this message for Adrian and
Hakim in Liverpool. Can you pass it on?” The guys at MadLab ask whether anyone knows
who we are, and it turns out that Hwa Young does. So, the next time she comes to Liverpool,
she delivers the postcard to us.
The story explained here to understand the working of internet protocols, there are
various protocols, sockets, applications, routers, and many other devices to help the data
communications between them.
IP
The preceding scenario describes how the Internet Protocol (IP) works. Data is sent
from one machine to another in a packet, with a destination address and a source address in a
standardised format (a “protocol”). Just like the original sender of the message in the example,
the sending machine doesn’t always know the best route to the destination in advance. Most of
4
the time, the packets of data have to go through a number of intermediary machines, called
routers, to reach their destination. The underlying networks aren’t always the same: just as we
used the phone, the postal service, and delivery by hand, so data packets can be sent over wired
or wireless networks, through the phone system, or over satellite links.
In our example, a postcard was placed in an envelope before getting passed onwards.
This happens with Internet packets, too. So, an IP packet is a block of data along with the same
kind of information you would write on a physical envelope: the name and address of the
server, and so on. But if an IP packet ever gets transmitted across your local wired network via
an Ethernet cable—the cable that connects your home broadband router or your office local
area network (LAN) to a desktop PC—then the whole packet will get bundled up into another
type of envelope, an Ethernet Frame, which adds additional information about how to complete
the last few steps of its journey to your computer.
TCP
Basically, how the Transmission Control Protocol (TCP) works. The simplest transport
protocol on the Internet, TCP is built on top of the basic IP protocol and adds sequence
numbers, acknowledgements, and retransmissions. This means that a message sent with TCP
can be arbitrarily long and give the sender some assurance that it actually arrived at the
destination intact. Because the combination of TCP and IP is so useful, many services are built
on it in turn, such as email and the HTTP protocol that transmits information across the World
Wide Web.
5
UDP
In UDP each message may or may not arrive. No handshake or retransmission occurs,
nor is there any delay to wait for messages in sequence. These limitations make TCP preferable
for many of the tasks that Internet of Things devices will be used for. UDP useful for
applications such as streaming data, which can cope with minor errors but doesn’t like delays.
Voice over IP (VoIP)—computer-based telephony, such as Skype. missing one packet might
cause a tiny glitch in the sound quality, but waiting for several packets to arrive in the right
order could make the speech too jittery to be easy to understand. UDP is also the transport for
some very important protocols which provide common, low-level functionality, such as DNS
and DHCP, which relate to the discovery and resolution of devices on the network.
IP ADDRESSES
IP addresses are numbers. In Internet Protocol version 4 (IPv4), almost 4.3 billion IP
addresses are possible—4,294,967,296 to be precise, or 232. Though that is convenient for
computers, it’s tough for humans to read, so IP addresses are usually written as four 8-bit
numbers separated by dots (from 0.0.0.0 to 255.255.255.255)
Every machine on the Internet has at least one IP address. That means every computer, every
network-connected printer, every smartphone, and every Internet of Things device has one.
8.8.8.x — One of several IP ranges assigned to Google.
192.168.x.x — A range assigned for private networks. Your home or office network router
may well assign IP addresses in this range.
10.x.x.x — Another private range.
DNS Although computers can easily handle 32-bit numbers, even formatted as dotted
quads they are easy for most humans to forget. The Domain Name System (DNS) helps our
feeble brains navigate the Internet. Domain names, Chapter 3: Internet Principles 47 such as
the following, are familiar to us from the web, or perhaps from email or other services:
google.com bbc.co.uk wiley.com arduino.cc Each domain name has a top-level domain (TLD),
like .com or.uk, which further subdivides into.co.uk and.gov.uk, and so on.
The domains then have information about where to direct calls to individual machines
or services. For example, the DNS records for.google.com know where to point you for the
following: www.google.com mail.google.com calendar.google.com
6
Configuration Protocol (DHCP). When the device tries to connect, instead of checking its
internal configuration for its address, it sends a message to the router asking for an address.
The router assigns it an address. This is not a static IP address which belongs to the device
indefinitely; rather, it is a temporary “lease” which is selected dynamically according to which
addresses are currently available
IPv6
When IP was standardised, few could have predicted how quickly the 4.3 billion
addresses that IPv4 allowed for would be allocated. The expected growth of the Internet of
Things can only speed up this trend
Enter IPv6, which uses 128-bit addresses, usually displayed to users as eight groups of
four hexadecimal digits—for example, 2001:0db8:85a3:0042 :0000:8a2e:0370:7334. The
address space (2^128) is so huge that you could assign the same number of addresses as the
whole of IPv4 to every person on the planet
IPv6 and Powering Devices We can see that an explosion in the number of Internet of
Things devices will almost certainly need IPv6 in the future. But we also have to consider the
power consumption of all these devices. We know that we can regularly charge and maintain
a small handful of devices. At any one moment, we might have a laptop, a tablet, a phone, a
camera, and a music player plugged in to charge. The constant juggling of power sockets,
chargers, and cables is feasible but fiddly. The requirements for large numbers of devices,
however, are very different. The devices should be low power and very reliable, while still
being capable of connecting to the Internet. Perhaps to accomplish this, these devices will team
together in a mesh network.
MAC ADDRESSES
As well as an IP address, every network-connected device also has a MAC address,
which is like the final address on a physical envelope in our analogy. It is used to differentiate
different machines on the same physical network so that they can exchange packets. This
relates to the lowest-level “link layer” of the TCP/IP stack. Though MAC addresses are
globally unique, they don’t typically get used outside of one Ethernet network (for example,
beyond your home router). So, when an IP message is routed, it hops from node to node, and
when it finally reaches a node which knows where the physical machine is, that node passes
the message to the device associated with that MAC address. MAC stands for Media Access
Control. It is a 48-bit number, usually written as six groups of hexadecimal digits, separated
by colons for example: 01:23:45:67:89: ab
Most devices, such as your laptop, come with the MAC address burned into their
Ethernet chips. Some chips, such as the Arduino Ethernet’s WizNet, don’t have a hard-coded
MAC address,
7
Similarly, when you send a TCP/IP message over the Internet, you have to send it to the right
port. TCP ports, unlike entrances to the Capulet house, are referred to by numbers (from 0 to
65535).
Ports 0–1023 are “well-known ports”, and only a system process or an administrator can
connect to them.
Ports 1024–49151 are “registered”, so that common applications can have a usual port
number. However, most services are able to bind any port number in this range. The Internet
Assigned Numbers Authority (IANA) is responsible for registering the numbers in these
ranges. People can and do abuse them, especially in the range 1024–49151, but unless you
know what you’re doing, you are better off using either the correct assigned port or (for an
entirely custom application) a port above 49151. You see custom port numbers if a machine
has more than one web server; for example, in development you might have another server,
bound to port 8080: http://www.example.com:8080 Or if you are developing a website locally,
you may be able to test it with a built-in test web server which connects to a free port. For
example, Jekyll (the lightweight blog engine) has a test server that runs on port 4000:
http://localhost:4000 The secure (encrypted) HTTPS usually runs on port 443. So, these two
URLs are equivalent: https://www.example.com https://www.example.com:443 Chapter 3:
Internet Principles 55 OTHER COMMON PORTS Even if you will rarely need a complete
catalogue of all port numbers for services, you can rapidly start to memorize port numbers for
the common services that you use daily. For example, you will very likely come across the
following ports regularly:
◾ 80 HTTP
◾ 8080 HTTP (for testing servers)
◾ 443 HTTPS ◾ 22 SSH (Secure Shell)
◾ 23 Telnet ◾ 25 SMTP (outbound email)
◾ 110 POP3 (inbound email)
◾ 220 IMAP (inbound email) All of these services are in fact application layer protocols.
Socket programming is a way for two devices on the same network to communicate
with each other. Two sockets communicate, one on the client and one on the server.
8
A socket is a communications connection point (endpoint) that you can name and
address in a network. Socket programming shows how to use socket APIs to establish
communication links between remote and local processes.
The processes that use a socket can reside on the same system or different systems on
different networks. Sockets are useful for both stand-alone and network applications. Sockets
allow you to exchange information between processes on the same machine or across a
network, distribute work to the most efficient machine, and they easily allow access to
centralized data. Socket application program interfaces (APIs) are the network standard for
TCP/IP. A wide range of operating systems support socket APIs.
A socket’s address consists of an IP and a port. The server application starts to listen to
clients over the defined port. The client establishes a connection over the IP of the server and
the port it opens. The communication can then continue bidirectionally.
Routing
When a device has multiple paths to reach a destination, it always selects one path by
preferring it over others. This selection process is termed as Routing. Routing is done by
special network devices called routers or it can be done by means of software processes. The
software-based routers have limited functionality and limited scope.
A router is always configured with some default route. A default route tells the router
where to forward a packet if there is no route found for specific destination. In case there are
multiple paths existing to reach the same destination, router can make decision based on the
following information:
• Hop Count
• Bandwidth
• Metric
• Prefix-length
9
• Delay
Routes can be statically configured or dynamically learnt. One route can be configured to
be preferred over others.
Unicast routing
Most of the traffic on the internet and intranets known as unicast data or unicast traffic
is sent with specified destination. Routing unicast data over the internet is called unicast
routing. It is the simplest form of routing because the destination is already known. Hence the
router just has to look up the routing table and forward the packet to next hop.
Broadcast routing
By default, the broadcast packets are not routed and forwarded by the routers on any
network. Routers create broadcast domains. But it can be configured to forward broadcasts in
some special cases. A broadcast message is destined to all network devices.
Broadcast routing can be done in two ways (algorithm):
• A router creates a data packet and then sends it to each host one by one. In this case,
the router creates multiple copies of single data packet with different destination
addresses. All packets are sent as unicast but because they are sent to all, it simulates
as if router is broadcasting.This method consumes lots of bandwidth and router must
destination address of each node.
• Secondly, when router receives a packet that is to be broadcasted, it simply floods those
packets out of all interfaces. All routers are configured in the same way.
10
This method is easy on router's CPU but may cause the problem of duplicate packets
received from peer routers.
Reverse path forwarding is a technique, in which router knows in advance about its
predecessor from where it should receive broadcast. This technique is used to detect
and discard duplicates.
Multicast Routing
Multicast routing is special case of broadcast routing with significance difference and
challenges. In broadcast routing, packets are sent to all nodes even if they do not want it. But
in Multicast routing, the data is sent to only nodes which wants to receive the packets.
The router must know that there are nodes, which wish to receive multicast packets (or
stream) then only it should forward. Multicast routing works spanning tree protocol to avoid
looping.
Multicast routing also uses reverse path Forwarding technique, to detect and discard
duplicates and loops.
Anycast Routing
11
Anycast packet forwarding is a mechanism where multiple hosts can have same logical
address. When a packet destined to this logical address is received, it is sent to the host which
is nearest in routing topology.
Anycast routing is done with help of DNS server. Whenever an Anycast packet is received it
is enquired with DNS to where to send it. DNS provides the IP address which is the nearest
IP configured on it.
There are two kinds of routing protocols available to route unicast packets:
Distance Vector is simple routing protocol which takes routing decision on the number
of hops between source and destination. A route with less number of hops is considered
as the best route. Every router advertises its set best routes to other routers. Ultimately,
all routers build up their network topology based on the advertisements of their peer
routers,
For example Routing Information Protocol (RIP).
Link State protocol is slightly complicated protocol than Distance Vector. It takes into
account the states of links of all the routers in a network. This technique helps routes
build a common graph of the entire network. All routers then calculate their best path
for routing purposes.for example, Open Shortest Path First (OSPF) and Intermediate
System to Intermediate System (ISIS).
12
Multicast Routing Protocols
Unicast routing protocols use graphs while Multicast routing protocols use trees, i.e. spanning
tree to avoid loops. The optimal tree is called shortest path spanning tree.
• DVMRP - Distance Vector Multicast Routing Protocol
• MOSPF - Multicast Open Shortest Path First
• CBT - Core Based Tree
• PIM - Protocol independent Multicast
Protocol Independent Multicast is commonly used now. It has two flavors:
• PIM Dense Mode
This mode uses source-based trees. It is used in dense environment such as LAN.
• PIM Sparse Mode
This mode uses shared trees. It is used in sparse environment such as WAN.
Routing Algorithms
Flooding
Flooding is simplest method packet forwarding. When a packet is received, the routers
send it to all the interfaces except the one on which it was received. This creates too much
burden on the network and lots of duplicate packets wandering in the network.
Time to Live (TTL) can be used to avoid infinite looping of packets. There exists another
approach for flooding, which is called Selective Flooding to reduce the overhead on the
network. In this method, the router does not flood out on all the interfaces, but selective ones.
Shortest Path
Routing decision in networks, are mostly taken on the basis of cost between source and
destination. Hop count plays major role here. Shortest path is a technique which uses various
algorithms to decide a path with minimum number of hops.
Common shortest path algorithms are:
• Dijkstra's algorithm
• Bellman Ford algorithm
• Floyd Warshall algorithm
• In real world scenario, networks under same administration are generally scattered
geographically. There may exist requirement of connecting two different networks of
same kind as well as of different kinds. Routing between two networks is called
internetworking.
• Networks can be considered different based on various parameters such as, Protocol,
topology, Layer-2 network and addressing scheme.
13
• In internetworking, routers have knowledge of each other’s address and addresses
beyond them. They can be statically configured go on different network or they can
learn by using internetworking routing protocol.
•
• Routing protocols which are used within an organization or administration are called
Interior Gateway Protocols or IGP. RIP, OSPF are examples of IGP. Routing between
different organizations or administrations may have Exterior Gateway Protocol, and
there is only one EGP i.e. Border Gateway Protocol.
• Tunneling
• If they are two geographically separate networks, which want to communicate with
each other, they may deploy a dedicated line between or they have to pass their data
through intermediate networks.
• Tunneling is a mechanism by which two or more same networks communicate with
each other, by passing intermediate networking complexities. Tunneling is configured
at both ends.
• When the data enters from one end of Tunnel, it is tagged. This tagged data is then
routed inside the intermediate or transit network to reach the other end of Tunnel. When
data exists the Tunnel, its tag is removed and delivered to the other part of the network.
• Both ends seem as if they are directly connected and tagging makes data travel through
transit network without any modifications.
• Packet Fragmentation
• Most Ethernet segments have their maximum transmission unit (MTU) fixed to 1500
bytes. A data packet can have more or less packet length depending upon the
14
application. Devices in the transit path also have their hardware and software
capabilities which tell what amount of data that device can handle and what size of
packet it can process.
• If the data packet size is less than or equal to the size of packet the transit network can
handle, it is processed neutrally. If the packet is larger, it is broken into smaller pieces
and then forwarded. This is called packet fragmentation. Each fragment contains the
same destination and source address and routed through transit path easily. At the
receiving end it is assembled again.
• If a packet with DF (don’t fragment) bit set to 1 comes to a router which can not handle
the packet because of its length, the packet is dropped.
• When a packet is received by a router has its MF (more fragments) bit set to 1, the
router then knows that it is a fragmented packet and parts of the original packet is on
the way.
• If packet is fragmented too small, the overhead is increases. If the packet is fragmented
too large, intermediate router may not be able to process it and it might get dropped.
Internetwork
• Web sites
• E-mail
• Instant Messaging
• Blogging
• Social Media
• Marketing
• Networking
• Resource Sharing
• Audio and Video Streaming
15
Ethernet
• Ethernet is a widely deployed LAN technology. This technology was invented by Bob
Metcalfe and D.R. Boggs in the year 1970. It was standardized in IEEE 802.3 in 1980.
• Ethernet shares media. Network which uses shared media has high probability of data
collision. Ethernet uses Carrier Sense Multi Access/Collision Detection (CSMA/CD)
technology to detect collisions. On the occurrence of collision in Ethernet, all its hosts
roll back, wait for some random amount of time, and then re-transmit the data.
• Ethernet connector is, network interface card equipped with 48-bits MAC address. This
helps other Ethernet devices to identify and communicate with remote devices in
Ethernet.
• Traditional Ethernet uses 10BASE-T specifications. The number 10 depicts 10MBPS
speed, BASE stands for baseband, and T stands for Thick Ethernet. 10BASE-T
Ethernet provides transmission speed up to 10MBPS and uses coaxial cable or Cat-5
twisted pair cable with RJ-45 connector. Ethernet follows star topology with segment
length up to 100 meters. All devices are connected to a hub/switch in a star fashion.
• Fast-Ethernet
• To encompass need of fast emerging software and hardware technologies, Ethernet
extends itself as Fast-Ethernet. It can run on UTP, Optical Fiber, and wirelessly too. It
can provide speed up to 100 MBPS. This standard is named as 100BASE-T in IEEE
803.2 using Cat-5 twisted pair cable. It uses CSMA/CD technique for wired media
sharing among the Ethernet hosts and CSMA/CA (CA stands for Collision Avoidance)
technique for wireless Ethernet LAN.
• Fast Ethernet on fiber is defined under 100BASE-FX standard which provides speed
up to 100 MBPS on fiber. Ethernet over fiber can be extended up to 100 meters in half-
duplex mode and can reach maximum of 2000 meters in full-duplex over multimode
fibers.
• Giga-Ethernet
• After being introduced in 1995, Fast-Ethernet could enjoy its high speed status only
for 3 years till Giga-Ethernet introduced. Giga-Ethernet provides speed up to 1000
mbits/seconds. IEEE802.3ab standardize Giga-Ethernet over UTP using Cat-5, Cat-5e
and Cat-6 cables. IEEE802.3ah defines Giga-Ethernet over Fiber.
• Virtual LAN
• LAN uses Ethernet which in turn works on shared media. Shared media in Ethernet
create one single Broadcast domain and one single Collision domain. Introduction of
switches to Ethernet has removed single collision domain issue and each device
connected to switch works in its separate collision domain. But even Switches cannot
divide a network into separate Broadcast domains.
• Virtual LAN is a solution to divide a single Broadcast domain into multiple Broadcast
domains. Host in one VLAN cannot speak to a host in another. By default, all hosts are
placed into the same VLAN.
16
• In this diagram, different VLANs are depicted in different color codes. Hosts in one
VLAN, even if connected on the same Switch cannot see or speak to other hosts in
different VLANs. VLAN is Layer-2 technology which works closely on Ethernet. To
route packets between two different VLANs a Layer-3 device such as Router is
required.
Wireless LANs are those Local Area Networks that use high frequency radio waves instead
of cables for connecting the devices in LAN. Users connected by WLANs can move around
within the area of network coverage. Most WLANs are based upon the standard IEEE 802.11
or WiFi.
IEEE 802.11 Architecture
The components of an IEEE 802.11 architecture are as follows
1) Stations (STA) − Stations comprise all devices and equipments that are connected to the
wireless LAN. A station can be of two types:
• Wireless Access Pointz (WAP) − WAPs or simply access points (AP) are generally
wireless routers that form the base stations or access.
• Client. − Clients are workstations, computers, laptops, printers, smartphones, etc.
Each station has a wireless network interface controller.
2) Basic Service Set (BSS) −A basic service set is a group of stations communicating at
physical layer level. BSS can be of two categories depending upon mode of operation:
17
• Infrastructure BSS − Here, the devices communicate with other devices through
access points.
• Independent BSS − Here, the devices communicate in peer-to-peer basis in an ad hoc
manner.
3) Extended Service Set (ESS) − It is a set of all connected BSS.
4) Distribution System (DS) − It connects access points in ESS.
Advantages of WLANs
• They provide clutter free homes, offices and other networked places.
• The LANs are scalable in nature, i.e. devices may be added or removed from the
network at a greater ease than wired LANs.
• The system is portable within the network coverage and access to the network is not
bounded by the length of the cables.
• Installation and setup is much easier than wired counterparts.
• The equipment and setup costs are reduced.
Disadvantages of WLANs
• Since radio waves are used for communications, the signals are noisier with more
interference from nearby systems.
• Greater care is needed for encrypting information. Also, they are more prone to errors.
So, they require greater bandwidth than the wired LANs.
• WLANs are slower than wired LANs.
• Cellular Networks
• Cellular network refers to mobile networks used over a wide area and is used to access
network. A mobile device is connected to its base station using an air-based interface,
using physical and link layer protocol. Each base station is connected to Mobile
Switching Center connecting mobiles to wide area networks, help to set up call and
mobility network.
• Wi-Fi Networks
18
• WiFi stands for Wireless Fidelity. It is used to create a wireless network of devices to
access the internet. It follows IEEE 802.11 standards. Each WiFi device is connecting
to the WLAN network to access the internet and a wireless access point, AP.
• Following are some of the important differences between Cellular Networks and Wi-Fi
Networks
Internet Cellular networks are based on mobile Wifi uses radiofrequency waves to
1 Access phones/devices using cellular signals to provide high-speed internet access to
connect to the internet. connected devices.
Standard Cellular networks are based on mobile WiFi is a wireless network technology
2 phones and use networks spread over a wide following IEEE 802.11 standards.
area.
Range Cellular networks are dependent on network Wifi has a limited range.
3
range availability.
Data Mobile phones have a plan up to which data Wifi has no such limits or plans up to
4
Plans can be consumed. which data can be consumed.
Speed Mobile phone network access speed is WiFi is quite faster as compared to a
5
generally slow as compared to WiFi. cellular network.
Wireless Cellular Systems solves the problem of spectral congestion and increases user
capacity. The features of cellular systems are as follows −
• Offer very high capacity in a limited spectrum.
• Reuse of radio channel in different cells.
• Enable a fixed number of channels to serve an arbitrarily large number of users by
reusing the channel throughout the coverage region.
19
• Communication is always between mobile and base station (not directly between
mobiles).
• Each cellular base station is allocated a group of radio channels within a small
geographic area called a cell.
• Neighbouring cells are assigned different channel groups.
• By limiting the coverage area to within the boundary of the cell, the channel groups
may be reused to cover different cells.
• Keep interference levels within tolerable limits.
• Frequency reuse or frequency planning.
• Organization of Wireless Cellular Network.
Cellular network is organized into multiple low power transmitters each 100w or less.
Shape of Cells
The coverage area of cellular networks are divided into cells, each cell having its own antenna
for transmitting the signals. Each cell has its own frequencies. Data communication in cellular
networks is served by its base station transmitter, receiver and its control unit.
The shape of cells can be either square or hexagon −
Square
A square cell has four neighbors at distance d and four at distance Root 2 d
Hexagon
A hexagon cell shape is highly recommended for its easy coverage and calculations. It offers
the following advantages −
20
Frequency Reuse
Frequency reusing is the concept of using the same radio frequencies within a given area, that
are separated by considerable distance, with minimal interference, to establish
communication.
Frequency reuse offers the following benefits −
As the networks of smart objects and IP merge, there is a high probability of security
vulnerabilities due to protocol translations, incompatible security infrastructures, etc. The
enterprise security model has been marked by two chief tenets:
• Security has been focused on best-of-breed applications and appliances:
21
solutions for firewall, for network security, for data security, for content security, and so forth.
• Security has been perimeter-based, meaning organizations secured the end device and the
server, and reacted to recognized intrusions or threats such as viruses or DoS attacks.
The complex operational technologies make it difficult for designing a robust security
architecture in IoT. It is a common opinion that in the near future IP will be the base common
network protocol for IoT. This does not imply that all objects will be able to run IP. In ontrast,
there will always be tiny devices, such as tiny sensors or Radiofrequency Identification (RFID)
tags, that will be organized in closed networks implementing very simple and application-
specific communication protocols and that eventually will be connected to an external network
through a proper gateway. In short, the heterogeneous characteristics of the networks make it
harder to implement certain IP-based security systems such as symmetric cryptosystems.
The IoT provides interconnectedness of people and things on a vast scale with billions
of devices. It is at once a huge opportunity for better efficiency and better services, as well as
a huge opportunity for hackers to compromise security and privacy. It may be noted that one
of the key elements of the state-of-the-art security in the Internet is the use of advanced
cryptographic algorithms needing substantial processing power. However, IoT devices are
based on low-end processors or microcontrollers that have low processing power and
memory, and are not designed with security as a priority design goal. Privacy enforced
through encryption, authentication to conform identity. These mechanisms rely on the
following:
• Cryptographic ciphers such as Advanced Encryption Standard (AES), Secure Hash
Algorithm (SHA2), and the public-key ciphers RSA and elliptic-curve cryptography (ECC).
• Transport Layer Security (TLS) protocol, and predecessor Secure Sockets Layer (SSL)
protocol, which provide authentication and information encryption using the ciphers
mentioned.
• Public-Key Infrastructure (PKI) provides the building blocks for authentication and trust
through a digital certificate standard and Certificate Authorities (CA).
Current IoT implementations have gaps in terms of implementing the above security
mechanisms, even though these mechanisms have widespread adoption in the IP networks
22
interchange, privacy aware identification, mobility, and IP network dynamics. Many
applications will process sensitive health monitoring or biometric data, so the demand for
cryptographic components that can be efficiently implemented is strong and growing.
IoT devices and applications add a layer of complexity over the generic issue of
privacy over the Internet, for example due to generation of traceable characteristics and
attributes of individuals. IoT devices in healthcare present a major concern, since these
devices and applications typically generate large volumes of data on individual patients
through continuous monitoring of vital parameters. In this case, it is crucial to delink the
identities of the device from that of the individual, through mechanisms such as data
anonymization. Data anonymization is the process of either encrypting or removing
personally identifiable information from data sets, so that the originator of the data remains
anonymous.
23
• It must provide enterprise scalability
• A minimal level of electronics and computing experience is required
• Higher-level programming languages are preferred
The most obvious is that most sensors cannot communicate via the Internet to
backend services. This is due to the fact that cost of having Wi-Fi or wired Ethernet
connections to every sensor, and in also due to the electrical power required. Supporting a
full TCP/IP networking stack on every sensor is not required if a low-power wireless network
is available. It is for these reasons that local sensors often communicate via a gateway.
• The sensors and actuators are application-specific and will require some thought to ensure
that the data collected meets the accuracy and sampling frequency required for analysis.
• The gateway is responsible for communicating with the sensors and actuators as well as the
backend services; it is a translator between localized interfaces, such as hard-wired sensors
and remote backend systems. It basically adds TCP/IP capability to sensors.
• The backend services are predominantly used to store the IoT device data but can include
additional functionality such as individual device configuration and analysis algorithms.
Communication between these three components ensures that data collected by an IoT
device can be transported via the Internet to backend services. The communication must be
bidirectional so that the configuration of the gateway can be updated.
SENSOR TO GATEWAY COMMUNICATION
The sensors themselves usually have low-level electronic interfaces for
communication. For example, I 2 C and SPI are common serial-communication buses capable
of linking multiple electronic components. Connecting one or more sensors to the serial bus
of a gateway device is a simple way of creating an IoT device.
Wired Gateway Interfaces
• Inter-Integrated Circuit (I2 C) or Two Wire Interface (TWI) is a serial bus capable of
hosting multiple master and multiple slave devices using just two connections: Serial Data
Line (SDA) and Serial Clock Line (SCL). It is an easy-to-implement bus, well understood
and commonly used, making it a good choice for IoT sensor communications. The clock
24
frequency and total bus capacitance will generally limit the range to a few meters, but in
theory, distances in excess of 100 m are possible.
• Serial Peripheral Interface (SPI) is a serial bus capable of hosting a single master with
multiple slave devices per bus. It uses three connections plus one connection per slave
device: Serial Clock (SCLK); Master Output, Slave Input (MOSI); Master Input, Slave
Output (MISO); and Slave Select (SS). SPI is very common and has a lower power
consumption than I2 C, but requires more connections
• Pulse-Width Modulation (PWM) is not an interface, but is rather a technique to encode a
message by varying the power supplied by a digital pin. It works by turning the PWM pin on
and off at different frequencies to lower the average voltage supplied to the load. It is often
implemented in hardware to ensure accurate timings of each pulse, permitting switching
frequencies of up to 1 MHz, for example, a 10-bit resolution on a 5-V digital pin will allow
for 1024 different voltage outputs between 0 and 5 V. This could relate to motor speed or
servo angle.
• Universal Asynchronous Receiver/Transmitter (UART) refers to the hardware that converts
parallel to serial communications, and is often simply called serial.
• A Controller Area Network (CAN) is a message-based protocol defined by ISO 11898-1
that allows multiple-master device communication. It was designed over the last 30 years for
in vehicle electronic networking, and has been adopted by automotive, railway, and
aerospace industries.
• Analog to Digital-Converter (ADC) is a device that converts a continuous analog voltage to
a digital number. The number of bits used to store this number relates to the accuracy of the
ADC
• General Purpose Input Output (GPIO) is a general-purpose pin whose behaviour can be
controlled by the user at the runtime. Some of these GPIO pins can be configured to function
as Interrupt Request (IRQ) inputs
• 1-Wire is a device communications bus-system capable of hosting multiple slave devices
connected to one single master, using a serial protocol for communication with a single data
line plus ground reference.
• X10 is a protocol for communication between automated home-electronic devices. It
controls any device plugged into an electric powerline by signalling digital information in the
form of short radio-frequency bursts.
SENSORS
We will now discuss the sensors required to build the environmental-sensing IoT
gateway device for weather monitoring, the wind vane has eight magnetic reed switches
arranged in a dial; each reed switch is connected to a different-sized resistor. As the vane
changes direction, a magnet moves, making and breaking the switches, and therefore
changing the resistance of the circuit. The magnet is sized such that when it is halfway
between two reed switches both will be connected. This will give a total of 16 distinct circuit
25
The three-cup hemispherical anemometer is used to calculate the wind speed. As the
anemometer cups rotate, a single magnet on the device closes a reed switch, completing the
circuit. This particular device has two reed switches, resulting in the circuit closing twice per
revolution. Connecting one end
THE GATEWAY
Next, we consider the IoT gateway device, which can be split into the hardware and
the software. Selecting the hardware to sit between sensors and an Internet connection
requires careful thought. In its simplest form, the gateway reads data from the sensor’s
electronic interface and transmits it to a destination across the Internet. The specific IoT
application area will drive the hardware selection
In this example IoT device, the following features are identified as requirements, and are used
to select the gateway hardware:
• Electronic interface
26
• Data persistence. Durable storage is required with sufficient capacity to collect data at the
maximum rate at which it is generated.
• Wired or wireless Internet stack:
• Data encryption:
• Data-processing capability: The more sensors that get added, the more data will be
produced. The gateway needs sufficient processing power and bandwidth to cope with peak
demands.
• Programmability: The hardware needs to be simple to program, preferably in a high-level
language or by using nonhardware-specific libraries.
• Low cost and commercially available: The hardware needs to be available, but cost also
needs to be considered.
GATEWAY HARDWARE
As new hardware is constantly appearing, we divide the available hardware into
groups and select the preferable hardware for the gateway, based on the previously outlined
requirements. It is worth revisiting the hardware market frequently to identify better-suited
hardware. A microprocessor usually only offers processing power and requires RAM and
storage to be added as separate chips on a PCB. This results in a lot of additional work and
power, whereas a microcontroller has most of the basics embedded on a single chip. They
tend to have a low clock-speed and are suitable for real-time applications. More powerful
microcontrollers exist, and are often referred to as a System on a Chip (SoC), although the
distinction between them is blurred. The term microcontroller often refers to low-memory,
low-performance, single-chip devices, whereas a SoC usually supports an operating system
such as Linux or Windows, although this is not a definition. The development board, often
called an evaluation board, will expose all the electronic interfaces, and provide a way to
program and power the chosen chip.
GATEWAY SOFTWARE
The more powerful a microcontroller or SoC, the more complicated the software
becomes, for example, there will be more interrupts and electronic interfaces. To manage the
processing power efficiently, supporting libraries are required. The Arduino platform offers a
great deal of libraries for all sorts of hardware, but ultimately the CPU and processing
capabilities, combined with the lack of a full TCP/IP stack, make it unsuitable for a gateway.
The .NetMF platform is open source and supports high-level programming languages such as
C# and Visual Basic, and provides libraries for SSL, but it only runs on a limited set of
hardware. Overall, .NetMF would be a good contender for the weather-station gateway.
The Mbed platform is excellent for IoT devices, it supports SSL, and there is a good selection
of supported hardware. Programming is done with C, which may be difficult for beginners.
The Microsoft Windows 10 IoT platform is a full OS that is targeted toward IoT
devices. It runs the Windows 10 kernel and supports high-level programming languages such
as C#. There are libraries to support most electronic interfaces, and it is designed to
27
interoperate with cloud backend services. The supported hardware is currently very limited,
making it unsuitable for the weather-station gateway.
There are many variants of Linux available for embedded systems. They are generally
well supported, and are capable of interfacing with a wide range of hardware. Many
programming languages environments, and libraries are available, and software programs are
easily ported to new hardware.
Hardware capable of running a full Linux operating system is more expensive than a
simple microcontroller, for example, the memory requirements mean that off-chip memory is
required. The processor needs to be more powerful, and the total electronic power
consumption of the IoT device will be considerably greater than that of a custom
microcontroller device. Platforms such as Mbed may be better suited to this. There are many
variants of Linux that run on ARM-based single-board computers, such as Android,
Slackware, Gentoo, openSUSE, Fedora, and Arch Linux, to name a few. Although they are
all similar, selecting a variant is down to personal preference and compatibility with the
chosen hardware.
DATA TRANSMISSION
The sensor data needs to be transmitted to the backend services via the Internet. To do
this we have an IoT gateway with either wired or wireless Internet access. The data is
packaged for transmission and received by an online service. Traditionally the online service
would use Remote Procedure Calls (RPC) or Simple Object Access Protocol (SOAP), but
these are rather heavyweight and have been superseded by protocols such as Representational
State Transfer (REST) and frameworks such as Apache Thrift. SOAP transmits messages
over different application protocols such as HTTP. It is designed to allow interoperability
between different services by exchanging messages, but is considered rather verbose. The
default message transmission is XML but this can be substituted by binary encoding to
reduce the message size. REST uses HTTP to transmit messages at the application layer. This
way, any device which can communicate via HTTP can interact with a REST backend
service. HTTP is well understood, and widely supported in many languages on many
different hardware platforms, providing a greater level of flexibility for IoT devices. It is
lightweight and supports different message payloads, for example, JSON and MIME
Both SOAP and REST were not created with IoT devices in mind. By looking at these
and other technologies, we can list the top IoT data-transmission requirements:
• Efficient data-transmission packet size: Remote IoT devices using either mobile phone or
satellite Internet connections need to preserve bandwidth and data-transmission costs.
• Reliable transmission: Messages need to be resent or batched, for example, where Internet
connectivity is intermittent.
• IoT message persistence: Unsent messages need to be stored on the IoT device to survive
crashes and power outages.
• Supported by a wide range of programming languages: The message needs to be easy to
build, using a wide range of languages, and hardware suitable for IoT devices.
28
Based on these requirements, it is clear that we need more than just a messaging
protocol: we need a message framework. Message Queue Telemetry Transport (MQTT),
Messaging and Presence Protocol (XMPP), Data Distribution Service (DDS), and Advanced
Message Queuing Protocol (AMQP) are good examples.
ADVANCED MESSAGE QUEUING PROTOCOL
We have chosen to use the Advanced Message Queuing Protocol (AMQP) as our
messaging protocol, as it is supported by many different server implementations, and the
client is available across platforms and programming languages. The main advantage is that
AMQP is a wire-level protocol, not requiring a verbose HTTP packet. This means that data is
sent more efficiently in smaller, specifically encoded packets.
Our IoT device sends AMQP messages to a server, which then passes them on to the
readers. There are many AMQP-supported architectures, ranging from publish/subscribe to
content-based routing, and they are worth investigating for more complex scenarios.
We could run any AMQP-compatible server, such as Apache Qpid , RabbitMQ , or
Apache ActiveMQ , but we have chosen to use the Windows Server AMQP-compatible
service, called Windows Service Bus v1.0, which can run on any Windows server on-site
installation. We selected the Windows Server Service Bus because it is identical to the cloud-
based.
The weather-station application code is implemented in Python, and so we use the
Apache Qpid library to post messages to the Service Bus. Currently, all sensor data is stored
on the IoT device in an SQL database, which acts as a buffer before the data is transmitted. It
should be possible to use an AMQP broker to automatically persist messages until
transmission is possible, however, this proved to be problematic and made debugging
difficult.
BACKEND PROCESSING
Sending data to a backend service is not the end of the IoT data story: some sources
predict that the number of IoT devices will exceed 26 billion by the year 2020, excluding
PCs, smartphones, and table. This means that any IoT backend service will need to be able to
scale and maintain a high level of availability, just to receive and store the data produced by
IoT devices. This can cause issues. Let’s assume you run a custom backend server to receive
the IoT data streams. During application or operating-system updates the service may be
offline, wasting valuable IoT data bandwidth. It is possible to run a failover service on
enterprise-grade software and hardware to improve reliability, but this increases the
complexity of the setup. We have made two key decisions for the backend services:
(1) the messages will be orchestrated using a service bus supporting the Advanced Message
Queuing Protocol (AMQP), and
(2) the processing will be orchestrated using the Hadoop platform and supporting
infrastructure.
29
The objective of the backend services varies according to the application; generally,
there will be both compute and storage components. Fig. 15.4 shows the backend
requirements for just the SHT15 sensor (temperature and humidity). The data is received
from the service bus, which is written to by all IoT devices. The data is cleaned, to ensure
basic validation, and to separate data from multiple sensors into a format suitable for storage.
The data is then forked into two streams:
• One stream ensures that near-real-time critical data is put on a second service bus for
consumption. In our example, the temperature and humidity data are formatted on a service
bus ready for consumption by a Connect the Dots service, which displays near-real-time data
graphs on a web page.
• Another stream persists the data to storage, ensuring it is available for postprocessing. The
postprocessing is split into two categories: data analytics/machine learning and application
logic. The data analytics and machine learning are used to explore the datasets and locate
anomalies, for example, to detect sensor failure. The application logic in this example
processes and filters the data to calculate the daily/weekly/ monthly minimum and maximum
temperature values. The data is also formatted and pushed to external services, such as
Twitter and the Weather Observations Website (WOW).
Data Processing Framework
Managing the compute and storage requirements of an IoT infrastructure will get
more complex as the number of devices, users, and supported services increases. This can be
made easier by using an existing framework; we have selected Apache Hadoop. Apache
Hadoop is a versatile software framework which can utilize a cluster of computers for
distributed storage and distributed compute. There is a strong ecosystem of supporting
modules for data analysis, but more importantly, it is supported by multiple cloud providers.
30
All our IoT weather-station data is deposited on a service bus and consumed by a single
Hadoop instance. All the data is stored in Hadoop on HDFS, and the application code is
orchestrated using Apache STORM, a near real-time data-processing engine. There is a huge
number of Hadoop-related projects that assist with processing and storing data.
CHARACTERISTICS AND CHALLENGES
CHARACTERISTICS OF IoV
Vehicular networks are mainly composed of vehicle nodes, which behave quite
differently from other wireless nodes. Therefore, a vehicular network has several
characteristics that may affect the design of IoV technologies. Some of the characteristics will
bring challenges to IoV technological development, whereas some others may bring benefit.
1. Highly dynamic topology: Compared to common mobile nodes, vehicles may move at
quite a high speed. This causes the topology of a vehicular network to change frequently.
Such high dynamicity in network topology must be carefully considered in IoV development.
2. Variable network density: The network density in IoV varies, depending on the traffic
density, which can be very high in the case of a traffic jam, or very low, as in suburban
traffic. At either extreme the network may frequently disconnect.
3. Large-scale network: The network scale could be large in dense, urban areas, such as city
centers, highways, and at entrances to big cities.
4. Geographical communication: Compared to other networks that use unicast or multicast
where the communication endpoints are defined by ID or group ID, the vehicular networks
often have Internet of Vehicles and applications a new type of communication, which
addresses the geographical areas where packets need to be forwarded (eg, in safe-driving
applications).
5. Predictable mobility: Vehicular networks differ from other types of mobile ad-hoc
networks in which nodes move in a random way. Vehicles, on the other hand, are constrained
by road topology and layout, by the requirement to obey road signs and traffic lights, and by
responding to other moving vehicles, leading to predictability in terms of their mobility.
6. Sufficient energy and storage: A common characteristic of nodes in vehicular networks is
that nodes have ample energy and computing power (including both storage and processing),
since nodes are cars instead of small handheld devices.
7. Various communication environments. Vehicular networks are usually operated in two
typical communication environments. In highway traffic scenarios, the environment is
relatively simple and straightforward (eg, constrained one-dimensional movement), whereas
in city conditions it becomes much more complex. The streets in a city are often separated by
buildings, trees, and other obstacles; therefore, there is not always a direct line of
communication in the direction of intended data communication.
CHALLENGES IN IoV
The objective of IoV is to integrate multiple users, multiple vehicles, multiple things,
and multiple networks, to always provide the best-connected communication capability that is
manageable, controllable, operational, and credible. It composes a truly complex system.
Moreover, the applications of IoV are quite different from those of other networks, and,
31
consequently, many special requirements arise. Both of these two aspects bring new technical
challenges to IoV research and development.
1. Poor network connectivity and stability: Due to the high mobility and rapid changes of
topology, which lead to frequent network disconnections and link failures, message loss
should be common. Then, how to elongate the life of communication links is always
challenging.
2. Hard delay constraints: Many IoV applications have hard delay constraints, although they
may not require a high data rate or bandwidth. For example, in an automatic highway system,
when a brake event happens, the message should be transferred and arrive in a certain time to
avoid a car crash. In this kind of application, instead of an average delay, a minimal delay
would be crucial.
3. High reliability requirements: Transportation and driving-related applications are usually
safety sensitive. Obviously, such an application requirement is high reliability. However, due
to complex network architecture, large network scale, and poor stability of network topology,
achieving high reliability in IoV is far from trivial. A special design should be conducted in
various layers, from networking protocols to applications.
4. High scalability requirements: High scalability is another big challenge in IoV. As
mentioned before, IoV is usually very large in terms of node number and deployment
territory. Such a large scale certainly requires high scalability in IoV technology.
5. Security and privacy: Keeping a reasonable balance between the security and privacy is
one of the main challenges in IoV. The receipt of trustworthy information from its source is
important for the receiver. However, this trusted information can violate the privacy needs of
the sender.
6. Service sustainability: Assuring the sustainability of service providing in IoV is still a
challenging task, calling for high intelligence methods, as well as a user-friendly network-
mechanism design. There are challenges in adjusting all vehicles to provide sustainable
services over heterogeneous Enabling technologies 303 networks in real-time, as they are
subject to limited network bandwidth, mixed wireless access, lower service platforms, and a
complex city environment.
32
Python
Python is a general-purpose high level programming language and suitable for
providing a solid foundation to the reader in the area of cloud computing. The main
characteristics of Python are:
1) Multi-paradigm programming language.
2) Python supports more than one programming paradigms including object- oriented
programming and structured programming.
3) Interpreted Language.
4) Python is an interpreted language and does not require an explicit compilation step.
5) The Python interpreter executes the program source code directly, statement by statement,
as a processor or scripting engine does.
6) Interactive Language
7) Python provides an interactive mode in which the user can submit commands at the
Python prompt and interact with the interpreter directly.
Datatypes
Every value in Python has a datatype. Since everything is an object in Python
programming, data types are actually classes and variables are instance (object) of these
classes. There are various data types in Python. Some of the important types are listed below.
Python doesn’t have variables as such, but instead has object references. When it comes to
immutable objects like ints and strs, here is no discernible difference between a variable and
an object reference
Identifiers and Keywords
Python, what really happens is that we bind an object reference to refer to the object in
memory that holds the data. The names we give to our object references are called identifiers
or just plain names When we create a data item, we can either assign it to a variable, or insert
it Object into a collection
A valid Python identifier is a nonempty sequence of characters of any length that consists of a
“start character” and zero or more “continuation characters”. Such an identifier must obey a
couple of rules and ought to follow certain conventions. The first rule concerns the start and
continuation characters. The start character can be anything that Unicode considers to be a
letter, including the ASCII letters, the underscore (“_”), as well as the letters from most non-
English languages. Each continuation character can be any character that is permitted as a
start character.
The second rule is that no identifier can have the same name as one of Python’s keywords, so
we cannot use any of the names shown in Table 2.1
33
And continue except global lambda pass while
When an invalid identifier is used it causes a Syntax Error exception to be raised. In each
case the part of the error message that appears in parentheses varies, so we have replaced it
with an ellipsis.
1. Python Numbers
Integers, floating point numbers and complex numbers falls under Python numbers
category. They are defined as int, float and complex class in Python. We can use the type()
function to know which class a variable or a value belongs to and the isinstance() function to
check if an object belongs to a particular class.
Script.py
1. a = 5
2. print(a, "is of type", type(a))
3. a = 2.0
4. print(a, "is of type", type(a))
5. a = 1+2j
6. print(a, "is complex number?", isinstance(1+2j,complex))
Integers can be of any length, it is only limited by the memory available. A floating
point number is accurate up to 15 decimal places. Integer and floating points are separated by
decimal points. 1 is integer, 1.0 is floating point number. Complex numbers are written in the
form, x + yj, where x is the real part and y is the imaginary part. Here are some examples.
>>> a = 1234567890123456789
>>> a
1234567890123456789
>>> b = 0.1234567890123456789
>>> b
0.12345678901234568
>>> c = 1+2j
>>> c
(1+2j)
Booleans
There are two built-in Boolean objects: True and False. Like all other Python data types, the
bool data type can be called as a function—with no arguments it returns False, with a bool
argument it returns a copy of the argument, and with any other argument it attempts to
convert the given object to a bool. All the built-in and standard library data types can be
34
converted to produce a Boolean value, and it is easy to provide Boolean conversions for
custom data types.
Floating-Point Types
Python provides three kinds of floating-point values: the built-in float and complex types, and
the decimal.Decimal type from the standard library. All three are immutable. Type float holds
double-precision floating-point numbers whose range depends on the C compiler Python was
built with.
Floating-Point Numbers
All the numeric operators and functions can be used with floats. The float data type
can be called as a function—with no arguments it returns 0.0, with a float argument it returns
a copy of the argument, and with any other argument it attempts to convert the given object to
a float. When used for conversions a string argument can be given, either using simple
decimal notation or using exponential notation.
\
Complex Numbers
The complex data type is an immutable type that holds a pair of floats, one representing the
real part and the other the imaginary part of a complex number. Literal complex numbers are
written with the real and imaginary parts joined by a + or - sign, and with the imaginary part
followed by a j.★ Here are some examples: 3.5+2j, 0.5j, 4+0j, -1-3.7j.
Decimal Numbers
In many applications the numerical inaccuracies that can occur when using floats
don’t matter, and in any case are far outweighed by the speed of calculation that floats offer
Decimal numbers are created using the decimal.Decimal() function. This function can take an
integer or a string argument—but not a float, since floats are held inexactly whereas decimals
are represented exactly. If a string is used it can use simple decimal notation or exponential
notation.
2. Python List
List is an ordered sequence of items. It is one of the most used datatype in Python and is very
flexible. All the items in a list do not need to be of the same type. Declaring a list is pretty
straight forward. Items separated by commas are enclosed within brackets [].
>>> a = [1, 2.2, 'python']
We can use the slicing operator [ ] to extract an item or a range of items from a list. Index
starts form 0 in Python.
Script.py
1. a = [5,10,15,20,25,30,35,40]
2. # a[2] = 15
3. print("a[2] = ", a[2])
4. # a[0:3] = [5, 10, 15]
5. print("a[0:3] = ", a[0:3])
6. # a[5:] = [30, 35, 40]
7. print("a[5:] = ", a[5:])
35
Lists are mutable, meaning; value of elements of a list can be altered.
>>> a = [1,2,3]
>>> a[2]=4
>>> a
[1, 2, 4]
3. Python Tuple
Tuple is an ordered sequence of items same as list. The only difference is that tuples
are immutable. Tuples once created cannot be modified. Tuples are used to write-protect data
and are usually faster than list as it cannot change dynamically. It is defined within
parentheses () where items are separated by commas.
>>> t = (5,'program', 1+3j)
Script.
pyt = (5,'program', 1+3j)
# t[1] = 'program'
print("t[1] = ", t[1])
# t[0:3] = (5, 'program', (1+3j))
print("t[0:3] = ", t[0:3])
# Generates error
# Tuples are immutable
t[0] = 10
4. Python Strings
Strings || Strings are represented by the immutable str data type which holds a sequence of
Unicode characters. The str data type can be called as a function to create string objects—
with no arguments it returns an empty string, with a nonstring argument it returns the string
form of the argument, and with a string argument it returns a copy of the string. The str()
function can also be used as a conversion function, in which case the first argument should be
a string or something convertable to a string, with up to two optional string arguments being
passed, one specifying the encoding to use and the other specifying how to handle encoding
string literals are created using quotes, and that we are free to use single or double quotes
providing we use the same at both ends. In addition, we can use a triple quoted string—this is
Python-speak for a string that begins and ends with three quote characters.
errors.>>> s = "This is a string"
>>> s = '''a multiline
Python uses newline as its statement terminator, except inside parentheses (()), square
brackets ([]), braces ({}), or triple quoted strings. Newlines can be used without formality in
triple quoted strings, and we can include newlines in any string literal using the \n escape
sequence. Alphabet A
Comparing Strings
Strings support the usual comparison operators <=, ==, !=, >, and >=. These operators
compare strings byte by byte in memory. The first problem is that some Unicode characters
can be represented by two or more different byte sequences. The second problem is that the
sorting of some characters is language-specific. Lower- or uppercasing all the strings
36
compared produces more natural English language ordering. Normalizing is unlikely to be
needed unless the strings are from external sources like files or network sockets.
Like list and tuple, slicing operator [ ] can be used with string. Strings are immutable.
Script.py
a ={5,2,3,1,4}
# printing setvariable
print("a = ", a)
# data type of variable a
print(type(a))
We can perform set operations like union, intersection on two sets. Set have unique
values. They eliminate duplicates. Since, set are unordered collection, indexing has no
meaning. Hence the slicing operator [] does not work. It is generally used when we have a
huge amount of data. Dictionaries are optimized for retrieving data. We must know the key to
retrieve the value. In Python, dictionaries are defined within braces {} with each item being a
pair in the form key: value. Key and value can be of any type.
>>> d = {1:'value','key':2}
>>> type(d)
<class 'dict'>
We use key to retrieve the respective value. But not the other way around.
Script.pyd ={1:'value','key':2}
print(type(d))
print("d[1] = ",d[1]);
print("d['key'] = ", d['key']);
# Generates error
print("d[2] = ",d[2]);
37
For developing GUI programs that must run on any or all Python desktop platforms (e.g.,
Windows, Mac OS X, and Linux), using only a standard Python installation with no
additional libraries, there is just one choice:
When a GUI program is run it normally begins by creating its main window and all of
the main window’s widgets, such as the menu bar, toolbars, the central area, and the status
bar. Once the window has been created, like a server program, the GUI program simply
waits. Whereas a server waits for client programs to connect to it, a GUI program waits for
user interaction such as mouse clicks and key presses. This is illustrated in contrast to console
programs in Figure 15.1. The GUI program does not wait passively; it runs an event loop,
which in pseudocode looks like this:
while True:
event = getNextEvent()
if event:
if event == Terminate:
break
processEvent(event)
Dialog-Style Programs
The first program we will look at is the Interest program. This is a dialog-style
program (i.e., it has no menus), which the user can use to perform compound interest
calculations. The program is shown in Figure. In most object-oriented GUI programs, a
custom class is used to represent a single main window or dialog, with most of the widgets it
contains being instances of standard widgets, such as buttons or checkboxes, supplied by the
library. Like most cross-platform GUI libraries, Tk doesn’t really make a distinction between
a window and a widget—a window is simply a widget that has no widget parent (i.e., it is not
contained inside another widget). Widgets that don’t have a widget parent (windows) are
38
automatically supplied with a frame and window decorations (such as a title bar and close
button), and they usually contain other widgets. Most widgets are created as children of
another widget (and are contained inside their parent), whereas windows are created as
children of the tkinter.Tk From the Library of STEPHEN EISEMAN Dialog-Style Programs.
The Interest program object—an object that conceptually represents the application, and
something we will return to later on. In addition to distinguishing between widgets and
windows (also called top-level widgets), the parent–child relationships help ensure that
widgets are deleted in the right order and that child widgets are automatically deleted when
their parent is deleted.
Control Flow Statements
We mentioned earlier that each statement encountered in a .py file is executed in turn,
starting with the first one and progressing line by line. The flow of control can be diverted by
a function or method call or by a control structure, such as a conditional branch or a loop
statement. Control is also diverted when an exception is raised.
A Boolean expression is anything that can be evaluated to produce a Boolean value
(True or False). In Python, such an expression evaluates to False if it is the predefined
constant False, the special object None, an empty sequence or collection or a numeric data
item of value 0; anything else is considered to be True. When we create our own custom data
types, we can decide for ourselves that they should return in a Boolean context. In Python-
speak a block of code, that is, a sequence of one or more statements, is called a suite. Because
some of Python’s syntax requires that a suite be present, Python provides the keyword pass
which is a statement that does nothing and that can be used where a suite is required but
where no processing is necessary.
Control Structures
Python provides conditional branching with if statements and looping with while and for …in
statements. Python also has a conditional expression—this is a kind of if statement that is
Python’s answer to the ternary operator (?:) used in C-style languagesthis is the general
syntax for Python’s conditional branch statement:
Conditional Branching
if boolean_expression1:
suite1
elif boolean_expression2:
suite2 ...
elif boolean_expressionN:
suiteN
else:
else_suite
There can be zero or more elseif clauses, and the final else clause is optional. If we want to
account for a particular case, but want to do nothing if it occurs, we can use as that branch’s
suite. In some cases, we can reduce an if …else statement down to a single condition. In all
expression. The syntax for a conditional expression is:
39
expression1 if boolean_expression else expression2
If the boolean_expression evaluates to True, the result of the conditional expression is
expression1; otherwise, the result is expression2. One common programming pattern is to set
a variable to a default value, and then change the value if necessary, for example, due to a
request by the user, or to account for the platform on which the program is being run. Here is
the pattern using a conventional if statement:
The parentheses also make things clearer for human readers. Conditional expressions can be
used to improve messages printed for users.
For example, when reporting the number of files processed, instead of printing “0 file(s)”, “1
file(s)”, and similar, we could use a couple of conditional expressions:
print("{0} file{1}".format((count if count != 0 else "no"),
("s" if count != 1 else "")))
This will print “no files”, “1 file”, “2 files”, and similar, which gives a much more
professional impression
Looping
Python provides a while loop and a for … in loop, both of which have a more sophisticated
syntax than the basics.
while boolean_expression:
while_suite
else:
else_suite
The else clause is optional. As long as the boolean_expression is True, the while
block’s suite is executed. If the boolean_expression is or becomes False, the loop terminates,
and if the optional else clause is present, its suite is executed. Inside the while block’s suite, if
a continue statement is executed, control is immediately returned to the top of the loop, and
the boolean_expression is evaluated again. If the loop does not terminate normally, any
optional else clause’s suite is skipped.
for Loops
Like a while loop, the full syntax of the for … in loop also includes an optional else clause:
for expression in iterable:
for_suite
else:
else_suite
The expression is normally either a single variable or a sequence of variables, usually in the
form of a tuple. If a tuple or list is used for the expression, each item is unpacked into the
expression’s items.
Exception Handling
Python indicates errors and exceptional conditions by raising exceptions, although some
third-party Python libraries use more old-fashioned techniques,such as “error” return values.
40
Catching and Raising Exceptions. Exceptions are caught using try … except blocks, whose
general syntax is:
try:
try_suite
except exception_group1 as variable1:
except_suite1 …
except exception_groupN as variableN:
except_suiteN
else:
else_suite
finally:
finally_suite
There must be at least one except block, but both the else and the finally blocks are
optional. The else block’s suite is executed when the try block’s suite has finished
normally—but it is not executed if an exception occurs. If there is a finally block, it is always
executed at the end.
Each except clause’s exception group can be a single exception or a parenthesized
tuple of exceptions. For each group, the as variable part is optional; if used, the variable
contains the exception that occurred, and can be accessed in the exception block’s suite.
If an exception occurs in the try block’s suite, each except clause is tried in turn. If
the exception matches an exception group, the corresponding suite is executed. To match an
exception group, the exception must be of the same type.
For example, if a KeyError exception occurs in a dictionary lookup, the first except
clause that has an Exception class will match since KeyError is an (indirect) subclass of
Exception. If no group lists Exception (as is normally the case), but one did have a
LookupError, the KeyError will match, because KeyError is a subclass of LookupError. And
if no group lists Exception or LookupError, but one does list KeyError, then that group will
match. Figure 4.1 shows an extract from the exception hierarchy.
Custom Exceptions
41
Custom exceptions are custom data types (classes). Since it is easy to create simple custom
exception types, we will show the syntax here:
class exceptionName(baseException): pass
The base class should be Exception or a class that inherits from Exception. One use of
custom exceptions is to break out of deeply nested loops. For example, if we have a table
object that holds records (rows), which hold fields (columns), which have multiple values
(items), we could search for a particular value with code like this:
found = False
for row, record in enumerate(table):
for column, field in enumerate(record):
for index, item in enumerate(field):
if item == target:
found = True
break
if found:
break
if found:
break
if found:
print("found at ({0}, {1}, {2})".format(row, column, index))
else:
print("not found")
Custom Functions
Functions are a means by which we can package up and parameterize functionality. Four
kinds of functions can be created in Python: global functions, local functions, lambda
functions, and methods.
Every function we have created so far has been a global function. Global objects
(including functions) are accessible to any code in the same module in which the object is
created. Global objects can also be accessed from other modules.
Local functions (also called nested functions) are functions that are defined inside other
functions. These functions are visible only to the function where they are defined; they are
especially useful for creating small helper functions that have no use elsewhere.
Lambda functions are expressions, so they can be created at their point of use; however, they
are much more limited than normal functions.
Methods are functions that are associated with a particular data type and can be used only in
conjunction with the data type when we cover object-oriented programming.
Python provides many built-in functions, and the standard library and third-party libraries add
hundreds more (thousands if we count all the methods), so in many cases the function we
want has already been written.
The general syntax for creating a (global or local) function is: def functionName(parameters):
Suite The parameters are optional, and if there is more than one they are written as a
42
sequence of comma-separated identifiers, or as a sequence of identifier value pairs as we will
discuss shortly. For example, here is a function that calculates the area of a triangle using
Heron’s formula:
def heron(a, b, c):
s = (a + b + c) / 2
return math.sqrt(s * (s - a) * (s - b) * (s - c))
Inside the function, each parameter, a, b, and c, is initialized with the corresponding value
that was passed as an argument. When the function is called, we must supply all of the
arguments, for example, heron(3, 4, 5). If we give too few or too many arguments, a Type
Error exception will be raised. When we do a call like this we are said to be using positional
arguments, because each argument passed is set as the value of the parameter in the
corresponding position
Names and Docstring
Using good names for a function and its parameters goes a long way toward making
the purpose and use of the function clear to other programmers—and to ourselves sometime
after we have created the function. Here are a few rules of thumb that you might like to
consider.
• Use a naming scheme, and use it consistently. In this book we use UPPERCASE for
constants, TitleCase TitleCase for classes, camelCase for GUI functions and methods
• For all names, avoid abbreviations, unless they are both standardized and widely used.
• Be proportional with variable and parameter names: x is a perfectly good name for an x-
coordinate and i is fine for a loop counter, but in general the name should be long enough to
be descriptive. The name should describe the data’s meaning rather than its type (e.g.,
amount_due rather than money), unless the use is generic to a particular type
• Functions and methods should have names that say what they do or what they return
(depending on their emphasis), but never how they do it—since that might change
Argument and Parameter Unpacking
We saw in the previous chapter that we can use the sequence unpacking operator (*) to
supply positional arguments. For example, if we wanted to compute the area of a triangle and
had the lengths of the sides in a list, we could make the call like this, heron(sides[0], sides[1],
sides[2]), or simply unpack the list and do the much simpler call, heron(*sides). And if the
list (or other sequence) has more items than the function has parameters, we can use slicing to
extract exactly the right number of arguments Accessing Variables in the Global Scope It is
sometimes convenient to have a few global variables that are accessed by various functions in
the program. This is usually okay for “constants”, but is not a good practice for variables,
although for short one-off programs it isn’t always unreasonable
Lambda Functions
Lambda functions are functions created using the following syntax:
lambda parameters: expression
43
The parameters are optional, and if supplied they are normally just commaseparated variable
names, that is, positional arguments, although the complete argument syntax supported by def
statements can be used. The expression cannot contain branches or loops (although
conditional expressions are allowed), and Generator functions cannot have a return (or yield)
statement. The result of a lambda expression is an anonymous function. When a lambda
function is called it returns the result of computing the expression as its result. If the
expression is a tuple it should be enclosed in parentheses.
Here is a simple lambda function for adding an s (or not) depending on whether
its argument is 1:
s = lambda x: "" if x == 1 else "s"
The lambda expression returns an anonymous function which we assign to the variable s.
Any (callable) variable can be called using parentheses, so given the count of files processed
in some operation we could output a message using the s() function like this:
print("{0} file{1} processed".format(count, s(count))).
Lambda functions are often used as the key function for the built-in sorted() function and for
the list.sort() method. Suppose we have a list of elements as 3-tuples of (group, number,
name), and we wanted to sort this list in various ways. Here is an example of such a list
elements = [(2, 12, "Mg"), (1, 11, "Na"), (1, 3, "Li"), (2, 4, "Be")]
If we sort this list, we get this result:
[(1, 3, 'Li'), (1, 11, 'Na'), (2, 4, 'Be'), (2, 12, 'Mg')]
44
In Python every built-in and library class and every class we create is derived directly or
indirectly from the ultimate base class—object. Figure 6.1 illustrates some of the inheritance
terminology.
Python also supports duck typing “if it walks like a duck and quacks like a duck, it is
a duck”. In other words, if we want to call certain methods on an object, it doesn’t matter
what class the object is, only that it has the methods we want to call. Inheritance is used to
model is-a relationships, that is, where a class’s objects are essentially the same as some other
class’s objects, but with some variations, such as extra data attributes and extra methods.
Another approach is to use aggregation (also called composition)—this is where a
class includes one or more instance variables that are of other classes. Aggregation is used to
model has-a relationships. In Python, every class uses inheritance—because all custom
classes have object as their ultimate base class, and most classes also use aggregation since
most classes have instance variables of various types. In earlier chapters we created custom
classes: custom exceptions. Here are two new syntaxes for creating custom classes:
class className:
suite
class className(base_classes):
suite
Custom Classes
Since the exception subclasses we created did not add any new attributes (no instance
data or methods) we used a suite of pass and since the suite was just one statement, we put it
on the same line as the class statement itself. One of the advantages of object orientation is
that if we have a class, we can specialize it. This means that we make a new class that inherits
all the attributes (data and methods) from the original class, usually so that we can add or
replace methods or add more instance variables. The ability to subclass is one of the great
advantages offered by object-oriented programming since it makes it straightforward to use
an existing class that has tried and tested functionality as the basis for a new class that
extends the original, adding new data attributes or new functionality in a very clean and direct
way. Furthermore, we can pass objects of our new class to functions and methods that were
written for the original class and they will work correctly. We use the term base class to refer
to a class that is inherited; a base class may be the immediate ancestor, or may be further up
the inheritance tree. Another term for base class is super class. We use the term subclass,
derived
45
Attributes and Methods
Let’s start with a very simple class, Point, that holds an (x, y) coordinate. The class is
in file Shape.py, and its complete implementation (excluding docstrings) is show here:
class Point:
def __init__(self, x=0, y=0):
self.x = x
self.y = y
def distance_from_origin(self):
return math.hypot(self.x, self.y)
def __eq__(self, other):
return self.x == other.x and self.y == other.y
def __repr__(self):
return "Point({0.x!r}, {0.y!r})".format(self)
def __str__(self):
return "({0.x!r}, {0.y!r})".format(self)
def edge_distance_from_origin(self):
return abs(self.distance_from_origin()
self.radius)
def area(self):
return math.pi * (self.radius ** 2)
def circumference(self):
return 2 * math.pi * self.radius
def __eq__(self, other):
return self.radius == other.radius and super().__eq__(other)
def __repr__(self):
return "Circle({0.radius!r}, {0.x!r}, {0.y!r})".format(self)
def __str__(self):
return repr(self)
Inheritance is achieved simply by listing the class (or classes) that we want our class to
inherit in the class line. Here we have inherited the Point class—the inheritance hierarchy for
Circle is shown in Figure 6.3.
46
Polymorphism means that any object of a given class can be used as though it were an object
of any of its class’s base classes. This is why when we create a subclass, we need to
implement only the additional methods we require and have to reimplement only those
existing methods we want to replace. And when reimplementing methods, we can use the
base class’s implementation if necessary, by using super () inside the reimplementation.
Custom Functions
Functions are a means by which we can package up and parameterize functionality. Four
kinds of functions can be created in Python: global functions, local functions, lambda
functions, and methods. Every function we have created so far has been a global function.
Global objects (including functions) are accessible to any code in the same module in which
the object is created. Global objects can also be accessed from other modules, as we will see
in the next chapter. Local functions (also called nested functions) are functions that are
defined inside other functions. These functions are visible only to the function where they are
defined; they are especially useful for creating small helper functions that have no use
elsewhere.
Lambda functions are expressions, so they can be created at their point of use; however, they
are much more limited than normal functions. Methods are functions that are associated with
a particular data type and can be used only in conjunction with the data type. Python provides
many built-in functions, and the standard library and third-party libraries add hundreds more
(thousands if we count all the methods), so in many cases the function we want has already
been written.
The general syntax for creating a (global or local) function is:
def functionName(parameters):
suite
The parameters are optional, and if there is more than one they are written as a sequence of
comma-separated identifiers, or as a sequence of identifier=value pairs as we will discuss
shortly. For example, here is a function that calculates the area of a triangle using Heron’s
formula:
47
def heron(a, b, c):
s = (a + b + c) / 2
return math.sqrt(s * (s - a) * (s - b) * (s - c))
Every function in Python returns a value, although it is perfectly acceptable to ignore the
return value. The return value is either a single value or a tuple of values, and the values
returned can be collections, so there are no practical limitations on what we can return. We
can leave a function at any point by using the return statement. If we use return with no
arguments, or if we don’t have a return statement at all, the function will return None. Some
functions have parameters for which there can be a sensible default. For example, here is a
function that counts the letters in a string, defaulting to the ASCII letters:
48