Project Report
Project Report
On
IMD Scraper
Submitted By:-
ASHISH ATTRI
Regn. No. – ST19168
B.Tech (ECE) - 4th year
HMRITM, GGSIPU
I am extremely grateful and remain indebted to our project coordinator Mr. Sankar
Nath, Scientist ‘E’, for being a source of inspiration and for his constant support in the
design, implementation and evaluation of this project. I am thankful to him for his
constant constructive criticism and invaluable suggestions, which benefitted me a lot
while developing the project report. Through his column, it would be my utmost
pleasure to express gratitude to him for his motivation for hard work, his
encouragement, co-operation and consent without which I would have failed to
accomplish this project.
I thank all the lab staffs, who are directly or indirectly instrumental in enabling me to
stay committed to the project.
I would also like to thank Dr. Shyam Lal Singh, Scientist ‘F’ (ISSD) for his constant
support and motivation. Last but not the least, I would like to thank the
‘Telecommunication Department’ to provide me this golden opportunity to work for the
meteorological department which will certainly help me in career building skills.
CONTENTS
I. IMD’s Mandate
II. Telecommunication in IMD
III. Objective
IV. Libraries Used
V. Program Code
VI. Result
VII. References
IMD's MANDATE
To warn against severe other phenomena like tropical cyclones, or others, dust storms,
heavy rains and snow, cold and heat waves, etc., which cause destruction of life and
property.
Background
Organization
The Directorate of Tele-communication was set up in IMD at New Delhi in 1969 to cater
the needs of National Meteorological Service and strengthen the Meteorological
Telecommunication in India. Since its inception, IMD maintains an extensive
telecommunication network for speedy collection of meteorological information, both
basic data and processed products, over the globe. The main telecommunication hub of
the IMD’s telecommunication network is located at Head Quarter, New Delhi and it is
known internationally as Regional Telecommunication Hub (RTH) under the aegis of
World Meteorological organization.
GTS Network
RTH New Delhi is directly connected with WMC Moscow, RTH Tokyo and RTH Cairo,
RTH Beijing, RTH Toulouse, RTH Jeddah and WMC Melbourne located on the MTN;
RTHs Bangkok and Tehran and NMCs Colombo, Dhaka, Karachi, Kathmandu, Male,
Muscat and Yangon in the RMTNs.
Collection and dissemination of meteorological data and products within its area
of responsibility
Exchange of such data/products with other RTHs
Collect the bulletins from their associated NMCs viz; Colombo, Dhaka, Karachi,
Kathmandu, Male, and Yangon and transmit them in the appropriate form on the
Main Telecommunication Network, directly.
Transmit on the Main Telecommunication Net-work directly as internationally
agreed and in the appropriate form, the processed meteorological information
produced by the RSMC, New Delhi.
Relaying selectively on the circuits of the Main Telecommunication Network, as
agreed, the bulletins which it receives from these circuits and/or from RTHs not
situated on the Main Telecommunication Network.
Ensure the selective distribution of bulletins to the associated NMCs and to the
RTHs not situated on the Main Telecommunication Net-work which it serves;
Before relaying message issued from its zones of responsibility (as an RTH
located on the MTN) on the GTS, checking the parts related to the
telecommunication of the message in order to maintain standard
telecommunication procedures;
Establish data dissemination systems (terrestrial and/or via satellite) as required in
accordance with regional plans;
Carry out the monitoring of the operation of the GTS of the WWW;
Maintain the Catalogue of Meteorological Bulletins as regards to bulletins issued
from the zone for which it is responsible i.e. Bangladesh, Bhutan, India,
Maldives, Myanmar, Nepal, Pakistan and Sri Lanka, for the collection, exchange
and distribution of data, and also including data from the Antarctica, as
appropriate.
Collect observational data from its own territory and other members according to
bilateral agreements, as Ill as observational data from aircraft and ships received
by centers located within the area of responsibility.
Compile such data into bulletins and transmitting them on the GTS, in
compliance with standard telecommunications procedures.
Receive and distribute in accordance with bilateral agreements, observational
data and processed meteorological information, to meet the requirements of the
Members concerned.
The WMO Information System (WIS) is a project designed for regional and global
connectivity to collect and distribute the information meant for routine global
dissemination, while serving as collection and distribution centers in their areas of
responsibilities; providing entry points, through unified portals and comprehensive
metadata catalogues, for any request for data held within the WIS.
WIS consists of two parallel parts: GTS and DAR (Discovery, Access and Retrieval). WMO
continues its efforts to enhance and improve of GTS, while new
DAR functionality is integrated into all WMO and related international programmes.
At the center of DAR is a catalogue of the entire WIS. GISCs collect all metadata from
WIS Centers in their areas of responsibility, and exchange metadata sets with one
another.
Organization of WIS-
· Services relating to data intended for global distribution (known as a GISC Cache)
NCs (National Centers) can be established in each WMO member state. They are
responsible for collecting national observation data and submitting them to the WIS
network. NCs are also responsible for domestic data distribution networks and in-
country authorization of WIS users.
In addition to above services, the ISSD also providing the Local area network at various
Divisions of IMD HQ for collection and processing of Meteorological
information/products. It links DGM building (Mausam Bhawan), Satellite Meteorological
Division Building and DDGM (UI) Building under ring using L3 switches and optical fiber
link for backbone support. Main internet links provided by ISP (Internet Service
Providers) are terminated at ISS Division in Mausam Bhawan and others are connected
from DGM building through optical fiber link. Seismology Division is located in SatMet.
Building and Earthquake Risk EvaluationCentre (EREC) is located in Annex building.
During bad weather/cyclone period, demand for satellite and Radar images/products
and during earthquake/Tsunami activity, the demand for their reports also gets
increased. Moreover, Satellite and Radar images products are also required by our
forecasters/modelers as well as Cyclone Warning Division located in Mausam Bhawan.
As such, the operational activities are dependent on performance of LAN and the up
time of LAN has to be kept very high.
Objective-
Why we need?
IMD is responsible for forecasting weather and climate conditions occurring in whole
India. The notifications about weather and climate are released on the IMD website
(www.imd.gov.in). But, what if someone wants to get daily updates on weather and
climatic conditions and that individual is not able to visit the site somehow. If any
update is released from IMD and an individual has to keep visiting the IMD’s website
every time whether the update is released or not.
So it becomes necessary that information about climatic conditions must be sent to the
individual. But how?
The information which needs to be sent to the readers must be desired information. To
achieve this type of problem, there is need of a script or program which can send
notifications to readers via email.This project script is written in python.
Web scraping is a process of automating the extraction of data in an efficient and fast
way. With the help of web scraping, you can extract data from any website, no matter
how large is the data, on your computer.
Moreover, websites may have data that you cannot copy and paste. Web scraping can
help you extract any kind of data that you want.
That’s not enough. Let’s say, you copy and paste some data but how to convert or save
it in a format of your choice?
Web scraping a web page involves fetching it and extracting from it. Fetching is the
downloading of a page (which a browser does when you view the page). Therefore, web
crawling is a main component of web scraping, to fetch pages for later processing. Once
fetched, then extraction can take place. The content of a page may be parsed, searched,
reformatted, its data copied into a spreadsheet, and so on. Web scrapers typically take
something out of a page, to make use of it for another purpose somewhere else. An
example would be to find and copy names and phone numbers, or companies and their
urls, to a list (contact scraping).
Description -
‘main_file.py’ file is the script written in python. This file is responsible for requesting
the website, scraping the website, fetching the title and link of the post/article
released on the website and sending it as an e-mail to the email-id’s of the readers.
‘download.jpg’ is the image of the IMD logo present in the same folder.
‘maillist.txt’ is the text file in which e-mail addresses of readers are added for mailing
them. If a new email address is to be add, just append new e-mail address in the new
line and save the file.
If any e-mail address is to be add in file, it should be in proper format.
Example: -email_address@gmail.com
It ensures that the email is sent successfully to the email address provided in the text
file.
Libraries Used-
I. Beautiful Soup –
Beautiful Soup is a popular module that parses a web page and then provides a
convenient interface to navigate content .Beautiful Soup is a Python package for parsing
HTML and XML documents (including having malformed markup, i.e. Non-closed tags,
so named after tag soup). It creates a parse tree for parsed pages that can be used to
extract data from HTML, which is useful for web scraping. It works with your favorite
parser to provide idiomatic ways of navigating, searching, and modifying the parse tree.
It commonly saves programmers hours or days of work.
Types of Requests –There are mainly six types of request are possible with the help of
the requests module.
1. Get() request
2. Post() request
3. Put() request
4. Delete() request
5. Head() request
6. Options() request
III. SMTPLIB –
Python comes with the built-in smtplib module for sending emails using the Simple Mail
Transfer Protocol (SMTP). smtplib uses the RFC 821 protocol for SMTP.
When you send emails through Python, you should make sure that your SMTP
connection is encrypted, so that your message and login credentials are not easily
accessed by others. SSL (Secure Sockets Layer) and TLS (Transport Layer Security) are
two protocols that can be used to encrypt an SMTP connection. It’s not necessary to use
either of these when using a local debugging server.
There are two ways to start a secure connection with your email server:
Start an SMTP connection that is secured from the beginning using SMTP_SSL().
Start an unsecured SMTP connection that can then be encrypted using .starttls().
Keep in mind that Gmail requires that you connect to port 465 if using SMTP_SSL(), and
to port 587 when using .starttls().
The code example below creates a secure connection with Gmail’s SMTP server, using
the SMTP_SSL() of smtplib to initiate a TLS-encrypted connection. The default context of
SSL validates the host name and its certificates and optimizes the security of the
connection. Make sure to fill in your own email address instead of my@gmail.com:
Using with smtplib.SMTP_SSL() as server: makes sure that the connection isautomatically
closed at the end of the indented code block. If port is zero, or not specified,
.SMTP_SSL() will use the standard port for SMTP over SSL (port 465).
It’s not safe practice to store your email password in your code, especially if you intend
to share it with others. Instead, use input() to let the user type in their password when
running the script, as in the example above. If you don’t want your password to show on
your screen when you type it, you can import the getpass module and use .getpass()
instead for blind input of your password.
Option 2: Using .starttls()
Instead of using .SMTP_SSL() to create a connection that is secure from the outset, we
can create an unsecured SMTP connection and encrypt it using .starttls().
The code snippet below uses the construction server = SMTP(), rather than the format
with SMTP() as server: which we used in the previous example. To make sure that your
code doesn’t crash when something goes wrong, put your main code in a try block, and
let an except block print any error messages to stdout:
IV. EMAIL Mime –
Python’s built-in email package allows you to structure more fancy emails, which can
then be transferred with smptlib as you have done already. Below, you’ll learn how use
the email package to send emails with HTML content and attachments.
If you want to format the text in your email (bold, italics, and so on), or if you want to
add any images, hyperlinks, or responsive content, then HTML comes in very handy.
Today’s most common type of email is the MIME (Multipurpose Internet Mail
Extensions) Multipart email, combining HTML and plain-text. MIME messages are
handled by Python’s email.mime module. For a detailed description, check the
documentation.
As not all email clients display HTML content by default, and some people choose only
to receive plain-text emails for security reasons, it is important to include a plain-text
alternative for HTML messages. As the email client will render the last multipart
attachment first, make sure to add the HTML message after the plain-text version.
In the example below, our MIMEText() objects will contain the HTML and plain-text
versions of our message, and the MIMEMultipart("alternative") instance combines these
into a single message with two alternative rendering options:
In this example, you first define the plain-text and HTML message as string literals, and
then store them as plain/htmlMIMEText objects. These can then be added in this order
to the MIMEMultipart("alternative") message and sent through your secure connection
with the email server. Remember to add the HTML message after the plain-text
alternative, as email clients will try to render the last subpart first.
Program Code-