is an open source software for geotagging/geoparsing
Japanese natural language text to extract place names.
More detailed Japanese documentation and API references are available in the /docs/source directory. You can also find the latest online documentation at PyGeoNLP Reference.
Import pygeonlp.api
and initialize it by specifying the directory
where the place-name database is placed.
>>> import pygeonlp.api as api
>>> api.init(db_dir='mydic')
Then, run geoparse("text to parse")
>>> result = api.geoparse("国立情報学研究所は千代田区にあります。")
The result is a list of dict objects, with POS/Spatial attributes assigned to each word.
A GeoJSON representation is obtained by JSON-encoding each dict object.
>>> import json
>>> print(json.dumps(result, indent=2, ensure_ascii=False))
"type": "Feature",
"geometry": null,
"properties": {
"surface": "国立",
"node_type": "NORMAL",
"morphemes": {
"conjugated_form": "名詞-固有名詞-地名語",
"conjugation_type": "*",
"original_form": "国立",
"pos": "名詞",
"prononciation": "コクリツ",
"subclass1": "固有名詞",
"subclass2": "地名修飾語",
"subclass3": "*",
"surface": "国立",
"yomi": "コクリツ"
}, ...
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [
"properties": {
"surface": "千代田区",
"node_type": "GEOWORD",
"morphemes": {
"conjugated_form": "*",
"conjugation_type": "*",
"original_form": "千代田区",
"pos": "名詞",
"prononciation": "",
"subclass1": "固有名詞",
"subclass2": "地名語",
"subclass3": "WWIY7G:千代田区",
"surface": "千代田区",
"yomi": ""
"geoword_properties": {
"address": "東京都千代田区",
"body": "千代田",
"body_variants": "千代田",
"code": {},
"countyname": "",
"countyname_variants": "",
"dictionary_id": 1,
"entry_id": "13101A1968",
"geolod_id": "WWIY7G",
"hypernym": [
"latitude": "35.69400300",
"longitude": "139.75363400",
"ne_class": "市区町村",
"prefname": "東京都",
"prefname_variants": "東京都",
"source": "1/千代田区役所/千代田区九段南1-2-1/P34-14_13.xml",
"suffix": [
"valid_from": "",
"valid_to": "",
"dictionary_identifier": "geonlp:geoshape-city"
"type": "Feature",
"geometry": null,
"properties": {
"surface": "に",
"node_type": "NORMAL",
"morphemes": {
"conjugated_form": "*",
"conjugation_type": "*",
"original_form": "に",
"pos": "助詞",
"prononciation": "ニ",
"subclass1": "格助詞",
"subclass2": "一般",
"subclass3": "*",
"surface": "に",
"yomi": "ニ"
requires MeCab C++ library and UTF8 dictionary for Japanese morphological analysis.
Also, the C++ implementation part depends on Boost C++.
$ sudo apt install libmecab-dev mecab-ipadic-utf8 libboost-all-dev
The pygeonlp package can be installed with the pip
It is recommended that you upgrade pip and setuptools to
the latest versions before running it.
$ pip install --upgrade pip setuptools
$ pip install pygeonlp
The database needs to be prepared the first time.
Prepare the database
Execute the command to register the basic place name word analysis dictionaries
, *.csv
) in this package into the database under mydic/
>>> import pygeonlp.api as api
>>> api.setup_basic_database(db_dir='mydic/')
This command registers three dictionaries:
"Prefectures of Japan" (
), -
"Historical Administrative Area Data Set Beta Dictionary of Place Names" (
) -
"Railroad Stations in Japan (2019)" (
If the GDAL library is installed,
can use "spatial distance" for disambiguation
when there are multiple place names with the same name, thus improving accuracy.
You can also use spatial filters.
$ sudo apt install libgdal-dev
$ pip install gdal
can use address-geocoding if an address-dictionary for
jageocoder is installed.
See the jageocoder documentation for installation instructions.
Run the unit tests with python test
Use pip
command to uninstall.
$ pip uninstall pygeonlp
When you register a place-name word analysis dictionary to the database, it will create a sqlite3 database and some other files in the specified directory.
If you want to delete them, just delete the whole directory.
$ rm -r mydic/
This software is supported by DIAS (Data Integration and Analysis System) and ROIS-DS CODH (Center for Open Data in the Humanities).
It was also supported by JST (Japan Science and Technology Agency) PRESTO program.