LINTO-PUNCTUATION

LinTO-Punctuation is a LinTO service for punctuation prediction. It predicts punctuation from raw text or raw transcription.

LinTO-Punctuation can either be used as a standalone punctuation service or deployed as a micro-service.

Table of content

Pre-requisites

Models

The punctuation service relies on a trained recasing and punctuation prediction model.

Some models trained on Common Crawl are available on recasepunc for the following the languages:

French
- fr-txt.large.19000
- fr.22000
English
- en.23000
Italian
- it.22000
Chinese
- zh.24000

Docker

The punctuation service requires docker up and running.

For GPU capabilities, it is also needed to install nvidia-container-toolkit.

(micro-service) Service broker

The punctuation only entry point in job mode are tasks posted on a REDIS message broker using Celery.

Deploy

linto-punctuation can be deployed two different ways:

As a standalone punctuation service through an HTTP API.
As a micro-service connected to a task queue.

1- First step is to build the image:

git clone https://github.com/linto-ai/linto-punctuation.git
cd linto-punctuation
docker build . -t linto-punctuation:latest

or

docker pull registry.linto.ai/lintoai/linto-punctuation:latest

2- Download the models

Have the punctuation model ready at <MODEL_PATH>.

HTTP

1- Fill the .env

cp .env_default .env

Fill the .env with your values.

Parameters:

Variables	Description	Example
SERVICE_NAME	The service's name	my_punctuation_service
CONCURRENCY	Number of worker	> 1

2- Run with docker

docker run --rm \
-v <MODEL_PATH>:/usr/src/app/model-store/model \
-p HOST_SERVING_PORT:80 \
--env-file .env \
linto-punctuation:latest

Also add --gpus all as an option to enable GPU capabilities.

This will run a container providing an http API binded on the host HOST_SERVING_PORT port.

Micro-service

LinTO-Punctuation can be deployed as a microservice. Used this way, the container spawn celery workers waiting for punctuation tasks on a dedicated task queue. LinTO-Punctuation in task mode requires a configured REDIS broker.

You need a message broker up and running at MY_SERVICE_BROKER. Instance are typically deployed as services in a docker swarm using the docker compose command:

1- Fill the .env

cp .env_default .env

Fill the .env with your values.

Parameters:

Variables	Description	Example
SERVICES_BROKER	Service broker uri	redis://my_redis_broker:6379
BROKER_PASS	Service broker password (Leave empty if there is no password)	my_password
QUEUE_NAME	(Optionnal) overide the generated queue's name (See Queue name bellow)	my_queue
SERVICE_NAME	Service's name	punctuation-ml
LANGUAGE	Language code as a BCP-47 code	en-US or * or languages separated by "\|"
MODEL_INFO	Human readable description of the model	"Bert based model for french punctuation prediction"
CONCURRENCY	Number of worker (1 worker = 1 cpu)	>1

Do not use spaces or character "_" for SERVICE_NAME or language.

2- Fill the docker-compose.yml

#docker-compose.yml

version: '3.7'

services:
  punctuation-service:
    image: linto-punctuation:latest
    volumes:
      - /my/path/to/models/punctuation.mar:/usr/src/app/model-store/model
    env_file: .env
    deploy:
      replicas: 1
    networks:
      - your-net

networks:
  your-net:
    external: true

3- Run with docker compose

docker stack deploy --resolve-image always --compose-file docker-compose.yml your_stack

Queue name:

By default the service queue name is generated using SERVICE_NAME and LANGUAGE: punctuation_{LANGUAGE}_{SERVICE_NAME}.

The queue name can be overided using the QUEUE_NAME env variable.

Service discovery:

As a micro-service, the instance will register itself in the service registry for discovery. The service information are stored as a JSON object in redis's db0 under the id service:{HOST_NAME}.

The following information are registered:

{
  "service_name": $SERVICE_NAME,
  "host_name": $HOST_NAME,
  "service_type": "punctuation",
  "service_language": $LANGUAGE,
  "queue_name": $QUEUE_NAME,
  "version": "1.2.0", # This repository's version
  "info": "Punctuation model for french punctuation prediction",
  "last_alive": 65478213,
  "concurrency": 1
}

Usages

HTTP API

/healthcheck

Returns the state of the API

Method: GET

Returns "1" if healthcheck passes.

/punctuation

Punctuation API

Method: POST
Response content: text/plain or application/json
Body: A json object structured as follows:

{
  "sentences": [
    "this is sentence 1", "is that a second sentence", "yet an other sentence"
  ]
}

Return the punctuated text as a json object structured as follows:

{
  "punctuated_sentences": [
    "This is sentence 1",
    "Is that a second sentence ?",
    "Yet an other sentence"
  ]
}

/docs

The /docs route offers a OpenAPI/swagger interface.

Using Celery

Punctuation-Worker accepts celery tasks with the following arguments: text: Union[str, List[str]]

text: (str or list) A sentence or a list of sentences.

Return format

Returns a string or a list of string depending on the input parameter.

Test

Curl

You can test you http API using curl:

curl -X POST "http://YOUR_SERVICE:YOUR_PORT/punctuation" -H  "accept: application/json" -H  "Content-Type: application/json" -d "{  \"sentences\": [    \"this is sentence 1\", \"is that a second sentence\", \"yet an other sentence\"  ]}"

License

This project is developped under the AGPLv3 License (see LICENSE).

Acknowledgments

recasepunc Python library to train recasing and punctuation models, and to apply them (License BSD 3).

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.github/workflows		.github/workflows
celery_app		celery_app
document		document
http_server		http_server
punctuation		punctuation
.env_default		.env_default
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile.cpu		Dockerfile.cpu
Jenkinsfile		Jenkinsfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
RELEASE.md		RELEASE.md
config.properties		config.properties
docker-compose.yml		docker-compose.yml
docker-entrypoint.sh		docker-entrypoint.sh
healthcheck.sh		healthcheck.sh
requirements.cpu.txt		requirements.cpu.txt
requirements.txt		requirements.txt
wait-for-it.sh		wait-for-it.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LINTO-PUNCTUATION

Table of content

Pre-requisites

Models

Docker

(micro-service) Service broker

Deploy

HTTP

Micro-service

Usages

HTTP API

/healthcheck

/punctuation

/docs

Using Celery

Return format

Test

Curl

License

Acknowledgments

About

Releases

Packages

Contributors 3

Languages

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

License

linto-ai/linto-punctuation

Folders and files

Latest commit

History

Repository files navigation

LINTO-PUNCTUATION

Table of content

Pre-requisites

Models

Docker

(micro-service) Service broker

Deploy

HTTP

Micro-service

Usages

HTTP API

/healthcheck

/punctuation

/docs

Using Celery

Return format

Test

Curl

License

Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Packages