Skip to content

pear/Text_LanguageDetect

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Text_LanguageDetect

PHP library to identify human languages from text samples. Returns confidence scores for each.

Installation

PEAR

$ pear install Text_LanguageDetect

Composer

$ composer require pear/text_languagedetect

Usage

Also see the examples in the docs/ directory and the official documentation.

Language detection

Simple language detection:

<?php
require_once 'Text/LanguageDetect.php';

$text = 'Was wäre, wenn ich Ihnen das jetzt sagen würde?';

$ld = new Text_LanguageDetect();
$language = $ld->detectSimple($text);

echo $language;
//output: german

Show the three most probable languages with their confidence score:

<?php
require_once 'Text/LanguageDetect.php';

$text = 'Was wäre, wenn ich Ihnen das jetzt sagen würde?';

$ld = new Text_LanguageDetect();
//3 most probable languages
$results = $ld->detect($text, 3);

foreach ($results as $language => $confidence) {
    echo $language . ': ' . number_format($confidence, 2) . "\n";
}

//output:
//german: 0.35
//dutch: 0.25
//swedish: 0.20
?>

Language code

Instead of returning the full language name, ISO 639-2 two and three letter codes can be returned:

<?php
require_once 'Text/LanguageDetect.php';
$ld = new Text_LanguageDetect();

//will output the ISO 639-1 two-letter language code
// "de"
$ld->setNameMode(2);
echo $ld->detectSimple('Das ist ein kleiner Text') . "\n";

//will output the ISO 639-2 three-letter language code
// "deu"
$ld->setNameMode(3);
echo $ld->detectSimple('Das ist ein kleiner Text') . "\n";
?>

Supported languages

  • albanian
  • arabic
  • azeri
  • bengali
  • bulgarian
  • cebuano
  • croatian
  • czech
  • danish
  • dutch
  • english
  • estonian
  • farsi
  • finnish
  • french
  • german
  • hausa
  • hawaiian
  • hindi
  • hungarian
  • icelandic
  • indonesian
  • italian
  • kazakh
  • kyrgyz
  • latin
  • latvian
  • lithuanian
  • macedonian
  • mongolian
  • nepali
  • norwegian
  • pashto
  • pidgin
  • polish
  • portuguese
  • romanian
  • russian
  • serbian
  • slovak
  • slovene
  • somali
  • spanish
  • swahili
  • swedish
  • tagalog
  • turkish
  • ukrainian
  • urdu
  • uzbek
  • vietnamese
  • welsh

Links

Homepage
http://pear.php.net/package/Text_LanguageDetect
Bug tracker
http://pear.php.net/bugs/search.php?cmd=display&package_name[]=Text_LanguageDetect
Documentation
http://pear.php.net/package/Text_LanguageDetect/docs
Unit test status

https://travis-ci.org/pear/Text_LanguageDetect

https://travis-ci.org/pear/Text_LanguageDetect.svg?branch=master

Notes

Where are the data from?

I don't recall where I got the original data set. It's just the frequencies of 3-letter combinations in each supported language. It could be generated from a few random wikipedia pages from each language.

About

PHP library to identify human languages from text samples.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy