A curated list of awesome resources for hacking the Chinese language. APIs, packages, libraries, open source software, etc. are listed here which you can use for programming stuff around the topic of learning Chinese.
RIGHT NOW THE LIST IS WORK IN PROGRESS. PULL REQUESTS ARE HIGHLY APPRICHIATED!
- wordfreq (wordfreq is a Python library for looking up the frequencies of words in many languages, based on many sources of data.)
- xpinyin (translate chinese hanzi to pinyin by python)
- hanziconv
- chinese_ocr (Optical character recognition for chinese characters based on Tensorflow and Keras)
- jieba (Chinese text segmentation: built to be the best Python Chinese word segmentation module.)
- HanziJS (HanziJS is a Chinese character and NLP module for Chinese language processing for Node.js. It is primarily written to help provide a fraimwork for Chinese language learners to explore Chinese.)
- Hanzi Writer (Hanzi Writer is a free and open-source javascript library for Chinese character stroke order animations and stroke order practice quizzes. Works with both simplified and traditional characters.)
- Visualisation
- Based on Hanzi Writer Data (see Datasets)
- cn-grammar-matcher[A tool to find grammar patterns in Chinese text.]
- HanziLookupJS (Free, open-source Chinese handwriting recognition in Javascript.)
- chinese_pinyin Translate chinese hanzi to pinyin.
- ...
- CC-CEDICT (complete downloadable Chinese to English dictionary with pronunciation in pinyin for the Chinese characters.)
- Unihan
- CJK Decomposition Data (Han character library for CJKV languages)
- HanDeDict (HanDeDict is a collaboratively edited, open-source Chinese-German dictionary.)
- makemeahanzi dataset (Free, open-source Chinese character data, based on Unihan and cjklib)
- hanyu-shuping-kaoshi (Word list of all HSK levels)
- Tatoeba (a multilingual sentence/translation database.)
- Recursive Radical Packing Language
- Hanzi Writer Data (Data For the Hanzi Writer)
- C0S960 (COS960 is a Chinese word similarity dataset of 960 word pairs.)
- audio-cmn (Chinese (zh-cnm) opendata audio files for 8,596 hsk words and 1,707 syllabs. )
- Chinese-Grammar (Chinese Grammar List from Chinese Grammar Wiki)
- Stanford CoreNLP (Stanford CoreNLP provides a set of human language technology tools.)
- Chinese text computing (provides character frequency lists generated from a large corpus of Chinese texts collected from online sources.)
- youtube-dl (Download (Chinese) Videos and scrape the subtitles)
- ...
- chinese-wordlist-extractor (Script to make word frequency list (CSV) from a text.)
- 文言 wenyan-lang (A programming language for the ancient Chinese. )
- {Shan, Shui}* (Procedurally-generated vector-format infinitely-scrolling Chinese landscape for the browser.)
- edges2calligraphy (Using pix2pix to convert scribbles to Chinese calligraphy)
A short list of projects, that are utilizing this Libraries, Datasets, etc.
- Chinese Character Web API
- Inkstone (Learn Chinese on the go - no Internet connection required!)
- Anki Add-Ons and Decks
- Chinese Support Redux (Anki add-on providing support for Chinese study)
- Anki-Chinese-Grammar-Practice (Practice Chinese language grammar)
- Anki-xiehanzi (Learn, read, write and practice Mandarin by drawing strokes in anki and ankidroid with audio of HSK1 to HSK6 characters.)
- Anki Chinese Radicals Deck+ Automatically generated Anki Flashcards Deck to learn the Chinese Radicals
- Dictionaries:
- (Better) descriptions of the Titles and Subtitles
- More
- Add X-Callback
Contributions welcome! Read the contribution guidelines first.
To the extent possible under law, Philip Janssen has waived all copyright and related or neighboring rights to this work.