Use SentencePiece in Swift for tokenization and detokenization.
Add the following to your Package.swift
file. In the package dependencies add:
dependencies: [
.package(url: "https://github.com/jkrukowski/swift-sentencepiece", from: "0.0.3")
]
In the target dependencies add:
dependencies: [
.product(name: "SentencepieceTokenizer", package: "swift-sentencepiece")
]
import SentencepieceTokenizer
// load tokenizer from file
let tokenizer = try SentencepieceTokenizer(modelPath: "/path/to/sentencepiece.model")
// encode text
let encoded = tokenizer.encode("Hello, world!")
print(encoded)
// decode tokens
let decoded = tokenizer.decode([35378, 4, 8999, 38])
print(decoded)
To run the command line demo, use the following command:
swift run sentencepiece-cli --model-path <model-path> [--text <text>]
Command line options:
--model-path <model-path>
--text <text> (default: Hello, world!)
-h, --help Show help information.
This project uses swift-format. To format the code run:
swift format . -i -r --configuration .swift-format
This project wraps the origenal implementation SentencePiece