Interfaces with the 'Hugging Face' tokenizers library to provide implementations of today's most used tokenizers such as the 'Byte-Pair Encoding' algorithm <https://huggingface.co/docs/tokenizers/index>. It's extremely fast for both training new vocabularies and tokenizing texts.
Version: | 0.1.4 |
Depends: | R (≥ 4.2.0) |
Imports: | R6, cli |
Suggests: | rmarkdown, testthat (≥ 3.0.0), hfhub (≥ 0.1.1), withr |
Published: | 2024-09-04 |
DOI: | 10.32614/CRAN.package.tok |
Author: | Daniel Falbel [aut, cre],
Posit [cph] tok author details |
Maintainer: | Daniel Falbel <daniel at posit.co> |
BugReports: | https://github.com/mlverse/tok/issues |
License: | MIT + file LICENSE |
URL: | https://github.com/mlverse/tok |
NeedsCompilation: | yes |
SystemRequirements: | Rust tool chain w/ cargo, libclang/llvm-config |
Materials: | README NEWS |
CRAN checks: | tok results [issues need fixing before 2024-10-11] |
Reference manual: | tok.pdf |
Package source: | tok_0.1.4.tar.gz |
Windows binaries: | r-devel: tok_0.1.4.zip, r-release: tok_0.1.4.zip, r-oldrel: tok_0.1.4.zip |
macOS binaries: | r-release (arm64): tok_0.1.4.tgz, r-oldrel (arm64): tok_0.1.4.tgz, r-release (x86_64): tok_0.1.4.tgz, r-oldrel (x86_64): tok_0.1.4.tgz |
Old sources: | tok archive |
Reverse imports: | sacRebleu |
Please use the canonical form https://CRAN.R-project.org/package=tok to link to this page.