Apply 'Wordpiece' (<doi:10.48550/arXiv.1609.08144>) tokenization to input text, given an appropriate vocabulary. The 'BERT' (<doi:10.48550/arXiv.1810.04805>) tokenization conventions are used by default.
Version: | 2.1.3 |
Depends: | R (≥ 3.3.0) |
Imports: | dlr (≥ 1.0.0), fastmatch (≥ 1.1), memoise (≥ 2.0.0), piecemaker (≥ 1.0.0), rlang, stringi (≥ 1.0), wordpiece.data (≥ 1.0.2) |
Suggests: | covr, knitr, rmarkdown, testthat (≥ 3.0.0) |
Published: | 2022-03-03 |
DOI: | 10.32614/CRAN.package.wordpiece |
Author: | Jonathan Bratt [aut, cre], Jon Harmon [aut], Bedford Freeman & Worth Pub Grp LLC DBA Macmillan Learning [cph] |
Maintainer: | Jonathan Bratt <jonathan.bratt at macmillan.com> |
BugReports: | https://github.com/macmillancontentscience/wordpiece/issues |
License: | Apache License (≥ 2) |
URL: | https://github.com/macmillancontentscience/wordpiece |
NeedsCompilation: | no |
Materials: | README NEWS |
CRAN checks: | wordpiece results |
Reference manual: | wordpiece.pdf |
Vignettes: |
Using wordpiece |
Package source: | wordpiece_2.1.3.tar.gz |
Windows binaries: | r-devel: wordpiece_2.1.3.zip, r-release: wordpiece_2.1.3.zip, r-oldrel: wordpiece_2.1.3.zip |
macOS binaries: | r-release (arm64): wordpiece_2.1.3.tgz, r-oldrel (arm64): wordpiece_2.1.3.tgz, r-release (x86_64): wordpiece_2.1.3.tgz, r-oldrel (x86_64): wordpiece_2.1.3.tgz |
Old sources: | wordpiece archive |
Reverse suggests: | textrecipes |
Please use the canonical form https://CRAN.R-project.org/package=wordpiece to link to this page.