igraph::pagerank
smart_stopwords
to be internal data so that
package doesnt need to be explicitly loaded with library
to
be able to parseidf(d, t) = log( n / df(d, t) )
to
idf(d, t) = log( n / df(d, t) ) + 1
to avoid zeroing out
common word tfidf valueslexRank
and
unnest_sentences
unnest_sentences
and
unnest_sentences_
to parse sentences in a dataframe
following tidy data principlesbind_lexrank
and
bind_lexrank_
to calculate lexrank scores for sentences in
a dataframe following tidy data principles
(unnest_sentences
& bind_lexrank
can be
used on a df in a magrittr pipeline)sentenceSimil
now calculated
using Rcpp. Improves speed by ~25%-30% over old implementation using
proxy
packageAdded logic to avoid naming conflicts in proxy::pr_DB in
sentenceSimil
(#1, @AdamSpannbauer)
Added check and error for cases where no sentences above
threshold in lexRankFromSimil
(#2, @AdamSpannbauer)
tokenize
now has stricter punctuation removal.
Removes all non-alphnumeric characters as opposed to removing
[:punct:]