Some minor documentation improvements.
str_trunc()
now correctly truncates strings when side
is "left"
or "center"
(@UchidaMizuki, #512).
stringr functions now consistently implement the tidyverse recycling rules (#372). There are two main changes:
Only vectors of length 1 are recycled. Previously, (e.g.) str_detect(letters, c("x", "y"))
worked, but it now errors.
str_c()
ignores NULLs
, rather than treating them as length 0 vectors.
Additionally, many more arguments now throw errors, rather than warnings, if supplied the wrong type of input.
regex()
and friends now generate class names with stringr_
prefix (#384).
str_detect()
, str_starts()
, str_ends()
and str_subset()
now error when used with either an empty string (""
) or a boundary()
. These operations didn’t really make sense (str_detect(x, "")
returned TRUE
for all non-empty strings) and made it easy to make mistakes when programming.
Many tweaks to the documentation to make it more useful and consistent.
New vignette("from-base")
by @sastoudt provides a comprehensive comparison between base R functions and their stringr equivalents. It’s designed to help you move to stringr if you’re already familiar with base R string functions (#266).
New str_escape()
escapes regular expression metacharacters, providing an alternative to fixed()
if you want to compose a pattern from user supplied strings (#408).
New str_equal()
compares two character vectors using unicode rules, optionally ignoring case (#381).
str_extract()
can now optionally extract a capturing group instead of the complete match (#420).
New str_flatten_comma()
is a special case of str_flatten()
designed for comma separated flattening and can correctly apply the Oxford commas when there are only two elements (#444).
New str_split_1()
is tailored for the special case of splitting up a single string (#409).
New str_split_i()
extract a single piece from a string (#278, @bfgray3).
New str_like()
allows the use of SQL wildcards (#280, @rjpat).
New str_rank()
to complete the set of order/rank/sort functions (#353).
New str_sub_all()
to extract multiple substrings from each string.
New str_unique()
is a wrapper around stri_unique()
and returns unique string values in a character vector (#249, @seasmith).
str_view()
uses ANSI colouring rather than an HTML widget (#370). This works in more places and requires fewer dependencies. It includes a number of other small improvements:
and
pattern` (#407).str_view_all()
redundant (and hence deprecated) (#455).New str_width()
returns the display width of a string (#380).
stringr is now licensed as MIT (#351).
Better error message if you supply a non-string pattern (#378).
A new data source for sentences
has fixed many small errors.
str_extract()
and str_exctract_all()
now work correctly when pattern
is a boundary()
.
str_flatten()
gains a last
argument that optionally override the final separator (#377). It gains a na.rm
argument to remove missing values (since it’s a summary function) (#439).
str_pad()
gains use_width
argument to control whether to use the total code point width or the number of code points as “width” of a string (#190).
str_replace()
and str_replace_all()
can use standard tidyverse formula shorthand for replacement
function (#331).
str_starts()
and str_ends()
now correctly respect regex operator precedence (@carlganz).
str_wrap()
breaks only at whitespace by default; set whitespace_only = FALSE
to return to the previous behaviour (#335, @rjpat).
word()
now returns all the sentence when using a negative start
parameter that is greater or equal than the number of words. (@pdelboca, #245)
Hot patch release to resolve R CMD check failures.
str_interp()
now renders lists consistently independent on the presence of additional placeholders (@amhrasmussen).
New str_starts()
and str_ends()
functions to detect patterns at the beginning or end of strings (@jonthegeek, #258).
str_subset()
, str_detect()
, and str_which()
get negate
argument, which is useful when you want the elements that do NOT match (#259, @yutannihilation).
New str_to_sentence()
function to capitalize with sentence case (@jonthegeek, #202).
str_replace_all()
with a named vector now respects modifier functions (#207)
str_trunc()
is once again vectorised correctly (#203, @austin3dickey).
str_view()
handles NA
values more gracefully (#217). I’ve also tweaked the sizing policy so hopefully it should work better in notebooks, while preserving the existing behaviour in knit documents (#232).
Error : object ‘ignore.case’ is not exported by 'namespace:stringr'
. This is because the long deprecated str_join()
, ignore.case()
and perl()
have now been removed.str_glue()
and str_glue_data()
provide convenient wrappers around glue
and glue_data()
from the glue package (#157).
str_flatten()
is a wrapper around stri_flatten()
and clearly conveys flattening a character vector into a single string (#186).
str_remove()
and str_remove_all()
functions. These wrap str_replace()
and str_replace_all()
to remove patterns from strings. (@Shians, #178)
str_squish()
removes spaces from both the left and right side of strings, and also converts multiple space (or space-like characters) to a single space within strings (@stephlocke, #197).
str_sub()
gains omit_na
argument for ignoring NA
. Accordingly, str_replace()
now ignores NA
s and keeps the original strings. (@yutannihilation, #164)
str_trunc()
now preserves NAs (@ClaytonJY, #162)
str_trunc()
now throws an error when width
is shorter than ellipsis
(@ClaytonJY, #163).
Long deprecated str_join()
, ignore.case()
and perl()
have now been removed.
str_match_all()
now returns NA if an optional group doesn’t match (previously it returned ""). This is more consistent with str_match()
and other match failures (#134).In str_replace()
, replacement
can now be a function that is called once for each match and whose return value is used to replace the match.
New str_which()
mimics grep()
(#129).
A new vignette (vignette("regular-expressions")
) describes the details of the regular expressions supported by stringr. The main vignette (vignette("stringr")
) has been updated to give a high-level overview of the package.
str_order()
and str_sort()
gain explicit numeric
argument for sorting mixed numbers and strings.
str_replace_all()
now throws an error if replacement
is not a character vector. If replacement
is NA_character_
it replaces the complete string with replaces with NA
(#124).
All functions that take a locale (e.g. str_to_lower()
and str_sort()
) default to “en” (English) to ensure that the default is consistent across platforms.
Add sample datasets: fruit
, words
and sentences
.
fixed()
, regex()
, and coll()
now throw an error if you use them with anything other than a plain string (#60). I’ve clarified that the replacement for perl()
is regex()
not regexp()
(#61). boundary()
has improved defaults when splitting on non-word boundaries (#58, @lmullen).
str_detect()
now can detect boundaries (by checking for a str_count()
> 0) (#120). str_subset()
works similarly.
str_extract()
and str_extract_all()
now work with boundary()
. This is particularly useful if you want to extract logical constructs like words or sentences. str_extract_all()
respects the simplify
argument when used with fixed()
matches.
str_subset()
now respects custom options for fixed()
patterns (#79, @gagolews).
str_replace()
and str_replace_all()
now behave correctly when a replacement string contains $
s, \\\\1
, etc. (#83, #99).
str_split()
gains a simplify
argument to match str_extract_all()
etc.
str_view()
and str_view_all()
create HTML widgets that display regular expression matches (#96).
word()
returns NA
for indexes greater than number of words (#112).
stringr is now powered by stringi instead of base R regular expressions. This improves unicode and support, and makes most operations considerably faster. If you find stringr inadequate for your string processing needs, I highly recommend looking at stringi in more detail.
stringr gains a vignette, currently a straight forward update of the article that appeared in the R Journal.
str_c()
now returns a zero length vector if any of its inputs are zero length vectors. This is consistent with all other functions, and standard R recycling rules. Similarly, using str_c("x", NA)
now yields NA
. If you want "xNA"
, use str_replace_na()
on the inputs.
str_replace_all()
gains a convenient syntax for applying multiple pairs of pattern and replacement to the same vector:
str_match()
now returns NA if an optional group doesn’t match (previously it returned ""). This is more consistent with str_extract()
and other match failures.
New str_subset()
keeps values that match a pattern. It’s a convenient wrapper for x[str_detect(x)]
(#21, @jiho).
New str_order()
and str_sort()
allow you to sort and order strings in a specified locale.
New str_conv()
to convert strings from specified encoding to UTF-8.
New modifier boundary()
allows you to count, locate and split by character, word, line and sentence boundaries.
The documentation got a lot of love, and very similar functions ( e.g. first and all variants) are now documented together. This should hopefully make it easier to locate the function you need.
ignore.case(x)
has been deprecated in favour of fixed|regex|coll(x, ignore.case = TRUE)
, perl(x)
has been deprecated in favour of regex(x)
.
str_join()
is deprecated, please use str_c()
instead.
fixed path in str_wrap
example so works for more R installations.
remove dependency on plyr
Zero input to str_split_fixed
returns 0 row matrix with n
columns
Export str_join
new modifier perl
that switches to Perl regular expressions
str_match
now uses new base function regmatches
to extract matches - this should hopefully be faster than my previous pure R algorithm
new str_wrap
function which gives strwrap
output in a more convenient format
new word
function extract words from a string given user defined separator (thanks to suggestion by David Cooper)
str_locate
now returns consistent type when matching empty string (thanks to Stavros Macrakis)
new str_count
counts number of matches in a string.
str_pad
and str_trim
receive performance tweaks - for large vectors this should give at least a two order of magnitude speed up
str_length returns NA for invalid multibyte strings
fix small bug in internal recyclable
function