data.table::merge()
wrapper
library(joyn)
library(data.table)
x1 = data.table(id = c(1L, 1L, 2L, 3L, NA_integer_),
t = c(1L, 2L, 1L, 2L, NA_integer_),
x = 11:15)
y1 = data.table(id = c(1,2, 4),
y = c(11L, 15L, 16))
x2 = data.table(id1 = c(1, 1, 2, 3, 3),
id2 = c(1, 1, 2, 3, 4),
t = c(1L, 2L, 1L, 2L, NA_integer_),
x = c(16, 12, NA, NA, 15))
y2 = data.table(id = c(1, 2, 5, 6, 3),
id2 = c(1, 1, 2, 3, 4),
y = c(11L, 15L, 20L, 13L, 10L),
x = c(16:20))
This vignette describes the use of the joyn
merge()
function.
π joyn::merge
resembles the usability of
base::merge
and data.table::merge
, while also
incorporating the additional features that characterize
joyn
. In fact, joyn::merge
masks the other
two.
Suppose you want to merge x1
and y1
. First
notice that while base::merge
is principally for data
frames, joyn::merge
coerces x
and
y
to data tables if they are not already.
By default, merge
will join by the shared column name(s)
in x
and y
.
# Example not specifying the key
merge(x = x1,
y = y1)
#>
#> ββ JOYn Report ββ
#>
#> .joyn n percent
#> 1 x 2 66.7%
#> 2 y 1 33.3%
#> 3 total 3 100%
#> ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ End of JOYn report ββ
#> βΉ Note: Joyn's report available in variable .joyn
#> βΉ Note: Removing key variables id from id and y
#> β Warning: The keys supplied uniquely identify y, therefore a m:1 join is
#> executed
#> id t x y .joyn
#> <num> <int> <int> <num> <fctr>
#> 1: 1 1 11 11 x & y
#> 2: 1 2 12 11 x & y
#> 3: 2 1 13 15 x & y
# Example specifying the key
merge(x = x1,
y = y1,
by = "id")
#>
#> ββ JOYn Report ββ
#>
#> .joyn n percent
#> 1 x 2 66.7%
#> 2 y 1 33.3%
#> 3 total 3 100%
#> ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ End of JOYn report ββ
#> βΉ Note: Joyn's report available in variable .joyn
#> βΉ Note: Removing key variables id from id and y
#> β Warning: The keys supplied uniquely identify y, therefore a m:1 join is
#> executed
#> id t x y .joyn
#> <num> <int> <int> <num> <fctr>
#> 1: 1 1 11 11 x & y
#> 2: 1 2 12 11 x & y
#> 3: 2 1 13 15 x & y
As usual, if the columns you want to join by donβt have the same
name, you need to tell merge which columns you want to join
by:Β by.x
Β for the x data frame column name,
andΒ by.y
Β for the y one. For example,
df1 <- data.frame(id = c(1L, 1L, 2L, 3L, NA_integer_, NA_integer_),
t = c(1L, 2L, 1L, 2L, NA_integer_, 4L),
x = 11:16)
df2 <- data.frame(id = c(1,2, 4, NA_integer_, 8),
y = c(11L, 15L, 16, 17L, 18L),
t = c(13:17))
merge(x = df1,
y = df2,
by.x = "x",
by.y = "y")
#>
#> ββ JOYn Report ββ
#>
#> .joyn n percent
#> 1 x 3 100%
#> 2 y 2 66.7%
#> 3 total 3 100%
#> ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ End of JOYn report ββ
#> βΉ Note: Joyn's report available in variable .joyn
#> βΉ Note: Removing key variables keyby1 from id, keyby1, and t
#> β Warning: The keys supplied uniquely identify both x and y, therefore a 1:1
#> join is executed
#> id.x t.x x id.y t.y .joyn
#> 1 1 1 11 1 13 x & y
#> 2 NA NA 15 2 14 x & y
#> 3 NA 4 16 4 15 x & y
By default, sort
is TRUE
, so that the
merged table will be sorted by the by.x
column. Notice that
the output table distinguishes non-by column t coming from
x
from the one coming from y
by adding the
.x and .y suffixes -which occurs because the
no.dups
argument is set to TRUE
by
default.
In a similar fashion as the joyn()
primary function
does, merge()
offers a number of arguments to
verify/control the merge1.
For example, joyn::joyn
allows to execute one-to-one,
one-to-many, many-to-one and many-to-many joins. Similarly,
merge
accepts the match_type
argument:
# Example with many to many merge
joyn::merge(x = x2,
y = y2,
by.x = "id1",
by.y = "id2",
match_type = "m:m")
#>
#> ββ JOYn Report ββ
#>
#> .joyn n percent
#> 1 y 1 14.3%
#> 2 x & y 6 85.7%
#> 3 total 7 100%
#> ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ End of JOYn report ββ
#> βΉ Note: Joyn's report available in variable .joyn
#> βΉ Note: Removing key variables keyby1 from id, keyby1, y, and x
#> β Warning: Supplied both by and by.x/by.y. by argument will be ignored.
#> id1 id2 t x.x id y x.y .joyn
#> <num> <num> <int> <num> <num> <int> <int> <fctr>
#> 1: 1 1 1 16 1 11 16 x & y
#> 2: 1 1 1 16 2 15 17 x & y
#> 3: 1 1 2 12 1 11 16 x & y
#> 4: 1 1 2 12 2 15 17 x & y
#> 5: 2 2 1 NA 5 20 18 x & y
#> 6: 3 3 2 NA 6 13 19 x & y
#> 7: 3 4 NA 15 6 13 19 x & y
# Example with many to many merge
joyn::merge(x = x1,
y = y1,
by = "id",
match_type = "m:1")
#>
#> ββ JOYn Report ββ
#>
#> .joyn n percent
#> 1 x 2 66.7%
#> 2 y 1 33.3%
#> 3 total 3 100%
#> ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ End of JOYn report ββ
#> βΉ Note: Joyn's report available in variable .joyn
#> βΉ Note: Removing key variables id from id and y
#> β Warning: Supplied both by and by.x/by.y. by argument will be ignored.
#> id t x y .joyn
#> <num> <int> <int> <num> <fctr>
#> 1: 1 1 11 11 x & y
#> 2: 1 2 12 11 x & y
#> 3: 2 1 13 15 x & y
In a similar way, you can exploit all the other additional options
available in joyn()
, e.g., for keeping common variables,
updating NAs and values, displaying messages etcβ¦, which you can explore
in the βAdvanced functionalitiesβ article.
See the βAdvanced functionalitiesβ article for more detailsβ©οΈ