In the web application, variables in a dataset are displayed in a list on the left side of the screen.
Typically, when you import a dataset, the variable list is flat, but
it can be organized into an accordion-like hierarchy. The variable
organizer in the Crunch GUI allows you to organize your variables
visually, but you can also manage this metadata from R using the
crunch
package.
This variable hierarchy can be thought of like a file system on your computer, with files (variables) organized into directories (folders). As such, the main functions you use to manage it are reminiscent of a file system.
cd()
, changes directories, i.e. selects a foldermkdir()
makes a directory, i.e. creates a foldermv()
moves variables and folders to a different
folderrmdir()
removes a directory, i.e. deletes a folderLike a file system, you can express the “path” to a folder as a string separated by a “/” delimiter, like this:
If your folder names should legitimately have a “/” in them, you can
set a different character to be the path separator. See
?mkdir
or any of the other functions’ help files for
details.
Paths can be expressed relative to the current object—a folder, or in
this case, the dataset, which translates to its top-level
"/"
root folder in path specification—and the file system’s
special path segments ".."
(go up a level) and
"."
(this level) are also supported. We’ll use those in
examples below.
You can also specify paths as a vector of path segments, like
which is equivalent to the previous example. One or the other way may be more convenient, depending on what you’re trying to accomplish.
These four functions all take a dataset or a folder as the first
argument, and they return the same object passed to it, except for
cd
, which returns the selected folder. As such, they are
designed to work with magrittr
-style piping
(%>%
) for convenience in chaining together steps, though
they don’t require that you do.
To get started, let’s pick up the dataset we used in the array variables vignette and view its starting layout. We can do that by selecting the root folder (“/”) and printing it
(The print()
isn’t strictly necessary here as
cd
will return the folder and thus it will print by
default, but we’ll use different print
arguments later, so
it’s included here both for explicitness and illustration.)
It’s flat—there are no folders here, only variables. If you’re
importing data from a data.frame
or a file, like an SPSS
file, this is where you’ll begin.
Let’s make some folders and move some variables into them. To start, I know that the demographic variables are at the back of the dataset, so let’s make a “Demos” folder and move variables 21 to 37 into it:
Now when I print the top-level directory again, I see a “Demos” folder and don’t see those demographic variables:
mv()
can reference variables or folders within a level
in several ways. Numeric indices like we just did probably won’t be the
most common way you’ll do it: names work just as well and are more
transparent. Let’s move the first variable, perc_skipped
,
into “Demos” as well
A side note: although the last step of that chain was
cd()
, we haven’t changed state in our R session. There is no “working folder” set globally.cd()
is a function that returns a folder; if we had assigned the return from the function (pipeline) to some object, we could then pass that in to another function to “start” in that folder.
Another way we can identify variables is by using the
dplyr
-like functions starts_with
,
ends_with
, matches
, and contains
.
Let’s use matches
to move all of the questions about Edward
Snowden or Bradley (Chelsea) Manning to a folder for the topical
questions in this week’s survey:
We can also select all variables in a folder using the
variables
function (or all folders within a folder using
folders
). Let’s move all remaining variables from the top
level folder to a folder called “Tracking questions”. To do this, we do
need to explicitly change to the top level folder.
(Curious about the “dot” notation? See the magrittr docs.)
The reason we change to the top level folder here is that there is a
subtle difference between passing ds
to mv()
versus cd(ds, "/")
. Whatever object, dataset or folder,
that is passed into mv()
determines the scope from which
the objects to move are selected. If you pass the dataset in, you can
select any variables in the dataset, regardless of what folder they’re
in. If you pass in a folder, you’re selecting just from that folder’s
contents. It can be convenient to find all variables that match some
criteria across the whole dataset to move them, but sometimes we don’t
want that. In this case, we wanted only the variables sitting in the top
level folder, not nested in other folders, so we wanted
variables(cd(ds, "/"))
and not
variables(ds)
.
Now, our variable tree has some structure. Let’s use
print(folder, depth = 1)
to see these folders and their
contents one level deep:
We can create folders within folders as well. In the “This week” folder, we have a set of questions about Edward Snowden. Let’s nest them inside their own subfolder inside “This week”:
ds %>%
cd("This week") %>%
mkdir("Snowden") %>%
mv(matches("snowden", ignore.case = TRUE), "Snowden") %>%
cd("..") %>%
print(depth = 2)
Note how we used ".."
to change folders up a level, as
you can in a file system . We did that just so we can print the folder
structure at the top level (and to illustrate that you can specify
relative paths :).
You could also do this using the full path segments.
mkdir
will recursively make all path segments it needs in
order to ensure that the target folder exists.
Folders themselves have names, which we can set with
setName()
:
We can also set the names of the objects contained in a folder with
setNames()
:
ds %>%
cd("Demos") %>%
setNames(c("Birth Year", "Gender", "Political Ideo. (3 category)",
"Political Ideo. (7 category)", "Political Ideo. (7 category; other)",
"Race", "Education", "Marital Status", "Phone", "Family Income", "Region",
"State", "Weight", "Voter Registration (new)", "Is a voter?",
"Voter Registration (old)", "Voter Registration"))
Unlike files in a file system, variables within folders are ordered.
Let’s move “Demographics” to the end. One way to do that is with the
setOrder
function. This lets you provide a specific order,
but it requires you to specify all of the folder’s contents. Let’s use
that function to put “Tracking questions” first:
The cleanest way to delete a folder is with rmdir()
:
This deletes the folder and all variables contained within it.