`prune (DirPath)`

The prune() function removes all the documents that are stored under the given directory path. Logically this can be accomplished by using find() to get the names of all such documents, and then calling remove() on each one. However, it will be desirable to optimise this as a distinct operation because we can plan its execution a little better than a literal set of concurrent remove() calls. These would be likely to conflict with one another, and they would execute a lot of incremental unlink() operations to remove items from directories one at a time. If we’re deleting an entire directory recursively then we know that we’ll want to delete all its internal directories entirely, rather than incrementally removing their children.

To perform prune(dir), we still perform a find(dir) call for the directory to locate all the documents to remove, and to load the shards containing all the intermediate directories into the cache. We then plan a set of rm() calls for those documents and any directories inside dir. We delete the directory tree bottom-up starting with the leaf documents; an item can be removed once all its children are removed. For example, consider the tree we saw earlier:

/
└─┬ path/
  ├── a.txt
  └─┬ to/
    ├── b.txt
    └── c.txt

A call to prune('/path/') would first call find('/path/'), returning the list ['/path/a.txt', '/path/to/b.txt', '/path/to/c.txt']. We immediately schedule an rm() for each of these; we’ll give each operation an identifying label:

A: rm('/path/a.txt')
B: rm('/path/to/b.txt')
C: rm('/path/to/c.txt')

There is one internal directory in this tree: /path/to/. This can be removed once all its documents have been removed:

D: rm('/path/to/'), depends on B and C

Finally /path/ itself can be removed once its direct children have been removed:

E: rm('/path/'), depends on A and D

The removal must be planned bottom-up for the same reason that the unlink() calls in remove() must go bottom-up. In a tree containing a single document, /path/to/file.txt, a call to prune('/path/') will perform basically the same operations as remove('/path/to/file.txt'), and they’ll be prone to the same race conditions unless we make the removal of a directory dependent on the removal of its children.

If any of the writes encounters a conflict, the whole operation should restart from the initial find() call. A successful execution of prune() should end with nothing inside the given directory, so if a concurrent update() occurs on a path inside the directory, we must call find() again to get an updated view of the directory’s contents. This find() call may happen when the directory has been partially deleted, which is another reason to make sure deletion goes bottom-up. prune() should be considered successful if all its writes after the initial find() complete without conflicts.

Once all the rm() calls have been completed, prune() can perform the same check that remove() does for empty parent directories. It should at least unlink() the pruned directory from its parent, and may perform further unlink() calls if this leaves the parent directory empty, and so on.

Putting all this together, the full execution of prune('/path/') on the above tree produces the following graph.

                     ┌───────────────────┐
                     │ rm('/path/a.txt') │
                     └──────────────────\┘
                                         \
┌──────────────────────┐                 ┌\─────────────┐   ┌──────────────────────┐
│ rm('/path/to/b.txt') │                 │ rm('/path/') ----- unlink('/', 'path/') │
└─────────────────────\┘                 └/─────────────┘   └──────────────────────┘
                       \                 /
                       ┌\───────────────/┐
                       │ rm('/path/to/') │
                       └/────────────────┘
                       /
┌─────────────────────/┐
│ rm('/path/to/c.txt') │
└──────────────────────┘

Keyboard shortcuts

EscoDB Internal Design

prune (DirPath)

`prune (DirPath)`