Introduction ============ A spatial index is a data structure that's designed to quickly find all elements that intersect with a given query shape. The word *index* is used in the database sense; not as an integer identifier. Here quick means, that we do not need to look through all element to figure out which elements intersect with the query shape. In brain-indexer we use an implementation of an R-tree, see `boost::rtree`_. The idea behind an R-tree is that the leaves of the tree contain the bounding boxes of the elements; and internal nodes store the bounding box of their descendants. This structure is depicted in :numref:`index`. .. _boost::rtree: https://www.boost.org/doc/libs/1_80_0/libs/geometry/doc/html/geometry/brain_indexeres.html .. _index: .. figure:: img/index.png :scale: 20 % The gray non-axis aligned boxes are represent the elements in the tree. The gray outlines represent the bounding box of internal nodes. Given such a tree, when performing the query one only needs to descend into subtrees, if they query shape intersects with the bounding box of the subtree. By design this piece of information is stored in the root of the subtree. This is show in :numref:`query`. .. _query: .. figure:: img/query.png :scale: 20 % The yellow box is the query shape. The elements found by this query are shown in green, any other elements are drawn in gray. The outlines show which parts of the tree need to be considered when performing a query. In the first level the entire left side is excluded, then the lower half. In the third level, both subtrees need to be looked at. The trick is to create the tree such that the bounding boxes of internal nodes don't overlap too much or needlessly. One typical work flow for indexes is to create them once up front, store them and open them whenever one needs to perform spatial queries. Naturally, there are other workflows which will also be covered, but for now it's enough to know that the cost of building the index is often less than the naive approach. Even for few elements and not very many queries, say thousands. Because it's common (at BBP) that someone has precomputed the index for you, we start explaining the syntax of queries. Using Existing Indexes ---------------------- If you prefer a hands-on approach you might like to continue with the Jupyter Notebook ``basic_tutorial.ipynb`` or any of the examples in ``examples/``. Given an index stored at ``index_path``, one may want to open the index and perform queries. Opening An Existing Index ~~~~~~~~~~~~~~~~~~~~~~~~~ Indexes are usually stored in their own folder. This folder contains a file called ``meta_data.json``. In order to open an index one may simply call .. code-block:: python index = brain_indexer.open_index(path_to_index) where ``path_to_index`` is the path to the folder containing the index. This can be used for all variants of indexes. Performing Simple Queries ~~~~~~~~~~~~~~~~~~~~~~~~~ After opening the index one may query it as follows: .. code-block:: python results = index.box_query(min_points, max_points) results = index.sphere_query(center, radius) The former returns all elements that intersect with the box defined by the corners ``min_points`` and ``max_points``. The latter is used when the query shape is a sphere. The detailed documentation of :ref:`queries ` contains several examples. Creating An Index on the Fly ---------------------------- Workflows that require repeated queries will benefit from using a spatial index, even for quite a small number of indexed elements. Therefore, it is useful to create small indexes on the fly. This section describes the available API for this task. If you're trying to pre-compute an index for later use, you might prefer the :ref:`CLI applications `. Indexing Nodes ~~~~~~~~~~~~~~ A common case is to create a spatial index of spheres which have an id (e.g. somas identified by their gid). SphereIndex is, in this case, the most appropriate class. The constructor accepts all the components (gids, points, and radii) as individual numpy arrays. .. code-block:: python from brain_indexer import SphereIndexBuilder import numpy as np ids = np.arange(3, dtype=np.intp) centroids = np.array([[0, 0, 0], [1, 0, 0], [2, 0, 0]], dtype=np.float32) radius = np.ones(3, dtype=np.float32) index = SphereIndexBuilder.from_numpy(centroids, radius, ids) Indexing Morphologies ~~~~~~~~~~~~~~~~~~~~~ In brain-indexer the term *morphologies* refers to discrete neurons consisting of somas and segments. Morphology indexes can be build directly from SONATA input files. For example by using `MorphIndexBuilder` as follows: .. code-block:: python index = MorphIndexBuilder.from_sonata_file(morph_dir, nodes_h5) where ``morph_dir`` is the path of the directory containing the morphologies in either ASCII, SWC or HDF5 format. Both function have a keyword argument which allows one to optionally specify the GIDs of all neurons to be indexed. By passing the keyword argument ``output_dir`` the index is stored at the specified location and can be opened/reused at later point in time. Indexing Synapses ~~~~~~~~~~~~~~~~~ Another common example is to create a spatial index of synapses imported from a sonata file. In this case ``SynapseIndexBuilder`` is the appropriate class to use: .. code-block:: python from brain_indexer import SynapseIndexBuilder from libsonata import Selection index = SynapseIndexBuilder.from_sonata_file(EDGE_FILE, "All") Building a synapse index through this API enables queries to fetch any attributes of the synapse stored in the SONATA file. Please see :ref:`Queries` for more information about how to perform queries. Passing the keyword argument ``output_dir`` ensures that the index is also stored to disk. Precomputing Indexes For Later Use ---------------------------------- When the number of indexed elements is large, considerable resources are needed to compute the index. Therefore, it can make sense to precompute the index once and store it for later (frequent) reuse. The most conventient way is through the CLI applications. Note that indexes can exceed the amount of available RAM, in this case please consult `Large Indexes`_. .. _`CLI Interface`: Command Line Interface ~~~~~~~~~~~~~~~~~~~~~~ There are three executables * ``brain-indexer-circuit`` is convenient for indexing both segments and synpses when the circuit is defined in a SONATA circuit configuration file. Therefore, if you already have a circuit config files, this is the right command to use. .. command-output:: brain-indexer-circuit --help * ``brain-indexer-nodes`` is convenient for indexing segments if one wants to specify the paths of the input files directly. .. command-output:: brain-indexer-nodes --help * ``brain-indexer-synapses`` like ``brain-indexer-nodes`` but for synapses. .. command-output:: brain-indexer-synapses --help Large Indexes ~~~~~~~~~~~~~ brain-indexer implements Multi-Indexing for indexing large circuits. Multi indexes subdivide the volume to be indexed into small subvolumes and uses MPI to create subindexes for each of these subvolumes. More information can be found :ref:`here `.