Multiple Populations

Multiple populations are handled by indexing each index separately, and writing a meta-data file that facilitates loading of the multi population index. Similarly, queries are simply performed by querying each index separately.

Multi-Population Queries

In brain-indexer, this has been implemented through MultiPopulationIndex. Which supports the same API as regular indexes, with a few additions. See Queries for more details.

The difference between single-population indexes and multi-population indexes is that multi-population indexes the results of queries is a dictionary of single-population results. For example:

>>> index.box_query(*window, fields="gid")
{
  "NodeA__NodeA__chemical": np.array([12, 3290, ..., ]),
  "NodeA__NodeB__chemical": np.array([22, 2309, ..., ]),
  ...
}

>>> index.box_query(*window, fields=["gid", "radius"])
{
  "NodeA__NodeA__chemical": {
    "gid": np.array([12, 3290, ..., ]),
    "radius": np.array([0.23, 0.34, ...])
  },
  "NodeA__NodeB__chemical": {
    "gid": ...
    "radius": ...
  }
}

Keyword Argument: populations

The query will be restricted to only these populations. The default it to return the results for all indexes. Example:

>>> index.box_query(*window, fields="gid", populations="NodeA__NodeA__chemical")
{
  "NodeA__NodeA__chemical": np.array([12, 3290, ..., ])
}

>>> all_but_one = index.populations[:-1]
>>> index.box_query(*window, fields="gid", populations=all_but_one)
{
  "NodeA__NodeA__chemical": np.array([12, 3290, ..., ]),
  "NodeA__NodeB__chemical": np.array([22, 2309, ..., ]),
  ...
}

Keyword Argument: population_mode

This option is slightly advanced and only interesting if you’re writing code that should be generic across both single and multi-population indexes. It controls the return type of queries, i.e., it controls if the dict that sorts the populations is present or not.

There are three options:

  • None This is the default. Under this mode the query result for multi-population indexes is a dictionary of single-population query results. While single-population indexes simply return a single-population result.

  • "single" This can be used to force a multi-population index to behave like a single population index, i.e., by not wrapping the single-population query result in a dict. Clearly, this requires that the query involves exactly one population.

  • "multi" This forces a single-population index to wrap their result as if it were a multi-population index. Note that the name of the population is unspecified.

Writing Generic Code

This section contains tips on how to use the API to write code that behaves nicely if one doesn’t know if the index is a single-population index or a multi-population index.

The potential traps:

def special_query(index, window):
    """Query the index suitable for scientific Usecase A."""

    results = index.box_query(*window, fields="gid")

    # Bad: fails for multi-population indexes
    largest_gid = np.max(results["gid"]) > 1000

    # Bad: fails for single-population indexes
    largest_gid = np.max(results["NodeA__NodeA__chemical"]["gid"])

    if largest_gid > 1000:
        print("Large GID spotted.")

Usecase 1: Single Population Queries

The piece of code knows it’s only dealing with a single population. In this case, we can coerce the result into the single-population format:

def special_query(index, window, population=None):
    """Query the index suitable for scientific Usecase A."""

    results = index.box_query(
        *window, fields="gid", population=population,
        population_mode="single"
    )

    # Good: works for both single- and multi-population indexes,
    # because of `population_mode="single"`.
    largest_gid = np.max(results["gid"]) > 1000

Usecase 2: Multiple Population Queries

In this usecase the code know how to handle multiple population if present. Then, one can choose to always use the multi-population return type:

def special_query(index, window, populations=None):
    """Print larges GID."""

    results = index.box_query(
        *window, fields="gid", population=population,
        population_mode="multiple"
    )

    for pop, result in results.items():
        largest_gid = np.max(result["gid"])
        print(f"{pop=}: {largest_gid}")

Source

multi_population.rst