Multiple Populations¶
Multiple populations are handled by indexing each index separately, and writing a meta-data file that facilitates loading of the multi population index. Similarly, queries are simply performed by querying each index separately.
Multi-Population Queries¶
In brain-indexer, this has been implemented through MultiPopulationIndex. Which
supports the same API as regular indexes, with a few additions. See
Queries for more details.
The difference between single-population indexes and multi-population indexes is that multi-population indexes the results of queries is a dictionary of single-population results. For example:
>>> index.box_query(*window, fields="gid")
{
"NodeA__NodeA__chemical": np.array([12, 3290, ..., ]),
"NodeA__NodeB__chemical": np.array([22, 2309, ..., ]),
...
}
>>> index.box_query(*window, fields=["gid", "radius"])
{
"NodeA__NodeA__chemical": {
"gid": np.array([12, 3290, ..., ]),
"radius": np.array([0.23, 0.34, ...])
},
"NodeA__NodeB__chemical": {
"gid": ...
"radius": ...
}
}
Keyword Argument: populations¶
The query will be restricted to only these populations. The default it to return the results for all indexes. Example:
>>> index.box_query(*window, fields="gid", populations="NodeA__NodeA__chemical")
{
"NodeA__NodeA__chemical": np.array([12, 3290, ..., ])
}
>>> all_but_one = index.populations[:-1]
>>> index.box_query(*window, fields="gid", populations=all_but_one)
{
"NodeA__NodeA__chemical": np.array([12, 3290, ..., ]),
"NodeA__NodeB__chemical": np.array([22, 2309, ..., ]),
...
}
Keyword Argument: population_mode¶
This option is slightly advanced and only interesting if you’re writing code that should be generic across both single and multi-population indexes. It controls the return type of queries, i.e., it controls if the dict that sorts the populations is present or not.
There are three options:
NoneThis is the default. Under this mode the query result for multi-population indexes is a dictionary of single-population query results. While single-population indexes simply return a single-population result."single"This can be used to force a multi-population index to behave like a single population index, i.e., by not wrapping the single-population query result in a dict. Clearly, this requires that the query involves exactly one population."multi"This forces a single-population index to wrap their result as if it were a multi-population index. Note that the name of the population is unspecified.
Writing Generic Code¶
This section contains tips on how to use the API to write code that behaves nicely if one doesn’t know if the index is a single-population index or a multi-population index.
The potential traps:
def special_query(index, window):
"""Query the index suitable for scientific Usecase A."""
results = index.box_query(*window, fields="gid")
# Bad: fails for multi-population indexes
largest_gid = np.max(results["gid"]) > 1000
# Bad: fails for single-population indexes
largest_gid = np.max(results["NodeA__NodeA__chemical"]["gid"])
if largest_gid > 1000:
print("Large GID spotted.")
Usecase 1: Single Population Queries¶
The piece of code knows it’s only dealing with a single population. In this case, we can coerce the result into the single-population format:
def special_query(index, window, population=None):
"""Query the index suitable for scientific Usecase A."""
results = index.box_query(
*window, fields="gid", population=population,
population_mode="single"
)
# Good: works for both single- and multi-population indexes,
# because of `population_mode="single"`.
largest_gid = np.max(results["gid"]) > 1000
Usecase 2: Multiple Population Queries¶
In this usecase the code know how to handle multiple population if present. Then, one can choose to always use the multi-population return type:
def special_query(index, window, populations=None):
"""Print larges GID."""
results = index.box_query(
*window, fields="gid", population=population,
population_mode="multiple"
)
for pop, result in results.items():
largest_gid = np.max(result["gid"])
print(f"{pop=}: {largest_gid}")