# j0251 Connectome — Agent Guide A concise, cookbook-style reference for AI coding agents (Claude and similar) working with the j0251 songbird basal-ganglia connectome. Two access modes are covered: the public REST API (good for ad-hoc queries, no install) and the raw NumPy arrays (good for batch analyses). ## Citation If you use this dataset or the API in your work, please cite: > Rother, A., Januszewski, M., Jain, V., Fee, M.S., Kornfeld, J. (2025). > *The songbird basal ganglia connectome.* bioRxiv. > See for related work. ## Dataset - **j0251** — adult zebra finch *area X*, 256 × 256 × 384 µm volume, acquired at the MPI for Medical Research (Heidelberg). - **Version** of these files: `72_seg_20210127_agglo2_syn_20220811_celltypes_20230822` (filtered). - **~18.36 M synapses**, **~4.26 M neurons**. - All spatial coordinates are in **nanometers**. - Acquisition: J. Kornfeld. Segmentation: M. Januszewski (V. Jain's team, Google Research). SyConn processing: P. Schubert, A. Rother, H. Ahmad, J. Kornfeld. ## Cell types Index → name (used in the `pre_celltype`, `post_celltype`, `cell_type` columns and as filter values in the REST API): ``` 0 ASTRO 6 INT1 12 MIGR 1 DA 7 INT2 13 MSN 2 FRAG 8 INT3 14 OLIGO 3 GPe 9 LMAN 15 STN 4 GPi 10 LTS 16 TAN 5 HVC 11 MICRO 17 unclassified ``` **Important:** `HVC`, `LMAN`, `DA` are **axon-only** in this dataset. They **cannot** be used as `post_celltype` — the API will return an error. ## Post-morphology codes ``` 0 spine_neck 2 shaft 255 unknown 1 spine_head 3 soma ``` --- ## REST API (recommended for most queries) Base URL: `https://syconn.esc.mpcdf.mpg.de`. The two JSON endpoints accept rich filters and tolerate concurrent clients. ### `GET /j0251/synapses/json` Query synapses with filters and choose what fields to return. | Parameter | Default | Notes | |---|---|---| | `pre_celltype` | — | Cell type name (e.g. `MSN`). Omit (with `post_celltype`) to query all. | | `post_celltype` | — | Cell type name. **Not** `HVC`/`LMAN`/`DA`. | | `pre_neuron_ids`, `post_neuron_ids` | — | Comma-separated IDs. | | `prob_min` | `0.6` | Synapse probability threshold. | | `size_min` | `0.0` | Min synapse size in µm². | | `post_morph` | `-1` (all) | `0`=spine_neck, `1`=spine_head, `2`=shaft, `3`=soma. | | `min_pre_axon`, `min_pre_soma`, `min_pre_dend` | `0.0` | Skeleton-length filters on the presynaptic neuron, in µm. | | `min_post_axon`, `min_post_soma`, `min_post_dend` | `0.0` | Same for the postsynaptic neuron. | | `return_fields` | `size` | Comma-separated. Available: `size,x,y,z,prob,pre_id,post_id,post_morph,pre_celltype,post_celltype`. | | `max_results` | unlimited | Caps result count. **Sampled randomly from the filtered pool — not the first N.** | Examples: ```bash # 100 random MSN→GPi synapses with coordinates and size curl "https://syconn.esc.mpcdf.mpg.de/j0251/synapses/json?\ pre_celltype=MSN&post_celltype=GPi&\ return_fields=x,y,z,size,prob&max_results=100" # All synapses (random 100 k sample) with cell-type tags curl "https://syconn.esc.mpcdf.mpg.de/j0251/synapses/json?\ max_results=100000&return_fields=pre_celltype,post_celltype,size" # HVC→MSN spine-head synapses on well-reconstructed neurons only curl "https://syconn.esc.mpcdf.mpg.de/j0251/synapses/json?\ pre_celltype=HVC&post_celltype=MSN&post_morph=1&\ min_pre_axon=50&min_post_dend=50&max_results=5000" ``` Response shape: ```json { "x": [15861, 15197, ...], "y": [6697, 4352, ...], "z": [1380, 1727, ...], "size": [0.2137, 0.2868, ...], "prob": [0.9208, 0.9082, ...], "count": 5, "total_size": 1.1337, "mean_size": 0.2268 } ``` ### `GET /j0251/neurons/json` Find neurons matching cell-type, connectivity, and reconstruction-length criteria. Returns `{"neuron_ids": [...], "count": N}`. | Parameter | Notes | |---|---| | `celltype` | Source cell type. | | `target_celltype` | Cell type for connectivity filtering. | | `direction` | `outgoing` or `incoming` (relative to `celltype`). | | `min_synapses` | Min # synapses to/from `target_celltype`. | | `min_celltype_certainty` | 0.0–1.0; certainty threshold on the cell-type prediction. | | `min_axon_length`, `min_dendrite_length`, `min_soma_length` | µm. | | `prob_min` | Probability threshold used while counting synapses. | Example: ```bash # HVC neurons with ≥50 synapses to MSN, well-reconstructed axons, # high-confidence cell-type call curl "https://syconn.esc.mpcdf.mpg.de/j0251/neurons/json?\ celltype=HVC&target_celltype=MSN&direction=outgoing&\ min_synapses=50&min_celltype_certainty=0.8&min_axon_length=50.0" ``` ### `GET /j0251/skeleton?neuron_id=` (binary) Layout: `uint32 num_vertices, uint32 num_edges, float32×3×N vertices, uint32×2×M edges`. ```python import struct, requests r = requests.get("https://syconn.esc.mpcdf.mpg.de/j0251/skeleton?neuron_id=930416054") b = r.content nv, ne = struct.unpack_from("/sv/:0:_mesh` (binary) Layout: `uint32 num_vertices, float32×3×N vertices`. Subsample by 20–30× for WebGL. ```bash curl "https://syconn.esc.mpcdf.mpg.de/notebook/j0251/72_seg_20210127_agglo2_syn_20220811_celltypes_20230822/sv/930416054:0:930416054_mesh" -o mesh.bin ``` ### Common pitfalls - **`prob_min` defaults to `0.6`.** Lower it explicitly if you need recall. - **`max_results` is a uniform random sample.** Don't assume the first row is representative; don't paginate by changing `max_results`. - **`HVC`/`LMAN`/`DA` cannot be `post_celltype`.** Server returns `ValueError: "Cell type 'X' cannot be used as post_celltype (axon-only in dataset)"`. - **Coordinates are in nanometers**, not voxels. - **Cell type names are case-sensitive.** - **Empty result on a plausible query:** check `prob_min`, check that the pre-/post- assignment matches biology (axon-only types), check casing. --- ## Raw NumPy arrays (for batch analyses) Direct downloads (~712 MB total): | URL | Size | Description | |---|---|---| | [`/connectome/data/j0251_…_filtered_synapses.npy`](/connectome/data/j0251_72_seg_20210127_agglo2_syn_20220811_celltypes_20230822_filtered_synapses.npy) | ~613 MB | 18,361,224 synapses, structured array | | [`/connectome/data/j0251_…_filtered_neurons.npy`](/connectome/data/j0251_72_seg_20210127_agglo2_syn_20220811_celltypes_20230822_filtered_neurons.npy) | ~94 MB | 4,261,298 neurons, structured array | | [`/connectome/data/j0251_…_filtered_celltype_mapping.txt`](/connectome/data/j0251_72_seg_20210127_agglo2_syn_20220811_celltypes_20230822_filtered_celltype_mapping.txt) | <1 KB | `index: name` per line | | [`/connectome/data/j0251_…_filtered_post_morph_mapping.txt`](/connectome/data/j0251_72_seg_20210127_agglo2_syn_20220811_celltypes_20230822_filtered_post_morph_mapping.txt) | <1 KB | `index: name` per line | ### Synapse array schema ``` size float16 # µm² pre_id uint64 # presynaptic neuron ID post_id uint64 # postsynaptic neuron ID prob float16 # synapse probability x, y, z uint32 # coordinates in nanometers post_morph uint8 # 0=spine_neck 1=spine_head 2=shaft 3=soma 255=unknown pre_celltype uint8 # see celltype mapping post_celltype uint8 # see celltype mapping ``` ### Neuron array schema ``` id uint64 # neuron ID cell_type uint8 # see celltype mapping celltype_certainty float16 # 0..1 skel_length_axon float32 # µm skel_length_soma float32 # µm skel_length_dendrite float32 # µm ``` ### Loading and querying in Python `mmap_mode='r'` keeps the 613 MB synapse file out of process RAM — only the slices you access are paged in. Always sufficient for filtering. ```python import numpy as np syn = np.load("j0251_..._filtered_synapses.npy", mmap_mode="r") neu = np.load("j0251_..._filtered_neurons.npy", mmap_mode="r") # Build name<->id maps from the .txt files (format: ": " per line) def load_mapping(path): name2id, id2name = {}, {} for line in open(path): if ":" not in line: continue i, name = line.split(":", 1) i, name = int(i.strip()), name.strip() name2id[name] = i; id2name[i] = name return name2id, id2name ct_name2id, ct_id2name = load_mapping("j0251_..._filtered_celltype_mapping.txt") # Example 1: count HVC→MSN synapses with prob ≥ 0.6 and size ≥ 0.05 µm² HVC, MSN = ct_name2id["HVC"], ct_name2id["MSN"] mask = (syn["pre_celltype"] == HVC) & (syn["post_celltype"] == MSN) \ & (syn["prob"] >= 0.6) & (syn["size"] >= 0.05) print("HVC→MSN synapses:", int(mask.sum())) # Example 2: per-celltype-pair synapse counts (full connectivity matrix) N = 18 # number of cell types counts = np.zeros((N, N), dtype=np.int64) np.add.at(counts, (syn["pre_celltype"], syn["post_celltype"]), 1) # counts[ct_name2id["HVC"], ct_name2id["MSN"]] is HVC→MSN total # Example 3: well-reconstructed MSN somata mask = (neu["cell_type"] == MSN) & (neu["skel_length_dendrite"] > 200) \ & (neu["celltype_certainty"] > 0.8) msn_ids = neu["id"][mask] ``` ### Performance notes - The arrays are **not** sorted by `pre_id`/`post_id`, so single-neuron lookups do a full scan (~50–100 ms for 18 M rows on warm cache). For repeated per-neuron queries, build a sparse index once or use the REST API. - `size` and `prob` are `float16` — fine for filtering, but cast to `float32` before doing math you care about (`syn["size"].astype(np.float32)`). - For interactive analyses with low setup, prefer the REST API. The raw arrays are best when you need to scan all 18 M rows (e.g. a full connectivity matrix in one pass) or need columns the API doesn't expose. --- ## Where to go for more depth - **Interactive notebook with live examples:** - **Project home and related publications:** - **The reference paper:** Rother et al. (2025), bioRxiv —