# brainprep 🧠 Preprocessing for pediatric (1-7 yo) MRI brain data This preprocessing follows the BIDS standard and writes all outputs directly into your dataset’s `derivatives/brainprep/` folder. It works with any BIDS-compliant dataset that provides an `exclude.yaml` file (as described below). After raw-image QC, the workflow runs in three steps and supports both cross-sectional and longitudinal datasets. --- ## Install `brainprep` (using `environment.yml`) 1) **Create the conda environment** ```bash conda env create -f environment.yml ``` 2. **Activate it** ```bash conda activate brainprep ``` 3. **Run a script** ```bash python brainprep.py --help ``` ## 1) Create the input list (`create_input_txt.py`) This step scans one or more **BIDS root** directories and writes a text file containing the **absolute paths** to images that should be preprocessed by `brainprep.py`.
📌 Additional instructions (exclude.yaml, layouts, age filters, examples): click the triangle to expand ### Where `exclude.yaml` lives (default) By default the script looks for: ``` /code/qc/raw/exclude.yaml ``` You can change the filename with `-e/--exclude-file`, but the folder is expected to be `code/qc/raw/` under each BIDS root. ### What the output file contains The output is a plain text file (e.g., `to_preprocess_hc-calgary-preschool.txt`) with **one quoted absolute path per line**, like: ``` "/path/to/bids/sub-10001/ses-001/anat/sub-10001_ses-001_T1w.nii.gz" "/path/to/bids/sub-10002/ses-002/anat/sub-10002_ses-002_T1w.nii.gz" ... ``` This list is later passed to `brainprep.py` via `--inputs`. --- ### `exclude.yaml` format `exclude.yaml` can be either: **(A) a YAML list** ```yaml - sub-10001 - sub-10002_ses-001 - sub-10003_ses-002_run-02 ``` **(B) a dict with a list under one of these keys:** `exclude`, `exclude_paths`, `exclude_images` ```yaml exclude: - sub-10001 - sub-10002_ses-001 - sub-10003_ses-002_run-02 ``` #### What entries mean You can exclude at different levels: * `sub-10001` → excludes **all sessions/files** for that subject * `sub-10001_ses-001` → excludes **all files** under that session * `sub-10001_ses-001_run-02` → excludes **that specific run** (for `T1w`/`T2w`, this targets `.../anat/_.nii.gz`) * You can also provide **path-like patterns / globs**, e.g. * `sub-10001/ses-001/**` * `sub-*/ses-*/anat/*_T1w.nii.gz` Notes: * Excludes are matched against the **relative path** from the dataset root. * Run-level exclusions only trigger when the exclude string contains `_run-`. --- ### Longitudinal vs cross-sectional layouts You must specify a layout for each dataset root: * `-l long` expects: `sub-*/ses-*/anat/*_T1w.nii.gz` * `-l cross` expects: `sub-*/anat/*_T1w.nii.gz` You can pass **multiple datasets** at once and give one layout per dataset. --- ### Optional filters #### Subject filter Restrict to specific subjects: * `--subjects sub-10001 sub-10010` * or `--subjects-file subjects.txt` (one subject per line) #### Age filter (in months) You can filter sessions by age: * `--min-age-months 12 --max-age-months 84` Age is read from a TSV (tab or CSV is auto-detected): * `--age-tsv ` * If not provided: defaults to `/participants.tsv` * Column names are configurable: * `--age-pid-col participant_id` * `--age-ses-col session` (used only for longitudinal layout) * `--age-col age` * `--age-units years|months` (default: years) Behavior: * If age is missing / not parseable → the session is **excluded** (conservative). * For longitudinal datasets, age is looked up by `(participant_id, session)`. * If a session label ends with `mo` (e.g., `ses-24mo`) and age is not found in TSV, it can be used as a fallback. --- ### Common commands **Single longitudinal dataset (T1w), using default exclude.yaml location** ```bash python create_input_txt.py /home/andjela/joplin-intra-inter/hc-calgary-preschool \ -l long \ --modality T1w \ -o to_preprocess_hc-calgary-preschool.txt ``` **With age range (1–7 years)** ```bash python create_input_txt.py /home/andjela/joplin-intra-inter/hc-calgary-preschool \ -l long \ --modality T1w \ --min-age-months 12 --max-age-months 84 \ --age-tsv /home/andjela/joplin-intra-inter/hc-calgary-preschool/participants.tsv \ --age-col age --age-units years \ -o to_preprocess_hc-calgary-preschool_12to84mo.txt ``` **Multiple datasets at once** ```bash python create_input_txt.py \ /home/andjela/joplin-intra-inter/hc-bcp \ /home/andjela/joplin-intra-inter/hc-calgary-preschool \ -l long long \ --modality T1w \ -o to_preprocess_all.txt ``` **Non-anat modality (example: dwi), using a recursive pattern** ```bash python create_input_txt.py /path/to/bids \ -l long \ --modality dwi \ -o to_preprocess_dwi.txt ```
--- ## 2) Run the preprocessing (`brainprep.py`) This step reads the image list produced by `create_input_txt.py` and writes BIDS-derivatives outputs directly into the dataset: ``` /derivatives/brainprep/sub-*/ses-*/anat/ ``` ### Pipeline steps and outputs For each input `sub-*/ses-*/anat/*_T1w.nii.gz`, the following steps are applied: 1. **SynthStrip (brain extraction)** * Tool: `mri_synthstrip` * Outputs: * `sub-*_ses-*_desc-synthstrip_T1w.nii.gz` * `sub-*_ses-*_desc-synthstrip_mask.nii.gz` 2. **N4 bias-field correction** * Tool: `N4BiasFieldCorrection` * Input: skull-stripped image from SynthStrip * Output: * `sub-*_ses-*_desc-n4_T1w.nii.gz` * Optional skip: if the input path is listed in `--no-bfc`, the N4 output is replaced by a link/copy of the SynthStrip image. 3. **Affine registration to a template** * Tool: `antsRegistrationSyNQuick.sh` * Input: N4 output * Outputs (template space): * `sub-*_ses-*_space-