summary refs log tree commit diff
path: root/issues/data/HS/fix_calc_genoprob_rqtl2_function.gmi
blob: 8c4d50d22bb9a93805967baf3bed34268d943eca (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
# Investigate and fix "qtl2::calc_genoprob()" run due to failing with negative length vectors 

## Tags 

* Assigned: Flisso
* type: bug 
* status: in progress
* interested: alexm
* key words: cross, qtl2, calc_genoprob, bugs 

## Description
Running subset of genotype and founders csv on qtl2 to generate founder aware smoothed genotypes. 
The script is crushing as per the followin error message:  
 
```sh
calc_genoprob failing with negative length vectors are not allowed
```
For reference, see "qtl2_hmm_pipeline.R" script: 
=> https://github.com/fetche-lab/HS-rats-2026/blob/main/genotypes/new_processing/tests/chr1/codes/qtl2_hmm_pipeline.R

The following were key findings from the run, and the error: 

* Map and IDs were consistent:
* - 50,000 markers
* - no duplicate marker IDs
* - monotonic increasing cM
* Genotype dimensions:
*  - HS genotypes: 1499 x 50000
*  - Founder genotypes: 8 x 50000
* Error cause matched integer-length overflow conditions:
* The original workflow tried to allocate a genotype-probability object effectively sized around 1499 * 50000 * 36 = 2,698,200,000, which exceeds R’s 32-bit vector-length limit (2,147,483,647), causing negative length vectors are not allowed.
* So the solution was to chunk the files to 5000 lines, but still the culprit is on the calc_genoprob() runtime. 

## Tasks
* [x] error: "calc_genoprob failing with negative length vectors are not allowed)" 
* [ ] Re-run the script per specified chunks 
* [ ] Evaluate the smoothed output for its validity and intepretability 
* [ ] use the proximal/distal founder aware markers to extract snps from the original geno file. 
* [ ] or, extend a function in the script to perform this 
* [ ] Test the results with gemma and rqtl2 mapping