Age | Commit message (Collapse) | Author |
|
Check for special files and hidden files that might be inadvertently
added to the zip bundle by the operating system in use that might
share names and/or extensions with the main bundle files.
|
|
Ensure that **ALL** samples/cases/individuals mentioned in any of the
pheno files actually exist in at least one of the geno files.
|
|
|
|
|
|
Provide the function 'read_file_data' in the 'r_qtl.r_qtl2' module to
read each file in the bundle separately.
The function 'file_data' in the 'r_qtl.r_qtl2' module reads *ALL* the
files of a particular type (e.g. geno files) and returns a single
generator object with the data from *ALL* the files. This does not
render itself very useful for error checking.
We needed a way to check for errors, and report them for each and
every file in the bundle, for easier tracking and fixing.
|
|
Add a function that, given the path to the zip file, will read the
control data. It creates its own context manager.
|
|
|
|
|
|
|
|
|
|
|
|
Add a function to ensure the values in the geno files are all listed
in the control data under the "genotypes" key.
|
|
* Display any and all errors on the UI
* Move `validate_bundle` to QC module and refactor to use
`missing_files`
|
|
|
|
|
|
`na.strings` has a default value of "NA" as stated in
https://kbroman.org/qtl2/assets/vignettes/input_files.html#CSV_files
quote:
> Missing value codes will be specified in the control file (as
> na.strings, with default value "NA") and will apply across all
> files, so a missing value code for one file cannot be an allowed
> value in another file.
for `comment.char`
> The CSV files can include a header with a set of comment lines
> initiated by a value specified in the control file as comment.char
> (with default value "#").
for `sep`:
The default separator is expected to be the comma, as stated in
https://kbroman.org/qtl2/assets/vignettes/input_files.html#field-separator
quote:
> If the data files use a separator other than a comma ...
|
|
There was a bug where the `na.strings` were not processed correctly if
the user called the `r_qtl.r_qtl2.file_data(...)` function without
explicitly providing the `process_*` arguments.
This commit fixes that.
|
|
|
|
|
|
Since the R/qtl2 bundle generator could name the identifier column
anything, this commit converts the incoming identifier column name
into something explicit that we know and can use.
|
|
The validation checks ensure that whatever files are listed in the
control file exist in the zip file bundle. It is still possible,
however, that the code tries to read a file that does not exist in the
file and is not listed in the control file. In those cases, raise the
appropriate exception.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
* Add tests for parsing "founder_geno" files
* Extract common file parsing structure out to more general function
* Use generic function to parse "founder_geno" file in test
|
|
|
|
|
|
The processing of transposed files is similar across files. This
commit extracts the common parts into a separate function.
|
|
The processing of transposed files is probably going to be very
similar, thus the need to extract some reusable code from the
geno-file-specific function in preparation.
|
|
Since the processing of non-transposed files is mostly similar,
abstract away the common operations into a separate function and use
the function instead of repeating the same pattern of code throughout
the codebase.
|
|
|
|
|
|
|
|
Check that the parsing of non-transposed geno files.
Leave in failing test for transposed geno files.
|
|
|
|
|
|
|
|
|
|
* Set up error objects.
* Read the control data.
|