-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DRY refactoring #60
DRY refactoring #60
Conversation
Goal is to refactor code so that there is minimal duplication, easily extensible to enable metatranscriptomics and future Assay types, and breakout all of the functionality from the main plugin function so they can be tested properly.
Some Additional modifications added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @charles-cowart, I have some concerns about the possibility of duplication of code; please let me know what you think.
@antgonza I've pushed a commit of quick updates based on your feedback, thanks! I haven't responded to all of your concerns yet, but I will shortly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the comments before. Now, a general question, if you are "extracting" the general functions from Amplicon/Metagenomic Step and adding them to the base class Step, does it really make sense to have a klp_util.py file? I mean basically you have to access method X in Amplicon/Metagenomic Step, that will call X method in Step and then do X in klp_util.py; no? Would it be better to remove one step and just go either Amplicon/Metagenomic Step -> klp_util.py or Amplicon/Metagenomic Step -> Step and remove klp_util.py?
Removed configuration.json. Additional updates based on testing and feedback.
We're definitely on the same page. The idea was to incorporate klp_util.py into the base Step class. I left them as is for the time being, just because so many other things had changed already. |
I think this will be the only blocker before merging this initial PR and this would be the base for future ones, could you do this now before waiting for a better time? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more comments.
job_output = [join(raw_fastq_files_path, x) for x in | ||
listdir(raw_fastq_files_path)] | ||
job_output = [x for x in job_output if isfile(x) and x.endswith( | ||
'fastq.gz') and not basename(x).startswith('Undetermined')] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess these 2 can be merged, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be better to keep them separate, because you want x in isfile(x) in the second one to be the full path created by join(raw_fastq_files_path, x) in the first one and if we merged them that would mean we'd have something like:
job_output = [join(raw_fastq_files_path, x) for x in listdir(raw_fastq_files_path) if isfile(join(raw_fastq_files_path, x)) and x.endswith('fastq.gz') and not basename(x).startswith('Undetermined')]]
The join() has to be present twice.
def generate_prep_file(self): | ||
config = self.pipeline.configuration['seqpro'] | ||
|
||
seqpro_path = config['seqpro_path'].replace('seqpro', 'seqpro_mf') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the plan to remove seqpro_mf?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's an easy enough change on this side of things, and a little bit of refactoring for metapool package. I can make the change by the end of the week in between other things.
config = self.pipeline.configuration['bcl-convert'] | ||
job = super()._convert_bcl_to_fastq(config, | ||
self.pipeline.sample_sheet.path) | ||
self.fsr.write(job.audit(self.pipeline.get_sample_ids()), 'ConvertJob') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like at the end of each method something similar to this line is called, should it be more general? I mean something like self.failed_samples_audit('ConvertJob')
where that calls something like self.write(job.audit(self.pipeline.get_sample_ids()), audit_name)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this out, but it seems like it will just create another function something like the following:
def _failed_samples_audit(self, job, step_name):
self.fsr.write(job.audit(self.pipeline.get_sample_ids()), step_name)
It doesn't really do much. The meat of each audit is in each Job class, while the structure and writing of each failed samples record is in the FailedSamplesRecord class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more comments.
* DRY refactoring (#60) * WIP DRY refactoring Goal is to refactor code so that there is minimal duplication, easily extensible to enable metatranscriptomics and future Assay types, and breakout all of the functionality from the main plugin function so they can be tested properly. * Forgot to add new Step files to DRY refactor. Some Additional modifications added. * Laying out new test classes * Added support for base Step tests * Ensure output_dir is created * bugfix * Temporarily set mg-scripts dependency to development version * .gitignore fix * First test w/pseudo job-submission added. * CI Fixes * Fix for CI * debug CI. test works as intended locally * debugging CI * ci debug * ci debug * ci debug * Added test for base QCJob * Added MetagenomicStep tests * Updated testing infrastructure * Added AmpliconStep tests * flake8 * setup now points to merged mg-scripts. * Updated code to create fake files in a searchable path. * Bugfix * flake8 * Easy updates based on feedback * Removed configuration.json Removed configuration.json. Additional updates based on testing and feedback. * Merged klp_util.py into Step and klp.py based on feedback * Removed test_klp_util.py * Updates based on testing * Updates based on feedback * Changes based on feedback * Add new file * Changes based on feedback * bugfix * Testing updates (#61) * Updates based on testing on qiita-rc * Updates based on feedback * Updates based on feedback * Removed sn_tid_map_by_project as a constructor parameter for Step() * flake8 * Updates based on feedback * Added test for get_special_map() * Added test_get_tube_ids_from_qiita() * Split BaseStepTests into two halves. Tests in BaseStepTests were being inherited by child BaseStepTests and run two more times (one for Amplicon and one for Metagenomic). Partitioning out general tests into a new child class allows all children to inherit configuration, but not inherit tests. * Updates based on feedback * Updates based on feedback * Migrated pipeline calls into a single function Migrated pipeline calls into a single function to preserve their sequence, offer potential alternatives, and include a debug option to not update Qiita. * Adding coveralls to project * bugfix * test coveralls * Updating coveralls * Testing codecov integration * testing codecov * Testing codecov --------- Co-authored-by: Charles Cowart <42684307+charles-cowart@users.noreply.github.com>
This code was originally reviewed and approved as part of the following PRs: #60 #61 Code refactored to be more extensible to new Assay types. Also, monolithic unit-tests that required testing on Qiita-RC were removed. New unittests are more numerous and test smaller self-contained functionality. Qiita is no longer needed for testing; FakeQiita class is now used to emulate canned Qiita API queries. Job submission and SLURM responses are also emulated w/fake binaries.
This code was originally reviewed and approved as part of the following PRs: #60 #61 Code refactored to be more extensible to new Assay types. Also, monolithic unit-tests that required testing on Qiita-RC were removed. New unittests are more numerous and test smaller self-contained functionality. Qiita is no longer needed for testing; FakeQiita class is now used to emulate canned Qiita API queries. Job submission and SLURM responses are also emulated w/fake binaries.
This code was originally reviewed and approved as part of the following PRs: #60 #61 Code refactored to be more extensible to new Assay types. Also, monolithic unit-tests that required testing on Qiita-RC were removed. New unittests are more numerous and test smaller self-contained functionality. Qiita is no longer needed for testing; FakeQiita class is now used to emulate canned Qiita API queries. Job submission and SLURM responses are also emulated w/fake binaries.
Goal is to refactor code so that there is minimal duplication, easily extensible to enable metatranscriptomics and future Assay types, and breakout all of the functionality from the main plugin function so they can be tested properly.