This section defines the criteria and thresholds used to determine whether a sequence proceeds further in the analysis or is excluded at this stage.
Knockout Criteria
If any of the knockout conditions is met, the sequence will be excluded, and the program will stop processing it. These criteria ensure that sequences not meeting basic requirements are filtered out early.
- GC Knockout (%): Specify the minimum GC content percentage. If the sequence's GC content falls below this value, it will be excluded from further analysis.
Example: Enter 10 to exclude sequences with GC content below 10%.
- Length Knockout (bp): Define the minimum sequence length in base pairs (bp). If the sequence is shorter than this value, it will not proceed further.
Example: Enter 2000 to exclude sequences shorter than 2000 bp.
Validation Thresholds
For the validation process to halt, all thresholds must indicate the absence of an OriC (origin of replication). This ensures the sequence meets all required conditions before excluding it.
- Min. GC Content Threshold (%): Set the minimum acceptable GC content percentage for validation. Sequences with GC content below this value are flagged.
Example: Enter 30 to consider only sequences with GC content above 30%.
- Entropy Threshold: Define the minimum entropy value, which represents the complexity of the sequence. Lower values indicate less complexity, which might signal invalid sequences.
Example: Enter 1.8 to exclude sequences with entropy values below this threshold.
- GC Skew Threshold: Specify the maximum allowable GC skew value. GC skew measures the imbalance of guanine and cytosine nucleotides in the sequence.
Example: Enter 0.05 to include only sequences with a GC skew below this value.
- Variance Threshold: Enter the maximum allowable variance value for sequence validation. Variance helps identify significant fluctuations in the data.
Example: Enter 1,000,000 to exclude sequences with variance exceeding this threshold.