I am trying to create a program for my biology research.
I sequence is required to take:
NNNNNNNNNNCCNNAGTGNGNACAGACGACGGGCCCTGGCCCCTCGCACACCCTGGACCA AGTCAATCGCACCCACTTCCCTTTCTTCTCGGATGTCAAGGGCGACCACCGGTTGGTGTT GAGCGTCGTGGAGACCACCGTTCTGGGGCTCATCTTTGTCGTCTCACTGCTGGGCAACGT GTGTGCTCTAGTGCTGGTGGCGCGCCGTCGGCGCCGTGGGGCGACAGCCAGCCTGGTGCT CAACCTCTTCTGCGCGGATTTGCTCTTCACCAGCGCCATCCCTCTAGTGCTCGTCGTGCG CTGGACTGAGGCCTGGCTGTTGGGGCCCGTCGTCTGCCACCTGCTCTTCTACGTGATGAC AATGAGCGGCAGCGTCACGATCCTCACACTGGCCGCGGTCAGCCTGGAGCGCATGGTGTG CATCGTGCGCCTCCGGCGCGGCTTGAGCGGCCCGGGGCGGCGGACTCAGGCGGCACTGCT GGCTTTCATATGGGGTTACTCGGCGCTCGCCGCGCTGCCCCTCTGCATCTTGTTCCGCGT GGTCCCGCAGCGCCTTCCCGGCGGGGACCAGGAAATTCCGATTTGCACATTGGATTGGCC CAACCGCATAGGAGAAATCTCATGGGATGTGTTTTTTGTGACTTTGAACTTCCTGGTGCC GGGACTGGTCATTGTGATCAGTTACTCCAAAATTTTACAGATCACGAAAGCATCGCGGAA GAGGCTTACGCTGAGCTTGGCATACTCTGAGAGCCACCAGATCCGAGTGTCCCAACAA GA CTACC GACTCTTCCGCACGCTCTTCCTGCTCATGGTTTCCTTCTTCATCATGTGGAGTCC CATCATCATCACCATCCTCNCATCTTGATCCAAAACTTCCGGCAGGACCTGGNCATCTGG NCATCCCTTTTCTTCTGGGNNGTNNNNNCACGTTGCNACTCTNCCTAAANCCCATACTGT ANNANATGNCGCTNNNAGGAANGAATGGAGGAANANTTTTTGNNNNNNNNN
... and the beginning and end of last week en first N removes everything. In other words, it's something to look like this:
ACAGACGACGGGCCCTGGCCCCTCGCACACCCTGGACCA AGTCAATCGCACCCACTTCCCTTTCTTCTCGGATGTCAAGGGCGACCACCGGTTGGTGTT GAGCGTCGTGGAGACCACCGTTCTGGGGCTCATCTTTGTCGTCTCACTGCTGGGCAACGT GTGTGCTCTAGTGCTGGTGGCGCGCCGTCGGCGCCGTGGGGCGACAGCCAGCCTGGTGCT CAACCTCTTCTGCGCGGATTTGCTCTTCACCAGCGCCATCCCTCTAGTGCTCGTCGTGCG CTGGACTGAGGCCTGGCTGTTGGGGCCCGTCGTCTGCCACCTGCTCTTCTACGTGATGAC AATGAGCGGCAGCGTCACGATCCTCACACTGGCCGCGGTCAGCCTGGAGCGCATGGTGTG CATCGTGCGCCTCCGGCGCGGCTTGAGCGGCCCGGGGCGGCGGACTCAGGCGGCACTGCT GGCTTTCATATGGGGTTACTCGGCGCTCGCCGCGCTGCCCCTCTGCATCTTGTTCCGCGT GGTCCCGCAGCGCCTTCCCGGCGGGGACCAGGAAATTCCGATTTGCACATTGGATTGGCC CAACCGCATAGGAGAAATCTCATGGGATGTGTTTTTTGTGACTTTGAACTTCCTGGTGCC GGGACTGGTCATTGTGATCAGTTACTCCAAAATTTTACAGATCACGAAAGCATCGCGGAA GAGGCTTACGCTGAGCTTGGCATACTCTGAGAGCCACCAGATCCGAGTGTCCCAACAAGA C TACCGACTCTTCCGCACGCTCTTCCTGCTCATGGTTTCCTTCTTCATCATGTGGAGTCC CATCATCATCACCATCCTC
How do I do this?
I think you can see the longest sequence of non-N characters in these inputs.
Otherwise, you do not have any rule to separate the previous N
from the first N
in the suffix. Not entirely different about N
, you start (before ACAGAC ...
) and the next N
( GN
) that he chooses the longest sequence, in fact, at the very beginning 10 n Compared to and at the end 9, nothing seems to be anything special about N. .
The easiest way to do this is to catch only the all views and stay the longest:
max (s.split ('N'), key = lane) If you have some additional rules on top of it - for example, the longest sequence whose length is divisible by three (which The same thing is the case) -You can do the same basic thing: max (sql sql ('n') for seq if lane (seq)% 3 == 0), key = lane)
Comments
Post a Comment