python - How to cut up a string in a specific way, retaining what is useful? -


I am trying to create a program for my biology research.

I sequence is required to take:

  NNNNNNNNNNCCNNAGTGNGNACAGACGACGGGCCCTGGCCCCTCGCACACCCTGGACCA AGTCAATCGCACCCACTTCCCTTTCTTCTCGGATGTCAAGGGCGACCACCGGTTGGTGTT GAGCGTCGTGGAGACCACCGTTCTGGGGCTCATCTTTGTCGTCTCACTGCTGGGCAACGT GTGTGCTCTAGTGCTGGTGGCGCGCCGTCGGCGCCGTGGGGCGACAGCCAGCCTGGTGCT CAACCTCTTCTGCGCGGATTTGCTCTTCACCAGCGCCATCCCTCTAGTGCTCGTCGTGCG CTGGACTGAGGCCTGGCTGTTGGGGCCCGTCGTCTGCCACCTGCTCTTCTACGTGATGAC AATGAGCGGCAGCGTCACGATCCTCACACTGGCCGCGGTCAGCCTGGAGCGCATGGTGTG CATCGTGCGCCTCCGGCGCGGCTTGAGCGGCCCGGGGCGGCGGACTCAGGCGGCACTGCT GGCTTTCATATGGGGTTACTCGGCGCTCGCCGCGCTGCCCCTCTGCATCTTGTTCCGCGT GGTCCCGCAGCGCCTTCCCGGCGGGGACCAGGAAATTCCGATTTGCACATTGGATTGGCC CAACCGCATAGGAGAAATCTCATGGGATGTGTTTTTTGTGACTTTGAACTTCCTGGTGCC GGGACTGGTCATTGTGATCAGTTACTCCAAAATTTTACAGATCACGAAAGCATCGCGGAA GAGGCTTACGCTGAGCTTGGCATACTCTGAGAGCCACCAGATCCGAGTGTCCCAACAA GA CTACC GACTCTTCCGCACGCTCTTCCTGCTCATGGTTTCCTTCTTCATCATGTGGAGTCC CATCATCATCACCATCCTCNCATCTTGATCCAAAACTTCCGGCAGGACCTGGNCATCTGG NCATCCCTTTTCTTCTGGGNNGTNNNNNCACGTTGCNACTCTNCCTAAANCCCATACTGT ANNANATGNCGCTNNNAGGAANGAATGGAGGAANANTTTTTGNNNNNNNNN  

... and the beginning and end of last week en first N removes everything. In other words, it's something to look like this:

  ACAGACGACGGGCCCTGGCCCCTCGCACACCCTGGACCA AGTCAATCGCACCCACTTCCCTTTCTTCTCGGATGTCAAGGGCGACCACCGGTTGGTGTT GAGCGTCGTGGAGACCACCGTTCTGGGGCTCATCTTTGTCGTCTCACTGCTGGGCAACGT GTGTGCTCTAGTGCTGGTGGCGCGCCGTCGGCGCCGTGGGGCGACAGCCAGCCTGGTGCT CAACCTCTTCTGCGCGGATTTGCTCTTCACCAGCGCCATCCCTCTAGTGCTCGTCGTGCG CTGGACTGAGGCCTGGCTGTTGGGGCCCGTCGTCTGCCACCTGCTCTTCTACGTGATGAC AATGAGCGGCAGCGTCACGATCCTCACACTGGCCGCGGTCAGCCTGGAGCGCATGGTGTG CATCGTGCGCCTCCGGCGCGGCTTGAGCGGCCCGGGGCGGCGGACTCAGGCGGCACTGCT GGCTTTCATATGGGGTTACTCGGCGCTCGCCGCGCTGCCCCTCTGCATCTTGTTCCGCGT GGTCCCGCAGCGCCTTCCCGGCGGGGACCAGGAAATTCCGATTTGCACATTGGATTGGCC CAACCGCATAGGAGAAATCTCATGGGATGTGTTTTTTGTGACTTTGAACTTCCTGGTGCC GGGACTGGTCATTGTGATCAGTTACTCCAAAATTTTACAGATCACGAAAGCATCGCGGAA GAGGCTTACGCTGAGCTTGGCATACTCTGAGAGCCACCAGATCCGAGTGTCCCAACAAGA C TACCGACTCTTCCGCACGCTCTTCCTGCTCATGGTTTCCTTCTTCATCATGTGGAGTCC CATCATCATCACCATCCTC  

How do I do this?

I think you can see the longest sequence of non-N characters in these inputs.

Otherwise, you do not have any rule to separate the previous N from the first N in the suffix. Not entirely different about N , you start (before ACAGAC ... ) and the next N (), or, for that matter, except for the previous one ( GN ) that he chooses the longest sequence, in fact, at the very beginning 10 n Compared to and at the end 9, nothing seems to be anything special about N. .

The easiest way to do this is to catch only the all views and stay the longest:

  max (s.split ('N'), key = lane) If you have some additional rules on top of it - for example, the longest sequence whose length is divisible by three (which The same thing is the case) -You can do the same basic thing:  
  max (sql sql ('n') for seq if lane (seq)% 3 == 0), key = lane)  

Comments