Sequencing experimental design

I went through an exercise this weekend trying to think of the questions I ask myself when designing sequencing-based experiment to describe microbial communities. I am finding myself in many collaborations where I have to answer these questions with investigators with little to no experience with sequencing and its opportunities and limitations.

I thought I’d put it here for some feedback from the universe.

Should you even perform/use/initiate sequencing

  • What is your research question?
  • What is your budget?
  • What are your resources? Samples? Computational?
  • Do you have a hypothesis – or what do you expect to see? Do you have previous evidence to suggest your expectations?
  • What kind of data do you have already or plan on getting?
  • What kind of data outputs do you expect? need? want? have?
  • Are there datasets that already exist and can answer your questions?
  • How many treatments / gradients are being compared?

What kind of / how much sequencing do you need

  • Do you want to characterize differences or identify significant differences? How many replicates do you minimally need?
  • Do you have appropriate positive / negative controls? (Thanks @markstenglein)
  • Are you trying to identify some specific genes? How much do you know about what you are looking for? How much is known in general?
  • Once you get this data, are you prepared for the analysis?
  • How much does the quality of the data matter – how much resolution do you need?
  • How specific? (do you need to identify mobile genetic elements and species host? Or carbon metabolism and phyla? Do you need to identify strain variation?)
  • How much do you need to sample? (e.g., is excellent characterization of the 10% most abundant organisms or decent characterization of 90% of organisms)
  • Do you have a good reference database? Or do you need to develop one? Is this reference database applicable to the samples you are studying?
  • If I describe every gene in your sample, how much will you actually use?

What kind of collaborator are you / looking for

  • Do you want a collaborator who helps you with understanding the biological question? Or data analysis assistance?

How happy will you be if:

  • I gave you just the raw sequencing files
  • I gave you an assembly of partial genes? whole genomes?
  • I gave you a species/function-abundance matrix
  • If 10, 30, 50, 80% of your sequencing reads can be identified
  • If sequences are identified as significantly different but we have little idea what they are
  • If I tell you who is there
  • If I tell you who is there and what they are doing
  • If I develop a reference that is more specific to your system
  • This analysis took 1, 2, 3, >6 months
  • All your data and the analysis was openly accessible

I’d love to get feedback on what kinds of questions others are thinking of.

Cheers, Adina

blog comments powered by Disqus