Looking to the Stars
An interview with Alisha Holloway, Director of Bioinformatics at Phylos Bioscience, Portland, Oregon and Assistant Adjunct Professor, Department of Epidemiology and Biostatistics, University of California, San Francisco (UCSF), USA.
I was at graduate school when the human genome sequence was published in 2001, and I decided then to get into data analysis at the whole-genome level. Much as I enjoyed working in the wet laboratory, bioinformatics gives you the opportunity to do experiments very quickly. You can identify a dataset, ask some interesting questions, and get almost immediate answers. It’s like solving a puzzle. In my postdoc I investigated genomics evolution in fruit flies, before moving on to medical genomics at Gladstone Institutes and UCSF.
I had been following the legalization of cannabis in many states, and first heard about Phylos at a conference in San Francisco. I was immediately struck by how well their scientific program fit with my research interests, and soon after joined the company to lead the bioinformatics team.
“We have analyzed thousands of samples from seed companies, growers and collectors in 80 different countries.”
Phylos is a company using genetic tools to help the cannabis industry. For example, one of our products is a DNA sex test, which allows growers to cull unwanted male plants just seven days after germination, saving significant time and money compared to traditional methods. We have also launched an ambitious effort to identify varieties of cannabis and their relationships to each other. What is the genetic diversity between cannabis varieties? Are varieties always what they purport to be? And how has cannabis evolved over time?
You can go to a dispensary and buy ‘Sour Diesel’, but it may not be the plant you think you are getting. That’s important, because different varieties have amazingly different flavor and chemical profiles, which in large part determines the effect on consumers. By better understanding how varieties are related to each other, and what genetic background produces what chemical profile, we can breed varieties that have beautiful flavor profiles or can treat specific medical conditions.
We have analyzed thousands of samples from seed companies, growers and collectors in 80 different countries. Some are nonviable seeds, and some are small pieces of plant stem washed in isopropanol to remove any THC and make them legal to ship. In places where we can’t ship any part of the plant, people can send us DNA.
We extract DNA from samples, and amplify targeted regions of the genome that have a variable single nucleotide polymorphism (SNP) – a variation in a specific position in the genetic code. We analyze differences in these SNPs using computational tools that allow us to determine identity by descent (familial relationships) and variability (which tells growers how much genetic variation they can expect in any offspring). To present this data in an easy-to-understand way, we use principal components analysis to map each variety in 3D space – the Phylos Galaxy (http://galaxy.phylosbioscience.com/).
Each star in the Galaxy represents a sample from which we have sequenced DNA. Closely related varieties are connected by lines that are based on the identity-by-descent analyses, to indicate a familial relationship or clone. The color of each node is determined by population subdivision – we’ve found evidence for six distinct subpopulations so far, and varieties are color coded by their membership in one or more subpopulations.
The Galaxy was released to the public in April 2016 and people are having a lot of fun exploring related varieties and the clusters that form. Growers are using their genetic reports as a marketing tool, to promote the uniqueness of their strains, and breeders are using the information to decide which varieties to cross. It’s useful on a lot of levels, and it’s pretty satisfying for us to reach so many people with a cool, interactive tool.
We’re trying to be very open about the data that we have – everyone can see every variety that’s on the Galaxy, and we’re also releasing all of the data (with permission) through the Open Cannabis Project (http://opencannabisproject.org/). There is a lot of work to be done in the field of cannabis genetics and genomics, and the more data we can release in an open format, the more questions researchers can ask and answer. Every time we add a sample to the Galaxy it’s a new piece of information – a data point that no-one has had before.
Perhaps surprisingly, a lot of the anecdotal stories about the origins of various strains are really panning out. Growers are often very knowledgeable about the pedigree of their plants and can accurately identify the variety and its parent varieties. As genotyping becomes more widespread, one positive result is likely to be greater consistency and accuracy in naming of varieties – we’ll no longer see ten names for the same variety. That’s not to say that growers can’t distinguish their product from others of the same variety, as even cloned plants can have significant differences when grown under different conditions.
We have a lot of support in the local community and the industry as a whole. Breeders and growers are excited to learn how varieties are related and what kind of genetic variation there is in their plants. Often, they have been crossing varieties to create new strains and are fascinated to find out how genetically different or similar the parent plants are, and how that is reflected in the offspring.
Ultimately, to really understand the plant we need genomics, genetics and chemical analysis. If we can combine genetic information with chemical profiles and other characteristics, we’ll be able to develop markers for different traits, and so improve the plant – not just in flavor and chemical profile but in ease of cultivation.
Sure, I get a few giggles and jokes when I tell people I’m researching cannabis but, on the whole, they are grateful that real science is being done to understand the plant and how we can use it to our best advantage.