A TEXT BOOK ON COMPUTATIONAL MOLECULAR BIOLOGY- BY ZAHOORULLAH S MD: UNIT VI: TAXONOMY AND PHYLOGENY

Basic concepts in systematics

There is an amazing diversity of life, both living and extinct. For biologists to communicate with each other about these many organisms, there must also be a classification of these organisms into groups. Ideally, the classification should be meaningful, and not arbitrary — it should be based on the evolutionary history of life, such that it predicts properties of newly discovered or poorly known organisms.

Classification, however, is only one aspect of the much larger field of phylogenetic systematics. Systematics is an attempt to understand the evolutionary interrelationships of living things, trying to interpret the way in which life has diversified and changed over time. While classification is primarily the creation of names for groups, systematics goes beyond this to elucidate new theories of the mechanisms of evolution.

Systematics, then, is the study of the pattern of relationships among taxa; it is no less than understanding the history of all life. But history is not something we can see. It has happened once and leaves only clues as to the actual events. Biologists in general and systematists in particular use these clues to build hypotheses or models of the history. We hope to convince you that only with a hypothesis of history can we truly discuss evolution.

The everlasting words of Father Jacobus (from Hesse's Magister Ludi):

“To study history one must know in advance that one is attempting something fundamentally impossible, yet necessary and highly important. To study history means submitting to chaos and nevertheless retaining faith in order and meaning. It is a very serious task, young man, and possibly a tragic one.”

Systematics

Biological systematics is the study of the diversification of living forms, both past and present, and the relationships among living things through time. Relationships are visualized as evolutionary trees. Phylogenies have two components, branching order and branch length . Phylogenetic trees of species and higher taxa are used to study the evolution of traits (e.g., anatomical or molecular characteristics) and the distribution of organisms (biogeography). Systematics, in other words, is used to understand the evolutionary history of life on Earth.

Cladistics

Cladistics is a particular method of hypothesizing relationships among organisms. Like other methods, it has its own set of assumptions, procedures, and limitations. Cladistics is now accepted as the best method available for phylogenetic analysis, for it provides an explicit and testable hypothesis of organismal relationships.

The basic idea behind cladistics is that members of a group share a common evolutionary history, and are "closely related," more so to members of the same group than to other organisms. These groups are recognized by sharing unique features which were not present in distant ancestors. These shared derived characteristics are called synapomorphies.

Note that it is not enough for organisms to share characteristics, in fact two organisms may share a great many characteristics and not be considered members of the same group. For example, consider a jellyfish, starfish, and a human; which two are most closely related? The jellyfish and starfish both live in the water, have radial symmetry, and are invertebrates, so you might suppose that they belong together in a group. This would not reflect evolutionary relationships, however, since the starfish and human are actually more closely related. It is not just the presence of shared characteristics which is important, but the presence of sharedderived characteristics. In the example above, all three characteristics are believed to have been present in the common ancestor of all animals, and so are trivial for determining relationships, since all three organisms in question belong to the group "animals." While humans are different from the other two organisms, they differ only in characteristics which arose newly in an ancestor which is not shared with the other two. As you shall see on the next page, chosing the right characters is one of the most important steps in a cladistic analysis.

What assumptions do cladists make?

There are three basic assumptions in cladistics:

Any group of organisms are related by descent from a common ancestor.
There is a bifurcating pattern of cladogenesis.
Change in characteristics occurs in lineages over time.

The first assumption is a general assumption made for all evolutionary biology. It essentially means that life arose on earth only once, and therefore all organisms are related in some way or other. Because of this, we can take any collection of organisms and determine a meaningful pattern of relationships, provided we have the right kind of information. Again, the assumption states that all the diversity of life on earth has been produced through the reproduction of existing organisms.

The second assumption is perhaps the most controversial; that is, that new kinds of organisms may arise when existing species or populations divide into exactly two groups. There are many biologists who hold that multiple new lineages can arise from a single originating population at the same time, or near enough in time to be indistinguishable from such an event. While this model could conceivably occur, it is not currently known how often this has actually happened. The other objection raised against this assumption is the possibility of interbreeding between distinct groups. This, however, is a general problem of reconstructing evolutionary history, and although it cannot currently be handled well by cladistic methods, no other system has yet been devised which accounts for it.

The final assumption, that characteristics of organisms change over time, is the most important assumption in cladistics. It is only when characteristics change that we are able to recognize different lineages or groups. The convention is to call the "original" state of the characteristic plesiomorphic and the "changed" state apomorphic. The terms "primitive" and "derived" have also been used for these states, but they are often avoided by cladists, since those terms have been much abused in the past.

Methodology of a Cladistic Analysis

HOW TO CONSTRUCT CLADOGRAMS

Here is an outline of the steps necessary for completing a cladistic analysis. Don't be fooled, however, by the simplicity of these steps. Seeing a real cladistic analysis out to fruition can be a difficult and time consuming task.

Choose the taxa whose evoutionary relationships interest you. These taxa must be clades if you hope to come up with plausible results.

Determine the characters (features of the organisms) and examine each taxon to determine the character states (decide whether each taxon does or does not have each character). All taxa must be unique.

Determine the polarity of characters (whether each character state is original or derived in each taxon). Note that this step is not absolutely necessary in some computer algorithms. Examining the character states in outgroups to the taxa you are considering helps you determine the polarity.

Group taxa by synapomorphies (shared derived characteristics) not plesiomorphies (original, or "primitive", characteristics).

Work out conflicts that arise by some clearly stated method, usually parsimony (minimizing the number of conflicts).

Build your cladogram, which is NOT an evolutionary tree, following these rules:

All taxa go on the endpoints of the cladogram, never at nodes.

All cladogram nodes must have a list of synapomorphies which are common to all taxa above the node (unless the character is later modified).

All synapomorphies appear on the cladogram only once unless the character state was derived separately by evolutionary parallelism.

To accomplish the task of creating a good cladogram, you must use your judgement. Ask yourself the following questions and answer them carefully.

Could a supposed synapomorphy be the result of independent evolutionary development?

Are your characters chosen well?

Should you consider other characters?

Should you consider additional taxa?

Implications of Cladistics

The output from a phylogenetic analysis is a hypothesis of relationship of different taxa. This hypothesis can be represented as a cladogram, a branching diagram. Cladograms bear a lot in common with the notion of family trees. In a family tree we trace back our ancestry. For example, in the family tree on the right, the ancestors of all the rest of the family are the initial black dot and yellow square. These ancestors give rise to three children, one of which mates and has two children. We can all trace our lineages back to one set of ancestors.

All species have ancestors too. So, for example, sometime in the past an ancestral species (father) of Homo sapiens walked the earth. This ancestor went extinct (died), but left descendent species (children).

In family trees, we can talk coherently about real ancestors. In biology, the ancestors are often gone sometimes without a trace. All we have left are the children. Reading cladograms is much like reading a family tree. Both are rich in information. Cladograms, like family trees, tell the pattern of ancestry and descent. Unlike family trees, ancestors in cladistics ideally give rise to only two descendent species. Also unlike family trees, new species form from splitting of old species. In speciation, it does not take two to tango. The formation of the two descendent species is called a splitting event. The ancestor is usually assumed to "die" after the splitting event.

In the first tree, labelled Cladogram A, notice the small circles. These mark the nodes of the tree. The stems of the tree end with the taxa under consideration. At each node a splitting event occurs. The node therefore represents the end of the ancestral taxon, and the stems, the species that split from the ancestor. The two taxa that split from the node are called sister taxa. They are called sister taxa because they are like the siblings from the parent or ancestor. The sister taxa must each be more closely related to one another than to any other group because they share a close common ancestor. In the same way, you are most closely related to your siblings than to anyone else since you share common parents. Lets focus on node C inCladogram A. At the node, the ancestor goes extinct but leaves two siblings hypothesized to be humans and gorillas. Humans and gorillas are sister taxa and are more closely related to one another than either is to chimpanzees or baboons.

Working down the tree we come to node B. At this node the ancestor of the humans and gorillas split from the chimpanzees. Therefore the chimpanzees sister taxon is the human/gorilla ancestor. A sister taxon can be an ancestor and all its descedents. We call an ancestor plus all its descendents a clade. A cladogram shows us hypothesized clades.

Finally we come to node A. Here, we find the splitting event that led to the baboons and the ancestor to the chimpanzees, humans and gorillas. By working our way down the cladogram we have learned the pattern of splitting. We have found out that chimpanzees, humans and gorillas are more closely related to each other than to baboons. In this example, baboons are the outgroup.

Now, how in the world did we manufacture Cladogram A? We mentioned that it was a hypothesis. What if it we chose another hypothesis like Cladogram B or Cladogram C? We would change the pattern of speciation events. In Cladogram B, humans and chimpanzees are sister taxa and inCladogram C, chimps and gorillas are sister taxa.

Which of the three cladograms presented above is correct? None of the cladograms can be proved correct, butCladogram B is the best supported of the three based on character data and is therefore hypothesized to best reflect the true branching pattern.

Manufacturing cladograms which show hypotheses of ancestry and descent requires that we analyze characters and find those characters that unite clades.

Cladistics is useful for creating systems of classification.

Cladistics is now the most commonly used method to classify organisms. Why do we need to classify organisms? Well, consider the bewildering variety of organisms that have ever lived on Earth, from jellyfish to bacteria — that's what paleontologists do for a living. How is it possible that paleontologists, let alone other biologists, are able to communicate their ideas about such a diverse topic as the history of life? Well, it's obvious that a system of classification is needed. That is, we need words like beetle or conifer so that we can talk about many organisms at one time. In fact, the history of formal classification schemes in biology is long, dating from the 1700s, well before Darwin proposed his theory of natural selection. Today, cladistics is the method of choice for classifying life because it recognizes and employs evolutionary theory.

Cladistics predicts the properties of organisms.

As with any other system in science, a model is most useful when it not only describes what has been observed, but when it predicts that which has not yet been observed. Cladistics produces hypotheses about the relationships of organisms in a way that, unlike other systems, predicts properties of the organisms. This can be especially important in cases when particular genes or biological compounds are being sought. Such genes and compounds are being sought all the time by companies interested in improving crop yield or disease resistance, and in the search for medicines. Only an hypothesis based on evolutionary theory, such as cladistic hypotheses, can be used for these endeavours.

Cladistics helps to elucidate mechanisms of evolution.

Unlike previous systems of analyzing relationships, cladistics is explicitly evolutionary. Because of this it is possible to examine the way in which characters change within groups over time — the direction in which characters change, and the relative frequency with which they change. It is also possible to compare the descendants of a single ancestor to look at patterns of origin and extinction in these groups, or to look at relative size and diversity of the groups. Perhaps the most important feature of cladistic is its use in testing long-standing hypotheses about adaptation. For many years, since even before Darwin, it has been popular to tell "stories" about how certain traits of organisms came to be. With cladistics, it is possible to determine whether these stories have merit, or whether they should be abandoned in favor of a competing hypothesis. For instance, it was long said that the orb-weaving spiders, with their intricate and orderly webs, had evolved from spiders with cobweb-like webs. The cladistic analysis of these spiders showed that, in fact, orb-weaving was the primitive state, and that cobweb-weaving had evolved from spiders with more orderly webs. This situation has been repeated in many groups with many traits, including studies of parasitism, geographic distribution, and pollination.

Taxonomy and Phylogeny

Taxonomy is the theoretical study of classification and the principles, procedures and rules thereof. Essentially, taxonomy deals with the ways in which we group living things together. Phylogenyrefers to evolutionary history.

Taxonomy has a long history, with Aristotle giving the first detailed classification of living things. His classification of animals was:

§ Blooded (vertebrates)

§ Viviparous quadrupeds (land mammals)

§ Birds

§ Oviparous quadrupeds (reptiles and amphibians)

§ Fish

§ Cetaceans (Aristotle did not realize their mammalian nature)

§ Bloodless (invertebrates)

§ Land arthropods (insects, arachnids, myriapods)

§ Aquatic arthropods (mostly crustaceans)

§ Shelled animals (shelled mollusks, echinoderms, etc.)

§ Soft animals (cephalopods, etc.)

§ Plant-animals (cnidarians, etc., which superficially resemble plants)

However, he had made no effort to classify plants or fungi. Modern approaches to taxonomy, while obviously more diverse than in Aristotle's time, but can be lumped into three major schools: phenetic, phylogenetic (cladistic), and evolutionary.

Phenetic Taxonomy

Phenetics is an approach to grouping organisms based on total (or "raw") similarity. Although its history dates back centuries to the French botanist Michel Adanson, phenetics underwent something of a renaissance in the 60's, 70's and early 80's in response to a growing dissatisfaction with what its practitioners viewed as the arbitrary and nonquantitative approaches that rose to prominence in the 1950's. Particularly troubling was the evolutionary taxonomy of Ernst Mayr, George Gaylord Simpson and others (described more fully below) that appeared to be more "art" than science.

On this basis, rigorous computationally-driven clustering methods were developed to combat these perceived problems. Information about organisms would be gathered, fed into computers, and out would come hierarchical arrangements of organisms based on overall similarity, typically arranged in a "tree" of sorts called a phenogram.

It is important to draw a distinction between phenetics as an approach to taxonomy, and phenetics as a tool for deciphering the evolutionary relationships of organisms. Although phenetic clustering can and has been used to generate phylogenetic trees, to the phenetic taxonomist, any convergence of his phenogram on a phylogentic tree is purely coincidental. Even if the groups he were to arrive at phenetically were nothing like the groups we'd discover if we had a chance to look at the true tree of life, it wouldn't matter; there are reasons, they think, for representing living things this way independent of evolution.

Phylogenetic (Cladistic) Taxonomy

Since the dawn of taxonomic science, its practitioners had arranged groups via the emphasis of certain characters. The group we know as birds were delimited because they all had feathers, and when Linnaeus was writing, were toothless. Invertebrates lacked a notochord, vertebrates possessed one. And while groups appeared to be nested within in each other, this was seen as just another part of God's special creation.

With Charles Darwin and modern evolution biology, however, scientists, including Darwin himself, began to understand that: "Our classifications will come to be, as far as they can be so made, genealogies; and will then truly give what may be called the plan of creation." Such genealogies are more usually called "phylogenies", using a word invented by Ernst Haeckel, a dedicated investigator of such arrangements. And the idea that taxa are to represent groupings defined by evolution has been an integral part of biology ever since.

Cladistics, invented by entomologist Willi Hennig in the 1950's, is the sort of rigorous application of the concept of evolution to taxonomy that Darwin envisioned. Phylogenies are established by what distinctive features their members share to the exclusion of more distantly related organisms. Thus, if one wants to identify some subgroup of insects, features that all insects have in common, like six legs and segmented bodies, are useless. One has to use features or combinations of features that only that subgroup has, like front wings becoming hard wing covers for Coleoptera (beetles) or scaly wings for Lepidoptera (butterflies and moths). One constructs phylogenies with this technique by trying to find the family tree that involves the fewest feature changes (steps), and thus the smallest amount of convergent evolution. Like phenetics, cladistics is almost always done by computer.

Groups are then delimited on this basis. Unlike the taxonomists of yore, phylogenetic taxonomists only recognize monophyletic groups; a group derived from a single common ancestor that contains all descendents of that ancestor. Thus, any definition of invertebrate that contains all forms without a notochord but not their descendants, i.e., those with a notochord, is seen as artificial and unscientific.

Evolutionary Taxonomy

Finally, traditional taxonomy is essentially a hybrid of these two, though sometimes a very subjective one. Traditional taxonomy has admitted numerous taxa that are cladistically illegitimate, because they exclude some of the descendants of the ancestors of their members. Such traditional taxa as Pisces, Reptilia, etc. are cladistically illegitimate, because legged vertebrates are descended from fish and birds from reptiles. "Evolutionary taxonomy" is essentially traditional taxonomy with evolution taken into account.

For the most part, hierarchy-based taxonomy has been very successful, but one-celled organisms often practice lateral gene transfer, and sometimes even lateral genome transfer (endosymbiosis), which cause difficulty for such taxonomies. In such cases, the "proper" taxonomies are those of genes or sets of genes, though as explained above, some genes serve as a reasonable proxy for the whole organism.

Nomenclature

The criteria by which we group organisms is one thing; the manner in which we give them names, what those names mean and how we define them, is another. The approach to naming groups (nomenclature) most familiar to all of us was invented Carolus Linnaeus. He invented binomial nomenclature by snipping the then-often-used <genus> + <lots of attributes of a species> down to <genus> + <some distinctive attribute of a species>. He also rationalized nomenclature, using the same name for both sexes and for adults and juveniles of a species. Like many of his contemporaries, he used Latin, which has the useful feature of being nationalistically neutral since the fall of the Roman Empire.

His hierarchy of taxa (singular: taxon) was kingdom, class, order, genus, and species, but later taxonomists added phylum, division, family, lots of sub- and supertaxa, and even such taxa as domain, cohort, tribe, and section.

Taxonomic names and parts of names come from a variety of sources, though they must all be Latinized. Aside from personal and place names, taxonomic name parts are almost always words drawn from Latin and Classical Greek, with other languages occasionally represented. They are often common names (Homo, Canis, Bos, Equus, Columba, Salmo, Apis, Lilium, Rosa, Quercus, Pinus, etc.), and also words for various features and descriptions of them. Compound words are very common, though this sometimes leads to very long and difficult to pronounce names likeStrongylocentrotus purpuratus (the purple sea urchin, found off the North American coast of the Pacific, often used as a model system). Higher-level taxa are often named after genera that they contain.

Several taxonomic ranks have standardized suffixes. Animal families end in -idae, plant families in -aceae, bird orders in -iformes, plant orders in -ales, etc. However, genera and lower-ranking taxa do not; genus names are singular nouns, while species names are either singular nouns, adjectives, or genitives (Latin's of-case). Also, taxon names above the genus are all plurals or collectives, whether or not they have some standardized suffix. Such conventions allow comparison of the ranks of different taxa at a glance.

Many organisms have received different names from different taxonomists; such conflicts are resolved by using the first-bestowed name. Thus, Apatosaurus pushed out Brontosaurus andHyracotherium pushed out Eohippus. Although the international codes of nomenclature have no rules against it, this rule of priority has meant that some inappropriate names -- names that don't accurately reflect the content or characters of taxa -- have survived. The chimpanzee, Pan troglodytes, got its species name because Linnaeus had believed that it lives in caves; it actually lives in forests, making Pan silvanus more appropriate. Also, Basilosaurus ("king lizard") turned out to be an early cetacean rather than a marine reptile upon closer examination. The Venus Flytrap, Dionaea muscipula, might be more appropriately named something like Insecticaptrix muscipula or even Insecticaptrix carolinensis (Insect-taker/grabber (f., like planta), from Carolina).

Nature of data used in taxonomy and phylogeny

From the time of Charles Darwin, it has been the dream of many biologists to reconstruct the evolutionary history of all organisms on Earth and express it in the form of a phylogenetic tree. Phylogeny uses evolutionary distance, or evolutionary relationship, as a way of classifying organisms (taxonomy).

Phylogenetic relationship between organisms is given by the degree and kind of evolutionary distance. To understand this concept better, let us define taxonomy. Taxonomy is the science of naming, classifying and describing organisms. Taxonomists arrange the different organisms in taxa (groups). These are then further grouped together depending on biological similarities. This grouping of taxa reflects the degree of biological similarity.

Systematics takes taxonomy one step further by elucidating new methods and theories that can be used to classify species. This classification is based on similarity traits and possible mechanisms of evolution. In the 1950s, William Hennig, a German biologist, proposed that systematics should reflect the known evolutionary history of lineages, an approach he called phylogenetic systematics. Therefore, phylogenetic systematics is the field that deals with identifying and understanding the evolutionary relationships among many different kinds of organisms

Phylogenic relationships have been traditionally studied based on morphological data. Scientists used to examine different traits or characteristics and tried to establish the degree of relatedness between organisms. Then scientists realized that not all shared characteristics are useful in studying relationships between organisms. This discovery led to a study of systematics called cladistics. Cladistics is the study of phylogenetic relationships based on shared, derived characteristics. There are two types of characteristics, primitive traits and derived traits, which are described below.

Primitive traits are characteristics of organisms that were present in the ancestor of the group that is under study. They do not indicate anything about the relationships of species within a group because they are inherited from the ancestor to all of the members of the group. Derived traits are characteristics of organisms that have evolved within the group under study. These characteristics were not present in the ancestor. They are useful because they can help explain why some species have common traits. The most likely explanation for the presence of a trait that was not present in the ancestor of the whole group is that it evolved from a more recent ancestor.

Two extensive groups of analyses exist to examine phylogenetic relationships: Phenetic methods and cladistic methods. Phenetic methods, or numerical taxonomy, use various measures of overall similarity for the ranking of species. They can use any number or type of characters, but the data has to be converted into a numerical value. The organisms are compared to each other for all of the characters and then the similarities are calculated. After this, the organisms are clustered based on the similarities. These clusters are called phenograms. They do not necessarily reflect evolutionary relatedness. The cladistic method is based on the idea that members of a group share a common evolutionary history and are more closely related to members of the same group than to any other organisms. The shared derived characteristics are called synapomorphies.

The introduction of two important tools has dramatically improved the study of phylogenetics. The first tool is the development of computer algorithms capable of constructing phylogenetic trees. The second tool is the use of molecular sequence data for phylogenetic studies.

Phylogenetics can use both molecular and morphological data in order to classify organisms. Molecular methods are based on studies of gene sequences. The assumption of this methodology is that the similarities between genomes of organisms will help to develop an understanding of the taxonomic relationship among these species. Morphological methods use the phenotype as the base of phylogeny. These two methods are related since the genome strongly contributes to the phenotype of the organisms. In general, organisms with more similar genes are more closely related. The advantage of molecular methods is that it makes possible the study of genes without a morphological expression.

As previously mentioned, closely related species share a more recent common ancestor than distantly related species. The relationships between species can be represented by a phylogenetic tree. This is a graphical representation that has nodes and branches. The nodes represent taxonomic units. Branches reflect the relationships of these nodes in terms of descendants. The branch length usually indicates some form of evolutionary distance. The actual existing species called the operational taxonomic units (OTUs) are at the tip of the branches on the external nodes.

Tree construction methods
Some methods have been proposed for the construction of phylogenetic trees. They can be classified into two groups, the cladistic methods (maximum parsimony and maximum likelihood) and the phenetic method (distance matrix method).

Maximum parsimony trees imply that simple hypotheses are more preferable than complicated ones. This means that the construction of the tree using this method requires the smallest number of evolutionary changes in order to explain the phylogeny of the species under study. In the procedure, this method compares different parsimonious trees and chooses the tree that has the least number of evolutionary steps (substitutions of nucleotides in the context of DNA sequence).

Maximum likelihood This method evaluates the topologies of different trees and chooses the best based on a specified model. This model is based on the evolutionary process that can account for the conversion of one sequence into another. The parameter considered in the topology is the branch length.

Distance matrix is a phenetic approach preferred by many molecular biologists for DNA and protein work. This method estimates the mean number of changes (per site in sequence) in two taxa that have descended from a common ancestor. There is much information in the gene sequences that must be simplified in order to compare only two species at a time. The relevant measure is the number of differences in these two sequences, a measure that can be interpreted as the distance between the species in terms of relatedness.

Molecular phylogeny was first suggested in 1962 by Pauling and Zuckerkandl. They noted that the rates of amino acid substitution in animal hemoglobin were roughly constant over time. They described the molecules as documents of evolutionary history. The molecular method has many advantages. Genotypes can be read directly, organisms can be compared even if they are morphologically very different and this method does not depend on phenotype.

Phylogeny is currently used in many fields such as molecular biology, genetics, evolution, development, behaviour, epidemiology, ecology, systematics, conservation biology, and forensics. Biologists can infer hypotheses from the structure of phylogenetic trees and establish models of different events in evolutionary history. Phylogeny is an exceptional way to organize evolutionary information. Through these methods, scientists can analyse and elucidate different processes of life on Earth.

Today, biologists calculate that there are about 5 to 10 million species of organisms. Different lines of evidence, including gene sequencing, suggest that all organisms are genetically related and may descend from a common ancestor. This relationship can be represented by an evolutionary tree, like the Tree of Life. The Tree of Life is a project that is focused on understanding the origin of diversity among species using phylogeny.

Molecular evolution

Molecular evolution is in part a process of evolution at the scale of DNA, RNA, and proteins. Molecular evolution emerged as a scientific field in the 1960s as researchers from molecular biology, evolutionary biology and population genetics sought to understand recent discoveries on the structure and function of nucleic acids and protein. Some of the key topics that spurred development of the field have been the evolution of enzyme function, the use of nucleic acid divergence as a "molecular clock" to study species divergence, and the origin of noncoding DNA.

Recent advances in genomics, including whole-genome sequencing, high-throughput protein characterization, and bioinformatics have led to a dramatic increase in studies on the topic. In the 2000s, some of the active topics have been the role of gene duplication in the emergence of novel gene function, the extent of adaptive molecular evolution versus neutral processes of mutation and drift, and the identification of molecular changes responsible for various human characteristics especially those pertaining to infection, disease, and cognition.

Molecular systematics is a product of the traditional field of systematics and molecular genetics. It is the process of using data on the molecular constitution of biological organisms' DNA, RNA, or both, in order to resolve questions in systematics, i.e. about their correct scientific classification or taxonomy from the point of view of evolutionary biology.

Molecular systematics has been made possible by the availability of techniques for DNA sequencing, which allow the determination of the exact sequence of nucleotides or bases in either DNA or RNA. At present it is still a long and expensive process to sequence the entire genome of an organism, and this has been done for only a few species. However, it is quite feasible to determine the sequence of a defined area of a particular chromosome. Typical molecular systematic analyses require the sequencing of around 1000 base pairs.

Genomic evolution is the process that changes the structure (sequence) or size of a genome over time.

The study of genome evolution involves multiple fields such as structural analysis of the genome, the study of genomic parasites, gene and ancient genome duplications, polyploidy, and comparative genomics. Evolutionary biologists are interested in five specific questions in regards to evolution of the genome, these are:

How did the genome evolve into its current size?

What is the content within the genome, is it mostly junk or not?

What is the distribution of genes within a genome?

What is the composition of the nucleotides within the genome?

How does translation of the genetic code evolve?

For more information see the below link

http://www.google.co.in/url?sa=t&rct=j&q=&esrc=s&source=web&cd=5&ved=0CEgQFjAE&url=http%3A%2F%2Fkim.bio.upenn.edu%2F~simola%2FGCBPrelim%2Fcompbio%2Fevolution-phylogenetics%2Fevolution-phylogenetics.pdf&ei=9R6NUJrYOYfirAekvYDwBQ&usg=AFQjCNHS-GV9q-6svkwgQKsBan0k9hMWSw&sig2=N_PyXb1RZ9TOef5C4vB4jw

Definition and description of Phylogenetic trees and types of trees

Phylogenetic Tree:

Look at the tree below and describe what sister relationships it depicts:

Phylogenetic Tree:

Look at the tree below and describe what sister relationships it depicts:

Tree 1:

Species 5 is the sister group to species 6

Species 3 is the sister group to species 4

The clade of species 3+4 is the sister group to the clade of species 5+6

The clade of species (3/4)+(5/6) is sister to species 2

Species 1 is sister to the clade containing all the other species

Remember, trees can be rotated at the nodes without changing the topology (i.e. without changing the relationships represented in the tree).

For example, look at the tree below.

Tree 2:

Although tree 2 looks superficially quite different from tree1, examine the relationships we deduced from tree 1 and see if they still apply.

Species 5 is the sister group to species 6. . . YES

Species 3 is the sister group to species 4. . . YES

The clade of species 3+4 is the sister group to the clade of species 5+6. . . YES

The clade of species (3/4)+(5/6) is sister to species 2. . . YES

Species 1 is sister to the clade containing all the other species. . . YES

THUS, Tree 1 and Tree 2 are identical trees that are simply drawn differently (rotated at the nodes).

Now examine tree #3 below in the same way

Tree 3:

Species 5 is the sister group to species 6. . .NO

Species 3 is the sister group to species 4. . .YES

The clade of species 3+4 is the sister group to the clade of species 5+6. . . NO

The clade of species (3/4)+(5/6) is sister to species 2. . . NO

Species 1 is sister to the clade containing all the other species. . . YES

Thus, although some branches in the tree are similar to tree 1 and tree 2, tree 3 is NOT the same tree!

Inferring Ancestral Characteristics on a Tree:

Look at the tree below and try to infer the character state (Blue or Red) of the ancestral node

Let's begin by guessing that the mystery ancestor was red

In this case, we would have to have the evolution of blue flowers twice in order to best explain the tree (i.e. 2 evolutionary steps).

Now let's see what happens if we guess blue to be the ancestral character.

In this case, the simplest explanation is that red flowers evolved once (one evolutionary step). Since it only calls for one change rather than two, assuming blue to be character at the ancestral node is the best and most likely explanation.

Monophyly vs. Paraphyly:

The tree below first shows an outgroup (family Outgroupaceae) to the left, and some other species labeled 2 - 6.

If we want to name these ingroup species as families, we want to be sure to only name Monophyletic groups. At first, a logical way to group these ingroup plants as families could be to segregate them by color into two families, Blueaceae (containing species 2, 5, and 6) and Redaceae (species 3 and 4).

First, we'll examine if Redaceae is a good monophyletic group. Follow the branches down to the node that would represent their common ancestor of species 3 and 4.

We can easily see that Redaceae would include that ancestor and all of its descendants (only species 3 and 4). Red color is a derived trait that is shared between those two species(a synapomorphy), and thus groups them together as a monophyletic group.

Now find the common ancestor of all of the blue species

Follow all the descendant branches from that common ancestor. The red species 3 and 4 are descendents of that ancestor too!

Grouping the blue species together based on the retention of an ancestral characer state makes it a paraphyletic group. Thus, naming species 2, 5, and 6 as the family Blueaceae is not acceptable.

So what are some monophyletic groups we could name as families? If we include species 3 and 4 in the Blueaceae (even though they're red!), it makes it a monophyletic family.

Taxonomists who combine formerly recognized families in order to make a larger monophyletic family are often called lumpers. Another way to create monophyletics is to be what is called a splitter. Splitters divide the larger monophyletic group into smaller monophyletic families, often with some families containing only one genus or species.

A splitter might divide the ingroup into three families in order to retain Redaceae as its own family. To do this, you must separate out species 2 as its own family. In this case, you could still call species 5 and 6 'Blueaceae' since they form a monophyletic clade. Species 2 would have its own family name (Neoblueaceae, perhaps?). And Redaceae would be a monophyletic family that is no longer included in a larger paraphyletic group.

Both splitting and lumping are equally correct and viable solutions, as long as the resulting groups are monophyletic. Sometimes lumping results in excessively huge groups or causes a popular, easily recognized taxon to be eliminated at the family level. Splitting can lead to a ridiculous number of families to be recognized and disintegration of natural groups at the family level. Both philosophies are widely applied. And both provide headaches to botanists who are constantly having to relearn plants under a new naming system.

Polytomy:

When a phylogenetic tree has more than two branches radiating from a node, it is called a polytomy. Polytomies arise when the relationships between a group of taxa are unresolved. For example, look at the polytomy in the tree below

The relationships between species 3 , 4, and 5 are unresolved. Species 4 and 5 could be a sister group, with species 3 as the sister group to their clade.

Alternatively, species 3 and 5 could be a sister group, or species 3 and 4 could be a sister group.

The polytomy shown in the original tree means that any one of the three trees that followed might be correct, but it is unresolved, and we just don't know the true relationships at that node..