# Research and Technology - Forensic Science Communications - April 2005

April 2005 - Volume 7 - Number 2 |

Research and Technology

# Evaluating Mixed DNA Profiles with the Presence of Relatives: Theory, Method, and Computer Software

Yue-Qing Hu

Research Postgraduate Student

Department of Statistics and Actuarial Science

University of Hong Kong

Hong Kong, China

and

Department of Mathematics

Southeast University

Nanjing, China

Wing K. Fung

Professor of Statistics

Department of Statistics and Actuarial Science

University of Hong Kong

Hong Kong, China

Jian Lu

Research Postgraduate Student

Department of Mathematics

Southeast University

Nanjing, China

#### Abstract

The theory and method are provided for evaluating the evidential weight of DNA mixture in a subpopulation with the presence of related contributors. An efficient, user-friendly, Windows-based software for handing such mixture problems has been developed. The software is illustrated by analyzing a case with various relationships between the persons involved in the case. The effects of different population substructure, kinship relationships, and defense propositions are also demonstrated numerically.

#### Introduction

The evaluation of DNA mixture is needed in many situations, such as the case of rape, in which the mixed crime stain may come from more than one person. The likelihood ratio is a useful measure commonly used to assess the weight of DNA evidence. A general formula for calculating the likelihood ratio in the mixture case was provided by Weir et al. (1997) and Fukshansky and Bar (1998) under the Hardy-Weinberg equilibrium. When all involved people come from the same subpopulation, Curran et al. (1999) and Fung and Hu (2000A) have derived formulas for evaluating the likelihood ratios. Fung and Hu (2000B) reported a general formula for calculating likelihood ratios for DNA mixtures based on Recommendation 4.1 of the second National Research Council Report (1996) on the evaluation of forensic DNA evidence.

In some situations, the interpretation of DNA evidence involves the relatives of the suspects and/or perpetrators. The assessment of the weight of DNA evidence when a relative of a suspect is involved in the pool of possible perpetrators has been discussed in the literature. Evett (1992) obtained a formula for the likelihood ratio in a case when the defense was “it was my brother.” Brookfield (1994) evaluated the effect that the suspect and the contributor of the crime stain are relatives upon the likelihood ratio (see also Berlin et al. 1997; Fung et al. 2002; Sjerps and Kloosterman 1999). All these papers, however, studied the effect of a relative on the likelihood ratio for a single source DNA sample problem. With regard to DNA mixture problems, Fukshansky and Bar (2000) and Hu and Fung (2003) have developed formulas for evaluating the likelihood ratios when relatives are involved. In both works, Hardy-Weinberg equilibrium was assumed in the population to which all involved persons belong.

In this paper, the theory and method for interpreting DNA mixture, with the presence of relatives, are discussed when all involved people are of the same subpopulation origin and allele proportions are only available for total population. A general formula of match probability is provided for the case that the suspect is not available for typing but his or her relative is available or that there are two related unknown contributors.

A Windows-based computer software, which is easy to use, can handle various kinds of relationships between the relatives simultaneously, and requires only minimal input, making it less susceptible to transcriptional error, was developed to analyze this problem. The method and the software are illustrated with an example using various sets of hypotheses. It demonstrates that the software can handle common mixture problems involving unrelated persons.

#### Match Probability Formulas

Let *M* be the DNA mixture, which comprises distinct alleles, *K* be the genotype(s) of the typed person(s), and *H* be a proposition about who the contributors of the mixed stain were. Usually, some unknown and some typed persons were declared as the contributors of the mixture in hypothesis*H*. The weight of DNA evidence is usually measured by the likelihood ratio, which is a ratio of two probabilities like *P* (Evidence|*H*). Writing *P*(Evidence|*H*) = *P* (*K*|*H*)*P*(*M*|*K*, *H*) and using the fact that *P* (*K*|*H*) = *P*(*K*), the calculation of the likelihood ratio is then reduced to the evaluation of *P* (*M*|*K*,*H*) (Hu and Fung 2003). In order to find the weight of the DNA evidence, calculating the match probability *P*(*M*|*K*, *H*) should be done. The notations used in this paper are listed in Table 1.

In this paper, all involved people are from the same subpopulation, and the allele frequencies for this subpopulation are unavailable. Hardy-Weinberg equilibrium is assumed to hold in that subpopulation and not to hold in the whole population. For convenience, let *i* = 1,2,..., be the alleles and *p _{i}* be the corresponding allele frequencies for the whole population. Quantity

*θ*is used to describe the degree of relationship of pairs of alleles in the subpopulation relative to the total population (Evett and Weir 1998). The recursive formula developed by Balding and Nichols (1994) can be invoked to calculate the joint probability for a sample of alleles from that subpopulation. For example,

*P*(

*i,i,j*) = (1−

*θ*)

*p*{

_{i}p_{j}*θ*+(1−

*θ*)

*p*}/(1+

_{i}*θ*). The estimation of quantity

*θ*may refer to the second National Research Council Report (1996), and

*θ*= 0, 0.01 and 0.03 in the case example will be used for illustration.

Consider a criminal case with a mixed stain *M*. One man is suspected by the police, but his DNA profile is unavailable for some reason. Instead, one relative of the suspected man is tested. In this situation, the hypothesis about who the unknown contributors to the mixed stain were is given as follows:

H_{1}: One relative of the typed personT = t, and_{1}t_{2}x− 1 other unrelated unknowns were the contributors.

A formula has been developed to obtain the match probability *P*(*M*|*K*, *H*_{1}) (Fung and Hu 2004). This formula is listed in Table 2. Note that (*k*_{0},2*k*_{1},*k*_{2}) in Table 2 are the kinship coefficients (Fung and Hu 2004) of *T* and his or her relative declared in *H*_{1}.

Two related persons (i.e., full siblings) may be involved in a case. In this circumstance, the proposition for the source of mixture is

H_{2}: Two related persons andx− 2 other unrelated unknowns were the contributors.

The resulting match probability *P*(*M*|*K*, *H*_{2}) can be obtained according to Fung and Hu (2004), and the formula is again reported in Table 2. Note (*k*_{0},2*k*_{1},*k*_{2}) in Table 2 are the kinship coefficients of the two related persons declared in *H*_{2}.

When the suspect is unavailable and one of his or her blood relatives is tested instead, *P*(*M*|*K*, *H*_{1}) can be used to evaluate the weight of the DNA evidence. Seven possible genotypes of the typed person *T* and the associated match probabilities were obtained. They are listed in Table 3. The formulas of *P*(*M*|*K*, *H*_{1}) and *P*(*M*|*K*, *H*_{2}) cover the results provided by Fukshansky and Bar (1998), Fung and Hu (2000A), Hu and Fung (2003), and Weir et al. (1997). Moreover, from the expressions of *P*(*M*|*K*, *H*_{1}) given in Table 3 and *P*(*M*|*K*, *H*_{2}) in Table 2, it is realized that implementing the evaluation of the likelihood ratio by a computer program is straightforward. In this regard, a Windows-based software for handing such calculations has been developed. In the next section, the applications of the formulas and the computer software are demonstrated in the interpretation of a DNA mixture case involving a blood relative of the typed person.

#### Example

The example taken from Fukshansky and Bar (1998) with extension involving possible related persons was analyzed. The mixed stain was assumed to be contributed by three persons—the victim and two unidentified assailants. The victim, the suspect *S*_{1}, and the suspect *S*_{2} were typed at three loci—DQa, FES, and F13A1. Table 4 presents the alleles of the mixed stain, the genotypes, and the allele frequencies (Fukshansky and Bar 1998). According to the genotypes of the two suspects, they cannot be excluded from being donors of the mixed stain. The prosecution hypothesis can be raised as follows:

H: The contributors of the mixed stain were the victim and two suspects,_{p}S_{1}andS_{2}.

With regard to the defense proposition, the situation that two related unknown contributors are involved or that a missing suspect is not available for typing and so his or her relative is typed instead was considered. To be specific, seven different sets of defense hypotheses are considered below.

*H*_{d}_{1}: The contributors were the victim, one untyped relative of*S*_{1}, and one unknown.

*H*_{d}_{2}: The contributors were the victim, one untyped relative of*S*_{2}, and one unknown.

*H*_{d}_{3}: The contributors were the victim,*S*_{1}, and one untyped relative of*S*_{2}.

*H*_{d}_{4}: The contributors were the victim,*S*_{2}, and one untyped relative of*S*_{1}.

*H*_{d}_{5}: The contributors were the victim,*S*_{1}, and one untyped relative of*S*_{1}.

*H*_{d}_{6}: The contributors were the victim,*S*_{2}, and one untyped relative of*S*_{2}.*H*_{d}_{7}: The contributors were the victim and two related unknowns.

It is noted that the current study includes the three cases discussed by Fukshansky and Bar (1998).

All involved people are taken from the same subpopulation. Six commonly encountered relationships are considered—unrelated, parent-child, full siblings, half siblings (or grandparent-grandchild, uncle-nephew, because their kinship coefficients are the same), first cousins, and second cousins. Using the developed software, it is convenient to compute the likelihood ratio for each of the loci and the overall likelihood ratio for the prosecution *H _{p}* versus the defense

*H*

_{d}_{1},

*H*

_{d}_{2},...,

*H*

_{d}_{7}, with these six relationships.

Consider the test of *H _{p}* versus

*H*

_{d}_{1}. In running the program, the file of the population frequency database, the quantity

*θ*, the number and names of the typed persons, and the prosecution and defense propositions have to be input when the first locus is analyzed. (They do not need to be input again.) For each of the other loci, however, only the number of alleles in the mixture, the associated allele names, and the genotypes of the typed persons are needed. The allele frequencies are not required for inputting. In Figure 1, the window page of the likelihood ratio for each locus and the overall likelihood ratio are shown for illustration. In this software, a different

*θ*may also be selected to view the associated likelihood ratios.

Table 5 gives the likelihood ratios for various defense propositions with different *θ* and kinship relationships. Although there are 21 x 6 possible overall likelihood ratios in Table 5, the computer program only needs to run seven times (i.e., once for each of the defense propositions). The findings are summarized as follows:

- The likelihood ratio decreases when the
*θ*increases. The drop can be quite large sometimes (e.g., considering the defense hypothesis*H*_{d}_{1}with an unrelated relationship, the likelihood ratio when*θ*= 0.03 is 1402, which is only one-third of the likelihood ratio 4464, when*θ*= 0).

- For
*H*_{d}_{1},...*H*_{d}_{4}, the likelihood ratio attains its highest and lowest values under the unrelated and full sibling relationships, respectively. Likelihood ratio (full siblings) is only about two percent of likelihood ratio (unrelated) for*H*_{d}_{3}and*H*_{d}_{4}at*θ*= 0. Likelihood ratio attains its lowest value under*H*_{d}_{4}among the defense propositions*H’*for each of the studied relationships. The second lowest goes to_{d}s*H*_{d}_{3}. Both propositions*H*_{d}_{3}and*H*_{d}_{4}are good defense strategies if such explanations are acceptable to the court. Note also that the associated likelihood ratio (parent-child) is about four percent of likelihood ratio (unrelated). - Likelihood ratio (unrelated) is not always the highest among the studied relationships. In fact, it is the lowest among the likelihood ratios for either one of the defense propositions
*H*_{d}_{5},*H*_{d}_{6}, or*H*_{d}_{7}. Instead, likelihood ratio (full siblings) or likelihood ratio (parent-child) attains the highest values under these defense propositions.

#### Conclusions

The theory and method are provided for interpreting DNA mixture in a subpopulation when two of the unknown contributors are related or when a contributor is not available for typing but his or her relative is available. The method covers the common situation that the unknown contributors are unrelated. The method is illustrated using an example of Fukshansky and Bar (1998). Under the first four kinds of defense propositions, the likelihood ratio for the common unrelated relationship attains the highest value, whereas likelihood ratio (full siblings) is the lowest. In these cases, the likelihood ratio (full siblings) may be only a few percent of the likelihood ratio (unrelated). However, it can also happen that likelihood ratio (full siblings) attains the highest value under the other three kinds of defense propositions.

A Windows-based computer software for calculating the likelihood ratio was developed. It may be viewed at http://www.saasweb.hku.hk/staff/wingfung/. In order to reduce effort and minimize possible errors during inputting, the locus names and allele frequencies are input from the file containing the population frequencies. Furthermore, several entries such as *θ*, number and names of the typed persons, numbers of known and unknown contributors for both the prosecution and defense propositions, and the relationship among the unknown contributors and/or typed persons, once specified in the beginning, will be retained and do not need to be reentered. Only the number and names of alleles in the mixture, and genotypes of the typed persons need to be specified for subsequent loci. Because only minimal information is required for inputting, the software is less susceptible to transcriptional error. Furthermore, the DNA mixture can also be evaluated simultaneously with frequency data of different ethnic groups. This feature is useful for countries with multiple racial groups. The findings can easily be saved to a file for cross-checking and reporting purposes. A common mixture problem of nine loci can be evaluated in ten minutes. This largely reduces the time needed and increases the accuracy during the inputting process for common DNA mixture problems involving unrelated persons (Curran et al. 1999; Fung and Hu 2000A).

**Acknowledgments**

This work is supported in part by the Hong Kong RGC Competitive Earmarked Research Grant (HKU 7022/04P) and the National Natural Science Foundation of China (10329102).

#### References

Balding, D. J. and Nichols, R. A. DNA profile match probability calculation: How to allow for population stratification, relatedness, database selection and single bands, *Forensic Science International* (1994) 64:125-140.

Belin, T. R., Gjertson, D. W., and Hu, M. Y. Summarizing DNA evidence when relatives are possible suspects, *Journal of the American Statistical Association* (1997) 92:706-716.

Brookfield, J. F. Y. The effect of relatives on the likelihood ratio associated with DNA profile evidence in criminal cases, *Journal of the Forensic Science Society* (1994) 34:193-197.

Curran, J. M., Triggs, C. M., Buckleton, J., and Weir, B. S. Interpreting DNA mixtures in structured populations, *Journal of Forensic Sciences* (1999) 44:987-995.

Evett, I. W. Evaluating DNA profiles in case where the defense is “It is my brother”, *Journal of the Forensic Science Society* (1992) 32:5-14.

Evett, I. W. and Weir, B. S. *Interpreting DNA Evidence*. Sinauer, Sunderland, Massachusetts, 1998.

Fukshansky, N. and Bar, W. Interpreting forensic DNA evidence on the basis of hypotheses testing, *International Journal of Legal Medicine* (1998) 111:62-66.

Fukshansky, N. and Bar, W. Biostatistics for mixed stain: The case of tested relatives of a non-tested suspect, *International Journal of Legal Medicine* (2000) 114:78-82.

Fung, W. K., Chung, Y. K., and Wong, D. M. Power of exclusion revisited: Probability of excluding relatives of the true father from paternity, *International Journal of Legal Medicine* (2002) 116:64-67.

Fung, W. K. and Hu, Y. Q. Interpreting forensic DNA mixtures: Allowing for uncertainty in population substructure and dependence, *Journal of the Royal Statistical Society A* (2000A) 163:241-254.

Fung, W. K. and Hu, Y. Q. Interpreting DNA mixture based on NRC-II recommendation 4.1, *Forensic Science Communications* [Online]. (2000B).

Fung, W. K. and Hu, Y. Q. Interpreting DNA mixtures with related contributors in subdivided populations, *Scandinavian Journal of Statistics* (2004) 31:115-130.

Hu, Y. Q. and Fung, W. K. Interpreting DNA mixtures with the presence of relatives, *International Journal of Legal Medicine* (2003) 117:39-45.

National Research Council. *Evaluation of Forensic DNA Evidence*. National Academy Press, Washington, DC, 1996.

Sjerps, M. and Kloosterman, A. D. On the consequences of DNA profile mismatches for close relatives of an excluded suspect, *International Journal of Legal Medicine* (1999) 112:176-180.

Weir, B. S., Triggs, C. M., Starling, L., Stowell, L. I., Walsh, K. A. J., and Buckleton, J. Interpreting DNA mixtures, *Journal of Forensic Sciences* (1997) 42:213-222.