If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Since 1990, numerous public repositories of microarray data have been created to store vast genomic data sets. Our hypothesis is that a secondary analysis of an available hepatocellular carcinoma (HCC) public data set could generate new findings and additional hypotheses.
Methods
The Gene Expression Omnibus at the National Center for Biotechnology Information was queried for available data sets specific for ‘HCC’ and ‘clinical data.’ Genes that passed filtering and normalization criteria were analyzed using the class comparison and prediction functions in BRB-ArrayTools. Ingenuity pathway analysis software was used to identify potential gene networks up- or down-regulated.
Results
The file GDS274, which measured gene expression in primary HCC lesions with or without hepatic metastases from a cohort of Chinese patients, was identified as an appropriate data set and was imported into BRB-ArrayTools. 9984 genes passed filtering criteria. Clinical data demonstrated alpha fetoprotein (AFP) >100 ng/mL predictive of worse survival (HR 5.87, 95% confidence interval: 1.11–31.0). A class comparison between patients with an AFP >100 and those with AFP <100 demonstrated 92 genes to be differentially expressed. Ingenuity pathway analyses demonstrated the top networks associated with the observed gene expression.
Conclusions
Using available HCC microarray data, we identified genes differentially expressed based on AFP >100. Canonical pathway analysis demonstrated functional gene pathways and associated upstream regulators. This study maximizes the use of publicly available data by generating new findings. Secondary analyses of these data sets should be considered by investigators before embarking on new genomic experiments.
High throughput genomic technologies are increasingly being used to identify therapeutic targets and risk factors for specific diseases in this era of personalized medicine [
Microarray analysis revealed dysregulation of multiple genes associated with chemoresistance to As(2)O(3) and increased tumor aggressiveness in a newly established arsenic-resistant ovarian cancer cell line, OVCAR-3/AsR.
]. The use of high throughput technologies has generated vast amounts of genomic data. Since 1990, numerous public repositories of microarray data have been created. At the present time, a prerequisite to the publication of microarray data is that the results must be publicly available to the research community [
]. Authors describing a newly sequenced genome, gene, or protein must deposit the primary data in a permanent, public data repository, such as the DNA Data Bank of Japan, European Bioinformatics Institute, and the National Center for Biotechnology Information (NCBI) or ArrayExpress [
The established databases allow researchers, at their discretion, to submit some or all of the clinical data associated with a microarray experiment. The standardization of data formatting facilitates further data analyses. This common format makes it easier for researchers to access, query, and share data [
The online Gene Expression Omnibus, a public functional genomics data repository at the NCBI (http://www.ncbi.nlm.nih.gov/gds) was queried for available data sets. The specific search included (“carcinoma, hepatocellular” [MeSH Terms] OR HCC [All Fields]) AND (“patients” [MeSH Terms] OR patient [All Fields]) AND (“mortality” [Subheading] OR “survival” [MeSH Terms] OR survival [All Fields]). The genomic data (GDS274) file met the search criteria and was imported into BRB-ArrayTools version 4.2 (National Cancer Institute), available at http://linus.nci.nih.gov/BRB-ArrayTools.html [
]. This data set represents primary lesions with or without hepatic metastases in patients with hepatitis B-induced HCC.
2.2 Clinical data
Deidentified available patient data were analyzed for overall survival using Kaplan–Meier survival analysis. Data included age, primary tumor size, type of surgical resection, portal vein involvement, presence of multiple tumors, cirrhosis, serum alpha fetoprotein (AFP), vital status, and survival time. A Cox proportional hazards model was used to determine the effects of multiple independent predictor variables on overall survival. The final multivariate model was created using the backward, stepwise method of covariate elimination to consider a wide range of possible best models [
]. STATA 12 (StataCorp, College Station, TX) statistical software was used for all analyses.
2.3 Microarray class comparison
Using BRB-ArrayTools, genes that passed filtering and normalization criteria were analyzed using the class comparison, which compares gene expression among predefined classes and presumes the data consists of experiments of different samples representative of the classes. We identified genes that were differentially expressed among classes using a multivariate permutation test [
]. The test statistics used were random variance t-statistics for each gene. Although t-statistics were used, the multivariate permutation test is nonparametric and does not require the assumption of Gaussian distributions. In the class comparison analysis, technical replicates of the same sample were averaged.
2.4 Canonical pathway analysis
Interactive pathway analysis (IPA) of complex genomics data software (Ingenuity Systems, www.ingenuity.com, Redwood City, CA) was used to examine differentially expressed genes [
]. The analysis settings reference set was the Ingenuity Knowledge Bases (genes + endogenous chemicals). IPA was used to assess for network-associated functions and well-characterized molecular signaling (canonical) pathways. This computational approach investigates the network behavior as a system. The Ingenuity software scans the list of input genes to identify networks (i.e., relationships between genes) using data in the Ingenuity Pathways Knowledge Base, a manually curated database of functional interactions extracted from peer-reviewed publications [
]. A Fisher exact test is performed to determine the likelihood of obtaining at least the equivalent numbers of genes by chance (i.e., from a random input gene set) as actually overlap between the input gene set and the genes present in each identified network. IPA predicts which upstream regulators are activated or inhibited, based on known relationships, to explain the up- and down-regulated genes. The IPA software describes an “upstream regulator” as any molecule that can affect the expression of another molecule.
3. Results
3.1 Online search
The data set GDS274 “HCC metastasis” was identified at Gene Expression Omnibus and imported into BRB-ArrayTools. The data from this microarray experiment were obtained from hepatitis B virus (HBV) positive HCC patients (n = 40) in China. GDS274 included primary HCC tumors and matched intrahepatic metastases (i.e., a primary tumor and an intrahepatic metastasis from the same patient). As originally published by Ye et al. [
], the mean patient age was 50 y (range: 36–74). The median diameter of the primary HCC was 7.2 cm (range 1.3–17.5). Thirty-two cases (80%) had underlying cirrhosis and 98% of the patients were HBV-positive. Serum AFP was >20 ng/mL in 68% of patients.
3.2 Clinical data
Deidentified, individual patient data were included in the GDS274 data set. A Cox proportional hazards model was created to determine predictors of survival. Age, tumor size, portal vein involvement, stage, and AFP >100 were found to have a P value <0.20 on univariate analysis. In the final multivariate analysis, only AFP >100 (HR 5.87) was predictive of worse survival (Table 1).
After filtering and normalization, 9984 genes passed inclusion criteria. A class comparison (modified t-test) demonstrated 92 genes were differentially expressed based on classifying patients as either (a) AFP >100 or (b) AFP ≤100. The full list of genes is available online at http://www.uth.tmc.edu/scleroderma/Supplemental_data.html. The list of 92 genes was imported into Ingenuity software. The top nine upregulated and nine downregulated genes are listed in Table 2. The top network-associated functions were [
] neurologic disease/cardiovascular disease/heart failure (Figure) and carbohydrate metabolism/lipid metabolism/small molecule biochemistry. The top biologic functions are listed in Table 3. The top canonical pathways involved, which is defined by the ratio of observed up- or down-regulated genes that are belong to a defined pathway are tyrosine degradation I and lipopolysaccharide/interleukin 1 mediated inhibition of retinoid X receptor function (Table 4). In addition, IPA analyzed upstream regulators of the observed gene set and found peroxisome proliferator-activated receptor (PPARα) and hepatocyte nuclear factor 4 alpha (HNH4α) are upstream regulators (Table 4).
FigureNeurologic disease, cardiovascular disease, and heart failure network. Caption: solid line = direct interaction, dashed line = indirect interaction, A = activation, B = binding, E = expression, I = inhibition, LO = localization, M = biochemical modification, P = phosphorylation, PP = protein–protein binding, RB = regulation of binding, T = transcription, and TR = translocation.
Using an online public repository of microarray data, we performed a secondary analysis of existing HCC gene expression data. The initial microarray experiment investigated differential gene expression between primary HBV-positive HCC tumors and paired intrahepatic metastases [
]. The study authors found Osteopontin was upregulated in metastatic HCC. For our present study, we analyzed the clinical and gene expression data as well as known canonical pathways and networks. Elevated AFP >100 was [
] associated with differential gene expression. AFP has been established as a tumor marker for HCC, and serum levels predict survival; however, the regulation of AFP gene expression is less clear [
] examined primary HCC tumors and did not identify a gene profile associated with survival. However, analysis of adjacent noncancerous liver parenchyma identified a gene profile related to normal liver function highly correlated with survival. They also identified a gene signature associated with inflammation that predicted poor survival. Sun et al. [
] examined expression between primary HCC and surrounding liver tissues, which resulted in a nine-gene profile associated with cell cycle and immune response that predicted survival in HCC samples. Although there is discrepancy between many of these studies, it is important to note the clinical specimens, patient populations, experimental design, and means of measurement all differ. The samples used to produce our results are from primary tumors and matched intrahepatic metastases, a distinctly different experimental model.
Although important for prognosis, gene expression profiles predictive of survival may have limited therapeutic utility as the interaction and significant relationships are not well characterized in a simple list of up- or down-regulated genes. Since this GDS274 was created, various pathway analysis tools became available to identify biological pathways and to unravel the intricate complexity of gene expression. Using software capable of modeling and understanding genomic networks, we were able to build on the initial data gained from the significant analysis of microarrays experiment. Analyses using this approach generate lists of differential gene expression; however, the biological relevance of the list of up- or down-regulated genes is not readily apparent [
]. Using IPA to better understand the output of the list of genes, we applied our results of differential gene expression to a known gene ontology database to examine potential gene pathways and networks. This subsequent analysis provides a broad understanding of functional gene expression as it goes beyond simple gene clustering [
The genes we identified involve signaling pathways implicated in hepatocarcinogenesis and other candidate genes not well characterized. We found S100p to be downregulated, and this gene has been studied in various gastrointestinal cancers [
]. Further investigation of the IPA network data, illustrated in Figure, depicts interactions with transcription factors. In turn, a transcription factor, such as nuclear factor-kappa B, interacts with mitogen-activated protein kinase transcription factors, which have been shown to increase the proliferation and invasion of HCC cells in vitro [
]. The IPA demonstrated upstream regulators (PPARα and HNF4α) of the AFP >100 class may also interact with these transcription factors. Sustained activation of PPARα by agonists has been linked to HCC due to sustained oxidative stress, endoplasmic reticulum stress, and liver cell proliferation [
]. These preliminary observations may generate hypotheses for future studies by providing investigators with candidate molecular targets for novel therapeutic agents.
There are several important limitations of this study. Foremost, the researcher must rely on secondhand data and has no ability to double check the internal validity of the data. However, each microarray chip has internal controls as a quality control measure. Second, the generalizability of the results to the U.S. patients is unknown because >70% of HCC patients in the U.S. present with advanced cirrhosis. This data set included Chinese HCC patients that were primarily HBV-positive, which is in contrast to HCC patients in the U.S. that are typically hepatitis C virus positive. Third, in this data set, surgical resection was the only therapy provided, thus additional locoregional or systemic therapies could not be included as covariates. Since all patients included in this study underwent surgical resection, most patients likely had preserved hepatic function with mild cirrhosis. Lastly, long-term follow-up of patients in this data set is not available.
], Osteopontin was found to be upregulated in metastatic HCC. In our current results, there was no correlation between Osteopontin and the list of upregulated genes in the AFP >100 class. There may be several reasons for this observation. In the original study, Osteopontin was found to be upregulated when global gene expression was compared between 10 primary HCC lesions and 10 HCC lesions with portal vein tumor thrombus. Our study used the same data set, but included all samples from 40 patients and technical replicates from the same patient were averaged.
Currently, the Functional Genomics Data Society requires all authors using microarray data to submit a complete data set to the NCBI [
]. This study maximizes the utility of available data by generating additional findings and hypotheses. Subsequent studies are needed to validate these results. Nonetheless, secondary analyses of existing microarray and clinical data sets should be considered by investigators before embarking on new, costly genomic experiments. Resulting gene profiles may be useful in elucidating mechanisms of carcinogenesis, identifying novel therapeutic targets, and stratifying patients in clinical trials. Considering the enormous amount of genomic data stored in public repositories, further analyses of this data with newer software may prove useful.
Acknowledgment
Author contributions: W.J.C. was responsible for conception and study design and data collection, manuscript composition. K.C.T. and T.K.F. were responsible for scientific review and manuscript revision. W.J.C. and T.K.F. were responsible for data analysis and interpretation.
Financial support: Funded in part by American Cancer Society (MSRG-12-178-01-PCSM).
References
Liu X.
Niu T.
Liu X.
et al.
Microarray profiling of HepG2 cells ectopically expressing NDRG2.
Microarray analysis revealed dysregulation of multiple genes associated with chemoresistance to As(2)O(3) and increased tumor aggressiveness in a newly established arsenic-resistant ovarian cancer cell line, OVCAR-3/AsR.