skip to main content
article

Lineage retrieval for scientific data processing: a survey

Published: 01 March 2005 Publication History

Abstract

Scientific research relies as much on the dissemination and exchange of data sets as on the publication of conclusions. Accurately tracking the lineage (origin and subsequent processing history) of scientific data sets is thus imperative for the complete documentation of scientific work. Researchers are effectively prevented from determining, preserving, or providing the lineage of the computational data products they use and create, however, because of the lack of a definitive model for lineage retrieval and a poor fit between current data management tools and scientific software. Based on a comprehensive survey of lineage research and previous prototypes, we present a metamodel to help identify and assess the basic components of systems that provide lineage retrieval for scientific data products.

References

[1]
Alonso, G. 1994. Managing advanced databases: Concurrency, recovery, and cooperation in scientific applications. Ph.D. Dissertation, Computer Science Department, University of California at Santa Barbara, Santa Barbara, CA.]]
[2]
Alonso, G., Agrawal, D., El Abbadi, A., and Mohan, C. 1997a. Functionality and limitations of current workflow management systems. Computer Science Department, University of California at Santa Barbara, Santa Barbara, CA. Available at: http://www.inf.ethz.ch/personal/alonso/PAPERS/IEEE-Expert.ps.Z.]]
[3]
Alonso, G., and El Abbadi, A. 1993. GOOSE: Geographic object oriented support environment. In Proceedings of the ACM Workshop on Advances in Geographic Information Systems. Arlington, VA. 38--49.]]
[4]
Alonso, G., and Hagen, C. 1997b. Geo-Opera: Workflow concepts for spatial processes. In Proceedings of the 5th International Symposium on Spatial Databases (SSD '97). Berlin, Germany. 238--258.]]
[5]
Alonso, G., Hagen, C., Schek, H.-J., and Tresch, M. 1998. Towards a platform for distributed application development. In Workflow Management Systems and Interoperability. A. Dogac, L. Kalinichenko, M. T. Ozsu and A. Sheth, Eds. NATO ASI Series, Vol. 164. Springer, Berlin. 195--221.]]
[6]
Aoyama, M., Weerawarana, S., Maruyama, H., Szyperski, C., Sullivan, K., and Lea, D. 2002. Web services engineering: promises and challenges. In IEEE Proceedings of the 24th International Conference on Software Engineering (ICSE '02). Orlando, FL. 647--648.]]
[7]
AT&T. 2001. Graphviz graph visualization software. AT&T Labs---Research. Available at: http://www.research.att.com/sw/tools/graphviz/.]]
[8]
Baker, N., McClatchey, R., and Le Goff, J.-M. 1997. Scientific workflow management in a distributed production environment. In IEEE Proceedings of the 1st International Enterprise Distributed Object Computing Workshop. 291--299.]]
[9]
Barkstrom, B. R. 1998. Digital archive issues from the perspective of an Earth Science data producer. Position Paper: ISO Archiving Workshop Series: Digital Archive Directions (DADs) Workshop (June). College Park, MD. Available at: http://ssdoo.gsfc.nasa.gov/nost/isoas/dads/.]]
[10]
Barkstrom, B. R. 2002. Data product configuration management and versioning in large-scale production of satellite scientific data production. Position paper: Workshop on Data Derivation and Provenance (Oct.). Chicago, IL.]]
[11]
Barry, A., Baker, N., Le Goff, J.-M., McClatchey, R., and Vialle, J.-P. 1998. Meta-data based design of workflow systems. Workshop paper: Metadata and Dynamic Object-Model Pattern Mining Workshop (at OOPSLA '98) (Oct.). Vancouver, Canada. Available at: http://www-poleia.lip6.fr/~razavi/aom/papers/oopsla98/mcclatchey.pdf.]]
[12]
Becker, R. A., and Chambers, J. M. 1988. Auditing of data analyses. SIAM J. Sci. Stat. Comput. 9, 4, 747--760.]]
[13]
Berkley, C., Jones, M., Bojilova, J., and Higgins, D. 2001. Metacat: A schema-independent XML database system. In Proceedings of the 13th International Conference on Scientific and Statistical Database Management (SSDBM '01) (July), Fairfax, VA, L. Kerschberg and M. Kafatos, Eds. IEEE Computer Society. 171--179.]]
[14]
Bernstein, A., Dellarocas, C., and Klein, M. 1999. Towards adaptive workflow systems. SIGMOD Record 28, 3, 7--8.]]
[15]
Booch, G., Rumbaugh, J., and Jacobson, I. 1999. The Unified Modeling Language User Guide. Addison-Wesley.]]
[16]
Brown, P., and Stonebraker, M. 1995. Big Sur: A system for the management of Earth science data. In Proceedings of the 21st International Conference of Very Large Data Bases (VLDB '95). Zurich, Switzerland. 720--728.]]
[17]
Buneman, P., and Foster, I. 2002a. Workshop on Data Derivation and Provenance. (Oct). Chicago, IL. Available at: http://www-fp.mcs.anl.gov/~foster/provenance/.]]
[18]
Buneman, P., and Foster, I. 2003. Workshop on Data Provenance and Annotation (Dec.). Edinburgh, Scotland. Available at: http://www.nesc.ac.uk/esi/events/304/.]]
[19]
Buneman, P., Khanna, S., and Tan, W. C. 2000a. Data provenance: Some basic issues. In Proceedings of the Foundations of Software Technology and Theoretical Computer Science (FSTTCS '00). New Delhi, India. Springer, 87--93.]]
[20]
Buneman, P., Khanna, S., and Tan, W. C. 2001. Why and where: A characterization of data provenance. In Proceedings of the International Conference on Database Theory (ICDT '01) (Jan.). London, UK. 316--330.]]
[21]
Buneman, P., Khanna, S., and Tan, W. C. 2002b. Computing provenance and annotations for views. Workshop Paper: Workshop on Data Derivation and Provenance (Oct.). Chicago IL. Available at: http://people.cs.uchicago.edu/~yongzh/position_papers.html.]]
[22]
Buneman, P., Maier, D., and Widom, J. 2000b. Where was your data yesterday, and where will it go tomorrow? Data Annotation and Provenance for Scientific Applications. Position paper for NSF Workshop on Information and Data Management (IDM '00): Research Agenda into the Future (March), Chicago IL.]]
[23]
Cederqvist, P. 1993. Version management with CVS, Signum Support AB (Dec.). Available at: https://www.cvshome.org/docs/manual/.]]
[24]
Chakravarthy, S., Krishnaprasad, V., Tamizuddin, Z., and Lambay, F. 1993. A federated multi-media DBMS for medical research: Architecture and functionality. Technical Report UF-CIS-TR-93-006, Department of Computer and Information Sciences, University of Florida, Gainesville, FL.]]
[25]
Chen, I. A., and Markowitz, V. M. 1995a. Modeling scientific experiments with an object data model. In Proceedings of the 11th International Conference on Data Engineering (ICDE '95). 391--400.]]
[26]
Chen, I. A., and Markowitz, V. M. 1995b. An overview of the Object Protocol Model (OPM) and the OPM data management tools. Inform. Syst. 20, 5, 393--418.]]
[27]
Chen, L., Shadbolt, N. R., Goble, C., Tao, F., Cox, S. J., Puleston, C., and Smart, P. 2003. Towards a knowledge-based approach to semantic service composition. Lecture Notes in Computer Science. 2870, 319--334.]]
[28]
Cichocki, A., Helal, A., Rusinkiewcz, M., and Woelk, D. 1998. Workflow and Process Automation. Kluwer Academic Publishers, London, UK.]]
[29]
Clarke, D. G., and Clark, D. M. 1995. Lineage. In Elements of Spatial Data Quality, S. C. Guptill and J. L. Morrison, Eds., Elsevier Science, Oxford. 13--30.]]
[30]
Conradi, R., and Westfechtel, B. 1998. Version models for software configuration management. ACM Comput. Sur. 30, 2, 232--282.]]
[31]
Cui, Y., and Widom, J. 2003. Lineage tracing for general data warehouse transformations. The VLDB J. 12, 1, 41--58.]]
[32]
Cui, Y., Widom, J., and Wiener, J. L. 1997. Tracing the lineage of view data in a warehousing environment. Technical Report, Stanford University Database Group (Nov.). Stanford, CA. Available at: http://www-db.stanford.edu/pub/papers/lineage-full.ps.]]
[33]
Cui, Y., Widom, J., and Wiener, J. L. 2000. Tracing the lineage of view data in a data warehousing environment. ACM Trans. Datab. Syst. 25, 2, 179--227.]]
[34]
Cushing, J. B., Maier, D., Rao, M., Abel, D., Feller, D., and DeVaney, D. M. 1994. Computational proxies: Modeling scientific applications in object databases. In Proceedings of the 7th International Working Conference on Scientific and Statistical Database Management (SSDBM '94). 196--206.]]
[35]
Date, C. J. 2000. Introduction to Database Systems. Addison-Wesley.]]
[36]
Draskic, J., Le Goff, J.-M., Willers, I., Estrella, F., Kovacs, Z., McClatchey, R., and Zsenei, M. 1999. Using a meta-model as the basis for enterprise-wide data navigation. In Proceedings of the 3rd IEEE Metadata Conference (MD'99) (April). Bethesda, MO.]]
[37]
Eagan, P. D., and Ventura, S. J. 1993. Enhancing value of environmental data: data lineage reporting. J. Environ. Eng. 119, 1, 5--16.]]
[38]
Elmagarmid, A., and Du, W. 1997. Workflow management: State of the art versus state of the products. In Workflow Management Systems and Interoperability, A. Dogac, L. Kalinichenko, M. T. Ozsu and A. Sheth, Eds. NATO ASI Series, Vol. 164, Springer, Berlin. 1--17.]]
[39]
ESRI. 1982. ARC/INFO geographic information system (GIS), ESRI, Redlands, CA. Available at: www.esri.com.]]
[40]
Federal Geographic Data Committee. 1998. Content standard for digital geospatial metadata FGDC-STD-001-1998 (revised June), Federal Geographic Data Committee, Washington, DC. Available at: http://www.fgdc.gov/metadata/csdgm/.]]
[41]
Feldman, S. I. 1978. Make---A program for maintaining computer programs. In UNIX Programmer's Manual, Vol. 2 (Bell Laboratories). Holt, Rinehart and Winston, New York. 291--300.]]
[42]
Foster, I., and Kesselmann, C., Eds. 1999. The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann.]]
[43]
Foster, I., Vockler, J., Wilde, M., and Zhao, Y. 2002. Chimera: A virtual data system for representing, querying, and automating data derivation. In Proceedings of the 14th International Conference on Scientific and Statistical Database Management (SSDBM '02) (July). Edinburgh, Scotland, J. Kennedy, Ed. IEEE Computer Society. 37--46.]]
[44]
Foster, I., Vockler, J., Wilde, M., and Zhao, Y. 2003. The virtual data grid: A new model and architecture for data-intensive collaboration. In Proceedings of the 1st Biennial Conference on Innovative Data System Research (CIDR '03) {Online proceedings} (Jan.). Pacific Grove, CA.]]
[45]
French, J. C. 1995. What is metadata? In Proceedings of the SDM--92 Workshop: The Role of Metadata in Managing Large Environmental Science Datasets, Richland, WA, R. B. Melton, D. M. DeVaney and J. C. French, Eds. Pacific Northwest Laboratory. 3--8.]]
[46]
Frew, J., and Bose, R. 2001. Earth system science workbench: A data management infrastructure for earth science products. In Proceedings of the 13th International Conference on Scientific and Statistical Database Management (SSDBM '01) (July). Fairfax, VA. L. Kerschberg and M. Kafatos, Eds. IEEE Computer Society. 180--189.]]
[47]
Frew, J., and Dozier, J. 1997. Data management for earth system science. SIGMOD Record 26, 1, 27--31.]]
[48]
Geist, A., and Nachtigal, N. 2003. ORNL Electronic Notebook Project. Oak Ridge National Laboratory. Available at: http://www.csm.ornl.gov/~geist/java/applets/enote/.]]
[49]
Geographic Designs. 1993. Geolineus Version 3.0 User Manual. Santa Barbara, CA.]]
[50]
Georgakopoulos, D., Hornick, M., and Sheth, A. 1995. An overview of workflow management: from process modeling to workflow automation infrastructure. Distrib. Paral. Datab. 3, 2, 119--153.]]
[51]
Goland, Y., Whitehead, E., Faizi, A., Carter, S., and Jensen, D. 1999. HTTP Extensions for distributed authoring--WEBDAV: RFC 2518. Network Working Group. Available at: http://asg.web.cmu.edu/rfc/rfc2518.html.]]
[52]
Greenwood, M., Goble, C., Stevens, R., Zhao, J., Addis, M., Marvin, D., Moreau, L., and Oinn, T. 2003. Provenance of e-science experiments---experience from bioinformatics. In Proceedings of the UK e-Science All Hands Meeting. Nottingham, UK. 223--226.]]
[53]
Grid Physics Network (GriPhyN) project. 2003. Chimera Virtual Data System Version 1.2 User Guide, Grid Physics Network (GriPhyN) project (Dec.). Available at: http://www.griphyn.org/chimera/release.html.]]
[54]
Hachem, N. I., Qui, K., Gennert, M., and Ward, M. 1993. Managing derived data in the Gaea scientific DBMS. In Proceedings of the 19th International Conference on Very Large Databases (VLDB '93) (Aug.). Dublin, Ireland. 1--12.]]
[55]
Insightful Corporation. 2003. S-PLUS statistical analysis, graphics and programming application, Insightful Corporation, Seattle, WA. Available at: http://www.insightful.com/.]]
[56]
Ioannidis, Y., Livny, M., Gupta, S., and Ponnekanti, N. 1996. ZOO: A desktop experiment management environment. In Proceedings of the 22nd International Conference on Very Large Databases (VLDB '96). Bombay, India. 274--285.]]
[57]
Ioannidis, Y., Livny, M., Haber, E., Miller, R., Tsatalos, O., and Wiener, J. 1993. Desktop experiment management. IEEE Data Eng. Bull. 16, 1, 19--23.]]
[58]
IT Innovation. 2002. IT innovation workflow enactment engine. IT Innovation Centre. Available at: http://www.it-innovation.soton.ac.uk/mygrid/workflow/.]]
[59]
Kaestle, G., Eddie C. Shek, and Dao, S. K. 1999. Sharing experiences from scientific experiments. In Proceedings of the 11th International Conference on Scientific and Statistical Database Management (SSDBM '99) (July). Cleveland, OH. IEEE Computer Society, 168--177.]]
[60]
Kavantzas, N., Burdett, D., and Ritzinger, G. 2004. Web Services Choreography Description Language Version 1.0. W3C Working Draft, IBM developerWorks (April). Available at: http://www.w3.org/TR/ws-cdl-10/.]]
[61]
Lanter, D. P. 1988. A neural network for GIS command language translation. Unpublished research paper. University of South Carolina, Columbia, SC.]]
[62]
Lanter, D. P. 1989a. Techniques and methods of spatial data-base lineage tracing. Ph.D. Dissertation, University of South Carolina, Columbia, SC.]]
[63]
Lanter, D. P. 1989b. Trimming Large spatial databases with lineage analysis. In Proceedings of the 10th Annual ESRI Users Conference. Palm Springs, CA.]]
[64]
Lanter, D. P. 1990. Lineage in GIS: The problem and a solution. Technical Report 90-6, National Center for Geographic Information and Analysis (NCGIA), University of California at Santa Barbara, Santa Barbara, CA.]]
[65]
Lanter, D. P. 1991. Design of a lineage-based meta-data base for GIS. Cart. Geograph. Info. Syst. 18, 4, 255--261.]]
[66]
Lanter, D. P. 1993. A Lineage meta-database approach toward spatial analytic database optimization. Cart. Geograph. Info. Syst. 20, 2, 112--121.]]
[67]
Lanter, D. P. 1994. Comparison of spatial analytic applications of GIS. In Environmental Information Management and Analysis: Ecosystem to Global Scales, W. K. Michener, J. W. Brunt and S. G. Stafford, Eds. Taylor & Francis, Bristol, PA. 413--425.]]
[68]
Lanter, D. P., and Veregin, H. 1990. A lineage meta-database program for propagating error in geographic information systems. In Proceedings of the GIS/LIS Conference (Nov.). 144--153.]]
[69]
Le Goff, J.-M., Vialle, J.-P., Bazan, A., Le Flour, T., Lieunard, S., Rousset, D., McClatchey, R., Baker, N., Kovacs, Z., Heath, H., Leonardi, E., Barone, G., and Organtini, G. 1996. C. R. I. S. T. A. L./ Concurrent repository & information system for tracking assembly and production lifecycles---A data capture and production management tool for the assembly and construction of the CMS ECAL detector. CERN CMS Note 1996/003, CERN, 1996, Geneva, Switzerland. Available at: http://cmsdoc.cern.ch/documents/96/note96_003.pdf.]]
[70]
Lee, J., Gruninger, M., Jin, Y., Malone, T., Tate, A., and Yost, G. 1998. PIF The process interchange format. In Handbook on Architectures of Information Systems. P. Bernus, G. Schmidt and K. Mertins, Eds. Springer, Berlin. 167--189.]]
[71]
Manola, F., and Miller, E. 2004. RDF Primer W3C Recommendation. World Wide Web Consortium (W3C). Available at: http://www.w3.org/TR/2004/REC-rdf-primer-20040210/.]]
[72]
Marathe, A. P. 2001. Tracing lineage of array data. In Proceedings of the 13th International Conference on Scientific and Statistical Database Management (SSDBM '01) (July). Fairfax, VA. L. Kerschberg and M. Kafatos, Eds. IEEE Computer Society. 69--78.]]
[73]
Mathworks. 2003. MATLAB programming and visualization application. The Mathworks, Inc., Natick, MA. Available at: http://www.mathworks.com/.]]
[74]
McClatchey, R., Baker, N., Harris, W., Le Goff, J.-M., Kovacs, Z., Estrella, F., Bazan, A., and Le Flour, T. 1997a. Version management in a distributed workflow application. In IEEE Proceedings of the 8th International Workshop on Database and Expert Systems Applications (DEXA '97). 10--15.]]
[75]
McClatchey, R., Estrella, F., Le Goff, J.-M., Kovacs, Z., and Baker, N. 1997b. Object databases in a distributed scientific workflow application. In Proceedings of the 3rd Basque International Workshop on Information Technology (BIWIT '97). 11--21.]]
[76]
McClatchey, R., Kovacs, Z., Estrella, F., Le Goff, J.-M., Chevenier, G., Baker, N., Lieunard, S., Murray, S., Le Flour, T., and Bazan, A. 1998. The integration of product data and workflow management systems in a large scale engineering database application. In IEEE Proceedings of the International Database Engineering and Applications Symposium (IDEAS '98). 296--302.]]
[77]
Medeiros, C. B., Vossen, G., and Weske, M. 1995. WASA: A workflow-based architecture to support scientific database applications. In Proceedings of the 6th International Workshop on Database and Expert Systems Applications (DEXA '95). 574--583.]]
[78]
Merriam-Webster Inc. 2001. Merriam-Webster Collegiate Dictionary, Springfield, MA.]]
[79]
Mohan, C. 1997. Recent Trends in workflow management products, standards and research. In Workflow Management Systems and Interoperability. A. Dogac, L. Kalinichenko, M. T. Ozsu and A. Sheth, Eds. NATO ASI Series Vol. 164, Springer. 396--409.]]
[80]
Myers, J., Pancerella, C., Lansing, C., Schuchardt, K., and Didier, B. 2003a. Multi-scale science: Supporting emerging practice with semantically derived provenance. In Proceedings of the Workshop on Semantic Web Technologies for Searching and Retrieving Scientific Data {Online proceedings} (Oct.). Sanibel Island, FL. 2003.]]
[81]
Myers, J. D., Chappell, A. R., Elder, M., Geist, A., and Schwidder, J. 2003b. Re-integrating the research record. Comput. Sci. Eng. 5, 3, 44--50.]]
[82]
National Aeronautics and Space Administration (NASA). 1986. Report of the EOS Data Panel, Vol. IIa: Earth Observing System Data and Information System. Technical Memorandum 87777, National Aeronautics and Space Administration (NASA), Washington, DC.]]
[83]
National Research Council. 1999. Global Environmental Change: Research Pathways for the Next Decade. National Academy Press, Washington, DC.]]
[84]
Object Management Group. 2002. Meta-Object Facility (MOF) Specification, Version 1.4. Object Management Group (OMG). Available at: http://www.omg.org/cgi-bin/doc?formal/2002-04-03.]]
[85]
Object Management Group. 2004. dtc/04-05-01 (Life Sciences Identifiers final adopted specification). Object Management Group, Inc. Available at: http://www.omg.org/docs/dtc/04-05-01.pdf.]]
[86]
Ousterhout, J. 1994. Tcl and the Tk Toolkit. Addison-Wesley, Reading, MA.]]
[87]
Pancerella, C., Myers, J., Allison, T. C., and Amin, K. 2003. Metadata in the collaboratory for multi-scale chemical science. In Proceedings of the Dublin Core Conference (DC-'03) {Online proceedings} (Sept.-Oct.). Seattle, WA.]]
[88]
Pratt, J. M. 1995. Data modeling of scientific experimentation. In Proceedings of the 1995 ACM Symposium on Applied Comput., 86--90.]]
[89]
Research Systems Inc. 2003. Interactive Data Language (IDL) computing environment for interactive analysis and visualization of data. Research Systems, Inc. Available at: http://www.rsinc.com/.]]
[90]
Roush, G. E. 1989. Documenting one's work. IEEE Potentials 8, 2, 24--26.]]
[91]
Rusinkiewicz, M., and Sheth, A. 1995. Specification and execution of transactional workflows. In Modern Database Systems: The Object Model, Interoperability, and Beyond. W. Kim, Ed. ACM Press, New York. 592--620.]]
[92]
Saran, A., Agrawal, D., El Abbadi, A., Smith, T. R., and Su, J. 1996. Scientific modeling using distributed resources. In Proceedings of the 4th ACM Workshop on Advances on Advances in Geographic Information Systems, Rockville, MD. ACM Press. 68--75.]]
[93]
Schael, T. 1998. Workflow Management Systems for Process Organizations. Springer, Berlin.]]
[94]
Singh, M., and Vouk, M. A. 1996. Scientific workflows: Scientific computing meets transactional workflow. In Proceedings of the NSF Workshop on Workflow and Process Automation in Information Systems: State-of-the-Art and Future Directions {Online Proceedings} (May). Athens, GA.]]
[95]
Skidmore, J. L., Sottile, M. J., Cuny, J. E., and Maloney, A. D. 1998. A prototype notebook-based environment for computational tools. In IEEE Proceedings of the Supercomputing '98 (SC '98) Conference (Nov.). Orlando, FL. 7--13.]]
[96]
Smith, T. R., Su, J., Agrawal, D., and El Abbadi, A. 1993. Database and modeling systems for the earth sciences. IEEE Bull. Tech. Comm. Data Eng. 16, 1, 33--37.]]
[97]
Smith, T. R., Su, J., El Abbadi, A., Agrawal, D., Alonso, G., and Saran, A. 1995. Computational modeling systems. Info. Syst. 20, 2, 127--153.]]
[98]
Spery, L., Claramunt, C., and Libourel, T. 1999. A lineage metadata model for the temporal management of a cadastre application. In Proceedings of the 10th International Workshop on Database and Expert Systems Applications (DEXA '99) (Sept.). Florence, Italy, A. Cammelli, A. Tjoa and R. R. Wagner, Eds. IEEE Computer Society, 466--474.]]
[99]
Stein, L., Rozen, S., and Goodman, N. 1994. Managing laboratory flow with LabBase. In Proceedings of the Conference on Computers in Medicine (CompMed'94).]]
[100]
Stonebraker, M. 1991. An overview of the Sequoia 2000 project. Sequoia Technical Report S2K-94-58. Berkeley, CA. Available at: http://epoch.cs.berkeley.edu:8000/sequoia/tech-reports/s2k-94-58/.]]
[101]
Stonebraker, M. 1994. Sequoia 2000-a reflection on the first three years. Sequoia Technical Report S2K-94-58. Berkeley, CA. Available at: http://epoch.cs.berkeley.edu:8000/sequoia/tech-reports/s2k-93-23/.]]
[102]
Stonebraker, M., Chen, J., Nathan, N., Paxson, C., and Wu, J. 1993. Tioga: Providing data management support for scientific visualization applications. In Proceedings of the 19th International Conference on Very Large Databases (VLDB '93). Dublin, Ireland. 25--38.]]
[103]
Thatte, S. 2003. Business Process Execution Language for Web Services Version 1.1. Specification, IBM developerWorks (May). Available at: http://www-106.ibm.com/developerworks/library/ws-bpel/.]]
[104]
U.S. Geological Survey. 1992. Spatial Data Transfer Standard (SDTS) NCITS 320-1998, American National Standards Institute (ANSI) (June). Reston, VA. Available at: http://mcmcweb.er.usgs.gov/sdts/SDTS_standard_nov97/part1b12.html.]]
[105]
U.S. Geological Survey. 1995. Modern Average Global Sea-Surface Temperature: Metadata. U.S. Geological Survey. Available at: http://geo-nsdi.er.usgs.gov/metadata/digital-data/10/metadata.html#2.]]
[106]
UC Berkeley. 1994. POSTGRES database management system (DBMS), Universtity of California Berkeley, Berkeley, CA. Available at: http://db.cs.berkeley.edu/postgres.html.]]
[107]
Vahdat, A., and Anderson, T. 1998. Transparent result caching. In Proceedings of the USENIX Annual Technical Conference {Online proceedings} (June). New Orleans, LA. 1998.]]
[108]
Vossen, G., and Weske, M. 1997. The WASA Approach to workflow management for scientific applications. In Workflow Management Systems and Interoperability, A. Dogac, L. Kalinichenko, M. T. Ozsu and A. Sheth, Eds. NATO ASI Series Vol. 164, Springer, Berlin. 145--164.]]
[109]
Vossen, G., and Weske, M. 1999. The WASA2 object-oriented workflow management system. In Proceedings of the ACM SIGMOD International Conference on Management of Data, ACM. 587--589.]]
[110]
Wainer, J., Weske, M., Vossen, G., and Medeiros, C. M. B. 1996. Scientific workflow systems. In Proceedings of the NSF Workshop on Workflow and Process Automation in Information Systems: State-of-the-Art and Future Directions {Online Proceedings} (May). Athens, GA.]]
[111]
Winfield, A. J. 1998. A Virtual Laboratory Notebook for simulation models. In Proceedings of the Pacific Symposium on Biocomputing '98 (Jan.). Maui, HI. 177--88.]]
[112]
Woodruff, A. G., and Stonebraker, M. 1997. Supporting fine-grained data lineage in a database visualization environment. In Proceedings of the 13th International Conference on Data Engineering (ICDE '97) (April). Birmingham, UK. IEEE Computer Society Press. 91--102.]]
[113]
Workflow Management Coalition. 1999a. Interface 1: Process Definition Interchange---Process Model. WfMC Standard WfMC-TC-1016-P v1.1, Workflow Management Coalition. Available at: http://www.wfmc.org/standards/docs.htm.]]
[114]
Workflow Management Coalition. 1999b. Interface 1: Process Definition Interchange---Q&A and Examples. WfMC Standard WfMC-TC-1016-X v1.1, Workflow Management Coalition. Available at: http://www.wfmc.org/standards/docs.htm.]]
[115]
Workflow Management Coalition. 2001. Workflow Process Definition Interface---XML Process Definition Language (XPDL). WfMC Standard WFMC-TC-1025, Workflow Management Coalition. Available at: http://www.wfmc.org/standards/docs.htm.]]
[116]
Zhao, J., Goble, C., Greenwood, M., Wroe, C., and Stevens, R. 2003. Annotating, linking and browsing provenance logs for e-Science. In Proceedings of the Workshop on Semantic Web Technologies for Searching and Retrieving Scientific Data {Online proceedings} (Oct.). Sanibel Island, FL.]]

Cited By

View all
  • (2024)Modelo de autoria de metadadosModelo de autoría de metadatosMetadata authoring modelInformação & Informação10.5433/1981-8920.2023v28n4p128:4(1-37)Online publication date: 28-Sep-2024
  • (2023)Transactional Python for Durable Machine Learning: Vision, Challenges, and FeasibilityProceedings of the Seventh Workshop on Data Management for End-to-End Machine Learning10.1145/3595360.3595855(1-5)Online publication date: 18-Jun-2023
  • (2023)Data Provenance in Security and PrivacyACM Computing Surveys10.1145/359329455:14s(1-35)Online publication date: 22-Apr-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

ACM Computing Surveys  Volume 37, Issue 1
March 2005
81 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/1057977
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 March 2005
Published in CSUR Volume 37, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Data lineage
  2. audit
  3. data provenance
  4. scientific data
  5. scientific workflow

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)127
  • Downloads (Last 6 weeks)7
Reflects downloads up to 19 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Modelo de autoria de metadadosModelo de autoría de metadatosMetadata authoring modelInformação & Informação10.5433/1981-8920.2023v28n4p128:4(1-37)Online publication date: 28-Sep-2024
  • (2023)Transactional Python for Durable Machine Learning: Vision, Challenges, and FeasibilityProceedings of the Seventh Workshop on Data Management for End-to-End Machine Learning10.1145/3595360.3595855(1-5)Online publication date: 18-Jun-2023
  • (2023)Data Provenance in Security and PrivacyACM Computing Surveys10.1145/359329455:14s(1-35)Online publication date: 22-Apr-2023
  • (2023)Preserving File Provenance Using Principles of Blockchain to Ensure Scientific Reproducibility2023 IEEE 19th International Conference on e-Science (e-Science)10.1109/e-Science58273.2023.10254665(1-7)Online publication date: 9-Oct-2023
  • (2022)Structure-based knowledge acquisition from electronic lab notebooks for research data provenance documentationJournal of Biomedical Semantics10.1186/s13326-021-00257-x13:1Online publication date: 31-Jan-2022
  • (2022)Subdivisions and crossroads: Identifying hidden community structures in a data archive’s citation networkQuantitative Science Studies10.1162/qss_a_002093:3(694-714)Online publication date: 1-Nov-2022
  • (2022)Challenges of Provenance in Scientific Workflow Management Systems2022 IEEE/ACM Workshop on Workflows in Support of Large-Scale Science (WORKS)10.1109/WORKS56498.2022.00007(10-18)Online publication date: Nov-2022
  • (2022)Augmented lineage: traceability of data analysis including complex UDF processingThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-022-00769-732:5(963-983)Online publication date: 23-Nov-2022
  • (2021)A comprehensive survey on data provenanceJournal of Computer Security10.3233/JCS-20010829:4(423-446)Online publication date: 1-Jan-2021
  • (2021)A Roadmap for Automating Lineage Tracing to Aid Automatically Explaining Machine Learning Predictions for Clinical Decision SupportJMIR Medical Informatics10.2196/277789:5(e27778)Online publication date: 27-May-2021
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy