The following is the intended content model for the metadata content repository: . `-- repositories/ `-- central/ |-- config/ | |-- name= | |-- storageUrl= | `-- uri= |-- content/ | `-- org/ | `-- apache/ | |-- archiva/ | | `-- platform/ | | |-- scanner/ | | | |-- 1.0-SNAPSHOT/ | | | | |-- scanner-1.0-20091120.012345-1.pom/ | | | | | |-- asc= | | | | | |-- created= | | | | | |-- fileCreated= | | | | | |-- fileLastModified= | | | | | |-- maven:buildNumber= | | | | | |-- maven:classifier | | | | | |-- maven:timestamp= | | | | | |-- maven:type= | | | | | |-- md5= | | | | | |-- sha1= | | | | | |-- size= | | | | | |-- updated= | | | | | `-- version= | | | | |-- ciManagement.system= | | | | |-- ciManagement.url= | | | | |-- created= | | | | |-- dependencies.0.artifactId= | | | | |-- dependencies.0.classifier= | | | | |-- dependencies.0.groupId= | | | | |-- dependencies.0.optional= | | | | |-- dependencies.0.scope= | | | | |-- dependencies.0.systemPath= | | | | |-- dependencies.0.type= | | | | |-- dependencies.0.version= | | | | |-- description= | | | | |-- individuals.0.email= | | | | |-- individuals.0.name= | | | | |-- individuals.0.properties.scmId= | | | | |-- individuals.0.roles.0= | | | | |-- individuals.0.timezone= | | | | |-- issueManagement.system= | | | | |-- issueManagement.url= | | | | |-- licenses.0.name= | | | | |-- licenses.0.url= | | | | |-- mailingLists.0.mainArchiveUrl= | | | | |-- mailingLists.0.name= | | | | |-- mailingLists.0.otherArchives.0= | | | | |-- mailingLists.0.postAddress= | | | | |-- mailingLists.0.subscribeAddress= | | | | |-- mailingLists.0.unsubscribeAddress= | | | | |-- maven:buildExtensions.0.artifactId= | | | | |-- maven:buildExtensions.0.groupId= | | | | |-- maven:buildExtensions.0.version= | | | | |-- maven:packaging= | | | | |-- maven:parent.artifactId= | | | | |-- maven:parent.groupId= | | | | |-- maven:parent.version= | | | | |-- maven:plugins.0.artifactId= | | | | |-- maven:plugins.0.groupId= | | | | |-- maven:plugins.0.reporting= | | | | |-- maven:plugins.0.version= | | | | |-- maven:properties.mavenVersion= | | | | |-- maven:repositories.0.id= | | | | |-- maven:repositories.0.layout= | | | | |-- maven:repositories.0.name= | | | | |-- maven:repositories.0.plugins= | | | | |-- maven:repositories.0.releases= | | | | |-- maven:repositories.0.snapshots= | | | | |-- maven:repositories.0.url= | | | | |-- name= | | | | |-- organization.favicon= | | | | |-- organization.logo= | | | | |-- organization.name= | | | | |-- organization.url= | | | | |-- relocatedTo.namespace= | | | | |-- relocatedTo.project= | | | | |-- relocatedTo.projectVersion= | | | | |-- scm.connection= | | | | |-- scm.developerConnection= | | | | |-- scm.url= | | | | |-- updated= | | | | `-- url= | | | `-- maven:artifactId= | | `-- maven:groupId= | `-- maven/ | `-- plugins/ | |-- maven:groupId= | |-- maven:plugins.compiler.artifactId= | `-- maven:plugins.compiler.name= |-- references/ | `-- org/ | `-- apache/ | `-- archiva/ | |-- parent/ | | `-- 1/ | | `-- references/ | | `-- org/ | | `-- apache/ | | `-- archiva/ | | |-- platform/ | | | `-- scanner/ | | | `-- 1.0-SNAPSHOT/ | | | `-- referenceType=parent | | `-- web/ | | `-- webapp/ | | `-- 1.0-SNAPSHOT/ | | `-- referenceType=parent | `-- platform/ | `-- scanner/ | `-- 1.0-SNAPSHOT/ | `-- references/ | `-- org/ | `-- apache/ | `-- archiva/ | `-- web/ | `-- webapp/ | `-- 1.0-SNAPSHOT/ | `-- referenceType=dependency `-- stats/ `-- 2009/ `-- 12/ |-- 02/ | `-- 23/ | `-- 47/ | `-- 00/ | |-- scanEndTime= | |-- scanStartTime= | |-- totalArtifactCount= | |-- totalArtifactFileSize= | |-- totalFileCount= | |-- totalGroupCount= | `-- totalProjectCount= `-- 03/ `-- 09/ `-- 00/ `-- 00/ |-- scanEndTime= |-- scanStartTime= |-- totalArtifactCount= |-- totalArtifactFileSize= |-- totalFileCount= |-- totalGroupCount= `-- totalProjectCount= (To update - run "tree --dirstfirst -F" on the unpacked content-model.zip from the sandbox) Notes: *) config should be reflected to an external configuration file and only stored in the content repository for purposes of accessing through a REST API, for example *) In the above example, we have the following coordinates: - namespace = org.apache.archiva.platform (namespaces are of arbitrary depth, and are project namespaces, not to be confused with JCR's item/node namespaces) - project = scanner - version = 1.0-SNAPSHOT - artifact = scanner-1.0-20091120.012345-1.pom *) filename (scanner-1.0-20091120.012345-1.pom) is a node, and each is distinct except for checksums, etc. *) the top level version (1.0-SNAPSHOT) is the version best used to describe the project (the "marketed version"). It must still be unique for lookup and comparing project versions to each other, but can contain several different "build" artifacts. *) Projects are just a single code project. They do not have subprojects - if such modeling needs to be done, then we can create a products tree that will map what "Archiva 1.0" contains from the other repositories. *) There is not Maven-native information here, other than that in the maven: namespace. pom & other files are not treated as special - they are each stored and it is up to the reader to interpret *) artifact data is not stored in the metadata repository (there is no data= property on the file). The information here is enough to locate the file in the original storageUrl when it is requested *) The API will still use separate namespace and project identifiers (the namespace can be null if there isn't one). This is chosen to allow splitting the namespace on '.', and also allowing '.' in the project identifier without splitting *) properties with '.' may be nested in other representations such as Java models or XML, if appropriate *) we only keep one set of project information for a "version" - this differs from Maven's storage of one POM per snapshot. The Maven 2 module will take the latest. Those that need Maven's behaviour should retrieve the POM directly. Implementations are also free to store as much information as desired within the artifact node in addition to whatever is shared in the project version node. *) while some information is stored at the most generic level in the metadata repository (eg maven:groupId, maven:artifactId), for convenience when loaded by the implementation it may all be pushed into the projectVersion's information. The metadata repository implementation can decide how best to store and retrieve the information. *) created/updated timestamps may be maintained by the metadata repository implementation for the metadata itself. Timestamps for individual files are stored as additional properties (fileCreated, fileLastModified). It may make sense to add a "discovered" timestamp if an artifact is known to be created at a different time to which it is added to the metadata repository. *) references are stored outside the main model so that their creation doesn't imply a "stub" model - we know if the project exists whether a reference is created or not. References need not infer referential integrity. *) some of the above needs to be reviewed before going into production. For example: - the maven specific aspects of dependencies should become a faceted part of the content - more of the metadata might be faceted in general, keeping the content model basic by default - determine if any of the stats can be derived by functions of the content repository rather than storing and trying to keep them up to date. Historical data might be retained by versioning and taking a snapshot at a given point in time. The current approach of tying them to the scanning process is not optimal - the storing of metadata as 0-indexed lists would be better in as child nodes. This might require additional levels in the current repository (.../scanner/versions/1.0-SNAPSHOT/artifacts/scanner-1.0-20091120.012345-1.pom), or for listed information to be in a separate tree (/metadata/org/apache/archiva/platform/scanner/1.0-SNAPSHOT/mailingLists/users), or to use some 'reserved names' for nodes (by using a content repository's namespacing capabilities). The first has the advantage of keeping information together but a longer path name and less familiarity to Maven users. The second arbitrarily divides metadata. The third option seems preferable but needs more investigation at this stage. *) Future possibilities: - audit metadata on artifacts (who uploaded, when, and how), or whether it was discovered by scanning