aboutsummaryrefslogtreecommitdiffstats
path: root/archiva-modules/src/site/apt/metadata-content-model.apt
blob: f5c193615ff90756c8dda8358c3678b7a1f3e0aa (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
 ----
 Metadata Control Model
 ----

Metadata Content Model

    The metadata repository stores all known information about a repository in a common format that other plugins can
    understand, and that eventually external applications will be able to query.

    The content model is designed such that it models the most likely structure of the data both for storage and
    retrieval. For example, audit logs are stored by the time they occur, not grouped under an action.

* Content Model Structure

    The following is a sample tree that represents the content model:

----
.
`-- repositories/
    `-- central/
        |-- config/
        |   |-- name=
        |   |-- storageUrl=
        |   `-- uri=
        |-- content/
        |   `-- org/
        |       `-- apache/
        |           |-- archiva/
        |           |   `-- platform/
        |           |       |-- scanner/
        |           |       |   |-- 1.0-SNAPSHOT/
        |           |       |   |   |-- scanner-1.0-20091120.012345-1.pom/
        |           |       |   |   |   |-- asc=
        |           |       |   |   |   |-- created=
        |           |       |   |   |   |-- fileCreated=
        |           |       |   |   |   |-- fileLastModified=
        |           |       |   |   |   |-- maven:buildNumber=
        |           |       |   |   |   |-- maven:classifier
        |           |       |   |   |   |-- maven:timestamp=
        |           |       |   |   |   |-- maven:type=
        |           |       |   |   |   |-- md5=
        |           |       |   |   |   |-- sha1=
        |           |       |   |   |   |-- size=
        |           |       |   |   |   |-- updated=
        |           |       |   |   |   `-- version=
        |           |       |   |   |-- ciManagement.system=
        |           |       |   |   |-- ciManagement.url=
        |           |       |   |   |-- created=
        |           |       |   |   |-- dependencies.0.artifactId=
        |           |       |   |   |-- dependencies.0.classifier=
        |           |       |   |   |-- dependencies.0.groupId=
        |           |       |   |   |-- dependencies.0.optional=
        |           |       |   |   |-- dependencies.0.scope=
        |           |       |   |   |-- dependencies.0.systemPath=
        |           |       |   |   |-- dependencies.0.type=
        |           |       |   |   |-- dependencies.0.version=
        |           |       |   |   |-- description=
        |           |       |   |   |-- individuals.0.email=
        |           |       |   |   |-- individuals.0.name=
        |           |       |   |   |-- individuals.0.properties.scmId=
        |           |       |   |   |-- individuals.0.roles.0=
        |           |       |   |   |-- individuals.0.timezone=
        |           |       |   |   |-- issueManagement.system=
        |           |       |   |   |-- issueManagement.url=
        |           |       |   |   |-- licenses.0.name=
        |           |       |   |   |-- licenses.0.url=
        |           |       |   |   |-- mailingLists.0.mainArchiveUrl=
        |           |       |   |   |-- mailingLists.0.name=
        |           |       |   |   |-- mailingLists.0.otherArchives.0=
        |           |       |   |   |-- mailingLists.0.postAddress=
        |           |       |   |   |-- mailingLists.0.subscribeAddress=
        |           |       |   |   |-- mailingLists.0.unsubscribeAddress=
        |           |       |   |   |-- maven:buildExtensions.0.artifactId=
        |           |       |   |   |-- maven:buildExtensions.0.groupId=
        |           |       |   |   |-- maven:buildExtensions.0.version=
        |           |       |   |   |-- maven:packaging=
        |           |       |   |   |-- maven:parent.artifactId=
        |           |       |   |   |-- maven:parent.groupId=
        |           |       |   |   |-- maven:parent.version=
        |           |       |   |   |-- maven:plugins.0.artifactId=
        |           |       |   |   |-- maven:plugins.0.groupId=
        |           |       |   |   |-- maven:plugins.0.reporting=
        |           |       |   |   |-- maven:plugins.0.version=
        |           |       |   |   |-- maven:properties.mavenVersion=
        |           |       |   |   |-- maven:repositories.0.id=
        |           |       |   |   |-- maven:repositories.0.layout=
        |           |       |   |   |-- maven:repositories.0.name=
        |           |       |   |   |-- maven:repositories.0.plugins=
        |           |       |   |   |-- maven:repositories.0.releases=
        |           |       |   |   |-- maven:repositories.0.snapshots=
        |           |       |   |   |-- maven:repositories.0.url=
        |           |       |   |   |-- name=
        |           |       |   |   |-- organization.favicon=
        |           |       |   |   |-- organization.logo=
        |           |       |   |   |-- organization.name=
        |           |       |   |   |-- organization.url=
        |           |       |   |   |-- relocatedTo.namespace=
        |           |       |   |   |-- relocatedTo.project=
        |           |       |   |   |-- relocatedTo.projectVersion=
        |           |       |   |   |-- scm.connection=
        |           |       |   |   |-- scm.developerConnection=
        |           |       |   |   |-- scm.url=
        |           |       |   |   |-- updated=
        |           |       |   |   `-- url=
        |           |       |   `-- maven:artifactId=
        |           |       `-- maven:groupId=
        |           `-- maven/
        |               `-- plugins/
        |                   |-- maven:groupId=
        |                   |-- maven:plugins.compiler.artifactId=
        |                   `-- maven:plugins.compiler.name=
        |-- facets/
        |   |-- org.apache.archiva.audit/
        |   |   `-- 2010/
        |   |       `-- 01/
        |   |           `-- 19/
        |   |               `-- 093600.000/
        |   |                   |-- action=
        |   |                   |-- artifact.id=
        |   |                   |-- artifact.namespace=
        |   |                   |-- artifact.projectId=
        |   |                   |-- artifact.version=
        |   |                   |-- remoteIP=
        |   |                   `-- user=
        |   |-- org.apache.archiva.metadata.repository.stats/
        |   |   `-- 2009/
        |   |       `-- 12/
        |   |           `-- 03/
        |   |               `-- 090000.000/
        |   |                   |-- scanEndTime=
        |   |                   |-- scanStartTime=
        |   |                   |-- totalArtifactCount=
        |   |                   |-- totalArtifactFileSize=
        |   |                   |-- totalFileCount=
        |   |                   |-- totalGroupCount=
        |   |                   `-- totalProjectCount=
        |   `-- org.apache.archiva.reports/
        `-- references/
            `-- org/
                `-- apache/
                    `-- archiva/
                        |-- parent/
                        |   `-- 1/
                        |       `-- references/
                        |           `-- org/
                        |               `-- apache/
                        |                   `-- archiva/
                        |                       |-- platform/
                        |                       |   `-- scanner/
                        |                       |       `-- 1.0-SNAPSHOT/
                        |                       |           `-- referenceType=parent
                        |                       `-- web/
                        |                           `-- webapp/
                        |                               `-- 1.0-SNAPSHOT/
                        |                                   `-- referenceType=parent
                        `-- platform/
                            `-- scanner/
                                `-- 1.0-SNAPSHOT/
                                    `-- references/
                                        `-- org/
                                            `-- apache/
                                                `-- archiva/
                                                    `-- web/
                                                        `-- webapp/
                                                            `-- 1.0-SNAPSHOT/
                                                                `-- referenceType=dependency
----

    ~~ To update - run "tree --dirsfirst -F" on the unpacked content-model.zip from the sandbox

    This uses a typical content repository structure, where there is a path to a particular node (the last paths in
    the structure above), and nodes can have properties and values (shown as <<<property=value>>> above).

    Properties with '.' may be nested in other representations such as Java models or XML, if appropriate - this is
    the decision of the content repository persistence implementation.

    Additionally, while some information is stored at the most generic level in the metadata repository (eg
    <<<maven:groupId>>>, <<<maven:artifactId>>>), for convenience when loaded by the implementation it may all be pushed
    into the project version's information. The metadata repository implementation can decide how best to store and
    retrieve the information.

    <Note:> Some of the properties have been put in place temporarily but need to be revisited - for example the use
            of index counters for the lists of Maven POM information are not ideal, and some Maven specific aspects of
            the dependencies should become faceted content

    The following sections walk through parts of the tree.

** Configuration section

   <Note:> The configuration section is not currently implemented in the code. It should be shadowed to a file on the
           file system for easy editing and pre-configuration outside the server. A possible implementation is to use
           the same storage and resolution mechanism to access the configuration so that this can be achieved, and it
           can be loaded on the fly, etc.

   It is desirable to be able to access and modify all configuration through the same interfaces, so it is also stored
   in the content repository.

   Each repository will have it's own metadata, but there also needs to be a server-level configuration for other parts
   of the system.

** Content section

    The content section houses the information directly about the artifacts in the repository. As described in the
    {{{./terminology.html} Terminology}} document, artifacts are described by the following coordinates (with the values
    shown from the example above):

        * Namespace (<<<org.apache.archiva.platform>>>)

        * Project ID (<<<scanner>>>)

        * Project version (<<<1.0-SNAPSHOT>>>)

        * Artifact ID (<<<scanner-1.0-20091120.012345-1.pom>>>)

        []

    Namespaces are of arbitrary depth, and are project namespaces, not to be confused with JCR's item/node namespaces.
    A separate namespace and project identifier are retained to allow '.' in the project identifier without splitting,
    while still allowing splitting on '.' in the namespace, when determining the most appropriate path for an artifact
    in the content repository. The namespace may be null if there isn't one.

    Projects are very simple entities. They do not have subprojects - if such modeling needs to be done, then we
    would create a "products" tree (or similar) that will map what "Archiva 1.0" contains as a collection of project
    version nodes, for example.

    Each artifact in the repository will contain an entry, though not necessarily every file. For example, in a Maven
    repository it is known that the <<<.md5>>>, <<<.sha1>>> and <<<.asc>>> files represent metadata about the artifact
    of the same name, so that is attached to that node instead.

    Metadata is stored at the level most appropriate to that piece of information. This means that in a Maven
    repository, while both the POM and other artifact(s) are considered be separate artifacts, they all share the
    information in the POM that is stored at the project version or even project level. We only keep one set of project
    information for a version - this differs from Maven's storage of one POM per snapshot. The Maven 2 module will take
    the latest snapshot data and use that. Those that need Maven's behaviour should retrieve the POM directly. 

    Note that artifact data is not stored in the metadata repository (there is no data= property on the file). The
    information here is enough to locate the file in the original storage when it is requested.

    The following describes some of the metadata at each level. Note that the Maven extensions are covered here - these
    are optional, and they wouldn't be present on a non-Maven storage repository. Likewise, plugins may store
    additional metadata for each artifact.

*** Namespace Metadata

    * <<<maven:groupId>>> - the Maven group ID

*** Project Metadata

    * <<<maven:artifactId>>> - the Maven artifact ID

*** Project Version Metadata

    * <<<created>>> - when the metadata was added to the repository (see [1] below)

    * <<<updated>>> - when the metadata was last updated (see [1] below)

    * <<<name>>> - human-readable project name

    * <<<description>>> - a human-readable description of this project

    * <<<url>>> - a URL to the project's documentation or other information

    * <<<organization[].*>>> - information about the organization

    * <<<licenses[].*>>> - the license the project source code is available under

    * <<<issueManagement.*>>> - the issue tracker used by the application

    * <<<ciManagement.*>>> - continuous integration server information

    * <<<dependencies[].*>>> - other projects that this project version depends on. Note that currently this contains
                               Maven specifics that are expected to be abstracted out into the Maven extensions

    * <<<individuals[].*>>> - participants in the development of, or otherwise associated with, the project

    * <<<scm.*>>> - information about the SCM used to store the project source code

    * <<<relocatedTo.*>>> - co-ordinates that this artifact has been relocated to

    * <<<maven:packaging>>> - the packaging value in the Maven POM

    * <<<maven:parent.*>>> - a reference to the Maven parent POM

    * <<<maven.plugins[].*>>> - references to build plugins in a Maven POM

    * <<<maven.repositories[].*>>> - references to other repositories in a Maven POM

    * <<<maven:buildExtensions[].*>>> - references to build extensions in a Maven POM

    * <<<maven:properties.*>>> - properties stored in a Maven POM

    []

    Footnotes:

        [[1]] created/updated timestamps may be maintained by the metadata repository implementation for the metadata
              itself. Timestamps for individual files are stored as additional properties (<<<fileCreated>>>,
              <<<fileLastModified>>>). It may make sense to add a "discovered" timestamp if an artifact is known to be
              created at a different time to which it is added to the metadata repository.

** Facets Section

    The facets section allows storage of other repository metadata for specific plugins. Each is named by the plugin's
    unique identifier.

*** Audit Logs (<<<org.apache.archiva.audit>>>)

    Audit logs are stored hierarchically by name, breaking down the date until getting to the timestamp of a particular
    event. The event details are stored as properties of that node. Presently filtering by an action or other field
    would require querying the content repository.

        * <<<action>>> - the action that was taken, such as uploading an artifact

        * <<<artifact.*>>> - the co-ordinates of the artifact affected

        * <<<remoteIP>>> - the IP address of the person executing the action, if applicable

        * <<<user>>> - the user affecting the action, if applicable

        []

    A future possibility is to store audit metadata on artifacts themselves (who uploaded, when, and how), or whether it
    was discovered by scanning. While this duplicates some information, it would reduce the need to query by a certain
    artifact ID and the nodes could be lined referentially.

    Audit metadata may also need to be extended to other nodes such as configuration. In this case, it may make sense
    to alter the artifact reference to a content repository path instead, or to utilise a native mechanism of the
    content repository.

*** Repository Statistics (<<<org.apache.archiva.metadata.repository.stats>>>)

    Like audit logs, repository statistics are stored by timestamp, marking the time a scan started. The results are
    stored as properties of the scan:

        * <<<scanStartTime>>>, <<<scanEndTime>>> - when the scan ran from and until

        * <<<total*>>> - the statistics gathered about certain totals in the repository

        []

    The current approach of tying statistics to the scanning process is not optimal, as it cannot be 'live'. We may
    later determine if any of the stats can be derived by functions of the content repository rather than storing and
    trying to keep them up to date. Historical data might be retained by versioning and taking a snapshot at a given
    point in time. 

*** Problem Reports (<<<org.apache.archiva.reports>>>)

    While not shown above, the problem reporting plugin similarly stores a facet of information, recording particular
    issues noticed in the repository such as invalid Maven POMs, etc.

** References Section

    The references section contains information about references to a given artifact. It is the inverse of the
    dependency relationship.

    References are stored outside the main model so that their creation doesn't imply a "stub" model - we know if the
    project exists whether a reference is created or not. References need not infer referential integrity.