git-svn-id: https://svn.apache.org/repos/asf/maven/repository-manager/trunk@425308 13f79535-47bb-0310-9956-ffa450edef68

18 years ago · 87b0eda46c
--- a/maven-repository-indexer/src/site/apt/design.apt
+++ b/maven-repository-indexer/src/site/apt/design.apt
@@ -0,0 +1,77 @@
 -----
 Indexer Design
 -----
 Brett Porter
 -----
 25 July 2006
 -----

 Indexer Design

  <<Note: The current indexer design is under review. This document will grow into what it should be, and the code and
  tests refactored to match>>

 * Standard Artifact Index

  We currently want to index these elements from the repository:

    * for each artifact file: the artifact ID, version, group ID, classifier, type (extension), filename,
      checksums (md5, sha1) and size

    * for each artifact POM: the packaging, licenses, dependencies, build plugins, reporting plugins

    * plugin prefix from the repository metadata (in the future, more may be indexed)

    * the identifier of the source repository

  Each record in the index refers to an artifact. Since the content for a record can come from various sources, the
  record may need to be updated when different files that are related to the same artifact are discovered (ie, the
  POM, or for plugins the metadata that contains their prefix).

  Records in the index are generally keyed by their dependency conflict ID (ie, a combination of group, artifact,
  version, type  and classifier). The exception to this rule is the POM: if an entry already exists with a different
  type but the same group, artifact, version and no classifier, then a POM entry is not added and the model fields are
  applied to the existing entry. Conversely, if a POM is added first and an artifact with the same group, artifact,
  version and no classifier is later added then it overwrites the record of the POM.

  The above process, especially with regard to the handling of the POM, should be much simpler if the discoverer is
  able to associate a POM to the artifact instead of feeding them in separately as it does at present.

  While some of the information stored is specific to a particular type of file, it is all maintained in a single index
  for simplicity. In the future, if the content of the various documents diverges greatly, it may be split into separate
  indexes. In that case, we may consider using Lucene's
  {{{http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-b11296f9e7b2a5e7496d67118d0a5898f2fd9823} multiple index
  searching capabilities}}.

  Currently, the discoverer returns POMs as separate artifact entries to the actual artifact, and any derived artifacts
  in the repository. To accommodate this, when indexed

  Note that archetypes currently don't have a packaging associated with them in Maven, so it is not recorded in the POM.
  However, to be able to search by this type, the indexer will look for a <<<META-INF/maven/archetype.xml>>> file, and
  if found set its packaging to <<<maven-archetype>>>. In the future, this handling will be deprecated as the POMs
  can start using the appropriate packaging.

  The index is shared among multiple repositories. The source repository is recorded in the index record. The indexer
  should complain if an artifact is attempted to be updated from a different repository at a later date to avoid
  duplicates. Ideally, the discovery/conversion mechanisms would deal with this before reaching the indexer.

  When indexing metadata from a POM, the POM should be loaded using the Maven project builder so that inheritance and
  interpolation are performed. This ensures that the record is as complete as possible, and that searching by
  fields that are inherited will reveal both the parent and the children in the search results.

 * Reduced Size Index

  An additional index is maintained by the repository manager in the
  {{{../apidocs/org/apache/maven/repository/indexing/MinimalIndex.html} MinimalIndex}} class. This indexes all of the
  same artifacts as the first index, but stores them with shorter field names and less information to maintain a smaller
  size. This index is appropriate for use by certain clients such as IDE integration for fast searching. For a fuller
  interface to the repository information, the integration should use the XMLRPC interface.

  ~~TODO: finish!

 * Limitations

  Currently, because the POM and artifacts are fed in separately, there is no way to associate an artifact with a
  classifier to its POM, meaning there is less information about it in the index. It may be best that this occurs by
  design - it seems that while it is desirable to search by classifier you only want to find the main artifact for
  browsing and see the derived artifact listed under that. How this evolves should be carefully considered.
--- a/maven-repository-indexer/src/site/site.xml
+++ b/maven-repository-indexer/src/site/site.xml
@@ -0,0 +1,24 @@
 <?xml version="1.0" encoding="ISO-8859-1"?>
 <!--
  ~ Copyright 2005-2006 The Apache Software Foundation.
  ~
  ~ Licensed under the Apache License, Version 2.0 (the "License");
  ~ you may not use this file except in compliance with the License.
  ~ You may obtain a copy of the License at
  ~
  ~      http://www.apache.org/licenses/LICENSE-2.0
  ~
  ~ Unless required by applicable law or agreed to in writing, software
  ~ distributed under the License is distributed on an "AS IS" BASIS,
  ~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  ~ See the License for the specific language governing permissions and
  ~ limitations under the License.
  -->

 <project>
  <body>
    <menu name="Design Documentation">
      <item name="Indexing Design" href="/design.html"/>
    </menu>
  </body>
 </project>