Download Ontology Draft

SebastianTrueg: I am proposing the following download ontology. The idea is that each download process (such as an http download or a torrent download or even saving an attachment) is represented by an instance of ndo:Download (or more precise: a subclass). This download is then linked to the local files that have been downloaded and things like the source URL or the IMAccount. As e-mail clients also talk about "downloading" an attachment (i.e. Thunderbird does "download" files when storing attachments) the same conceptual model will be used to annotate e-mail attachments that are stored in the filesystem.

LeoSauermann supports all this.

LeoSauermann suggests to host the namespace on as all the other ontologies are also there.

evgeny.egorochkin: objects.

This draft is of course useful so that we can see what download manager people need. However, I'd suggest a number of changes.

The most important change is instead of making tons of ndo:RemoteDownload children and defining properties for them, reuse NIE stuff. Make a property like ndo:sourcedFrom property which would point to a nie:DataObject or nie:InformationElement representing the original location. This would transparently support:

  • simple cases like http source(providing also login, password details and whatnot)
  • [ ED2K/Kad] could store file hash using nfo:FileHash(if the download is in progress, the has of the local file copy is still unvailable)
  • BitTorrent could use NIE to express torrent contents (afaik bittorrent provides file names, sizes and hashes). Also the .torrent file itself could be modelled as either a local file or a remote file using the same NIE stuff. This way you can also represent .torrent files not involved in any downloads. Of course this means some properties need to be added to NIE tree.
  • email attachments and files sent via IM are as well supported by this approach thanks to nfo:Attachment. This also removes the need for ndo:message property and possibly some other.

Files sent via IM are best represented as a message with a file attached. Message text can be empty although I think I saw IM services letting people add a description to the file being sent. This also lets you easily log IM file transfers just like you log message history.

Also please consider a related use case: copying files locally or without involving a download manager. This possibly means that it's better to use nxx:sourcedFrom property to directly link the original file nie:DataObject and it's local copy, and either add extra ndo propertie to the local copy or create a resource to hold these properties linked to the local copy. One more reason to go this way is multi-sourced donwloads, including multi-protocol downloads.

One more thing: ndo:TorrentDownload obviously asks for a ndo:P2PDownload parent since BitTorrent isn't the only P2P network.

An example of a file downloaded from a torrent represented by a .torrent file which it itself stored on a remote site(a typical use case of user clicking a link to a .torrent file in their browser and ktorrent popping up):

  a nfo:FileDataObject, nfo:Document;
  nfo:fileName "somefile.odf";
  nxx:sourcedFrom user:SomeFile.

  a nfo:EmbeddedFileDataObject, nfo:Document; # a more specific DataObject could be used
  nfo:fileName "somefile.odf";
  nfo:fileSize 834756;
  nie:isLogicalOf user:SomeTorrent;

  a nfo:RemoteDataObject, nfo:Container; # a more specific container could be used
  nfo:fileName "test.torrent";
  nie:hasLogicalPart user:SomeFile.

  a ndo:Download;
  ndo:startTime "1233333";
  ndo:endTime "28734612";
  ndo:downloadApplication application:/KTorrent;
  nie:hasLogicalPart user:SomeLocalFile,
                     user:someLocalFile2. # list of files that are a part of this download action

SebastianTrueg: I see why you want to simplify the system, evgeny. However, using files for everything cannot be the answer. I had the goal to model the actual download action using NDO. Not just linking files to their sources. With your approach there is no way to remember the application that downloaded or the time the download was started and finished. At least not the easy way I propose it. We would have to add other properties alongside "sourcedFrom" like "downloadTime" and "downloadApp". And then exactly what I wanted to prevent happens: data duplication for all files. And to top it off we still would not have a grouping of downloaded files since even if the download apps and times of two files are the same, they could still be two separate downloads. Thus, we would loose information at the additional cost of data duplication.

Leo, I agree to the namespace.

evgeny.egorochkin: I have elaborated my example to address your concerns. My explanation was too cryptic I guess. The idea is that you still keep the download action resource(ndo:Download), it groups downloaded files, it adds necessary properties such as startTime, but instead of ndo:Download linking the download source, the download source is linked by ndo:sourcedFrom.

What advantages this provides:

  • avoids having ndo:Download class tree mirror a good deal of nie:DataObject tree.
  • support of copying/download actions not involving a download manager, by decoupling data origin information form user actions
  • cleaner data origin semantics:
    • each separate file "knows" where it originated from which makes a difference for eg torrents since knowing a file was downloaded off a specific torrent doesn't mean we know which of the source files it was
    • ability to better describe data origin relation semantics in terms of other similar relations such as derivation relation.
  • elaborate description of original data files such as being able to represent a .torrent as a container with folders, files, file metadata such as size and hash.

SebastianTrueg: I think I understand better now. So basically instead of defining subclasses such as TorrentDownload you want to locally represent the files. I see some problems with that:

  1. The torrent file is not at all the source of the download, it is merely a means to finding the source. Following the same logic the source file is not part of the torrent file. Thus, your example does not model the situation correctly. You cannot really model the source file as it is not really a file but something like a stream. The "original" file is scattered across multiple machines throughout the torrent network. Thus, to be exact one would have to model the entire torrent locally, i.e. introduce a new class "Torrent" which has a list of virtual files which then could be the sources.
  2. There is no way to model IM downloads as I discussed in a blog a few weeks back. This is because you cannot model the original file using NIE. At least I would not know how.
  3. A download source such as a URI might not even represent a real file. It might be a stream that is created on the fly. IMHO in cases like that using a NFO object to model it will result in imprecise information again.
  4. Minor issue: if the downloaded file is deleted the source resource is still dangling around without any real use. A download object on the other hand still has some use. It would even allow to reproduce the download since it contains the information of being a download.

evgeny.egorochkin: Replies in the same order:

  1. You are correct. The torrent file should be using nie:hasLogicalPart to link to its contents and not nie:hasPart since it indeed just a collection of references. I've fixed my example to address this.
  2. If you refer to this one: , I see two issues here:
    • "Sent by Tudor Groza". It's best modelled by introducing a nxx:recommendedBy relation. It wouldn't matter then if the person sent you the file directly, in an email or sent a link to the file. The question is: should nxx:recommendedBy link to the original file or the local copy. Original seems to be more logical.
    • Representing of files sent via IM apps. Two approaches here:
      • Kopete/KIO current approach. Model the file as a remote resource of sorts
      • (Empty) message with an attachment approach: lets you keep a common log of messages and files sent and reuses nmo framework to store the sender of the message etc, while in Kopete approach you'd have to define a subclass of RemoteResource to store this.
  3. All resources have a limited lifetime. A HTTP URL might one day return 404 yet the URL itself is still useful. If you pay attention how Kopete transfers files between users, you'll see kopete:/ uri ;)
  4. The decision of which objects to keep and which to delete is up to the implementation. Do you delete artist's nco:Contact if all songs by this artist are deleted? As to reproducibility of download, yes my approach could still keep nxx:recommendedBy, but Download object itself would be pretty useless. Is this a serious use case? I don't know, but probably if you require regular redownloading of stuff, then you'd need some other approach to define this and the reason why you need to redownload the thing. To make things short: ndo:Downlad(my version) defines the download action only. If you need to regularly redownload file, this is better decoupled from download and described as a Job or some important data source, or marked in a some other meaningful way.

<> {
        a rdfs:Class ;
        rdfs:label "download" ;
        rdfs:comment "Represent one download process like an http or a torrent download or even saving an email attachment" .

        a rdfs:Class ;
        rdfs:subClassOf ndo:Download ;
        rdfs:label "remote download" ;
        rdfs:comment "abstract base class for all remote downloads" .

        a rdfs:Class ;
        rdfs:subClassOf ndo:RemoteDownload ;
        rdfs:label "HTTP download" ;
        rdfs:comment "An HTTP download" .

        a rdfs:Class ;
        rdfs:subClassOf ndo:RemoteDownload ;
        rdfs:label "FTP download" ;
        rdfs:comment "An FTP download" .

        a rdfs:Class ;
        rdfs:subClassOf ndo:RemoteDownload ;
        rdfs:label "torrent download" ;
        rdfs:comment "A torrent download" .

        a rdf:Property ;
        rdfs:label "The source URL of a download" ;
        rdfs:comment "Abstract property which stores the source URL" ;
        rdfs:domain ndo:RemoteDownload ;
        rdfs:range rdfs:Resource .

        a rdfs:Class ;
        rdfs:subClassOf ndo:Download ;
        rdfs:label "message download" ;
        rdfs:comment "a download from a message such as saving an attachment from an email" .

        a rdfs:Class ;
        rdfs:subClassOf ndo:Download ;
        rdfs:label "IM download" ;
        rdfs:comment "a download from an instant messaging system" .

        a rdf:Property ;
        rdfs:label "sending contact" ;
        rdfs:comment "the contact that sent the file(s)" ;
        rdfs:domain ndo:IMDownload ;
        rdfs:range nco:IMAccount ;
        nrl:cardinality "1"^^xsd:nonNegativeInteger .

        a rdf:Property ;
        rdfs:label "message" ;
        rdfs:comment "the message from which the file/attachment was saved" ;
        rdfs:domain ndo:MessageDownload ;
        rdfs:range nmo:Message .

    # normally this should have domain nfo:FileLike and xesam:File
    # but Nepomuk sadly does not support multiple domains yet
        a rdf:Property ;
        rdfs:domain rdfs:Resouce ;
        # rdfs:domain xesam:File ;
        # rdfs:domain nfo:FileLike ;
        rdfs:range ndo:Download ;
        rdfs:label "download" ;
        rdfs:comment "the download which created this file" ;
        nrl:maxCardinality "1"^^xsd:nonNegativeInteger .

        a rdf:Property ;
        rdfs:subPropertyOf ndo:sourceUrl ;
        rdfs:label "torrent tracker" ;
        rdfs:comment "URL of the tracker that provided the torrent" ;
        rdfs:domain ndo:TorrentDownload .

        a rdf:Property ;
        rdfs:label "originating web page" ;
        rdfs:comment "the web page where this download originated. This often differs significantly from the source URL" ;
        rdfs:domain ndo:RemoteDownload ;
        rdfs:range nfo:Website ;
        nrl:maxCardinality "1"^^xsd:nonNegativeInteger .

        a rdf:Property ;
        rdfs:label "start time" ;
        rdfs:comment "time the download was started" ;
        rdfs:domain ndo:Download ;
        rdfs:range xsd:dateTime ;
        nrl:maxCardinality "1"^^xsd:nonNegativeInteger .

        a rdf:Property ;
        rdfs:label "end time" ;
        rdfs:comment "time the download was finished" ;
        rdfs:domain ndo:Download ;
        rdfs:range xsd:dateTime ;
        nrl:maxCardinality "1"^^xsd:nonNegativeInteger .

        a rdf:Property ;
        rdfs:label "download application" ;
        rdfs:comment "the application that performed the download" ;
        rdfs:domain ndo:Download ;
        rdfs:range naso:Service ;
        nrl:maxCardinality "1"^^xsd:nonNegativeInteger .
Last modified 6 years ago Last modified on 03/18/09 12:24:20