Email Attachment Linker

The Email-Attachment Linkage algorithm aims at preserving the semantic connection between attachment and the email it was sent with. When we save an email attachment on our disk it becomes a simple file, and the connection to the email it was initially attached to is totally lost. This algorithm adds an additional triple in the RDF Repository to link the email's attachment with the corresponding files.

The algorithm searches all the metadata in the RDF Repository for two resources, one with type File and the other with type Attachment which have:

  • the same size and extension,
  • the creation date of the

file after the arrival date of the email, and

  • similar names (used the SecondString library om.wcohen.ss.Jaccard and a threshold of at least 0.5).

The logic behind this type of search is strictly connected to the normal behavior of an user who would save a file from an email, but would not change the extension of the file, or modify the file by itself or even drastically modify the name of the attachment.

Junit test

One Junit test which fails

TODO: breaks at the moment. Maybe Sparql Query in the Generator is not correct ticket:400 should be closed if this is fixed.

There is file with test data