Author Topic: Blog: KDE Nepomuk Miscellaneous Stuff  (Read 889 times)

Offline menotu

  • PCLinuxOS Tester
  • Super Villain
  • *******
  • Posts: 15288
  • ┌∩┐(◕_◕)┌∩┐
Re: Nepomuk Miscellaneous Stuff
« Reply #15 on: March 10, 2013, 11:34:57 AM »
only being into linux a year exactly what does
Quote
What's new with Nepomuk 4.10
do, Its on monty but I tried it and it seemed to slow me down so i disabled it. So does the usual user need it?

The version in KDE 4.9.5 isn't such a resource hog as the earlier ones used to be. It shouldn't slow you down noticeably. It seems to be useful, particularly with a Search and Launch desktop.

It's slightly irritating though that a search will give you different results depending on whether you use the Dophin, Konqueror, the Search and Launch desktop or the command line. (But try all of them.)

+1

Now that having to "manually" enable Nepomuk/Strigi indexing (I do it via the System Tray)  is far better than how it worked in the "old days "   ;D as that was t'other way round......

Resource usage is much lighter as well
PCLinuxOS 32bit KDE 4.10.1; kernel-3.4.11-pclos1.bfs & 64bit 3.2.18bfs; NVidia GeForce 8400GS 1GB 310.19 driver

Sony Vaio SVE1513A4ESI Laptop, Intel Core i5, 2.6GHz, 6GB RAM, 750GB, 15.6" Intel HD Graphics 4000

Offline menotu

  • PCLinuxOS Tester
  • Super Villain
  • *******
  • Posts: 15288
  • ┌∩┐(◕_◕)┌∩┐
Re: Nepomuk Miscellaneous Stuff
« Reply #16 on: March 22, 2013, 06:15:45 AM »
By Vishesh Handa - 21 March  2013

Nepomuk - Simplifying the debugging process

When something goes wrong in Nepomuk, its easy for us Nepomuk developers to track it down, but for other developers and users it can be quite hard. Even simple things like reporting which component is malfunctioning isn’t completely obvious.

Over the last month, we have simplified some of the external details and added tools which will help us debug your problems so that we can fix things more easily. These all will be shipped with nepomuk-core in 4.11

Nepomuk2::Service2

Nepomuk, like most modular architectures, has a number of different plugins or as we like to call them “services”. Traditionally each service would be installed as a library that would be loaded by the nepomukservicestub process. When most users would try to provide debugging information they mostly just provide the process name - nepomukservicestub. This doesn’t tell us much, since all the heavy lifting is done by the nepomuk services. The client libraries are mostly just light wrappers.

With the 4.10 release we have 3 major services -

    Storage Service
    File Watch Service
    File Indexing Service

Each can be started by calling the nepomukservicestub along with the service name. Eg - nepomukservicestub nepomukfilewatch.

Currently in master, we have moved away from this approach and each service now installs its own process. So you should no longer see any nepomukservicestubs. Instead you’ll see a nepomukstorage, nepomukfilewatch and nepomukfileindexer process.

This greatly simplifies the debugging process as the users can easily report which process is problematic, and starting a service is just a matter of running the correct executable.

Tools

Restarting Nepomuk and looking into the database has traditionally required some dbus commands. These commands were always apparent to us developers, but it’s good to have some standard ways of managing Nepomuk.

NepomukCtl

Thanks to Gabriel, we now have nepomukctl which acts very similar to the akonadictl. It can be easily used to start, stop and restart Nepomuk or any individual service.

$ nepomukctl start
$ nepomukctl restart
$ nepomukctl restart fileindexer


This is a great tool to check if Nepomuk is actually running.

Nepomuk Show

In order to view the data inside the Nepomuk database one typically needs to issue a query. This requires the developers to know SPARQL. Typically users do not want to go into so much effort when they are debugging simple stuff.

Now nepomukshow can be used to easily view the resource information.

$ nepomukshow Politik.mp3

<nepomuk:/res/94e816c8-2466-4172-9bcd-0b9129cd17f8> rdf:type nmm:MusicPiece rdf:type nfo:FileDataObject rdf:type nfo:Audio rdf:type nie:InformationElement nao:created 2013-03-21T18:06:54Z nao:lastModified 2013-03-21T18:06:57Z nie:url file:///home/vishesh/Music/Coldplay/Politik.mp3 nie:mimeType audio/mpeg nie:title Politik nie:lastModified 2009-08-20T11:58:04Z nie:contentCreated 2002-03-21T18:06:54Z nie:created 2011-11-21T15:32:52Z nfo:averageBitrate 1.2800000000e+05 nfo:sampleRate 4.4100000000e+04 nfo:fileSize 5088374 nfo:fileName Politik.mp3 nfo:channels 2 nfo:duration 318 nmm:performer nepomuk:/res/8ffe10cc-c375-485f-

f00-b1d5b211ae7f
  nmm:trackNumber     1
  nmm:genre           20
  nmm:musicAlbum      nepomuk:/res/22c13ad9-f234-4e60-910a-9159b47cd290
  kext:indexingLevel  2

This is a great tool to use to see what all information has been indexed about a file or if a file has been indexed at all. It can even be used to check if an email has been indexed, though the syntax is a little different.

$ nepomukshow 'akonadi:?item=39618'

<nepomuk:/res/b8ef2a3f-9112-4dfb-9071-4b1ce7544b1b>
  rdf:type            aneo:AkonadiDataObject
  rdf:type            nmo:Email
  nao:created         2012-12-11T10:29:20Z
  nao:hasSymbol       internet-mail
  nie:isPartOf        <akonadi:?collection=44>
  nie:byteSize        3998
  nie:url             <akonadi:?item=39618>
  nmo:isRead          1
  nmo:to              nepomuk:/res/cbee003e-7a36-4dd3-8978-c71c7c91d359
  aneo:akonadiItemId  39618
  nmo:messageSubject  Re: proposals marked as to be accepted in Melange now
  nmo:sentDate        2011-04-18T16:45:59Z
  nmo:from            nepomuk:/res/b900fd2e-dacf-4395-bb72-6c9ed78f5b71
  nmo:messageId       <BANLkTikYWvzvGCM4-N0cOw0NdVdO8BmOpg@mail.gmail.com>

NepomukCmd

Most developers already know of nepomukcmd which was an alias on top of sopranocmd. It could be used to query Nepomuk and contained many more soprano specific features which are now no longer applicable.

We are now shipping our own nepomukcmd tool, which currently only supports sparql queries. It does however support a neat --inference option which can be used to selectively enable and disable inferencing.

These tools right now are in a very early, simple but working state and could use some polishing, and extra features in the future. Contributing to these would be a great way to get involved in Nepomuk. Message me for more details.

http://vhanda.in/blog/2013/03/nepomuk-simplifying-the-debug-process/
PCLinuxOS 32bit KDE 4.10.1; kernel-3.4.11-pclos1.bfs & 64bit 3.2.18bfs; NVidia GeForce 8400GS 1GB 310.19 driver

Sony Vaio SVE1513A4ESI Laptop, Intel Core i5, 2.6GHz, 6GB RAM, 750GB, 15.6" Intel HD Graphics 4000

Offline cirehawk

  • Hero Member
  • *****
  • Posts: 581
Re: Nepomuk Miscellaneous Stuff
« Reply #17 on: March 23, 2013, 01:29:34 PM »
Honestly, I've never really understood how to make use of nepomuk.  I's enabled in the KCC, and if I uncheck/recheck indexing I can see it indexing my files.  However, filters in Dolphin don't seem to do anything.  On the left panel, if I click in the "Search For" area (i.e. Documents, Images, etc.) nothing ever shows up in the query results.  Am I missing something?

Offline menotu

  • PCLinuxOS Tester
  • Super Villain
  • *******
  • Posts: 15288
  • ┌∩┐(◕_◕)┌∩┐
Re: Nepomuk Miscellaneous Stuff
« Reply #18 on: April 17, 2013, 08:49:40 AM »
By Vishesh Handa - 16 April 2013

Merging Nepomuk Graphs

Since I’ve become the maintainer of Nepomuk we have put a strong emphasis on performance and stability. One of the core parts of Nepomuk are the high level operations that are exposed to the applications. These operations are typically used to insert and modify data into Nepomuk. Each of these operations is quite complex and involves a number of complicated queries.

For this 4.11 release we wanted to simplify that code and make it more efficient. This blog post delves in the technical details of what has changed, and then finally goes into how that affects the users.
New Graph Handling

With this 4.11 release I have simplified the concept of graphs. Now there are a limited number of graphs based on the number of Agents that push data into Nepomuk. Each Agent gets its own graph. This way we can still easily implement ‘removeDataByApplication’, and decrease the complexity of our code base.

This grossly simplifies the internals of the Nepomuk code base since no longer need to worry about all the complicated graph handling.

This big change is still in a feature/mergeGraphs branch. I’m still not completely ready to merge it into master.
Benchmarks

These are initial benchmarks that were taken about a month ago. There is still scope for more optimizations. Especially if we combine more of our SPARQL calls. Also these benchmarks were run on a blank database. The difference should be a lot larger when there is some real world data.



The numbers are in msecs. The functions are the higher level functions that all applications use to push any data into Nepomuk.

Whenever you add a tag in Nepomuk, the addProperty/setProperty methods are called. The file indexers and PIM feeder mostly use the storeResources function to push new data and removeDataByApplication to remove existing indexing data.

If you look at the results you’ll notice a substantial increase with 4.10 except for the removeDataByApplication functions where they seem to have taken a severe hit. I’m not too sure why this has happened, the only change in that code base has been one bug fix which should have increased performance.

As noted above the number of graphs in Nepomuk are now limited and we no longer create superfluous graphs. However, all those extra graphs are still present and need to merged into a finite number. This can be a time consuming process.

Tomorrow, I’ll go into the details of how we plan to counter that.

A lot more info  here
PCLinuxOS 32bit KDE 4.10.1; kernel-3.4.11-pclos1.bfs & 64bit 3.2.18bfs; NVidia GeForce 8400GS 1GB 310.19 driver

Sony Vaio SVE1513A4ESI Laptop, Intel Core i5, 2.6GHz, 6GB RAM, 750GB, 15.6" Intel HD Graphics 4000

Offline menotu

  • PCLinuxOS Tester
  • Super Villain
  • *******
  • Posts: 15288
  • ┌∩┐(◕_◕)┌∩┐
Re: Nepomuk Miscellaneous Stuff
« Reply #19 on: April 19, 2013, 09:05:58 AM »
By Vishesh Handa - 19 April 2013

The Nepomuk Migration

A couple of days ago I talked about how we have been clearing up some unwanted data in Nepomuk for the 4.11 release - mainly graphs. This change comes with a increased performance of over 100% in many cases, and makes the codebase simpler, and easier to maintain.

Unfortunately, it comes at a cost.

The graphs in the old database need to be merged to a small number. This operation is a very time consuming process cause merging graphs is equivalent to slowly removing your entire database and reinserting it.

Given that all users will have to go through this migration. We decided to add some additional methods of migrating



1.  Backup Tags and Rating - The user can choose to only backup their file tags and ratings, remove the entire database, and then restore the ratings. The graph merging process is only performed on the tags and ratings when creating the backup.

This process is very fast and is the recommended way.



Once this has been performed, all your files and emails will need to be indexed again, which is actually a good thing cause historically a lot of the indexed data has been quite inferior.

2.  Migrate the existing Data - We can obviously go through the slow process of migrating all of the graphs. This can however easily take a couple of hours for medium sized databases (2.5 gb). I would not recommend this unless you have some really important data that you added on your own that option (1) does not cover.

3.  Start Afresh - Just remove the existing database and start with a fresh Nepomuk installation.

So, far the user is given a choice the first time Nepomuk runs in 4.11. It’s a little ugly, but that can be fixed.

I’m hoping that this will not be too much of a pain and that the users can just click through the wizard using the default option (1). For medium sized databases this entire process gets done in just a couple of minutes.

On the positive side, because of this migration I had a chance to fix and test Nepomuk Backup - Just backing up the Tags and Ratings works very well right now. Backing up the full Nepomuk system still needs to be tested.

I recommend developers to checkout the feature/mergeGraphs branch and try out the migration and just use that branch of Nepomuk for a while. When I’m confident about the migration and the internal changes, I’ll merge it into master.

Happy testing!

http://vhanda.in/blog/2013/04/the-nepomuk-migration/
PCLinuxOS 32bit KDE 4.10.1; kernel-3.4.11-pclos1.bfs & 64bit 3.2.18bfs; NVidia GeForce 8400GS 1GB 310.19 driver

Sony Vaio SVE1513A4ESI Laptop, Intel Core i5, 2.6GHz, 6GB RAM, 750GB, 15.6" Intel HD Graphics 4000

Offline menotu

  • PCLinuxOS Tester
  • Super Villain
  • *******
  • Posts: 15288
  • ┌∩┐(◕_◕)┌∩┐
Re: Nepomuk Miscellaneous Stuff
« Reply #20 on: April 23, 2013, 08:01:56 AM »
This blog may answer a few questions that people may have about  Akonadi and Nepomuk
=========================================================

Posted by Rob Boudreau 19-Apr-2013

Semantic Desktop: Akonadi and Nepomuk

Praised, cursed, often misunderstood, what are KDE's semantic desktop tools for anyway?

The idea of taking the myriad kinds of information stored on a computer, and trying to find the relationships between it so it's more usable, has been around for a long time. "Semantics", the dictionary tells us, "is the study of meaning". The goal of a "semantic desktop" is to take all the bits and pieces of information we as users collect over time, and make it more meaningful, and ultimately more useful.

Akonadi

The Akonadi Framework was created as one piece in an effort to realize a semantic desktop. It's basically a service for collecting, storing and retrieving personal information management (PIM) data. This is actually harder than it would seem. Most PIM applications like calendars, e-mail, address books, journals, notebooks and the like, traditionally use unique file types to store their information. Compounding the problem is all the web-based PIM applications people use today. And if that wasn't difficult enough, there's also the constantly changing API's (Application Programming Interface) used by some of these. Google is a perfect example of this, they've recently announced they're dropping support for CalDAV. It's one of the reasons Akonadi has received so much unfair criticism, it's extremely hard to create something as complex as Akonadi in the constantly shifting sands of APIs.

Despite these hurdles Akonadi has come a long way in it's development. Most of the KDE Plasma Desktop PIM applications now make use of Akonadi, as well as several of the Plasma Desktop widgets. Despite misconceptions, PIM data is kept in it's native file formats, or kept on the remote server in the case of web-based and groupware data. Akonadi only pulls the important, frequently-used bits out, and places them in a database cache for quick, unified access. This information is also handed off to Nepomuk for the real semantic work.

The advantages of Akonadi for PIM users are great. Let's say you want to know when was the last time you had a meeting with John Doe. You could of course open your calendar app and search for John Doe. Akonadi will enable you to do much more with less effort. Instead of opening your calendar application you'll just open Krunner and search for John Doe, which will not only return your last meeting with him, but also any meetings, any mails sent between you, any to-do's you might have had concerning him, his contact information (IM, e-mail, address, phone, etc.), essentially anything you have in your PIM concerning John Doe, as well as any documents and other files that he's mentioned in. By searching for when you last had a meeting with John Doe you just may be reminded he sent you a follow-up e-mail you haven't read yet. Uh-oh, good thing you didn't just search your calendar. Akonadi's not quite there yet, but it's real close.

In our hypothetical example above, Akonadi was responsible for pulling all the information from the various places John Doe is referenced in the PIM. It also allows all the various PIM applications to share this information, as well as some Plasma Desktop widgets. Creating the relationships between the data and returning the search results to Krunner though was the work of another in the semantic desktop duo, Nepomuk.

Nepomuk

I have a habit, as I think most of us do, of putting files in "relate-able" folders. Got pictures from the fishing trip last year? Create a folder called something like "Fishing Trip 2012" and put all the pictures there. Got a project with lots of files? I'd create a folder with a short but (hopefully) meaningful name. This is what people have done since the creation of computers with disks. It works... kind of. But it has a couple of major drawbacks. With modern digital cameras, video recorders, phones and other devices, and the abundance of excellent media applications available, we're storing a lot of files with names like "DSC00023.jpg". Not a lot of help when you're writing to a friend bragging about a whale-sized trout you caught and want to include the picture. Being it was only last year I'd remember which folder it was in, and after browsing with another program for a bit I'd find it, but that took me away from my e-mail, possibly my train of thought, had to open another application, navigate to the needed folder, etc.. And what if the picture I was looking for was taken years ago? I'd probably be looking through folders for quite a while.

Add to that the fact that many of us also copy lots of information from the web - text clippings, media files, PDFs, whole web pages, and even more images. It's a lot of files to try and keep track of. The old folder naming and hierarchy strategy I've used becomes too difficult to manage, making finding that picture of the whale-sized trout, or the recipe for apple turnovers clipped from the web difficult and time-consuming.Nepomuk was created as an answer to this problem, and more. Nepomuk, and it's associated libraries and utilities, pull information in the form of metadata from files and creates a searchable database of that information. Added to that is the ability for the user to create their own metadata in the form or tags, ratings and comments.

With Nepomuk, what the file is, it's contents and it's metadata, are more pertinent than where it is. With Nepomuk it's feasible to place all of a user's files in one folder, like Documents, and still find them quickly when needed, based simply on some known trait like file type, date, tag, comment, rating, contents or a metadata value like a camera model, video length or document creator.However Nepomuk's database isn't of the kind that programs like Beagle, Tracker or Recoll create and use. Those programs use ones that are much like what we think when we hear "database", information gathered and stored in a searchable table. Nepomuk handles and stores it's data based on "ontologies". In a nutshell what ontologies do is create relationships between data, very much like the human brain does. This is a far more complex problem than creating a simple database, and overcoming this problem is one of the reasons Nepomuk had gained a bad, though mostly deserved, reputation. Being memory-hungry and CPU-intensive were the usual labels assigned when users were willing to try using it. That's not the case anymore.

Up until KDE 4.9 Nepomuk used a file indexing program called Strigi. While a good, fairly light-weight indexer, Strigi has drawbacks in getting it to work with Nepomuk in the way needed. But with the release of KDE Plasma Desktop 4.10, Nepomuk is Strigi-less[/b]. Thanks largely to the hard work of a talented developer named Vishesh Handa, Nepomuk was re-worked from the ground up, and even has it's own indexer now. The difference in performance and speed are extremely noticeable. That, and well, it works!

If you use a PIM, or are thinking you may want to try one to organize you digital life, give Kontact, the KDE Plasma Desktop PIM application a try. Akonadi, the back-end for Kontact, has come a long way and just may surprise you.

If you're a Plasma Desktop 4.10 user and don't have Nepomuk indexing turned on in your System Settings because of past bad experiences, give it a try. But note - if you have an older Nepomuk database, run Nepomuk Cleaner first. When that's done, turn on Nepomuk's file indexing. Initially Nepomuk will just index file names and mimetypes, basic stuff to work with the file manager. It will then wait for the computer to be idle for a while before doing a deeper metadata and content index. Depending on how many files you have will dictate how long this will take. But don't worry, the new Nepomuk is very well-behaved. It will nicely move out of your way if you start using your computer again. Nepomuk is starting to live up to it's original vision. The semantic desktop envisioned years ago is becoming a reality.

http://www.muktware.com/5417/semantic-desktop-akonadi-and-nepomuk
« Last Edit: April 23, 2013, 08:03:34 AM by menotu »
PCLinuxOS 32bit KDE 4.10.1; kernel-3.4.11-pclos1.bfs & 64bit 3.2.18bfs; NVidia GeForce 8400GS 1GB 310.19 driver

Sony Vaio SVE1513A4ESI Laptop, Intel Core i5, 2.6GHz, 6GB RAM, 750GB, 15.6" Intel HD Graphics 4000

Offline menotu

  • PCLinuxOS Tester
  • Super Villain
  • *******
  • Posts: 15288
  • ┌∩┐(◕_◕)┌∩┐
Re: Nepomuk Miscellaneous Stuff
« Reply #21 on: May 14, 2013, 05:56:21 AM »
joergs weblog - Monday, May 13, 2013

Nepomuk WebMiner 0.6

A few month have past, this my last WebMiner update. In the meantime I finished my Master Thesis, moved to a new location and started my new job. Perfect time to release a new version with the changes I have made since.

The Nepomuk WebMiner 0.6 adds beside several bugfixes:

    User changeable regular expression for the filename parsing.

    Removed its own and reuse the Nepomuk internal fileindexing to get id3 tags and other file metadata.

    Add whitelist for automatic web search. You might like to lookup the folder with your publication pdfs but not your private documents. Or the network share with your tvshows, but not your private family videos. This works on top of the Nepomuk whitelist. So you Nepomuk can index these files, but not all of them will be websearched.

    Instead of the dull treeview that shows the raw fetched metadata, you can now see and edit the metadata in several fancy edit fields.

You can find the latest release on projects.kde.org or the tarball on kde-apps.org.
Even though I wanted to get this into KDE SC 4.11, I doubt this is going to happen. Soft feature freeze is around the corner and I don't feel comfortable enough to let this be part of SC and annoy all users with this service yet. There are still a lot of usability problems I like to have solved properly before this can be part of of any KDE installation.

So please test the latest release and report any errors back to me.

Blog and images

The Nepomuk-WebMiner Handbook - KDE Documentation - PDF
« Last Edit: May 14, 2013, 06:00:58 AM by menotu »
PCLinuxOS 32bit KDE 4.10.1; kernel-3.4.11-pclos1.bfs & 64bit 3.2.18bfs; NVidia GeForce 8400GS 1GB 310.19 driver

Sony Vaio SVE1513A4ESI Laptop, Intel Core i5, 2.6GHz, 6GB RAM, 750GB, 15.6" Intel HD Graphics 4000