Author Topic: Blog: KDE Nepomuk Miscellaneous Stuff  (Read 859 times)

Offline menotu

  • PCLinuxOS Tester
  • Super Villain
  • *******
  • Posts: 15279
  • ┌∩┐(◕_◕)┌∩┐
Blog: KDE Nepomuk Miscellaneous Stuff
« on: February 02, 2013, 05:04:49 AM »
by Vishesh Handa - Friday, 01 February, 2013

What's new with Nepomuk 4.10

I've blogged about some of the more prominent changes in this new Nepomuk release. I thought it would be a good idea to document all the changes, most of which I haven't publicly blogged about.

File Indexing

As the release announcement has been saying, the file indexer has undergone the maximum number of changes.

New Double Queue Architecture

We've split the working of the indexer into two parts - The first basic indexing and second full file indexing. The basic indexing quickly indexes the basic information about the file such as the filename and mimetype. This allows us to always at least answer simple queries. The other queue, which is only run when the user is idle, extracts the full information about the file.

New File Indexer

We've had some problems with Strigi earlier. With 4.10, we have finally decided to release our own solution. Our solution is arguably technologically inferior, but it's more maintainable and, for now, provides a better user experience.

Mimetype Filtering

One of the advantages of moving to this new file indexing architecture is that mimetypes are a very important part. All of the file indexing plugins use mimetypes to identify which types of files they can index. With this, we decided to allow the user to control the type of files that are indexed.

By default, source code is now no longer indexed. Common stuff like Documents, Images, Audio and Videos are.



KioSlave changes

Till the 4.9 release, the kioslave code hadn't changed much. With 4.9.1, we managed to optimize some of the code. The 4.10 release however takes this to an entirely different level.

Massive Optimizations

The 'nepomuksearch' tagging slave could initially show both non-file and file data. This means that it would also occasionally show contacts, albums and other details. Selecting any of those would result in another search for resources related to that contact. For this release, we decided to optimize for the most common use case of listing files.

The 'nepomuksearch' kioslave, and all other nepomuk kioslaves, now no longer show any result which does not have a URL. This coupled with a LOT of other optimizations, has now yielded a super fast kioslave which can display thousands of results in under a second.

There is also some http://userbase.kde.org/Nepomuk/kioslaves/search  interesting userbase documentation about custom queries on the nepomuksearch kioslave.

Tagging KioSlave

As previously stated, we are also introding a new tagging kioslave. This slave allows you to easily manage you Nepomuk tags, and browse files based on the different tags it contains.




One of the largest part of the Dolphin Information Panel was the KFileMetadataWidget which was provided by kdelibs/kio. This widget was one of the last parts of Dolphin that still used Nepomuk1. Since kdelibs was frozen, we couldn't port it to Nepomuk2. Thus emerged the Nepomuk2::FileMetadataWidget in nepomuk-widgets.

The KFileMetadataWidget historically fetched all the data in another process. This was done because Strigi was a little unreliable. With KDE Workspaces 4.10, we are no longer using Strigi in Nepomuk. This means the widget now uses the nepomukindexer, to extract the data. It also no longer uses this multi-process architecture when loading the Nepomuk data. This result in a massive performance improvement cause we can rely on Nepomuk cache in Dolphin, instead of recreating it each time.

In terms of appearance, the widget has become a little more uniform, and by default only shows the properties that really matter.

Improved Removable Media Handling

Nepomuk has for quite some time supported indexing of removable media handling. However, it didn't always work that great. From a design point of view, the solution was great and extremely robust. This however, came at a steep cost for the rest of Nepomuk. Every other query was affected by these features, and not in a small way. For some simple tests of basic indexing, it made of difference of around 20%.

With this new release, we have gone to a simpler solution which has a lighter performance cost. We have also removed the "Automatic Invalid File Metadata Cleaner" which removed the metadata for any file it could not access. The client code now always checks if the file can be accessed before displaying it to the user.

Nepomuk Backup Changes

With KDE Workspaces 4.6, my Google Summer of Code Project, Nepomuk Backup, was finally merged. It was a very ambitious project which attempted to synchronize, backup and restore data in a non-destructible manner. In the end, it was just a little bit too complex. Large parts of the synchronization code, eventually migrated into the data feeding code which is now used by anyone pushing data into Nepomuk. So, it wasn't a complete loss.

With this new release, I finally got around to throwing away most of the complex code, and implementing a very simple and reliable backup solution. This new method does not require a separate service to be running, and therefore consumes less memory. Additionally, we also have some basic unit tests to ensure that the backups are restored properly!

Please keep in mind that this only backups up the non-destructible data. This does not include the file or email index information. If you want that to be backed up, you're better off just making a copy of the database file.

Nepomuk Cleaner



The Nepomuk Cleaner originated from a series of scripts I was writing to clear up my own database. It eventually occurred to me that other people might suffer from the same problem. The scripts were eventually combined into a cohesive form, and released. The application is very simple right now, but that will change in future releases. I even contemplated not releasing it for 4.10, but it clearly provides some value, even if it doesn't look that great.

Other Changes

Surprisingly, I didn't want to include many new features this releases. I was trying to focus more on stabilization. Over the last 6 months, A total of 246 bugs have been resolved, out of which 188 were reported within the last 6 months. This seems like a good improvement to me.

Apart from these simple changes there have been a number of optimizations all across Nepomuk and Soprano. Nepomuk should be running faster and better than ever before. In some cases we have even seen an over 200% increase in performance.

Anyway, Enjoy the new release! :)

Blog and links
« Last Edit: May 16, 2013, 05:20:37 AM by menotu »
PCLinuxOS 32bit KDE 4.10.1; kernel-3.4.11-pclos1.bfs & 64bit 3.2.18bfs; NVidia GeForce 8400GS 1GB 310.19 driver

Sony Vaio SVE1513A4ESI Laptop, Intel Core i5, 2.6GHz, 6GB RAM, 750GB, 15.6" Intel HD Graphics 4000

Offline menotu

  • PCLinuxOS Tester
  • Super Villain
  • *******
  • Posts: 15279
  • ┌∩┐(◕_◕)┌∩┐
Re: Nepomuk - Miscellaneous Stuff
« Reply #1 on: February 07, 2013, 07:27:41 AM »
joergs weblog - Wednesday, February 6, 2013

Nepomuk WebMiner 0.5

Since my last post a lot has happened to the Nepomuk-WebMiner (former MetaDataExtractor).

The WebMiner went through the KDE Review process and got cleaned up a bit during this process. The new location of it is extragear/base/nepomuk-webminer.

On the code side, I have fixed several bugs and integrated the automatic fetching better into the current Nepomuk system.

The new WebMiner-Service respects the suspend/resume and event monitoring (no internet, low diskspace, on battery mode) in the same was the FileIndexer does it.

When the automatic fetching is started via the command-line or dolphin command, the service is used for the actual fetching. This allows to show the current fetching progress in the nepomukcontroller (in the systray).

Starting with KDE 4.11 the Systemsettings for Nepomuk and the Nepomuk WebMiner are combined and won't show up as two different entries anymore.

         

Instead of the buggy imdb python script that has a hard time following the changes on the imdb website to allow proper fetching of movie resources, a new plugin for themoviedb.org was created.

The next step for the WebMiner will be the full integration into KDE SC for the 4.11 release.
So moving out of extragear again into some other proper place.

In order to make this happen there is still one large blocker task that needs to be done.

So if anyone is good with python and has some time, the script at nepomuk-core/services/storage/rcgen/nepomuk-simpleresource-rcgen.py needs to be improved.

This script is responsible to generate the SimpleResource classes from the used ontology.
As it takes nearly all ontologies into account and is rather slow right now, the call takes ~20 minutes for each generation. This is a pain for anyone compiling the WebMiner from source.

http://joerg-weblog.blogspot.co.uk/search?updated-min=2013-01-01T00:00:00%2B01:00&updated-max=2014-01-01T00:00:00%2B01:00&max-results=1
PCLinuxOS 32bit KDE 4.10.1; kernel-3.4.11-pclos1.bfs & 64bit 3.2.18bfs; NVidia GeForce 8400GS 1GB 310.19 driver

Sony Vaio SVE1513A4ESI Laptop, Intel Core i5, 2.6GHz, 6GB RAM, 750GB, 15.6" Intel HD Graphics 4000

Offline david1958

  • PCLinuxOS Tester
  • Full Member
  • *******
  • Posts: 206
  • Lovin Linux
    • Ed's Cuckoo Uhren
Re: Nepomuk Miscellaneous Stuff
« Reply #2 on: February 12, 2013, 07:20:03 PM »
only being into linux a year exactly what does
Quote
What's new with Nepomuk 4.10
do, Its on monty but I tried it and it seemed to slow me down so i disabled it. So does the usual user need it?
To all Windows Users, Quit being Lazy and learn Linux. You'll Love it after you get the hang of it!
FullMonty Release:            2013.04
Kernel-version:    3.2.18-pclos2.pae.bfs
KDE4-version:                        4.10.1
Biostar mother Board A55MH,  CPU chip A8-3807K

8 gig ram

Offline menotu

  • PCLinuxOS Tester
  • Super Villain
  • *******
  • Posts: 15279
  • ┌∩┐(◕_◕)┌∩┐
Re: Nepomuk Miscellaneous Stuff
« Reply #3 on: February 13, 2013, 06:32:44 AM »
Christian Mollekopf, (cmollekopf.) 13 Feb 2013 - Finding New Ways

Kontact-Nepomuk Integration: Why data from akonadi is indexed in nepomuk

So Akonadi is already a “cache” for your PIM-data, and now we’re trying hard to feed all that data into a second “cache” called Nepomuk, just for some searching? We clearly must be crazy.

The process of keeping these to caches in sync is not entirely trivial, storing the data in Nepomuk is rather expensive, and obviously we’re duplicating all data. Rest assured we have our reasons though.

    Akonadi handles the payload of items stored in it transparently, meaning it has no idea what it is actually caching (apart from some hints such as mimetypes). While that is a very good design decision (great flexibility), it has the drawback that we can’t really search for anything inside the payload (because we don’t know what we’re searching through, where to look, etc)

    The solution to the searching problem is of course building an index, which is a cache of all data optimized for searching. It essentially structures the data in a way that content->item lookups become fast (while normal usage does this the other way round). So that  already means duplicating all your data (more or less), because we’re trading disk-space and memory for searching speed. And Nepomuk is what we’re using as index for that.

Now there would of course be simpler ways to build an index for searching than using Nepomuk, but Nepomuk provides way more opportunities than just a simple, textbased index, allowing us to build awesome features on top of it, while the latter would essentially be a dead end.

To build that cache we’re doing the following:

analyze all items in Akonadi

split them up into individual parts such as (for an email example): subject, plaintext content, email addresses, flags

store that separated data in Nepomuk in a structured way


This results in networks of data stored in Nepomuk:

    PersonA [hasEMailAddress] addressA
    PersonA [hasEMailAddress] addressB
    emailA [hasSender] addressA
    emailB [hasSender] addressB

So this “network” relates emails to email-addresses, and email-addresses to contacts, and contacts to actual persons, and suddenly you can ask the system for all emails from a person, no matter which of the person’s email-addresses have been used in the mails. Of course we can add to that IM conversations with the same Person, or documents you exchanged during that conversation, … the possibilities are almost endless.

Based on that information much more powerful interfaces can be written. For instance one could write a communication tool which doesn’t really care anymore which communication channel you’re using and dynamically mixes IM and email depending on whether/where the other person is currently available for a chat or would rather have a mail, which can be read later on, and doing so without splitting the conversation across various mail/chat interfaces.

This is of course just one example of many (neither am I claiming the idea, it’s just a nice example for what is possible).

So that’s basically why we took the difficult route for searching (At least that is why I am working on this).

Now, we’re not quite there yet, but we already start to get the first fruits of our labor;

KMail can now automatically complete addresses from all emails you have ever received

Filtering in KMail does fulltext searching, making it a lot easier to find old conversations

The kpeoples library already uses this data for contacts merging, which will result in a much nicer addressbook

And of course having the data available in Nepomuk enables other developers to start working with it


I’ll follow up on that post with some more technical background on how the feeders are working and possibly some information on the problematic areas from a client perspective (such as the address auto-completion in KMail).

https://cmollekopf.wordpress.com/2013/02/13/kontact-nepomuk-integration-why-data-from-akonadi-is-indexed-in-nepomuk/
PCLinuxOS 32bit KDE 4.10.1; kernel-3.4.11-pclos1.bfs & 64bit 3.2.18bfs; NVidia GeForce 8400GS 1GB 310.19 driver

Sony Vaio SVE1513A4ESI Laptop, Intel Core i5, 2.6GHz, 6GB RAM, 750GB, 15.6" Intel HD Graphics 4000

Offline Bald Brick

  • PCLinuxOS Tester
  • Hero Member
  • *******
  • Posts: 6371
  • I'm going South
Re: Nepomuk Miscellaneous Stuff
« Reply #4 on: February 13, 2013, 08:08:34 AM »
only being into linux a year exactly what does
Quote
What's new with Nepomuk 4.10
do, Its on monty but I tried it and it seemed to slow me down so i disabled it. So does the usual user need it?

The version in KDE 4.9.5 isn't such a resource hog as the earlier ones used to be. It shouldn't slow you down noticeably. It seems to be useful, particularly with a Search and Launch desktop.

It's slightly irritating though that a search will give you different results depending on whether you use the Dophin, Konqueror, the Search and Launch desktop or the command line. (But try all of them.)

Feed the trolls!
They need it!

AMD Athlon 7450 Dual-Core Processor, 7.80 GiB RAM, Nvidia GeForce GT 120/PCIe/SSE2, OpenGL/ES-version: 3.3 0 NVIDIA 295.40, SBx00 Azalia (Intel HDA) soundcard, ‎Logitech B500 webcam, SAA7146 DVB card, HDDs: Seagate 250824AS, Western Digital WD10EAVS-00D

Offline david1958

  • PCLinuxOS Tester
  • Full Member
  • *******
  • Posts: 206
  • Lovin Linux
    • Ed's Cuckoo Uhren
Re: Nepomuk Miscellaneous Stuff
« Reply #5 on: February 20, 2013, 08:02:55 PM »
tks bald brick. I haven't been on for a while since the weather went way downhill here for Feburary. I  do want to ask do  you yourself have neomunk enabled? I was just playing with it to see just what it might do to proformance. I'm happy with just what I have.
To all Windows Users, Quit being Lazy and learn Linux. You'll Love it after you get the hang of it!
FullMonty Release:            2013.04
Kernel-version:    3.2.18-pclos2.pae.bfs
KDE4-version:                        4.10.1
Biostar mother Board A55MH,  CPU chip A8-3807K

8 gig ram

Offline menotu

  • PCLinuxOS Tester
  • Super Villain
  • *******
  • Posts: 15279
  • ┌∩┐(◕_◕)┌∩┐
Re: Nepomuk Miscellaneous Stuff
« Reply #6 on: February 21, 2013, 05:41:32 AM »
By Gabriel Poesia  - Wednesday, February 20, 2013

More Nepomuk File Watcher backends

After being in an ICPC Training Camp here in Brazil, I'm back to work on Nepomuk. I'll now work on implementing the support for more back-end options in the Nepomuk File Watcher service.

Nepomuk has a service called File Watcher, that monitors the file system, waiting for changes in files (content changed, file deleted, file moved, renamed, etc). When that happens, the changed file has to be reindexed, so the search will use its up-to-date contents.

The Linux kernel has a subsystem called Inotify  that allows you to do that efficiently. You tell inotify what folders you want to monitor, it calls you when it spots an event of your interest. Currently, Nepomuk uses inotify on Linux to watch for changes. But it has it's limitations. For example, the number of watches you can create in a default installation is small, which may be a problem.

Fortunately, there are some alternatives. KDE itself has a mechanism for doing that (KDirWatch), and Linux has the more recent fanotify  Each one has its advantages and disadvantages. What I'll be doing is making the File Watcher support these additional 2 back-ends, and use any subset of the three simultaneously (which will be independently enabled or disabled by the user). With a lot of help from Vishesh, of course.

http://g-poesia.blogspot.co.uk/2013/02/more-nepomuk-file-watcher-backends.html
PCLinuxOS 32bit KDE 4.10.1; kernel-3.4.11-pclos1.bfs & 64bit 3.2.18bfs; NVidia GeForce 8400GS 1GB 310.19 driver

Sony Vaio SVE1513A4ESI Laptop, Intel Core i5, 2.6GHz, 6GB RAM, 750GB, 15.6" Intel HD Graphics 4000

Offline Bald Brick

  • PCLinuxOS Tester
  • Hero Member
  • *******
  • Posts: 6371
  • I'm going South
Re: Nepomuk Miscellaneous Stuff
« Reply #7 on: February 21, 2013, 05:58:31 AM »
tks bald brick. I haven't been on for a while since the weather went way downhill here for Feburary. I  do want to ask do  you yourself have neomunk enabled?

I do indeed.
Feed the trolls!
They need it!

AMD Athlon 7450 Dual-Core Processor, 7.80 GiB RAM, Nvidia GeForce GT 120/PCIe/SSE2, OpenGL/ES-version: 3.3 0 NVIDIA 295.40, SBx00 Azalia (Intel HDA) soundcard, ‎Logitech B500 webcam, SAA7146 DVB card, HDDs: Seagate 250824AS, Western Digital WD10EAVS-00D

Offline david1958

  • PCLinuxOS Tester
  • Full Member
  • *******
  • Posts: 206
  • Lovin Linux
    • Ed's Cuckoo Uhren
Re: Nepomuk Miscellaneous Stuff
« Reply #8 on: February 21, 2013, 07:29:54 AM »
BB, after I just sent u the email I found the post. I got it set ok now. I had to go into the desktop search for nepomunk and re enabeled it and set it to update the files each week. tks again
david
To all Windows Users, Quit being Lazy and learn Linux. You'll Love it after you get the hang of it!
FullMonty Release:            2013.04
Kernel-version:    3.2.18-pclos2.pae.bfs
KDE4-version:                        4.10.1
Biostar mother Board A55MH,  CPU chip A8-3807K

8 gig ram

Offline Bald Brick

  • PCLinuxOS Tester
  • Hero Member
  • *******
  • Posts: 6371
  • I'm going South
Re: Nepomuk Miscellaneous Stuff
« Reply #9 on: February 21, 2013, 10:32:28 AM »
BB, after I just sent u the email I found the post. I got it set ok now. I had to go into the desktop search for nepomunk and re enabeled it and set it to update the files each week. tks again
david

I just saw your PM, but you posted before I had time to reply.

To avoid getting a lot of search result that you aren't interested in it might be a good idea to customize the list of directories that are to be indexed. At least you probably don't want the tmp directories indexed. And how often do you search for something in /var? Indexing /proc would be next to ridiculous. (Some people only index their own files in their home directory....)
« Last Edit: February 21, 2013, 10:34:25 AM by Bald Brick »
Feed the trolls!
They need it!

AMD Athlon 7450 Dual-Core Processor, 7.80 GiB RAM, Nvidia GeForce GT 120/PCIe/SSE2, OpenGL/ES-version: 3.3 0 NVIDIA 295.40, SBx00 Azalia (Intel HDA) soundcard, ‎Logitech B500 webcam, SAA7146 DVB card, HDDs: Seagate 250824AS, Western Digital WD10EAVS-00D

Offline david1958

  • PCLinuxOS Tester
  • Full Member
  • *******
  • Posts: 206
  • Lovin Linux
    • Ed's Cuckoo Uhren
Re: Nepomuk Miscellaneous Stuff
« Reply #10 on: February 21, 2013, 12:39:45 PM »
Quote
(Some people only index their own files in their home directory....)

tks bb. I will do just what you say the home side of things. so on the home side are you suggesting I do not index the temp file also?
To all Windows Users, Quit being Lazy and learn Linux. You'll Love it after you get the hang of it!
FullMonty Release:            2013.04
Kernel-version:    3.2.18-pclos2.pae.bfs
KDE4-version:                        4.10.1
Biostar mother Board A55MH,  CPU chip A8-3807K

8 gig ram

Offline Bald Brick

  • PCLinuxOS Tester
  • Hero Member
  • *******
  • Posts: 6371
  • I'm going South
Re: Nepomuk Miscellaneous Stuff
« Reply #11 on: February 21, 2013, 01:01:31 PM »
Quote
(Some people only index their own files in their home directory....)

tks bb. I will do just what you say the home side of things. so on the home side are you suggesting I do not index the temp file also?

I wouldn't index any tmp folder. They aren't intended for permanently stored files.
Feed the trolls!
They need it!

AMD Athlon 7450 Dual-Core Processor, 7.80 GiB RAM, Nvidia GeForce GT 120/PCIe/SSE2, OpenGL/ES-version: 3.3 0 NVIDIA 295.40, SBx00 Azalia (Intel HDA) soundcard, ‎Logitech B500 webcam, SAA7146 DVB card, HDDs: Seagate 250824AS, Western Digital WD10EAVS-00D

Offline david1958

  • PCLinuxOS Tester
  • Full Member
  • *******
  • Posts: 206
  • Lovin Linux
    • Ed's Cuckoo Uhren
Re: Nepomuk Miscellaneous Stuff
« Reply #12 on: February 21, 2013, 01:09:06 PM »
do i have the right version is my next question. My setup pic does not look like the one he has a snapshot of..
Mine does say
Quote
Nepomuk/Strigi Server Configuration

I did thru the advanced uncheck the temp in the home. root was not highlighted
To all Windows Users, Quit being Lazy and learn Linux. You'll Love it after you get the hang of it!
FullMonty Release:            2013.04
Kernel-version:    3.2.18-pclos2.pae.bfs
KDE4-version:                        4.10.1
Biostar mother Board A55MH,  CPU chip A8-3807K

8 gig ram

Offline Bald Brick

  • PCLinuxOS Tester
  • Hero Member
  • *******
  • Posts: 6371
  • I'm going South
Re: Nepomuk Miscellaneous Stuff
« Reply #13 on: February 21, 2013, 01:38:01 PM »
do i have the right version is my next question. My setup pic does not look like the one he has a snapshot of..
Mine does say
Quote
Nepomuk/Strigi Server Configuration

The picture from Vishesh Handa's page that menotu posted is from version 4.10 of Nepomuk, where Strigi has been replaced by something else that is "arguably technologically inferior, but [...] more maintainable and, for now, provides a better user experience". We still use version 4.9.5 - and even that only if we are fully updated.
Feed the trolls!
They need it!

AMD Athlon 7450 Dual-Core Processor, 7.80 GiB RAM, Nvidia GeForce GT 120/PCIe/SSE2, OpenGL/ES-version: 3.3 0 NVIDIA 295.40, SBx00 Azalia (Intel HDA) soundcard, ‎Logitech B500 webcam, SAA7146 DVB card, HDDs: Seagate 250824AS, Western Digital WD10EAVS-00D

Offline david1958

  • PCLinuxOS Tester
  • Full Member
  • *******
  • Posts: 206
  • Lovin Linux
    • Ed's Cuckoo Uhren
Re: Nepomuk Miscellaneous Stuff
« Reply #14 on: February 21, 2013, 03:18:27 PM »
got ya bb. as long as i got the right version is what matters. I got it set up as you advised, did not do the temp or i have a torrent folder for downloaded movies I do not want to index either for they are not important to keep. got to go bb tks again
david
To all Windows Users, Quit being Lazy and learn Linux. You'll Love it after you get the hang of it!
FullMonty Release:            2013.04
Kernel-version:    3.2.18-pclos2.pae.bfs
KDE4-version:                        4.10.1
Biostar mother Board A55MH,  CPU chip A8-3807K

8 gig ram