The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

TO DO

Name

* Come up with a good name

* YODA (from the image in the talk)?

* "Answers you seek?"

Distribution

Docker

* ulimit for crawling processes

Windows distribution

* .msi package including Perl, Apache Tika and Elasticsearch

* Debian package

Pages

Search page

* Simple search

* Autocomplete/recommend

Simple (HTML) Results page

* Search images

Result fragment / document rendering

Come up with a concept to render different mime types differently.

Ideally, this would avoid the hardcoding we use for audio/mpeg currently.

Customization

* Auto-session

* Refinement using the last search, if the last search was "recently"

Plack

* Plack-hook/example for /search to tie up the search application into arbitrary websites

Dancer

* ElasticSearch plugin / configuration through YAML

Mojolicious

* ElasticSearch plugin / configuration through YAML

Search multiple indices

Having different Elasticsearch clusters available (or not) should be recognized and the search results should be combined. For example, a work cluster should be searched in addition to the local cluster, if the work network is available.

This calls for using the asynchronous API not only for searching but also for progressively enhancing the results page as new results become available.

Recognizing new versions of old documents

How can we/Elasticsearch recognize similarity between two documents?

If two documents live in the same directory, the newest one should take precedence and fold the similar documents below it.

Java ES plugins

Currently better written in Perl

ES Analyzers

FS scanner

* Don't rescan/reanalyze elements that already exist in Elasticsearch

* Delete entries that don't exist in the filesystem anymore

Video data

Which module provides interesting video metadata?

Audio data

* MP3s get imported but could use a nicer body rendering.

* Playback duration should be calculated

* Also import audio lyrics - how could these be linked to their mp3s?

Playlist data

Playlists should get custom rendering (album art etc.)

Playlists should ideally also hotlink their contents

Test data

* Consider importing a Wikipedia dump

* Some other larger, mixed corpus, like http://eur-lex.europa.eu/

Synonyms

Find out which one(s) we want:

https://www.elastic.co/guide/en/elasticsearch/guide/current/synonyms-expand-or-contract.html

From first glance, we might want Simple Expansion, but Genre Expansion also seems interesting.

We want to treat some synonyms as identical though, like 'MMSR' and its German translation 'Geldmarktstatistik'.

User Introduction

Videos

* Create screencasts using http://www.openshot.org/videos/ or

Code structure

Crawlers

* Create Dancer-crawler - skip the HTTP generation process and reuse App::Wallflower for crawling a Dancer website.

* Create tree-structure-importer

Both IMAP and file systems are basically directed graphs and far easier to crawl than the cyclic graphs of web pages. Abstract out the crawling of a tree into a common module.

* Turn index-imap and index-filesystem into modules so they become independent of being called from an outside shell.

This also implies they become runnable directly from the web interface without an intermediate shell.

* Add attachment import to the imap crawler

Metasearch

Implement metasearch across multiple ES instances