Audio::DB - Tools for generating relational databases of MP3s
use Audio::DB; my $mp3 = Audio::DB->new(-user =>'user', -pass =>'password', -host =>'db_host', -dsn =>'music_db', -adaptor => 'mysql'); $mp3->initialize(1); $mp3->load_database(-dirs =>['/path/to/MP3s/'], -tmp =>'/tmp');
Audio::DB is a module for creating relational databases of MP3 files directly from data stored in ID3 tags or from flatfiles of information of track information. Once created, Audio::DB provides various methods for creating reports and web pages of your collection. Although it's nutritious and delicious on its own, Audio::DB was created for use with Apache::Audio::DB, a subclass of Apache::MP3. This module makes it easy to make your collection web-accessible, complete with browsing, searching, streaming, multiple users, playlists, ratings, and more!
MP3::Info for reading ID3 tags, LWP::MediaTypes for distinguising types of readable files;
No methods are exported.
Metrics for assigning songs to albums: Since Audio::DB processes file-by-file, it uses a number of parameters to assign tracks to albums. The quality of the results of Audio::DB will depend directly on the quality and integrity of the ID3 tags of your files.
Single tracks (those not belonging to a specific album) are distinguished by either undef or the label "single" in the album tag. In this way, all the single tracks for a given artist can be easily grouped together and fetched as a sort of pseudo-album. Of course, since you've ripped all of your MP3z from albums that you own, this shouldn't be a problem ;).
If two or more albums have the same name ("Greatest Hits"), Audio::DB checks to see if the year they were released and the total number of tracks is the same. If so, it thinks they are the same album, and all tracks are grouped together. This works most of the time, but obviously will fail sometimes. If you haven't assigned either of these tags, you'll have one less metric for distinguishing tracks. If you have a better metric for distinguishing tracks, please let me know!
Title : initialize Usage : $mp3->initialize(-erase=>$erase); Function: initialize a new database Returns : true if initialization successful Args : a set of named parameters Status : Public
This method can be used to initialize an empty database. It takes the following named arguments:
-erase A boolean value. If true the database will be wiped clean if it already contains data.
A single true argument ($mp3->initialize(1) is the same as initialize(-erase=>1). Future versions may support additional options for initialization and database construction (ie custom schemas).
Title : load_database Usage : Creating a database by reading the tags from MP3 files: $stats = $mp3->load_database(-dirs => ['/path/to/MP3s/'], -tmp => '/tmp', -verbose => 100); Creating a database from a flat file of file information : $stats = $mp3->load_database(-files => ['/path/to/files/'], -columns => '[columns in file]', -tmp => '/tmp', -verbose => 100); Creating a database from the iTunes Music Library.xml file $stats = $mp3->load_database(-library => '/path/to/iTunes\ Music\ Library.xml', -verbose => 100); Function: Parses mp3s and loads database Returns : Hash reference containing number of artists, albums, songs, and genres processed. Args : array of top-level paths to mp3s; path to tmp directory, verbose flag Status : Public
load_database is a broad wrapper method that provides simplified access to many Audio::DB less-public methods. load_database expects an array of top level paths to directories containing MP3s to load. The second required parameter is the path to a suitable /tmp directory. Audio::DB::Build will write temporary files to this directory prior to doing bulk loads into the database.
The optional -verbose flag will a variety of messages to be displayed to STDERR during processing. The value of -verbose controls how frequently to display a message during song processing.
Instead of reading the tags directly, a flat file or files containing the ID3 tag information can be read. This is particularly useful, in part for offline files that have been cataloged with utilities like MP3Rage. Furthermore, I've found that the MP3::Info modules that Audio::DB::Build relies on isn't as robust at reading tags as other applications. The path to individual files or directories contain batches of these files should be passed in as an anonymous array. A second parameter, columns, should also be passed showing the order of the fields in the file. Minimally, the file should contain album, artist, and title. The following column names should be adhered to:
title => song title artist => performing artist album => containing album track => song track number total_tracks => total tracks on album duration => [optional] formatted string of song duration seconds => [optional] song duration in seconds bitrate => [optional] integer. The bitrate of the song samplerate => [optional] sample rate of encoding comment => [optional] song comment filename => [optional] duh. filesize => [optional] file size in kb filepath => [optional] absolute file path tagtypes => [optional] ID3 tag types present fileformat => [optional] file format channels => [optional] number of channels year => [optional] year of the album rating => [optional] user rating playcount => [optional] song play count playdate => [optional] date song last played dateadded => [optional] date song added to collection datemodified => [optional] date song information last modified
Title : update_database Usage : $mp3->update_database(-dirs =>['/path/to/MP3s/'], -tmp =>'/tmp', -verbose => '/100/'); $mp3->update_database(-files =>['/path/to/files'], -columns =>'[columns in file]', -tmp =>'/tmp', -verbose => 100); Function: Parses new mp3s and adds them to a pre-existing database, Returns : true if succesful Args : array of top-level paths to new mp3s; path to tmp directory Status : Public
<B>update_database<B> accepts the same parameters and is a similar in function to load_database except that it takes a path to new mp3s and adds them to a preexisting database. The artist and album of these new files will be checked against those already existing in the database to prevent addition of duplicates. Duplicate songs, however, will be added. This is a feature, since you may want multiple copies of some tracks. It's up to you in advance to remove duplicates if you don't want them listed in your database. See the section below "Appending To A Preexisting Database" for more information on using this method.
Like load_database, update_database can read information directly from flat files instead of the MP3s themselves. See load_database for more information.
Audio::DB;:Build contains several additional public methods that you are welcome to use if you'd like greater control over file parsing and database loading. In the normal course of things, you probably will not need to use these methods directly but are described for completeness.
Title : cache_song Usage : $mp3->cache_song(-full_path=>$full_path,-file=>$file); $mp3->cache_song(-song=>$song); Function: Parses new mp3s and adds them to a pre-existing database Returns : true if successful Args : a pre-processed data hash arising from one of the Parse modules Status : Public
cache_song accepts the filename and full path to a file to be processed. It makes seperate calls to MP3::Info to extract ID3 tag info. Once extracted, song information is checked against the database to determine if the artist or album have been seen before, adding the song to that artist or album or inserting new artists / albums into the internal temporary data structure as required. Finally, the song is added to this structure.
Alternatively, cache_song can be passed a single tab-delimited line of data that holds the relevant information. See load_database for more information and using this interface.
Title : get_couldnt_read Usage : $mp3->get_couldnt_read() Function: Fetch a list of files that could not be read Returns : Array reference of files whose tags could not be read Args : none Status : Public
Title : get_stats Usage : $mp3->get_stats; Function: Get some info on files loaded Returns : Hash reference containing the number of artists, albums, genres, and songs loaded into the database. Args : none Status : Public
There are a number of private methods, described here for my own sanity. These methods are not part of the public interface.
Title : _establish_counters Usage : $mp3->_establish_counters Function: Used to determine the highest values for keys before adding new data to the database. Returns : Hash reference containing the number of artists, albums, genres, and songs loaded into the database. Args : none Status : Private
Title : get_tags Usage : $mp3->get_tags(@args); Function: Fetch and processes raw ID3 tags from files Returns : Hash reference of parsed tag data Status : Private
_check_artist_mem _check_album_mem _check_genre_mem _check_artist_db _check_album_db _check_genre_db Title : _check_*_mem or _check_*_db Usage : $mp3->_check_album_mem($artist); Function: Checks for the existence of the current tag Returns : ID of the appropriate album, artist, genre, if it already exists Args : artist, album, or genre, as appropriate Status : Private
The _check_* methods check for the pre-existence of the current artist, album, or genre for the file currently being examined. The two variations, *_mem and *_db, control whether this look up is done against the internal data structure in memory or against a pre-existing database.
_check_album_* is necessarily more complex. It attempts to assign songs to albums based on both the year and total number of tracks. See "Caveats" above for more information.
Title : _dump_data_structures Usage : _dump_data_structures Function: Wrapper around all the _dump_* subroutines Returns : true if succesful Args : none Status : Private
_dump_artists _dump_albums _dump_songs _dump_genres Title : _dump_* Usage : _dump_artists() Function: Create temp files for loading into the database Returns : true if succesful Args : none Status : Private
These methods dump out the appropriate data from the internal data structure into the temporary directory path. Some dump multiple tables:
_dump_artists : artists and artist_genres tables _dump_albums : album and album_artists tables _dump_songs : songs table _dump_genres : genres table
Title : _load_db Usage : _load_db() Function: Loads data from temporary tables into the database Returns : true if succesful Args : none Status : Private
Title : _stuff_album Usage : _stuff_album() Function: Stuffs the current album into the internal data structure Returns : true if succesful Args : none Status : Private
Audio::DB::Build builds a large internal data structure as it reads each file. The data strucutre is:
Lookups - For quick lookups to see if an artist, album or genre has been encountered $self->{lookups}->{artists}->{$artist} = $artist_id; $self->{lookups}->{albums}->{$album} = $album_id; $self->{lookups}->{songs}->{$song} = $song_id; $self->{lookups}->{songs}->{$genre} = $genre_id; Counters - for tracking the number of artists, albums, songs, and genres $self->{counters}->{artists}= $total; $self->{counters}->{albums} = $total; $self->{counters}->{songs} = $total; $self->{counters}->{genres} = $total; $self->{couldnt_read} = [ files that could not be read ];
The main data structure of artists, albums, songs, and genres I know, I know, its partially denormalized.
$self->{artists}->{$artist_id} = { artist => artist name, genres => { $genre_ids => total }, albums => { $album => $album_id } }; $self->{albums}->{$album_id} = { album => $album, # For tracking multiple genres per album genres => { $genre_ids => ++ }, # For tracking multiple artists per album (compilation CDs) contributing_artists => { $artist_id => ++ }, # Internal measure for distinguishing same-named albums total_tracks => total number of tracks, year => year released }; $self->{songs}->{$song_id} = { title => song title, artist_id => artist_id, album_id => album_id, genre_id => genre_id, track => track number, total_tracks => total tracks on album, duration => formatted duration, seconds => raw seconds, bitrate => song bitrate, samplerate => sample rate, comment => id3 comment, filename => filename, filesize => filesize, filepath => filepath, tagtypes => types of ID3 tags found, format => MPEG layer, channels => stereo / mono / joint, song_year => year (also with album), rating => user rating, playcount => play count } $self->{genres}->{$genre_id} = { genre => $genre }
This module implements a fairly complex internal data structure, which in itself rests upon lots of things going right, like reading ID3 tags, tag naming conventions, etc. On top of that, I wrote this in a Starbucks full of screaming children.
Need a resonable way of dealing with tags that can't be read
Lots of error checking needs to be added. Support for custom data schemas, including new data types like more extensive artist info, paths to images, etc.
Keep track of stats for updates. Fix update - needs to use mysql (these are the _check_artist_db routines that all need to be implemented)
Robusticize new for different adaptor types
Add in full MP4 support make the data dumps rely on the schema in the module put the schema into its own module
Copyright 2002-2004, Todd W. Harris <harris@cshl.org>.
This module is distributed under the same terms as Perl itself. Feel free to use, modify and redistribute it as long as you retain the correct attribution.
Chris Nandor <pudge@pudge.net> wrote MP3::Info, the module responsible for reading MP3 tags. Without, this module would be a best-selling pulp romance novel behind the gum at the grocery store checkout. Chris has been really helpful with issues that arose with various MP3 tags from different taggers. Kudos, dude!
Lincoln (Dr. Leichtenstein) Stein <lstein@cshl.org> wrote much of the original adaptor code as part of the l<Bio::DB::GFF> module. Much of that code is incorporated here, albeit in a pared-down form. The code for reading ID3 tags from files only with appropriate MIME-types is borrowed from his <Apache::MP3> module. This was a much more elegant than my lame solution of checking for .mp3! Lincoln tolerates having me in his lab, too, even though I use a Mac.
Audio::DB::Adaptor::dbi::mysql,Audio::DB::Util::Reports, Apache::MP3, Apache::Audio::DB,MP3::Info
To install Audio::DB, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Audio::DB
CPAN shell
perl -MCPAN -e shell install Audio::DB
For more information on module installation, please visit the detailed CPAN module installation guide.