Thoth’s Suggested Tags

Thoth’s Suggested Tags is a WordPress plugin that recommends tags by scanning a post and displaying recurring words and phrases as a tag cloud.

Author:Jimmy O'Higgins (profile at wordpress.org)
WordPress version required:3.0
WordPress version tested:3.0.5
Plugin version:1.3
Added to WordPress repository:08-07-2010
Last updated:26-07-2010
Warning! This plugin has not been updated in over 2 years. It may no longer be maintained or supported and may have compatibility issues when used with more recent versions of WordPress.
Rating, %:0
Rated by:0
Plugin URI:http://wiki.github.com/edlab/Thoth-Suggested-...
Total downloads:2 133
Active installs:30+
plugin download
Click to start download

Thoth is a plugin that creates a widget in your "New Post" authoring page and recommends tags for your post based on the content.

Thoth scans the text for tags and associates them to a "tag strength", an integer that represents how appropriate the tag is to recommend based on the post content. This value is determined by the word count of the tag, its frequency in the post, and its count in the wordpress database (number of times it has been tagged in other posts).

How It Works

Every time the user saves a draft or updates a post, Thoth

  • Splits the post content into chunks delimited by stop-characters.

  • Uses a simple k-mer counting algorithm to record every possible phrase in the content and its frequency.

  • Does post-processing on the phrase list and displays the final list as a tag cloud. For more information consult the "Features" section.

Features

  • Tag strength ($strength in the code) is an integer representing the likelihood of the tag being appropriate to the post. A tag's strength is initially determined by its frequency in the post.

  • Stop Words - Filters out phrases beginning or ending with any words in the stop-word list.

  • Tag length - Tag strength is multiplied by the number of words in the tag with a maximum of 3. This means that longer recurring phrases are ranked higher than shorter ones.

  • Pluralization - For every potential single-word tag, Thoth adds a plural suffix 's' and searches for matches in the potential tag list. If a match is found, the tag strength of the singular is transferred to the plural version (e.g. "download" becomes "downloads"). If a match is not found, the singular is used.

  • Capitalization - Capitalization is preserved only for words or phrases that are capitalized in every instance they appear in the post (i.e. they are proper noun/noun phrases). Else, capitalization is removed and the tags strength for capped/non capped are combined.

  • Existing tags - Thoth also retrieves all the tags used in your blog and searches for instances of them in the content of the post. In the case of a match, the tag strength is multiplied by 2 and incremented by the number of times that tag has been used in your blog. This means that if your blog has a unifying theme, certain tags are likely to be reused and will enable Thoth to make better suggestions.