This version adds support for the OBO file format for the Gene Ontology in addition to XML. In the future OBO is likely to be the recommended file type to use for ErmineJ.
We are also taking this opportunity to announce the availability of R support for ErmineJ, with the ErmineR library.
Changes in this version affect the behaviour of the software in a particular gene score situation, where the smallest value is the for the top gene. This is in contrast to the typical inputs which are p-values (for which -log10 transformation is conducted, so large values are ranked highest) or fold-changes or other scores that lead to ranking genes in decreasing order (“bigger is better”). Specifically:
- Fixes a bug in the precision-recall method that caused incorrect results when using such scores.
- A related fix was made to the multifunctionality bias assessments when using such scores.
- Fixes a bug in the display of score distributions in the gene set details window (GUI) when using such scores.
This update mostly affects the command line interface (CLI), but has some impact on the GUI. This includes removing some rarely-used methods, changing some defaults, and renaming some options. This may break (or change the output of) existing scripts using the CLI which is why we have bumped the version number to 3.1.
- The CLI now has an option for limiting analysis for GO aspects such as biological process (as already existed for the GUI)
- The CLI option to save the configuration to a file (-S) now works as intended
- The CLI has an option to set the random seed (-seed) (used for resampling-based approaches)
- CLI and GUI options for turning off multifunctionality assessment are removed (now always true).
- CLI option to set “filter non-specific” is removed (always true)
- CLI multiple test correction methods are renamed “FDR” and “FWE”
- CLI: The “Westfall-Young” multiple test correction method is no longer supported
- Support for alternative annotation file formats has been dropped.
- The options that are no longer supported in the CLI are also ignored when present in configuration files
- Default class size limits are now 20 and 200 (rather than 10 and 100).
- Fix for a bug in retrieving annotation files from Gemma via the GUI.
- Various fixes to the documentation.
- ErmineJ now requires Java 8
- Updated external URLs for GO terms
- Multi-gene-mapping microarray probes listed in Gemma platform annotation files are now ignored by default. This behaviour can be modified through a configuration.
- Changes for compatibility with the new Gemma RESTful API
- Mac OS distribution now a DMG file.
This release fixes several problems reported by users as well as including some documentation updates.
- The “generic” distribution of the software had an error in the configuration causing it to fail to execute.
- The logging configuration was causing some problems for command line use.
- In the details screen, if your “raw data” matrix is missing data matching the rows in your score file, saving the image or data would fail (bug 3898)
- The data matrix reader was skipping the first two columns of data relative to the documented meaning of the “first data column” (bug 3922)
- We reduced use of the term “probe” in favor of the more platform-neutral “element”, to refer to the potential many-to-one relationship between the data and genes.
- There have been some additions and clarifications to the on-line documentation.
This release fixes several issues:
- Quicklists did not correctly deal with genes in your hitlist that are unannotated (are not members of any gene groups).
- Error handling for the “multifunctionality diagnostics” has been improved.
- Problems with selecting score files and poor interaction with the file validation in the Analysis Wizard (bug 3874)
- The ermineJ log file is now stored in your ermineJ.data directory by default, and the location is configurable by editing your ermineJ.properties file. This will help reduce and track down issues some users had with missing log files.
Version 3.0 – (June 2013) Many major and minor new features, numerous bug fixes, and major code rewrites.
Note that this version of ErmineJ is not completely backwards-compatible with previous versions. Your old results files may not load properly, and old settings (erminej.properties) may not be read in completely. Also not that version 3.x files will not work with version 2.x.
- Application package for Mac OSX
- New feature: Overhaul of management of user-defined (non-GO) gene groups. This simplifies the use of schemes such as KEGG. See the help pages for more information. We removed some rarely used (and confusing) features such as the ability to redefine GO groups.
- New feature: Introduction of “gene multifunctionality” features, a unique new feature of ErmineJ. See the documentation for details.
- New feature: Introduction of a new gene set scoring method based on precision-recall.
- New feature: Allow the creation of “projects” that store information on the annotations that were used as well as the data and analysis results (in addition to the existing “save/load results”). This means it is now possible to switch the annotation file without restarting ErmineJ.
- New feature: ‘Quick list’ makes it easy to test over-representation in a list of genes. See the documentation for details.
- The search utilities have been fixed and streamlined.
- Algorithm clarification: Gene that are not in any group are removed from analysis. This should help ameliorate the effects of multifunctionality, since genes that are completely unannotated are less likely to be “discovered” in a assay, and don’t contribute to the analysis in any positive way. This affects the computation of null distributions.
- Tree view enhancements: You can now hide nodes in the GO hierarchy based on significance, making it easier to browser your results. See the documentation for details. You can also hide terms that are empty.
- Algorithm improvement: In previous versions of ErmineJ, groups that were completely redundant with others (containing the same genes) were identified but not handled particularly well. Now redundant groups are exhaustively identified. While these groups are included in the analysis (to avoid confusion and other problems), they are considered as one in multiple testing correction.
- Table view enhancements: You can hide rows in the table to remove clutter due to empty gene sets.
- Improvement: A major cleanup of the way ErmineJ handles gene annotations. A number of lurking problems were flushed out. Perhaps the most serious was that it is not clear whether redundant gene sets were being excluded from analyses (as they should). It is now much easier to find out how many gene sets there are, how many have genes, and how many are redundant.
- Output file format changes: The output is more informative so settings which were not used are not listed. The output also now includes additional columns on multifunctionality.
- Bug fix: Error handling is improved, so mysterious stoppages of analyses should be easier to diagnose.
- Bug fix: Command line tool didn’t correctly deal with correlation method when no score file was supplied.
- Bug fix: Command line tool assumed gene scores were p-value like when validating the “threshold” option.
- Bug fix: Replacing gene URL patterns in Gene Set Details was broken.
- Bug fix: Switching gene score files was probably broken.
- Bug fix: “Obsolete” GO terms no longer displayed in tree or table.
- Bug fix: Affymetrix CSV files work again (NetAffx format changed)
- Bug fix: Webstart startup dialog was not working consistently. This was fixed for good by making the startup dialog part of the main application window.
- Bug fix: Problems with hangs and (related to this) missing log files should be fixed for Windows users.
- Bug fix: MacOSX and Linux graphical interface problems have been ameliorated.
- Numerous documentation updates. Added documentation of how to deal with “down-regulated” vs. “up-regulated” genes ( FAQ). There are quite a few updates to the manual.
- (Behind the scenes) Large parts of ErmineJ’s code has been rewritten to improve future maintainability and modularity. A wide range of other minor bugs have been fixed.
Version 2.1.22 – (November 2010) Changes include:
- Fix CLI bug that caused gene score file to be required even when using the correlation method.
Version 2.1.21 – (September 2010) Changes include:
- Update our GO parser to correctly handle the revised GO hierarchy structure. This only affects the view of GO in the “Tree view”. Thanks for Makio Tamura for pointing out the problem.
Version 2.1.20 – (May 2010) Changes include:
- Another bug fix for ORA to correct the way duplicates genes are handled.
- Custom creation of gene sets through the GUI accepts lists of probes as well as genes.
- The analysis wizard allows you to preview the gene scores you are loading.
Version 2.1.19 – (April 2010) Changes include:
- Change in the way ORA p-value are computed. Prior versions computed the upper tails of the distribution exclusive of the actual number of represented genes in the group. Now probabilities are computed inclusive of this value (“at least this many” instead of “more than this many”).
- The use of configuration files has been changed when running the command line version of ermineJ. This affects users of the command line invoking the classScoreCMD main class. Instead of always reading the ermineJ.properties file from the user’s home directory, the configuration file is ignored unless it is specified, or if the user supplies the
-Goption to start the GUI. Please see the User manual for the CLI for details.
Version 2.1.18 – (July 2009) Changes include:
- Fix broken links on GO file documentation page.
- Update our mailing list address. Old subscribers have been subscribed to the new list
- Fix a bug that kept the
-goption working on the command line.
Version 2.1.17 – (October 2008) Changes include:
- Fix problem with gene set details page not working.
- ‘About’ box was broken
- Affymetrix annotation files for some organisms failed due to missing gene symbol information.
Version 2.1.16 – (September 2008) Changes include:
- Using the ‘ROC’ method with ‘bigger is better’ gave incorrect results.
- GSR scores could be unstable from run to run, depending on the dataset and the parameters set.
- Fix ‘generic’ ermineJ.bat script to use java.exe instead of javaw.exe, and fix stray “\nul”.
- Fix broken link to annotation files on the help pages.
- Help works for webstart.
- Behind the scenes: finally upgraded to maven 2.
Version 2.1.15 – (July 2007) Changes include:
- An error was made in the two previous distributions: an incorrect jar file was included in the windows installer, so some bugfixes were not correctly included. In particular, an issue with how the GO classes were read in is now consistent between all distributions.
- Output file improvement: When ‘Include all genes in output’ is checked, the gene names are sorted.
- Some bugs in the Command Line Interface were fixed, which caused errors when using some option combinations.
- A bug in the software API which rendered it pretty much unusable was fixed.
- There were problems with some command line arguments not being recognized correctly.
- Minor enhancement: when the gene symbols are included in analysis output files, they are sorted by name.
Version 2.1.14 – (July 2007) Changes include:
- Minor bug fix: Probes lacking any annotations are handled a little differently, avoiding an annoying error message. This does have a minor effect on the analysis.
- A variety of cosmetic and GUI bugs have been fixed, primarily affecting Linux Fedora Core users.
Version 2.1.13 – () Changes include:
- Bug fix: If gene scores were negative, the “use best score” option might not work correctly. Thanks to Hubert Rehrauer for the fix.
- Bug fix: Some parent terms were not being included in when inferring annotations. Thanks to Salvatore Micciche for reporting this.
- Command line interface overhauled. This should fix problems some users had with certain option combinations.
- Enhancement: repeated occurrences of the same identifier in the gene score file results in a warning.
- Minor enhancement: GO categories can be viewed even when score file doesn’t match annotation file.
- Various minor cosmetic fixes.
Version 2.1.12 – (August 8 2006) Changes include:
- Added internationalization support for input files (addresses bugs 302, 321).
- Output file format had misaligned columns. Duplicate column names were fixed and an unused column was removed (bug 322).
- Affymetrix annotation files properly parsed in the command line interface (bug 323).
Version 2.1.11 – (July 12 2006) Internal release
Version 2.1.10 – (May 24 2006) Changes include:
- ErmineJ is now licensed under the Apache 2.0 license.
- (Input files) Annotation files from Agilent are supported directly.
- (Dev) Code has been restructured. This will only affect developers using ermineJ code.
- Fixed bug: If you tried to run a correlation analysis without a gene score file set (which is supposed to be irrelevant), the analysis would fail.
- Fixed bug: “Null pointer exceptions” when custom gene score files are missing.
- Fixed bug: It was still possible to get errors when loading raw data files for analysis, this has finally been fixed.
- (Docs) Documention of custom gene sets fixed to reflect correct location of ermineJ.data (in your home directory, not the ermineJ installation directory).
- The command to show user-defined gene sets can sometimes be “confused”. The work-around is to just hit “Ctrl-U” twice.
- The id and name of user-defined gene sets cannot be changed. This limitation should be restricted to GO gene sets (bug 224). The work-around is to edit the file and then reload the gene sets.
- (Minor GUI) In tables shown in wizards, the table columns are resizable and sortable, but this is not indicated by the cursor. (bug 212)
Version 2.1.9 – (Feb 8 2006) Changes include:
- Restricting analysis to specific GO aspects wasn’t working for ROC analysis (fixes bug 285).
- Loading an analysis from disk failed if the raw data file was not set (bug 291).
- You can now save the gene set information in the details view even without loading the raw data (bug 290).
- Small p-values were rounded to zero in the output file (bug 289).
- The path to the annotation file is stored in the output file. This currently isn’t used by the software but is good for user records.
- Command line interface: -F (–format) option (which took an argument) changed to -A. Using the -A option indicates that the annotation file is in “Affymetrix” format. Fixes bug 272.
- The format of the Annotation files has been made more flexible. The GO term field can be delimited with characters other than ‘|’ such as commas or spaces (issue 286).
- The format of the annotation files has been documented (see this page) (issue 287).
- Minor GUI tweaks to improve portability (bug 269).
Version 2.1.8 – (December 20 2005) Changes include:
- Bug fixed that caused “null” error in viewing gene set data under some conditions.
Version 2.1.7 – (November 9 2005) Changes include:
- Command line option to output all genes in each gene set added (“-j”).
- Command line error output is more helpful.
- Result file contains parameters when using command line.
- Bug in handling custom annotation files fixed.
- Correlation score performance improved (again).
Version 2.1.6 – September 2, 2005. Changes include:
- User-defined gene sets were not being used for analysis. There were also related problems with editing and viewing gene sets once created (Fixes for bugs 256, 257 and 258).
Version 2.1.5 – August 12, 2005. Changes include:
- Bug which caused the “cancel” function to fail fixed.
- Reading of raw matrix files performance improvements.
- Correlation analysis dramatically sped up again due to a bug fix. In addition, a number of optimizations have been implemented to futher improve performance of the correlation analysis.
- Bug which caused poor performance of tree view after analysis fixed.
- Bug which left grouping separators in numbers for the gene details matrix output files fixed.
Version 2.1.4. Changes include:
- Fixed bug which caused problems after loading two or more gene score files that had only partial gene lists with respect to the annotation file.(Bug 230)
- Fixed bug which caused error in reading preferences for the “details” view. The file choosers for saving data have also been improved to suggest a file name.
- (File format) The first line of a loaded gene score file is now ignored (e.g., it is treated as a header), as documented in the help. This was causing spurious warnings about mismatched names (bug 229). The warnings given are also more informative.
- (Performance) Bug which caused unnecessary slowdown of correlation analysis fixed. It’s still slow, but usable.
- (Performance) Sorting of details view when sorting by gene symbol is now as fast as for the other columns (bug 195).
- (Documentation) New FAQs and more guidance on setting parameters and choosing methods.
Version 2.1.3. Changes include:
- A number of bugs in the display of user-modified or user-defined gene sets have been fixed.
- When modifying a GO gene set, you are no longer allowed to change the name or description. You can still change the members of the gene set.
- New options in gene set popup menus to manage custom gene sets.
- Add Gene Set menu item to reload the user-defined gene sets from disk (keyboard Ctrl-E).
- Both the data file used and the gene score file can be changed from menu items as well as from the analysis wizard.
- Additional bug and feature fixes (including issues 170, 208, 214, 215, 218, 219, 220, 222, 223).
Version 2.1.2. Changes include:
- New Feature: Receiver operator characteristic (ROC) method now available from all interfaces.
- Fixed Issue 210: Failure to save results files for some situations.
- Fixed Issue 204: default for “Larger scores are better” should be ‘false’.
- After an analysis, the results are checked for “reasonableness”. If they are not, the user is advised to check the validity of the analysis settings. Unreasonable results would be p-values that are all zero, for example.
- Inconsistent behavior of “next” and “back” buttons in Analysis Wizard fixed.
- Errors during analysis now result in less silent death.
- Some other minor issues resolved.
Version 2.1.1. Fixes bugs which caused incorrect parsing of gene descriptions and resulted in multiple custom gene set directories. All users should upgrade. (issues 198,199)
Version 2.1. Changes include:
- Gene sets and their results are now available in a new tree diagram.
- URLs for gene links in the details view are now modifiable.
- You can change the data set you are viewing in the details view.
- Result sets (‘runs’) can be renamed by using the context menu in the table view.
- A feature has been added to find gene sets by genes, in addition to by gene set name.
- The saved output files can optionally contain the names of the genes in each gene set.
- The GO aspect(s) to be analyzed can be controlled.
- New tooltips have been added, to display the aspect and definition of a gene set
- GO and gene annotation files can be ZIP or GZIP compressed.
- Logfile is saved for all interactive runs of ermineJ, and is viewable from the “help” menu.
- For programmers, a simplified API is available to help integrate ermineJ functionality into your code.
- Further improvements and updates to the documentation.
- Program startup time has been improved, fixing slowdown introduced in 2.0.4
- Assorted minor bug fixes and improvements, too many to list.
(not released): Version 2.0.5. Annotation files can be Affymetrix CSV files. Documentation has been improved. A number of other minor fixes and improvements, primarily affecting the GUI.
April 27 2005: Version 2.0.4 released. Annotation files no longer need to contain lists of all parent terms, these are added b the software. This means initializing the software is a little slower but makes it much easier to use third-party annotation files. A number of other minor issues and bugs were also fixed.
April 7 2005: Version 2.0.3 released. This fixes a few GUI bugs and increases the usability of the ‘load analysis’ feature
January 4 2005: Version 2.0.2 released. This release addresses several minor bugs relating to software usability. The software now handles gene scores that are not p values more gracefully.
December 23 2004: Version 2.0.1 released. This maintenance release addresses a number of bugs, the most serious of which was a bug in the “ORA” analysis that could lead to inaccurate findings in some cases. In addition, the handling of gene scores that are not raw p-values has been made more flexible. Most of the remaining bugs are minor and relate to the usability of the software, but all users should update to this version. There are also many improvements to the documentation.
October 13 2004 Version 2.0 released. This fixes several GUI bugs that affected use under Linux and Mac, as well as some other minor bugs and documentation improvements.
September 28 2004 Version 2.0b7. This further improves documentation and fixes bugs, including one that caused gene set probe numbers to be displayed incorrectly when data had been filtered.
September 22 2004 Version 2.0b6 released. This fixes some major bugs in the visualization component of the software, and includes some improvements to the documentation and error messages. New feature: some methods for speeding up the resampling-based analysis have been implemented. These are particularly helpful for the correlation-analysis. The new methods use some approximations that are documented here.
August 18 2004 – Version 2.0b5 released. This introduces some minor new features and fixes several bugs, one of which are serious affecting the correlation analysis. In addition, the advertised command line tool is now operational.
August 12 2004 – Version 2.0b4 released. This is a maintenance release that fixes a number of bugs and issues, some of which are serious. All users should update to this version.
July 20 2004 – Version 2.0b3 released. This is an extensive rewrite of the user interface, with numerous new features, including the ability to define your own gene sets and visualize the analysis results. This is a late-beta release as we iron out bugs and performance issues.