by David.Chiang@SageNResearch.com

The brand-new GEMYNI(tm) platform for SORCERER iDAs, designed for mass spec data mining, enables rapid algorithm development, deployment, and automation using R, the popular open source programming language for statistics and data science.

Bioinformaticians researching new algorithms will publish more quickly by focusing only on the R module rather than the whole application. Biopharma labs with specialized needs, commonly automation tools to characterize drug compound variants, can develop such tools in-house or with our help. Biotech startups relying on proprietary algorithms can keep the secret sauce in their own R code, yet benefit from the rest of the robust SORCERER Enterprise system. Specialty quantitation, often poorly addressed by off-the-shelf or academic software, can be built and optimized using abundant R math libraries for area-under-curve and 2D peak extraction calculations.

As an internal test, I was able to prototype a next-generation algorithm within 3 weeks, one that utilized the Support Vector Machine statistical model, used in UW’s Percolator algorithm, to combine multiple search engines, SEQUEST 3G and the binomial score (based on Max Planck’s Andromeda), including fragment mass errors, to derive a modified XCorr score that is plug-in compatible with the current SORCERER workflow. This will significantly improve results for data from newer instruments like the Q Exactive using the current workflow, and without ad hoc search engine tweaks. (Watch this space for technical details later.)

An important point is how quickly this algorithm could be prototyped and deployed in the production workflow, by partitioning the project between the mathematical manipulation in R and the input/output handled by standard MUSE library functions. To be sure, some tuning will be needed as the algorithm is stress-tested with different data-sets. Nevertheless, such an effort would take far longer to develop as a standalone program, and far more difficult to maintain, than as a script within the GEMYNI platform.

The new GEMYNI platform for rapid algorithm development is the latest industry-leading innovation introduced by SORCERER iDA products. SORCERER iDAs (Integrated Data Appliances) increase overall research productivity by both reducing or eliminating low-value distractions (software maintenance, server administration) and making high-value activities simpler (deep data analysis, algorithm development, data mining). They are designed for mass spec labs with more than $1M in research capability and that can improve productivity by at least 10% with an automated, maintenance-free, server-based data analysis platform.

The first version of GEMYNI (v4.3), a “Linux-only” version, and the Sorcerer-Score sample script will be formally introduced at ASMS in June. They will be available for all late-model SORCERER systems as part of the TSP maintenance plan.

To find out more, please contact us at: sales@SageNResearch.com. Hope to see everyone at ASMS!

References:
Cox, J. et al. (2010) Andromeda: A Peptide Search Engine Integrated into the MaxQuant
Environment. J. Proteome Res., 10, 1794-1805.
Käll,L. et al. (2007) A semi-supervised machine learning technique for peptide
identification from shotgun proteomics datasets. Nat. Methods, 4, 923–925. [Percolator]
Käll,L. et al. (2008) Assigning significance to peptides identified by tandem mass
spectrometry using decoy databases. J. Proteome Res., 7, 29–34. [q-value]

CARENotice – Comprehensive Assessment and Response Event

As part of our ongoing Total Support Program, this CARE notice is being sent to all friends of Sage-N Research:

Based on our extensive research, we recommend that all in-warranty (TSP) customers contact us in order to upgrade to the latest v4.6 of the ReAdW module. Enhanced features of this most current version include:

  • Correctly identifies QExactive instruments
  • Skips zero intensity peaks (e.g., smaller files)
  • Retains the proper precursor charge state information for spectra in the MzXML file

To schedule this software upgrade, please contact us at:  Support@SageNResearch.com

**This CARE notice is your TSP at work. TSP encompasses not only hardware and software support, it also contributes to our ongoing research and development to provide the most cutting edge products and bring you the tools you need to refine your research.

To renew or re-instate your TSP, please contact us at: Sales@SageNResearch.com

Members of Sage-N Research’s Total Support Program (TSP) will want to read the following courtesy advisory bulletin carefully and contact our support team if deemed necessary.
Note:  If you are currently out of TSP, or not using newer hardware servers (e.g. Fujitsu), the following message will be of the utmost importance to you as well:

By far the most common problem we see in the field is hard drive failures. Given there are moving parts inside hard drives which spin at a very high rate of speed for years on end, it is common for these drives to fail at some point in time. This can be a catastrophic failure resulting in a total loss of your data in the event more than one drive fails.  By monitoring and taking the correct course of action when a drive fails, data loss can probably be avoided.

SORCERER ™ systems are configured with a disk technology called RAID that allows the system to continue to function normally in the event of a single hard drive failing. It is possible that your SORCERER system may have a failed hard drive right now and you may not even be aware. It is crucial that in the event a single drive has failed, that drive must be replaced as soon as possible. If it is not replaced, another drive failing will lead to a total loss of data.

How do you know if a drive has failed or not?  The newer Fujitsu systems come equipped with a few options that make checking for hard drive failures easy and they allow automated email notification of hardware failures:

  • A quick way to tell if a hard drive has failed is by checking the lights on the front of the system. On Primergy-based SORCERER 2 and Lab systems, the hard drive bays are located at the bottom front of the tower. For Enterprise systems, they are in the separate disk subsystem. You should see a green light on every active drive in the system. If any drive has a red light on, that indicates the drive has failed. Please contact our support team immediately if this is the case at  support@sagenresearch.com.
  • SORCERERs built on newer server platforms also feature hardware monitoring software which can send out automated email alerts when a hardware problem arises. Given that most people do not physically check the lights on their system daily, we highly recommend all customers set up the alerting software if you have not already done so.  Please contact our support team at  support@sagenresearch.com for assistance on setting up the alerts. As well, we would like to offer our TSP members complimentary monitoring of email alerts by routing them to our support address.
  •  For those experienced in using the Linux command line: If you run the command “PrimeCollect” as the root user, the Fujitsu system can generate a diagnostics report for your system. You may upload your file at:  http://dropbox.yousendit.com/SageN and we will get back to you once we have analyzed the results.

We hope that this advisory bulletin will give our TSP members the opportunity for peace of mind that their SORCERER system is running smoothly, their data is protected and that the Sage-N Research support team is at your finger tips.

We will continue to strive to offer our customers the very best in available advanced hardware features for performance, reliability and expansion by using enterprise-grade (vs. consumer-grade) components that are designed for years of continuous 24/7 peak operation.

**If you are not currently covered under our TSP maintenance plan, or if your SORCERER hardware is something other than the newer server (e.g. Fujitsu), please contact info@sagenresearch.com to discuss options for rejoining TSP and/or upgrading your hardware.

Tags: , , , ,

The newest release of Sorcerer Proteomics Edition (Sorcerer PE) software is now available in beta to supported Sorcerer customers. It introduces several new enhancements:

  • New native file formats based on MS2 and SQT for greater data handling efficiency
  • The obsolete DTA and OUT formats have been removed from the internal flows of Sorcerer but are still available for import and export to legacy applications
  • Improved system performance and efficiency throughout.
  • Support for the multiple biosample feature of Scaffold — spectra files can be pre-grouped in the search to become separate biosamples in the Scaffold file
  • Built-in processing for Raw files from Thermo LTQ Orbi Velos and Q Exactive mass spectrometers
  • Now bundles the most recent TPP 4.5.2 software
  • Support for Scaffold V3.4

Release 4.2 is the latest in the V4 series of Sorcerer Proteomics Edition (Sorcerer PE) software, and is immediately available for beta testing, which means that all the new features have now been implemented and tested internally, but that the software has not yet received full testing in real-world conditions. If you would like to try out the new features, then please contact support@sagenresearch.com to request the new beta software. If you are currently using version 3.5 or earlier releases, you will also need to enter new license keys.

Sorcerer PE V4′s NEW FILE formats OFFER greater PERFORMANCE

This release completes the transition to new file formats that was begun with v4.1 (which still used the old formats behind the scenes), and now all of Sorcerer PE’s internal use of the legacy Sequest DTA and OUT file formats has been replaced by the more modern MS2 and SQT formats for representing MS2 spectra and peptide matches respectively.  In these days in which tandem mass spectrometers can generate tens of thousands of spectra every hour, it is very inefficient to represent each data item in a separate file — there is a substantial overhead in opening and closing each file, and transfers in a network environment such as Sorcerer uses are typically slow. It also wastes a lot of disk space. So using MS2 and SQT natively throughout the Sorcerer search engine greatly improves the overall performance of the system.

However, although they work well internally to the system, we don’t recommend these formats for an end-user to work with directly —  the formats are neither standardized, amenable to upstream and downstream processing tools, nor easily generalized to other search algorithms. Rather, for input to and output from Sorcerer, we’ve standardized on mzXML for spectra and pepXML for peptide matches as interchange formats that are more general and with extensive community support. PepXML is now generated by default, even if you do not select TPP postprocessing overall. Of course, Sorcerer supports  other formats, too, such as Thermo’s Raw files, but these will be converted to pass through the standard formats — mzXML in the case of Raw files.

One more word about DTA and OUT file legacy support: these files are no longer directly supported by the Sorcerer PE search engine, but you can still import DTAs, and we will have a script to generate OUT files from pepXML, if your downstream processing requires them. Please note that there is one spot in the TPP suite that expects .out files, and that is the “spectrum” hyperlink in the Peptide Viewer, which actually brings up a view of the out file, if any. Most of the scores, masses etc. for the spectrum match that are presented in that view can be added as columns directly to the peptide report. But if you do want to view these OUT files and you don’t mind the extra overhead, then consider running the OUT file compatibility script as a post-processing step. Please consult support@sagenresearch.com for further assistance with the compatibility script.

Multiple Biosample support for Scaffold

One common request from our clients who are keen Scaffold users is for enhanced support in the Sorcerer-Scaffold integration that can take advantage of Scaffold’s ability to group data into different biosamples, corresponding to different columns in the Scaffold view. We’re happy to announce a new feature in the Sorcerer PE software that speaks to this. The way it works is very simple, and requires only a minor difference to the way you have always searched data on Sorcerer.

Previously, if you selected multiple items for searching in the Web GUI, they would all be searched together and would wind up being a single biosample in the Scaffold file. Now, any separately selected item — either a single spectra file, or a folder of several files — will become its own biosample. Typically, the way this is used is to pre-group raw files in subfolders of the search data folder, and each of those subfolders will become a separate biosample, so long as they are each individually selected from within the search data folder. If, however, you select the search data folder itself at the top level, then all its contents will become a single biosample.

Of course, the existing method of working with Scaffold Desktop to add new biosamples based on merging with another Scaffold file is still available, so you can choose whichever method is more suitable for your needs.

Do be aware though, that searching more data in one run will add to the load of the Scaffold analysis. The system resources that Scaffold needs, particularly in terms of memory, is a function primarily of the number of files, and the number of spectra represented by those files. We recommend that any Sorcerer that is used for intensive Scaffold analysis should be upgraded to a minimum of 24GB of system RAM, and that users should discuss their Scaffold analysis usage and possible upgrades to their system with Sage-N support in order to ensure the best performance.

New method for extracting Thermo RAW files in Sorcerer

When Thermo introduced XCalibur 2.1 and 2.2 supporting the Orbi Velos and Q-Exactive instruments, incompatibilities in their libraries meant that the method of extracting spectra from Raw files that Sorcerer then used suddenly stopped working. In response to this, Sage-N Research developed a solution based on a new software method, but that was Windows-specific, and not well suited to other platforms such as Linux. Nevertheless, at the cost of some complexity, particularly in terms of installation, we made it work on Sorcerer, and once again had an integrated flow with Sorcerer PE for XCalibur 2.1 and above.

Now we have implemented an alternative approach, based on a method developed by Dr. Patrick Pedrioli at the University of Dundee, that allows that Sorcerer’s built-in extraction software to be used successfull with the latest XCalibur libraries. It is a lot easier to deploy on Sorcerer than the Windows-based solution , and just requires a few tweaks that Sage-N customer support can easily guide you through or do remotely. This method is now the default flow for the Sorcerer PE 4.2 release.

The Windows/msconvert method remains available for qualified customers who have the requirement to use its different feature set.

New versions of TPP and ScaffolD Software

The version of the bundled Trans-Proteomics Pipeline (TPP) software has been updated to the most recent 4.5.2 software, which provides several new enhancements and bug fixes. Also, the most recent version of Scaffold, V3.4, is now supported. Licensed users may obtain this software at the Proteome Software download site.

Other Sorcerer PE V4 enhancements

The new release rolls up other enhancements from earlier V4 releases including:

  • The SEQUEST 3G scoring module with new features to improve the sensitivity and thoroughness of peptide searches.
  • A new Web API for submitting and getting results from Sorcerer searches over the network has been implemented to help developers use Sorcerer as a search engine within their programs and scripts.
  • A component design for the Sorcerer-as-a-platform architecture, co-existing with other life science analysis software
  • Enhancements to the MUSE scripting framework to allow more powerful scripts to customize Sorcerer searching.

Please review an earlier posting for further details of these and other enhancements in Sorcerer PE V4.

 

Tags: , , , ,

As part of our growth plan, we are expanding Proteomics into Microbiology. We recently signed an exclusive agreement with the U.S. Army Edgewood Chemical and Biological Center (ECBC).

This license allows for the integration of the ECBC Agents of Biological Origins Identification (ABOID) system into our existing SORCERER ™ proteomics platform, enabling rapid and cost-effective detection and identification of microorganisms.

ABOID is a broad-based technology that is well-known for its role in studying the honeybee Colony Collapse Disorder. However, it has reported success in rapid identification of pathogens in human infections, food-borne illness, and other fields.

Read more here:

http://www.ecbc.army.mil/news/Patent_Licensing_Agreement_signed_at_ECBC.html

Release 4.1 is an update to V4.0, which was only released as beta software to a limited number of users, so this release will be the first general release in the Sorcerer PE version 4 series.   The release is currently entering a beta-testing period, following which (probably in late summer), it will be made available to Sorcerer customers with active support arrangements, as well as installed on newly purchased Sorcerer systems.

This release contains enhancements in many different areas of the Sorcerer software:

  • The SEQUEST 3G scoring module has new features to improve the sensitivity and thoroughness of peptide searches.
  • The data flows for Sorcerer processing have been rearchitected to use MS2 and SQT data formats instead of the legacy SEQUEST DTA and OUT file formats.
  • As a solution for the issue of extracting from recent RAW files, an interface has been developed within the Sorcerer software to connect to a separate Windows system and to remotely run ProteoWizard’s new MSConvert extractor with instrument -specific libraries
  • The bundled version of Trans-Proteomic Pipeline software is updated to V4.4.1, which offers multiple enhancements.
  • The new Sorcerer software now supports Scaffold V3.1.2, with new features in TIC quantitation and batch file merging
  • The Scaffold flow has also been reworked on the Sorcerer side, enabling users to identify multiple biosamples for Scaffold in a single search.
  • A new Web API for submitting and getting results from Sorcerer searches over the network has been implemented to help developers use Sorcerer as a search engine within their programs and scripts.
  • This software release has been designed as a component for the Sorcerer-as-a-platform architecture, co-existing with other life science analysis software
  • Enhancements to the MUSE scripting framework to allow more powerful scripts to customize Sorcerer searching.

Read the rest of this entry »

Tags: , , ,

Many of you are asking the question, what do we do about quantitation, and how can Sage-N Research help in this area?

Our goal at Sage-N Research is to provide you with a complete proteomics platform.

Quantitation is an important part of the overall workflow, and can be used for both differential protein expression and protein characterisation applications.

After looking at different software in the market, we found that the Nonlinear Dynamics’ Progenesis LC-MS is the best solution available.

Progenesis LC-MS is a data analysis program that helps you to find and quantify the proteins showing interesting behaviour in your label-free samples. It can be used for both differential protein expression and protein characterisation applications.

Progenesis LC-MS quantifies peptides and proteins independently of identification, thus ensuring users capture all of the interesting protein data in their experiments. The software is platform independent and will integrate with a wide range of instrumentation.

For more information on Progenesis LC‑MS and its approach to data analysis, you may find the following links useful

If you already own Progenesis LC‑MS, but want to know about the features in the latest release, please see the FAQ, “What’s new in the latest version?

Overall we think the approach used by Nonlinear Dynamics can greatly assist you in your workflow.

Mark Your Calendars! Sage-N Research
User Group Meeting ASMS – Denver
June 4th 2011!

The meeting is open to in-warranty Sorcerer customers and by invitation only. Pre-registration is required. A buffet dinner and refreshments are being provided, and there will be a drawing for customer door prizes. We will as usual have a ultra-cool door prize! (But make sure you come on time for the best chance to win!)

As usual, we will have great speakers, and also have training talks on the new SEQUEST 3G and the new SORCERER Proteomics Edition Software.

Date: Saturday Evening, June 4, 2011
Time: 5 PM to 8:30 PM
Address: Sheraton Denver Downtown Hotel,1550 Court Place, Denver, CO 80202, (303) 626-2517

Important Note: We are meeting on Saturday this year!

We have recently created a Marketing Partnership with  GenoLogics.  We have found that many of our clients were asking for ” front-end” solutions. And have found that Genologics is one of the best people in this area.   GenoLogics solutions offer the flexibility of a custom in-house built LIMS, with all the benefits of a commercial LIMS solution. Their Solutions are built on a scalable informatics platform that adapts to constant change, and can expand to multiple labs. GenoLogics is the proven industry leader in LIMS and data management for next generation sequencing.  For more information on our solutions for proteomics, contact us directly or visit www.genologics.com.

Sage-N Research Marketing

Tags: , ,

by David.Chiang@SageNResearch.com

Proteomics technology is now a robust discovery tool, at least in capable hands with the right tools, for characterizing post-translational modifications such as phosphorylation, right alongside gene expression and cellular imaging for tumor and stem cell research.

However, the complexity, scale, and criticality of the data from a modern mass spectrometer such as an Orbitrap Velos are well beyond the capability of desktop PCs and require specialized infrastructure IT solutions.

When losing data becomes catastrophic rather than merely annoying, it is time to move beyond PCs into robust infrastructure solutions, such Sage-N Research’s SORCERER Enterprise system. Unlike traditional business-oriented IT systems, the SORCERER Enterprise system is optimized for the large multi-gigabyte data files of proteomics research.

Robust servers and storage systems provide the needed capacity, reliability, and throughput for storing and analyzing proteomics mass spec data that inexpensive PCs cannot provide. For example, a typical throughput of 300GB of raw data per week for a single mass spec will fill up a PC in less than a month. As well, the lower grade disk drives used in cost-sensitive, consumer-oriented PCs and external USB drives can lead to costly data loss and system downtime.

In addition, the nature of the data analysis needed for proteomics is changing, as it becomes more akin to hedge fund data mining than an administrative assistant running an Excel spreadsheet. This is especially true for quantitation and ETD data analyses where the field has not settled onto a de facto one-size-fits-all methodology, and where some semi-customization of the analysis to query and adapt to a particular data-set will be necessary. This is why the large-scale SILAC papers are always done by research groups with their own bioinformatics resource, and why just about any off-the-shelf software you can download or buy will probably not work well for your needs without some customization.

Why does quantitation or ETD software need to be semi-customized? Read the rest of this entry »

Tags: , , ,

The meeting is open to in-warranty Sorcerer customers and by invitation only. Pre-registration is required. A light buffet and refreshments are being provided, and there will be a drawing for customer door prizes. We will have the brand-new, ultra-cool Apple iPad as our door prize! (But make sure you come on time for the best chance to win!)

We are privileged to have Profs John Yates (Scripps) and Steven Gygi (Harvard) confirmed to give a talk. We will also have training talks on the new SEQUEST 3G and the new VersaSearch technology on the SORCERER platform.

If you wish to receive a meeting invitation, please contact: tnowak@sagenresearch.com.
Seating will be limited, so reserve your spot today!

Date: Sunday 23rd May 2010
Time: 1:30 PM to 5:00 PM
Address: Hotel Monaco, 15 West 200 South, Salt Lake City, UT 84101 (801) 595-0000
Room: Suite Paris A

Hope to see many of you there!

Tags: , , ,


What exactly is SEQUEST 3G? What can it do for me?
SEQUEST 3G is the latest, next-generation SEQUEST standard specifically developed and optimized for translational proteomic applications involving phosphorylation and other post-translational modifications (PTMs). Defined by Sage-N Research in close collaboration with Dr. John R. Yates, III of the Scripps Research Institute, SEQUEST 3G defines a single common standard for similarity scores, search parameters and statistics, and input/output file formats and is ideal for noisy and poor quality spectra. This robust proteomics search engine maintains compatibility with existing TPP and Scaffold workflows, and incorporates new features of the latest technologies such as electron-transfer dissociation (ETD) and high mass accuracy instruments. SEQUEST 3G also supports multiple scoring and rescoring algorithms such as ASCORE. In addition to improved sensitivity identification for phosphopeptides and low-abundance spectra, the SEQUEST 3G is a significant update that enables future complex functionality and addresses prior version incompatibilities and calculation variances.

How do I obtain SEQUEST 3G?
SEQUEST 3G is being released as an embedded component within the SORCERER v4.0 platform being released in Q1 of 2010. Whereas earlier versions (such as SORCERER-SEQUEST) relied upon a single pass, the new v4.0 platform uses a more efficient “multiple-pass” search engine architecture involving a first-pass with SEQUEST 3G to keep the top 50 candidates and then subsequent rescoring passes with other search engine modules. Current customers will receive an automatic Q1 update as part of their standard maintenance package. This update is conveniently backward-compatible in two ways: the subsequent pass is optional, and can be combined with other rescoring functions in a MUSE scripting environment.

SEQUEST 3G is available for licensing within third-party bioinformatics software suites for a variety of mass spectrometers and technologies. A SEQUEST 3G press release can be found atwww.sagenresearch.com/news_11.html

Read the rest of this entry »

As you may know, Sage-N Research offers three SORCERER platforms — the SORCERER Enterprise, the SORCERER 2, and the SORCERER Lab.

The SORCERER Enterprise
The SORCERER Enterprise is a customizable and secure throughput and biorepository system for aggregate analysis. Provided as a scalable Blade server with integrated life-science optimized storage, the Enterprise system handles up to a few terrabytes of data and is highly scalable using additional computing resources. This platform is ideal for labs that are highly productive or focused on generating continuous high-throughput data, and can analyze 100,000+ spectra/hour — even those with post-translational modifications.

The SORCERER 2
The SORCERER 2 is a mainstream product for most proteomic labs — those with frequent but not continuous high-throughput needs. It is well-suited for modern instruments like the Thermo LTQ and Orbitrap, ABI 4700 and 4800, and Waters SYNAPT G2 and handles 2+ MS spectra/second, or 30,000+ spectra/hour. This system is a stand-alone platform and not scalable like the SORCERER Enterprise.

The SORCERER Lab
The SORCERER Lab is a lower-cost, entry system that is simple, lightweight, and compact. Designed to plug-and-play and require low IT maintenance, it contains the most essential yet basic capabilities, including SEQUEST 3G, TPP, and Scaffold. The SORCERER Lab is still more powerful compared to other PC-based systems. At 10,000+ spectra/hour, it has one-third the analytical throughput of the Discovery system.

Read the rest of this entry »

“All models are wrong, but some are useful.” George E. P. Box

The Truth About Probability Scores

What is Probability Scoring?
Probability Scoring is a popular method of ranking possible peptide sequences that best fits an observed tandem mass spectrum. This can be computed as the primary score in a search engine (e.g. Mascot), or as a second stage re-scoring of, say, the top 10 results from another search engine.

Why is it important?

Of all the different types and styles of similarity scores used in proteomics search engines, Probability Scoring is considered a conceptually easy and simple score to understand. Other scores, notably SEQUEST’s cross-correlation score (XCorr) based on vectors and linear algebra, can be more mathematically rigorous, but require more technical background to understand its calculation.

What does it derive from?
The Probability Scoring functions used by both the Matthias Mann Lab from Max Planck and Steven Gygi Lab from Harvard use the coin-flip model with a biased coin (i.e. the binomial distribution).

For example, if a peptide sequence is predicted to yield N=18 fragment ions, and of those exactly K=6 observed peaks match these, and assume the success probability is modeled as p=0.05 (we will get to that later), then the “random-chance probability” of that happening (i.e. the p-value) is computed as the probability of getting exactly 6 Heads out of 18 Tosses using a Biased Coin, where each coin is modeled to have a 5% chance of yielding Heads.
Read the rest of this entry »

Tags: , ,

Starting with v4.0 software, the Sage-N Research SORCERER platform will provide the 2-stage scoring (i.e. different from 2-stage searching) architecture that generally mimics the current Gygi Lab workflow, which does a first-stage SEQUEST (i.e. SEQUEST 3G starting with v4.0) followed by our open-source MUSE scripting version of the Gygi Lab’s “Bino 5-score”. (This is analogous to the Mann Lab workflow, which generally uses a Mascot first-stage followed by a “6-score” re-score stage, according to private communication with Matthias.)

Users can also modify these re-score modules to incorporate their own scoring functions, such as to accommodate water and/or ammonia losses, incorporate special cleavage rules, or otherwise tune coefficients and parameters.

Anything to keep in mind about Probability Scoring?
Some researchers mistakenly believe that a “probability” is some kind of absolute “word of God” but they are very much a creation of man. Indeed, in science, a probability of an event has more to do with you — or rather your lack of all the relevant information — than the event itself, and is best considered a “degree of confidence” measure based on incomplete information. After all, mass spectrometers do not measure peptide sequence per se, but only a collection of mass/charge ratios from which you infer sequence information.

Probability Scores are simply tools that based on underlying (sometimes hidden) models, which as George Box observed are always “wrong” because they necessarily involve simplifications and assumptions. Probability Scoring, by its nature, tends to have increased specificity but reduced sensitivity. Their Achilles heel is the filtering step – how does one decide which peaks are “real” and which are “noise”, particularly for noisy spectra common for phospho and low abundance peptides? With only a handful of matching peaks to determine the Score, their accurate selection becomes critical.

Therefore, they are best used as a second re-scoring stage of results from a search engine like SEQUEST 3G designed to find specific patterns with significant noise. In addition, it is important to note that p-values are NOT true probabilities, since there is no requirement for such values of competing hypotheses to sum to 1. (See this Proteomics 2.0 blog entry for further discussion: http://proteomics2.com/?p=65 )

Read the rest of this entry »

We have developed an new flow for processing Thermo RAW files that works both with the most recent XCalibur V2.1, as well as with earlier versions. This flow has been giving good results in internal testing, and we are now releasing it for beta testing to any interested, actively supported Sorcerer customer.

Thermo LTQ Velos users will have noticed the major changes to the XCalibur software that were introduced at version 2.1. The installation process is different, and requires a new component called Thermo Foundation, and some of the file names and locations have changed. All of these changes are no longer compatible with the ReAdW program that is used within the CrossOver environment by Sorcerer. One workaround which has been commonly suggested in the Thermo field is to down-rev the XCalibur used on the instrument to V2.0 and to continue using the old software for analysis. This remains a viable option, but with our newly developed solution, it is now also possible to use 2.1 RAW files on Sorcerer.

We are moving to a new spectrum extractor called msconvert (part of the ProteoWizard suite)  which works with a different version of the Thermo libraries, and for which we have developed a new integration in the CrossOver environment. We are offering this as a beta release to our in-warranty customers. This solution  entails a few Linux operations to reinstall CrossOver with the latest release, to configure the required libraries and to install a new Sorcerer workflow script; it is fairly straightforward for people comfortable with the Linux environment, or alternatively, we can do it for you if you give us remote access to your system. Please contact us at support@sagenresearch.com for more information.

Tags:


Prof. Josh Elias (left) of Stanford University receives a thank-you gift from David Chiang after his talk.

Ever wondered about target-decoy searching? Want to gain a better understanding and realistic expectation of this effective tool? SageNResearch’s video “Addressing Peptide Identification Signal-to-noise With Target-Decoy Searching”, given by Professor Josh Elias of Stanford University at our “Translational Proteomics 2.0″ meeting, can help. Dr. Elias is an Assistant Professor in Chemical and Systems Biology at Stanford University, and was part of the Steven Gygi Lab at Harvard Medical School before that. His lab is keenly interested in developing and applying methods to meet the current challenges facing scientists engaged in large scale proteome characterization.

Josh kicked off his talk with a stunning and very powerful visual to hit home the concept of what target-decoy database searching can do — you’ll never look at coffee beans in quite the same way. With this talk, you’ll know how to better find a happy medium for thresholds, smarter ways of designing your filtering criteria, when not to even consider using the method, how to get the most out of (really easy) decoy searching in SORCERER, and what’s so good about partial tryptic searches.

The 30-minute presentation is available at: http://www.scivee.tv/node/15544
To view slides, I recommend using the “full screen” mode. The slide set can also be downloaded as a Powerpoint file.

Tags: , , , , ,


Prof. Alexey Nesvizhskii (left) of University of Michigan receives a thank-you gift from David Chiang after his talk.

If you really want to understand how peptide and protein identification is done, this video talk is a must-see!

Professor Alexey Nesvizhskii of the University of Michigan is one of the co-inventors (with Dr. Andy Keller) of the popular PeptideProphet/ProteinProphet algorithm for turning search engine results into statistically consistent peptide and protein identifications. (This algorithm is also the basis for the popular Scaffold software.)

At the “Translational Proteomics 2.0″ meeting, we were privileged to have Alexey give his insightful talk that reviews the various steps involved in inferring peptide and protein identifications from large spectra datasets.

In this talk, you will learn why False Discovery Rates are preferred over P-values, why you probably should not run more than 4 replicates of a MudPIT experiment, how FDR estimations from decoy differ from Peptide/ProteinProphet, how “The Two Prophets” compute probabilities by curve-fitting the score distributions, how sensitivity and FDR are computed, and the what and why of some advanced TPP options.

The talk is available at: http://www.scivee.tv/node/12671 (45 minutes).

I recommend using the “full screen” mode so you can view the slides, which are also available as a download from the site. (Please be aware that the slideset order is different from that in the presentation.)

(Note: Both Trans-Proteomic Pipeline and Scaffold Batch software are integrated into the SORCERER platforms.)

Tags: , , , , ,

by David.Chiang@SageNResearch.com

Proteomics mass spectrometry is finally sensitive and specific enough for robust translational medicine (at least in capable hands), and holds tremendous promise to revolutionize biology and medicine. For some, it holds the key to incredible research power for decades to come.

However, there is a chasm that continues to grow between the productive and unproductive labs, because too many proteomics practitioners focus too early on low-level issues (i.e. cost, automation, ease-of-use) without first resolving high-level ones (i.e. sensitivity in presence of noise, quality of results, algorithmic suitability).

For many researchers experimenting with a new high-resolution instrument, the most common scenario is to select a workflow based on running a simple protein solution, usually a purified BSA solution or a commercial protein mixture.

Since different workflows will give basically identical protein IDs results for these simple test cases, they may conclude that all search engines are equivalent. While true when there is almost no signal noise, it is largely irrelevant in translational research. In fact, the exact same test will likely show that low-resolution and high-resolution mass specs are equivalent, the lowest quality reagents will suffice, or maybe you don’t have to clean your glassware as often. These are also true when there is little or no signal noise, but again, that is irrelevant for real-world research.

Seeing that there is little difference in protein IDs, some focus on using protein coverage as the sole metric for evaluating search engines. However, this is actually the opposite of what is needed for sensitive discovery proteomics. For example, if you are hunting for new protein biomarkers (especially a “one-hit wonder”), you do not want the protein inference engine tuned to assigning any ambiguous peptides to already found proteins, thereby hiding them from further study.

Not surprisingly, a workflow selected based on low-noise experiments and focused on protein coverage will excel for simple mixtures, but is not sensitive enough to analyze complex mixtures with wide dynamic range, such as in translational research. Scientists will be able to see the abundant peptides and proteins, but probably little else. That is roughly what most proteomics researchers find today, nothing meaningful, but enough of the obvious to not change their methodologies.

The result is that most labs are not getting the value commensurate with their investments in proteomics mass spectrometry. Under the current economic environment, this is both wasteful and dangerous.

Within the academic world, while many proteomics researchers have trouble getting any interest, a select few are swamped and have to turn away collaborators. Within drug discovery firms, while many are staring at their mostly idle mass spectrometers, a select few are running multiple mass spectrometers 24/7 sieving productively through millions of peptides.

So why are the majority of the proteomics research not producing high-value results?

With our access into the world’s top academic and drug discovery proteomics labs, we have a unique bird’s eye view into the answer. (However, like attorneys, we never give out client-specific information.)

Please allow me to share some secrets to your future success.

Read the rest of this entry »

Tags: , , , , , ,


“Translational Proteomics 2.0″ 2009 Users Meeting in Philadelphia.
Guest speakers Jimmy Eng (UWashington), Alexey Nesvizhskii (UMichigan), Josh Elias (Stanford), along with SAB member John Yates (Scripps) are in the middle row.


Stanford’s Dr. Chris Adams (left) must be feeling pretty lucky!
He gets to use a SORCERER 2 for his research (as part of Allis Chien’s mass spec core facility), AND wins an Acer One netbook door prize from David Chiang!

Translational proteomics — aka Proteomics 2.0 — is high-sensitivity proteomics for translational research, whose mastery is your key to unimaginable fame and fortune in biology and medicine!

Whether you need to catch up or to keep up, you need to hear the leading proteomics technologists reveal their secrets!

We were fortunate to have three of most accomplished technologists (Mr. Jimmy Eng, Prof Josh Elias, and Prof Alexey Nesvizhskii) at our “Translational Proteomics 2.0 Meeting” give their insider insights on high-sensitivity data analysis.

In addition, we were privileged to have Sage-N Research SAB advisor Prof John Yates, one of the fathers of proteomics, attend our meeting and join in our lively panel discussions regarding the present and future of translational proteomics.

From the talks, these are tips for best sensitivity and specificity:

* There are several equivalent ways to calculate precursor mass, all of which can result in several AMUs of mass error due to incorrect isotope assignment.
* Semi-tryptic settings for database searching gives the best performance
* Use a wider mass tolerance than your experiments will yield
* However, you don’t need a wide mass tolerance for searching if (a) you use isotope shift check and (b) you have a decent source of noisy peptide, e.g. with semi-enzyme search
* Post-process peptide IDs with proper statistical tools (e.g. PeptideProphet, DTASelect or target-decoy analysis)
* Key is to monitor the false discovery rates (FDR) with different filtering criteria
* Use monoisotopic mass for fragment ions, and for precursor ions if using high-resolution instrument
* P-values or E-values are not good for large-scale proteomics, because they don’t give you estimated data rates for a given score cut-off, and they ignore other relevant factors (e.g. retention time, mass accuracy, etc.)
* The target-decoy method is a simple and effective means of FDR estimation. It gives scores more discriminatory power by improving signal-to-noise ratio.
* Can use search scores in combination with other characteristics to get more good IDs at a particular FDR than by using score alone

We will be publishing the meeting talks online. Watch this space for details!

Tags: , , , , , , ,

Hear Khatereh discuss her work and her success with the SORCERER 2 system!

Dr. Khatereh Motamedchaboki is currently the Manager of the Proteomics Facility at the Burnham Institute for Medical Research.

She is one of our increasing number of two-time SORCERER success stories, as a previous user at the Ebrahim Zandi Lab at the University of Southern California.

Reference: Laurence M. Brill, Khatereh Motamedchabokia, Shuangding Wu, and Dieter A. Wolf, “Comprehensive proteomic analysis of Schizosaccharomyces pombe by two-dimensional HPLC-tandem mass spectrometry”, Methods (2009), doi:10.1016/j.ymeth.2009.02.023.

Click Here to See Video

Tags: , , ,

Our R&D team is busy working on the next major version of the Sorcerer-PE software, and expects to release it to then-in-warranty customers in the next few weeks.  Early previews and beta tests of some of the components will be made available by arrangement to qualified customer sites.

Highlights of the upcoming release include:

  • ETD fragmentation support and analysis
  • MUSE scripting modules for rescoring peptide matches with Olsen-Mann and Sadygov-Coon scores
  • Interoperation with major components of the Yates lab Sequest suite, including the DTASelect filtering and statistical analysis tool, and the Census quantitation application
  • Enhancements to the SEQUEST engine which provide first-pass cross-correlation scoring and E-values for greater accuracy and sensitivity

Read the rest of this entry »

Tags: , , , ,

We’ve developed a new Muse workflow for target-decoy analysis and false discovery rate estimation, based on our integration of DTASelect from the Yates lab. DTASelect can now use target-decoy FASTA files that are installed on Sorcerer to support its statistical analysis. It provides an easy-to-interpret results report complete with match statistics and estimated false discovery rates.

Our DTASelect on Sorcerer page on this blog has been updated to describe the target-decoy workflow, in addition to the existing material on installing, configuring and running DTASelect and the Muse script. Please visit it to get links to the latest scripts and for a detailed How-To.

Tags: , , , ,

Three of the world’s leading experts on MS-MS protein identification came together recently at Sage-N Research’s annual user group meeting, and presented methods and results for the techniques and tools with which they are associated:

  • Jimmy Eng, co-inventor of Sequest and developer of many proteomics tools, presented tips for Sequest analysis
  • Josh Elias, who pioneered the systematic use of decoy databases for FDR estimation, gave a talk on how to use that technique to address Peptide ID signal-to-noise.
  • Alexey Nesvizhskii spoke about the tools he co-authored, in “Peptide identification and protein inference using PeptideProphet and ProteinProphet”

Their talks were very wide-ranging and full of practical insights for the proteomics user community, and they explored the different research interests, data sets, analysis methods and workflows in the individual labs.  However, they all had this in common: they had kept a careful eye on their search settings, monitored sensitivity and error rates, and come to a common, if perhaps not entirely intuitive, conclusion: the most sensitive search and the lowest error rates for shotgun proteomics are achieved when using semi-enzymatic searches — that is, when one end, but not both, of the peptide is allowed to diverge from the expected cleavage site.

Read the rest of this entry »

Tags: , , , , , , , , , ,


Jimmy Eng (left) of University of Washington receives a thank-you gift from David Chiang after his talk.

During our Translational Proteomics 2.0 Meeting, we were privileged to have Jimmy Eng (University of Washington) give us his uncommon insights into using SEQUEST with the Trans-Proteomic Pipeline (TPP).

This talk will be invaluable for advanced users of the SEQUEST search engine for sensitive translational proteomics analysis. All active SEQUEST users should listen to this talk!

Researchers will benefit by increasing their sensitivity and decreasing their false discovery rates when identifying proteins and post-translational modifications using proteomics mass spectrometers like the Orbitrap.

Jimmy is one of the most prolific proteomics developers over almost two decades, as the co-inventor (with John Yates) of proteomic search engines and SEQUEST, as well as the developer of a number of TPP tools.

Conclusions from slides:
- Semi-tryptic searches are better
- Use monoisotopic masses for fragment ions
(Use monoisotopic masses for precursor ions if data from a high-res instrument)
- Narrow mass tolerance searches better if search considers precursor mass isotope assignment error

The talk is available at:  http://www.scivee.tv/node/11920 (31 minutes).

I recommend using the “full screen” mode so you can view the slides, which are also available as a download from the site.

Tags: , , ,

Many of our customers have found DTASelect to be a very useful postprocessing tool for Sequest results, and have reported success using it with Sorcerer output. Up until now, however, these customers have generally run the tool manually on a separate desktop computer. Now we have developed a Muse script to make it easy to do this automatically, on Sorcerer itself.

See our DTASelect on Sorcerer page on this blog for a detailed How-to on installing, configuring and running DTASelect and the Muse script.

If you are interested in using Ascore as described in the application note on the blog, please contact us for new Muse scripts for your Sorcerer. We’ve just updated them, and they are needed to work with the recent v4.0 release of TPP, which is what’s in the current Sorcerer release.

Tags: , ,

Here’s a how-to for technically advanced users who need to update the Java platform on Sorcerer. It’s not required for the base Sorcerer software, including ScaffoldBatch, but it may be necessary for Phenyx installation. Please consult our technical support staff before deciding to do the update.

These instructions assume that you have a recent 64-bit Sorcerer operating platform (either RHEL 5.2 or Centos 5-based), and that your Sorcerer software is at V3.5.

Here are the steps:

  1. Get the latest Java Development Kit (JDK)  (currently v6 update 18) from http://java.sun.com/javase/downloads/index.jsp. Click on the ‘Download JDK’ button. Get the Linux x64 platform, and download the non-rpm file which has a name like jdk-6u18-linux-x64.bin
  2. Log in as root in a terminal window and type: cd /opt
  3. Copy the file you downloaded to /opt, and unpack it:  /bin/sh jdk-6u18-linux-x64.bin
  4. Note the name of the pathname to java in the unpacked directory for use in the next step, e.g. /opt/jdk1.6.0_18/bin/java
  5. Type:  /usr/sbin/alternatives --install /usr/bin/java java /opt/jdk1.6.0_18/bin/java 2
    • This sets up a system of links from /usr/bin/java to the new installation
  6. Type: /usr/sbin/alternatives --config java
    • Enter ’2′ at the prompt to select the newly installed alternative
  7. Check you have the latest java by typing:  java -version

(Optional) Update Firefox Java plugin:

  1. Create a plugins directory in the Firefox installation directory if the plugins directory does not exist. Please check your version of Firefox to determine the correct path to use: mkdir /usr/lib64/firefox-3.x.x/plugins
  2. Create a symbolic link to the new Java plugin. Again please check your Firefox and JRE version for the correct paths: ln -s /opt/jdk1.6.0_18/jre/lib/amd64/libnpjp2.so /usr/lib64/firefox-3.0.5/plugins/

by David.Chiang@SageNResearch.com

First off, I may need to apologize to those who take offense at the equivalent of someone trying to lift spirits at a funeral, as I am not trying to make light of the seriousness of today’s challenging economic circumstances.

However, I subscribe to the philosophy of author Anthony Robbins and others that there is always a positive to any negative, and that a proper mindset is key to move yourself forward, no matter what life throws at you. If life gives you lemons, it’s an opportunity to build a lemonade business empire.

Today, it is more important than ever to focus one’s mind on a positive path forward, because quite honestly, there are signs that the post-recession recovery could well be the opportunity of a lifetime for many of you!

It may seem perverse to have such a view given the prevalence of all the bad news, but history is on my side.

In fact, for those of you relatively early in your career, with at least 10 to 20 good working years ahead of you, I believe the career gods may well be smiling on you, as you have the best chances of catching the wave of the upcoming Biotech Revolution 2.0 — the one centered around proteins rather than DNA or cDNA.

Let me explain why this is so, and what you must know to win big in the next decades.

Read the rest of this entry »

Tags: , ,

Sage-N Research is hosting its annual users’ meeting on the afternoon of Sunday, 31st May, immediately before the ASMS meeting in Philadelphia. We are proud to announce a compelling  agenda with talks from the principal developers of several of the key proteomics data analysis methods that are used as standard in the community, including SEQUEST, target-decoy search strategies, and Peptide/Protein Prophet.  The insights our clients will take away from this meeting will be very relevant to their use of Sorcerer, and promise to enhance their proteomics analysis productivity greatly.

2009 Users’ Meeting Arrangements

The meeting is open to in-warranty Sorcerer customers and by invitation only. Pre-registration is required. A light buffet and refreshments are being provided, and there will be a drawing for customer door prizes. Attending this meeting is your chance to win one of three Acer Aspire One netbook computers that we are giving away to our customers (must pre-register and be present to win)!Acer Aspire One Netbook

Date: Sunday 31st May 2009
Time:
1:30 PM to 5:00 PM
Address: Courtyard Marriott Hotel, 21 N. Juniper St, Philadelphia.
Room: Ballroom Level 1

Agenda

1:30 PM   Welcome and introductory remarks

1:45 PM    “What’s new in Sorcerer”
James Candlin, Sage-N Research, Inc.

2:05 PM    “Sequest analysis tips”
Jimmy Eng, University of Washington

2:45 PM    “Using target-decoy searching to visualize peptide identification signal-to-noise”
Dr. Joshua Elias, Stanford University School of Medicine

3:15 PM    Break and refreshments

3:50 PM   “Peptide identification and protein inference using PeptideProphet and ProteinProphet”
Dr. Alexey Nesvizhskii, University of Michigan

4:30 PM   Panel Discussion: “Putting it together: strategies for a productive proteomics analysis workflow”

4:50 PM   Concluding remarks

5:00 PM   End of meeting

Tags: , , ,

Hear Dr. Laurence Brill, senior research scientist at the Burnham Institute (La Jolla, CA) describe his advanced proteomics setup with the SORCERER 2 system:

Click here to here Dr. Laurence Brill

Reference: Laurence M. Brill, Khatereh Motamedchabokia, Shuangding Wu, and Dieter A. Wolf, “Comprehensive proteomic analysis of Schizosaccharomyces pombe by two-dimensional HPLC-tandem mass spectrometry”, Methods (2009), doi:10.1016/j.ymeth.2009.02.023.

Click here for another Success Profile

Tags: , , ,

Many people I’ve talked to in the science and investment community equate Digital Biology with High-Throughput Biology. While related, they are not the same thing.

High-throughput is about speed, but digital (in the sense of Moore’s Law type geometric scaling) is about acceleration, or geometrically increasing throughput. The distinction is important for predicting the eventual successes of different technologies, and maybe your career if it is closely tied to a particular technology.

Moore’s Law exemplifies geometric scaling in semiconductors, and generally predicts that the number of transistors on a single chip doubles approximately every 18 months. If you work out the math, it’s about 1000x after 15 years. When that happens, a field is revolutionized once it reaches tipping point.

Indeed, when I started working in Silicon Valley as a freshly minted MIT engineer in the mid 1980′s, I was astonished that my friend’s company had a whopping 3 gigabytes of disks on their central computers (that’s for the ENTIRE mid-size company). Today, those 3GB fit on a USB pen drive, and you can get portable drives with 1 terabytes. In another 15 years, we may have portable 1 petabyte drives, and computers 1000x more powerful. It’s mind-boggling.

In contrast, high-throughput technologies include flow cytometry and robotics. While they are “fast” today, it is doubtful that they will become 1000x faster in 15 years. There are 96-well, 384-well, and 1536-well plates, but it is doubtful these will continue to geometrically scale to having million-well plates anytime soon.

Proteomics mass spectrometry, which often relies on ever faster and more sensitive electronics and sensors, is geometrically scalable for some time, and as such holds the possibility for a revolution. With continued scaling, it is possible to imagine when geometrically more proteins can be characterized from a single organelle.

Such will be the power of the Digital Biology Revolution.

Tags: ,

APEX (‘Absolute Proteomics Expression’) is a technique developed by Lu et al. for label-free quantitation of proteins based on MS-MS spectral counting of peptides. Unlike basic methods of this sort which suffer from variable detection probabilities that depend on the physiochemical properties of the peptides, APEX includes correction factors that predict the detection rates of the peptides for a better protein quantitation result.

There is an open source APEX Quantitative Proteomics Tool that implements this technique and that can use Sequest-based protein IDs as analyzed by the Trans-Proteomic Pipeline. Sorcerer users had the idea of using the tool in conjunction with Sorcerer, and now we have developed a workflow and MUSE script to help other users use this combination.

For more information, please read the application note ‘Sorcerer Workflow for the APEX Quantitative Proteomics Tool’.

President Obama unveiled a stimulus package in February that includes about $10B funding for NIH over two years. He specifically called for cancer research, which will get about $1.26B. In March, he will lift federal funding restrictions for stem cells. Other countries may follow suit. Since cancer and stem cell research make up more than 2/3 of advanced proteomics research today, this is very good news for proteomics!

Stimulus grants are likely one-time grants, so it should be viewed as a start-up grant for building up or completing your advanced research capability. In particular, focus on tools that increase automation and reduce manual intervention, including any tech support and maintenance that can improve your research productivity over the next 3 years.

For advanced “Proteomics 2.0″ analyses capable of large-scale analysis of important PTMs (phosphorylation and ubiquination), you would need (1) one or two high-throughput mass-accurate mass spectrometers, (2) a high-throughput software workflow capable of sensitive PTM analysis, (3) a robust compute server and storage system, and (4) several years of software and hardware warranty and maintenance.

For example, this is a proven basic setup suitable for advanced phospho-proteomics for cancer and stem cell research:

1) LTQ-Orbitrap mass spec
2) SORCERER 2 integrated data appliance
3) SORCERER ISIS-10 storage system with 10 terabytes
4) Recommended optional software on PC: Proteome Discoverer, Scaffold
5) Recommended optional software on SORCERER: Mascot, Phenyx

The SORCERER 2 IDA system includes a high-throughput SEQUEST and two integrated high-throughput workflows (Scaffold Batch and Trans-Proteomic Pipeline), as well as specific tools for phosphorylation and an integrated scripting environment for workflow customization. The SORCERER ISIS storage subsystem works directly with the SORCERER IDA to provide sufficient secure storage for several years of typical throughput. They both share the same warranty for 3 years, simplifying the IT maintenance.

A more advanced system, particularly for statistically robust biomarker discovery, including isotope labelled quantitation (iTRAQ), is the following:

1) LTQ-Orbitrap with HPLC
2) MALDI-TOF/TOF mass spec (e.g. ABI 4800)
3) SORCERER 2 integrated data appliance
4) SORCERER ISIS-20 storage system with 20 terabytes
5) Recommended optional software on PC: Proteome Discoverer, Scaffold
6) Recommended optional software on SORCERER: Mascot, Phenyx

In most cases, Sage-N Research can assemble the entire data analysis system with the noted tools pre-installed and pre-configured as a plug-and-play system with unified warranty.

Not surprisingly, what we recommend coincides with our own product portfolio. Undoubtedly, some would think that we are simply recommending what we sell.

Actually, quite the opposite is true — we sell what we would recommend. As the only prominent search engine provider that doesn’t promote our own proprietary search algorithm (we resell SEQUEST, Phenyx, and X!Tandem inside SORCERER, and would pre-install Mascot on request), we are free to pick and choose the best-of-class software and hardware components to deliver the most robust, sensitive workflow systems.

Advanced proteomics holds great promise, especially for cancer and stem cell research. Unfortunately, six years after Scott Patterson proclaimed in Nature Biotechnology that data analysis is proteomics’ Achilles heel, it is still true. The fact is, there is no way you can realize the full potential of an Orbitrap for advanced proteomics using just a PC.

Please contact me personally to see if we can help. The best way is to send me an email at: david@SageNResearch.com. Together, we can realize the full potential of proteomics to benefit everyone.

Tags: , , , ,

We are pleased to announce the availability of the ISIS (Integrated Storage and Information System), which is configured and integrated to work directly with the SORCERER Enterprise bladecenter system to provide 4 to 100+ terabytes of integrated, protected storage for proteomics, genomics, imaging, and other repository needs. A second backup ISIS system can be configured offsite to provide additional backup and disaster recovery needs. To simplify maintenance and warranty for our clients, it will be covered under the same warranty plan as the SORCERER system for 3 years or 5 years.

The base ISIS system will provide approximately 4.1 terabytes of secure storage in a “2U” height, rack-mount system, consisting of twelve 450 GB SAS disks with 2 disk redundancy in RAID6.

In most countries, the ISIS system consists of the following:

- ISIS storage integration software interface running on SORCERER platform
- Fujitsu ETERNUS DX80 with single controller
- Approximately 4.1TB usable (12 x 1TB SATA disks in Raid6) per 2U rack, with up to 20 racks
- Min 3 year warranty is included (subject to the TSP coverage of the SORCERER)

Note that future expansion to 100+ TB will require additional ISIS expansion units or higher density SAS drives.

New clients can order the SORCERER Enterprise blade system with the ISIS system together as two rack-mount units. Clients with newer SORCERER 2 integrated data appliances with at least 8 CPU cores can simply add the ISIS to their existing system. (Older SORCERER systems will require a hardware upgrade.)

Please contact sales@SageNResearch.com for more information.

Tags: , , ,

We are lining up an exciting program for our SORCERER clients on Sunday, May 31, 2009, from 1:30pm to 5:00pm, in Philadelphia, just before the ASMS reception:

http://www.asms.org/Default.aspx?tabid=209

Like last year, we will have plenty of toys and give-aways. It will continue to be a closed meeting, open to in-warranty clients and special invited guests only. Pre-registration will be required due to space limitations.

Please stay tuned for more details.

Tags: , ,

Here are some notes from the TPP support group on using Tandem Mass Tags (i.e. similar to iTRAQ):

http://groups.google.com/group/spctools-discuss/browse_thread/thread/98dcb28f8dfa2349?hl=en

Here is Thermo’s TMT information:

http://www.thermo.com/com/cda/article/general/1,,20815,00.html

Note that TMT pre-dates iTRAQ, and is a significantly larger molecular tag. At present, iTRAQ has a larger marketshare than TMT.

Tags: , , , ,

There is a continuing market shift toward “appliances” (application-specific systems) and away from software and hardware.

During this last week, both Microsoft and Intel have announced rare layoffs.

So who is doing well?

Apple (nasdaq:AAPL), for one, experienced unexpectedly strong financial results:  http://blogs.zdnet.com/BTL/?p=11563 .

Its sales are increasingly dominated by its iPod and iPhone appliances — computers that are tuned for one application for increased usability and reliability.

Network Appliance (nasdaq:NTAP) for another. NetApp sells storage appliances that are servers configured only for maintaining a large file system. Its products are also tuned for one application for usability and reliability. NetApp beat out Google as the top place to work in Fortune Magazine’s survey:    http://www.siliconvalley.com/opinion/ci_11529119

Why should you in proteomics research care?

Because data analysis continues to be the Achilles Heel of proteomics (see Scott Patterson’s Nature Biotech 2003 article), and IT trends determine the optimal value proposition comprising hardware and software systems.

The reason for the shift toward appliances is because of the changing cost dynamics.

Twenty years ago, you would buy $500 software for your $5K computer, so you need to have general-purpose computers to handle multiple software applications.

Today, it’s the reverse: you are more likely to buy $5K software for your $500 computer. Therefore, it makes less sense to re-use hardware for cost reasons alone, especially if it compromises application usability.

I think it is amazing that a high-end iPod costs more than a low-end Dell PC with monitor! Such is the case when the value is in the application, not the hardware.

Therefore, we expect that advanced proteomics analysis — where your server may see 80%+ usage — may similarly be best served by a dedicated analysis appliance, such as the SORCERER integrated data appliance, for improved usability and reliability.

Tags: , , ,

Common PC proteomic software is designed primarily to be easy to use with low throughput and small datasets up to a few 1000 spectra. PC programs like Mascot or other software generally work fine at this scale.

However, high-throughput and large-scale analysis (e.g. 100K+ spectra experiments) — a foundation capability for biomarker discovery, molecular profiling and advanced post-translational modification research, requires a different methodology because of the increased need for sensitivity, noise-reduction, and automation.

Horses for Courses

This British maxim states that what may be suitable for one situation may not be suitable for another, as no one race horse is ideal for all course conditions.

When you need to go somewhere, you would walk, drive, or take a plane depending on whether the distance is 1, 100, or 10000 miles/kilometers, respectively.

If your annual income is USD $1K, $100K, or $10M, you would prepare your tax forms manually, use the TurboTax software, or hire a very expensive accountant, respectively.

However, I still occasionally meet scientists who mistakenly believe they can evaluate a large-scale workflow by using a simple BSA or other standard commercial mixture.

Advanced, large-scale analysis is highly specialized, and requires a lot of messy statistics tested against big datasets for true validation. Unless you enjoy that sort of thing, it’s easier to find someone else you respect who has done the heavy statistical lifting for you, so you can focus on what’s really important for you.

Two common large-scale workflows, both use SEQUEST

Read the rest of this entry »

Tags: , , , , ,

by David.Chiang@SageNResearch.com

In the classic A Tale of Two Cities, Charles Dickens wrote the famous line, “It was the best of times, it was the worst of times …”

This is an accurate description for these extraordinary times. It has been very difficult for those among our friends and families who suffered losses in jobs, housing, and retirement accounts. With things likely to get worse before getting better, these are indeed the worst of times in recent memory.

However, it is important to keep in mind that extraordinary times breed extraordinary opportunities. Many great inventions and companies were created or forged during deep recessions. If history is any indication, these will also turn out to be the best of times, at least for those who can plan and act strategically.

In the middle of a tsunami, whether oceanic or financial, there is not much one can do but to try to hang on and not get swept away. Afterward though, the sea level returns to normal and renewal begins, eventually leaving everything better and fundamentally stronger that it once was.

Looking at the positive side of the global correction, the pendulum will likely swing back toward substance (particularly medical research and technology) and away from fluff. It would be refreshing for the world to value drug discovery and stem cell research on par with YouTube and Facebook, for instance.

During downturns, there is also a “flight to quality.” This is good news for those of us focused on high-end quality in our products and services. This is true for Core Facilities like you as well as tool providers like us. Only those of us who bring unique value survive during times like these, and we alone will thrive once recovery gets under way.

Indeed, the future outlook is brighter than ever for those who have special expertise in the right domain. In fact, I believe there are special opportunities ahead for those well-versed in advanced proteomic data analysis.

The “Digital Biology” revolution happening now will change everything
Read the rest of this entry »

Tags: , ,

MSQuant, from the Centre for Experimental Bioinformatics (DK), is a leading tool used by the Matthias Mann group and others for quantitative proteomics, and in particular, SILAC analysis. It is a Windows program that is designed to take MS-MS raw files and protein IDs in the form of a Mascot Peptide Summary Report. So up until now, if you wanted to use MSQuant, the only practical way of doing it was to have Mascot installed and run it first.

Now, however, there another option: MSQuant users can use a Sequest/TPP-based toolchain for protein ID, and using a conversion utility, they can transform the ProtXML/PepXML files from TPP into a format which MSQuant can load.  Using Sorcerer’s scripting environment, MUSE, the transform can be done automatically as post-processing of a Sorcerer search. A further advantage of doing it this way is that the Sequest/TPP toolchain needs no special preparation of the input peaklist files to extract all the information that MSQuant requires for links to the scans in the raw file.

Read the rest of this entry »

Tags: , ,

A ‘Muse’ is a Greek goddess with inspirational and creative power — perhaps someone you might expect to hang out with Sorcerers!

Indeed, MUSE(R) is a recursive acronym for ”MUSE Utilities for Search Engines”. The MUSE platform is developed to allow rapid prototyping of new scoring algorithms, such as for “Proteomics 2.0″ analyses of PTMs, ETD, and quantitation.

There are currently 3 big challenges in proteomic data analysis today:

  1. Data scale and throughput
  2. Workflow integration
  3. Analysis flexibility

The few proteomic labs with the compute servers to handle large-scale data-sets, the know-how to integrate robust workflows, and the programming capability to develop semi-custom analyses and algorithms can do big science. Today, much more than instrumentation, data analysis capability separates the ‘haves’ from the ‘have-nots’ in proteomics research.

The SORCERER 2 appliance already addresses throughput and workflow integration. With the new MUSE integrated scripting platform, the SORCERER 2 appliance can now address all three to provide the most advanced platform for advanced proteomic data analysis.

The MUSE platform is specifically designed to allow trained researchers to quickly interrogate, filter, and manipulate their large-scale data-sets interactively, along with easy-to-use scripting libraries for developing new scoring functions that compare spectra against a peptide sequence with PTMs.

Technically, the MUSE platform consists of two components: the MUSE scripting language and the MUSE scripting environment.

The MUSE scripting language is a proteomics extension of the LUA language popularized by online video games due to its speed and extensibility. (It is considered one of the fastest scripting languages, is easier to read than Perl, and is syntactically similar to Java.)

The MUSE scripting environment is based on the Bash shell, and includes Perl, PHP, sed, awk, and other popular tools on a 64-bit Enterprise Linux platform, with three decades of robust history.

Even with the very first MUSE platform, it is possible to write single lines to make regular expression substitutions, sort search results by score or delta-mass, write new scoring functions, re-arrange or combine fields, or change formats.

In one test case, we are able to write out the search results into a virtual spreadsheet with 6000 rows and 6000 columns that can be filtered and sorted at will. With adequate training and tech support, researchers can rapidly sort results by XCorr or mass difference, search for phosphorylated sites, and convert PTM symbols to actual masses without programming, for example.

You can see MUSE examples at the Proteomics 2.0 blog by searching for “MUSE”:   http://www.proteomics2.com/ .

Tags: , , ,

The MUSE script ‘sorcout.mu’ can be used to summarize the top peptide scores from SORCERER-SEQUEST into a CSV format for importing into Excel.

This is useful to performing non-standard analyses (i.e. separate from PeptideProphet or Scaffold), or for further manipulation of the data using scripting languages like Perl or MUSE.

Simply type “sorcout.mu” in the MUSE box (under Advanced Options in the Search page).

It can also be run interactively after the search, by running it inside the output directory for the search job (e.g. “/home/sorcerer/output/45/”), just above the ‘original’ directory.

It will search all subdirectories for *.out files, and turn the top peptide from each *.out file into a single CSV line.

As well, the MUSE script can be copied and modified as needed to customize to a specific format.

Note: sorcout.mu is available in Sorcerer PE v3.5+ revisions.

Tags: , ,

Electron transfer dissociation (ETD) is a promising dissociation technology for analyzing labile post-translational modifications (PTMs) such as phosphorylation. Unlike CID, ETD generates positively charged c and z* (z-radical) ions instead of b and y ions. There are two caveats in using standard SEQUEST for ETD tandem mass spectra:

  1. Standard c/z option doesn’t compute z* ions correctly.
  2. Standard SEQUEST allows only low charge states, and would not work for highly charged, long peptides.

It is important to note that z* ions are not the same as z ions, and have an extra hydrogen (1.08 Da monoisotopic mass). This means that the standard SEQUEST option of searching c/z ions will not search ETD spectra correctly, since the computed z ions will have the wrong mass. On SORCERER, correct c/z* ions can be obtained using user-defined static peptide terminus modifications on standard b/y searches, as described below. As well, SORCERER-SEQUEST* allows very high precursor charge states (up to +255) in order to accommodate highly charged species. Here is how to search ETD spectra using SORCERER …

1. Define peptide terminus mods that shift b/y ions to c/z* ions, and use these for ETD searches.

Define the following static peptide terminus modifications using the web interface (click “Add/edit modifications…” on the Search page, then click “New/edit modifications” on top):

  • Name: “BtoC” with Mono Mass: “17.02655″ and Type=”N-Terminus”
  • Name: “YtoZrad” with Mono Mass “-16.01872407″ and Type=”C-Terminus”

In both cases, Residue is left blank.

2. Define a new search profile that incorporates the above peptide terminus mods.

In the Search page under “(2) Choose a Search profile”, select the most similar existing search profile, then click “Edit this profile…”. Be sure to name it something different and memorable, then select the above 2 mods under “Terminus modifications” and “Static”. Select other applicable options.

3. Include a MUSE script to generate a Excel-readable tab-delimited text (TDT) summary file of the SEQUEST top peptides.

In many cases, it can be useful to have a TDT file of the SEQUEST outputs for your Excel analysis, especially for ETD analysis of purified proteins or very simple mixtures. (See note below.) Simply include the MUSE script “sorcout.mu” (part of Sorcerer PE v3.5) as follows: Click Advanced Options “Expand”, and type “sorcout.mu” into the MUSE custom script box. (From now on, any submitted search will have a “sorcout.tdt” file automatically created in the appropriate ‘output’ directory.) Save the search profile. It is now ready for SEQUEST searches on ETD spectra.

4. Try the search using this test DTA file.

Download the following ETD test DTA file and search against SwissProt.

Right Click to Download Sample ETD DTA file

If using built-in TPP’s Spectrum Viewer, simply set the display options to “c” and “z” ions (here, “z” really means “z*”). The z* ions should match pretty well against peptide “KLYNKEPSEIVELK”.

 

Note that many common post-SEQUEST probability re-scoring algorithms, such as PeptideProphet or Scaffold, are not tuned for ETD scores. From first principles, we believe that the resulting probabilities may not be wrong per se, but rather be lacking in specificity. Therefore, particularly for ETD analysis of PTMs in purified proteins or other simple mixtures, we recommend downloading the SEQUEST scores to an Excel spreadsheet for manual interpretation rather than using CID-tuned tools. *The Yates Lab’s version of SEQUEST has 2 code modifications for ETD. The first is the increased charge state (same as in SORCERER-SEQUEST). The second is exclusion of the Proline cleavage, which is not implemented in standard SORCERER-SEQUEST. However, this can be done with a MUSE post-processing step in the future if it is found to have a large effect. As always, in-warranty clients can contact our TechTeam for help on this and other advanced capabilities.

Tags: , , ,

Article on Sage-N Research and Thermo Fisher Scientific collaboration:

http://www.drugdiscoverynews.com/index.php?newsarticle=2475

Tags: , , , ,

N-linked protein glycosylation is a common post-translational modification (PTMs) in many cellular processes. Atwood et al (RCMS 2005) describe a tandem mass spec-based methodology to analyze N-linked glycopeptides.

Enriched glycopeptides are treated with peptide N-glycosidase F, which removes the carbohydrate moieties from the peptide backbone. Deglycosylated peptides are analyzed with a tandem mass spec. The resulting MS/MS spectra are searched against a modified protein sequence database that allows only PTMs on N’s within the consensus sequence N-x-y, where x is any residue other than proline, and y is either serine or threonine.

To analyze this PTM on the deglycosylated peptides on SORCERER, we need to search for a monoisotopic mass shift of 0.9840 Da on N’s only in the {N[^P][ST]} consensus sequence.

To search this PTM on the SORCERER, we do the following 2 steps:

1) Create a new protein sequence database that replaces ‘N’ with ‘J’ in the consensus sequence.

2) Prepare this new sequence database for searching by defining ‘J’ to have the same mass as ‘N’ using a static modification setting on ‘J’.

3) Submit a search on SORCERER with a variable modification search on ‘J’ with a mass shift of +0.9840 Da.

Create New Protein Database

Use the MUSE script ‘nlinkglyco-fasta.mu’ (part of SORCERER PE v3.5) to create a new protein sequence database that replaces each N in the consensus sequence with J.

Simply log onto SORCERER, go to directory ‘/home/sorcerer/fasta/’ where the protein sequences are, and create a new fasta file from an existing one (for example, create ‘ipi.human_n2j.fasta’ from ‘ipi.HUMAN.fasta’) . Then use prepare this new fasta file for searching as you would any other protein sequence file.

Once you log onto the SORCERER, and type the following 2 commands (do not type the ‘sorc$’ which is the SORCERER prompt):

   sorc$ cd /home/sorcerer/fasta/

   sorc$ nlinkglyco-fasta.mu < ipi.HUMAN.fasta > ipi.human_n2j.fasta

The latter command literally means to run the MUSE script using “standard input” from file ipi.HUMAN.fasta (after the ‘<’ symbol) and sending the “standard output” to the new file ipi.human_n2j.fasta (after the ‘>’ symbol).

(The script may be easily copied and modified for another consensus sequence. Contact TechTeam for details.)

Prepare Database for Searching

When the new protein sequence database is prepared for searching, assign a static modification ‘MakeN’ of -9885.95707256 Da. This will cause the final ‘J’ mass to be the monoisotopic mass of 114.04292744 Da. (The normally unused codes ‘J’ and ‘U’ are set at 10,000 Da to flag any inadvertent usage.) The resulting peptide database will be used for subsequent searching.

SORCERER Search

The search can now be submitted by creating a user-defined variable modification ‘Nlinkglyco’ with mass of 0.9840 Da on the residue ‘N’ against the new peptide database.

 

We thank Dr. Rebekah Gundry from the Van Eyk Lab at Johns Hopkins for bringing this SORCERER application to our attention!

Reference: Atwood et al (Rapid Comm Mass Spec 2005; 19: 3002-3006 DOI: 10.1002/rcm.2162)

Tags: , , , , ,

Dr. John Yates from the Scripps Research Institute gave the talk “Driving Biological Discovery using Quantitative Mass Spectrometry” at the 2008 Proteomics 2.0 Meeting hosted by Sage-N Research.

 

The audio MP3 file is available by download here (click to play, right click to download):

   SageN002_JYates_2008Jun_57m.mp3

The complete slideset is available by download in 5 parts here (click to view, right click to download):

  SageN002_JYates_2008Jun_part1.pdf

   SageN002_JYates_2008Jun_part2.pdf

   SageN002_JYates_2008Jun_part3.pdf

   SageN002_JYates_2008Jun_part4.pdf

   SageN002_JYates_2008Jun_part5.pdf

The meeting was held on June 1, 2008 in Denver, just before the ASMS conference.
 

Tags: , , , , ,

by David.Chiang@SageNResearch.com

Orbitraps and other fast ion trap mass spectrometers (e.g. FT, LTQ) are popular instruments for discovery proteomics research.

The SEQUEST cross-correlation score is almost tailor-made for the spectral characteristics of ion trap data, whose information-rich spectra are challenging due to multiply-charged ions reported with relatively low fragment mass accuracy. This is especially important for analyzing noisy spectra that arise from low-abundance peptides and phosphorylated peptides, where the information content is embedded in the abundant small peaks.

However, you may be unaware how the basic SEQUEST functionality has evolved from the first ‘sequest27′ prototype program to the latest SORCERER-SEQUEST implementation. 

Software continues to evolve to adapt to new requirements. Like a home remodeling job that never ends, at some point it becomes more practical to start over from scratch. After all, maintenance costs are several times higher than the initial development costs over the life of a software product.

The recommended architecture for high-throughput analysis is a client-server system architecture, which separates the interactive user client computer from the heavy-duty number-crunching server. This simplifies the sharing, updating, and backup of the central server, and isolates it from viruses and other sources of system instability from the user accessible client PCs.

Sequest27

Proteomic search engines were first invented by John Yates and Jimmy Eng at the University of Washington in the early 1990′s, based on the novel idea that a peptide sequence can be inferred not just from the tandem mass spectrum alone (i.e. de novo sequencing), but using known protein sequences as a reference.

The prototype search engine software was a standalone program named ‘sequest27′ comprising approximately 3000 lines of C code. The source code has since been separately maintained by the Yates Lab and by Thermo, with PTM searches and other modifications added later. 

The ‘sequest27′ program processes one mass spectrum at a time, and searches a protein sequence database from the beginning to end each time it is run. For example, to analyze a MudPIT experiment with 8,000 spectra, the ‘sequest27′ program is run exactly 8,000 times to generate 8,000 output files, with no attempt to use information from one ‘sequest27′ run to another. 

SEQUEST Cluster

The simplest way to scale up the throughput is to run the same program on many computers at once, such as in a Beowulf cluster architecture (http://www.beowulf.org/). 

The SEQUEST Cluster (“SC”) product once marketed by ThermoFinnigan uses this approach, with typically 4 to 32 Linux slave node computers running ‘sequest27′ under the control of the Windows master node computer running Bioworks. 

The SC architecture partitions the set of input spectra into smaller sets for each node, and uses the master node to aggregate the results. While this approach is simpler to implement than partitioning the protein sequences, it requires each local disk to contain the same protein files, resulting in inefficient disk usage (i.e. a 16-node cluster searching the NCBI nr file must store 16 identical copies). As well, it makes the indexed search capability impractical. If the local files are large, then manually copying the files across the network to each node will take a lot of time.

To proteomics researchers new to clusters, the SC architecture seems to offer two benefits: (1) higher throughput than a single computer, and (2) ability to expand throughput in the future by adding nodes. 

However, the devil is in the details. In practice, the cluster may not offer higher throughput than an optimized, non-cluster architecture. As well, future expansion for this software architecture is impractical in light ofMoore’s Law

Depending on the search conditions, one high-end server (say with 8 GB RAM, 1.6 terabyte disk) with an optimized software architecture can outrun a 16-node cluster, whereby each slave node has 1/16th the resources (i.e. 512 MB RAM and 100 GB disk). And it will be simpler to maintain, easier to program, and approximately 16x more reliable. The partitioned RAM and disk resources make system-wide optimization difficult.

Future expansion is also impractical beyond the first year for the SC architecture, since all the slave nodes are assumed to have identical specs. With Moore’s Law predicting 2x performance increase every 18 months at the same price, it is more effective to replace the computing hardware every 2 to 3 years with a brand-new system rather than to try to buy older nodes to add to an old cluster.

Server vs. PC

Servers are not just big Personal Computers (PCs). Quality server hardware is designed for reliable 24/7 multi-processing and continuous disk access, unlike PC hardware designed for the cost-sensitive consumer market.

Robust server operating systems like Enterprise Linux are designed to simultaneously run dozens of independent programs in multi-user environments and to isolate crashed programs from affecting our programs.

Server programs have fewer restrictions than PC programs designed for easy installation and use by non-experts. Therefore, they can incorporate powerful server modules like Perl, PHP, Ruby on Rails, Apache, and MySQL, but require IT expertise for installation and configuration. 

One important benefit of the server platform is ease of integration, which is increasingly important as the workflow evolves from just the search engine to a full proteomic workflow. 

In contrast, integration can be very complex on the standard Windows operating system. For example, some mass spec software from different vendors cannot co-exist on the same Windows PC. In general, PC software is easy to install but difficult to integrate, while server software tends to be the opposite.

SORCERER-SEQUEST

The SORCERER software architecture was developed from the ground up as a server platform for high-throughput search engines and workflows, with focus on robustness, scripting flexibility, and scalable performance. 

The SORCERER platform is not hard-coded for SEQUEST, but instead is a general-purpose proteomics search platform that uses the scoring subsystem for algorithm customization. (It was initially prototyped with X!Tandem, and later introduced with SEQUEST.)

At the heart of the SORCERER software architecture is the micro-partitioning of a search job into self-contained “micro-jobs” that are distributed and managed by a relational database.

In order to further reduce search time, the protein sequences are re-arranged into a peptide-centric data structure when they are first loaded into the SORCERER and “prepared” for peptide searches. Specifically, protein sequences are pre-digested in silico into unmodified peptides, which are sorted by mass, and partitioned into 0.5 GB chunks call ‘seqblobs’.

When a large search job is submitted to the SORCERER, it is added to the queue by the queuing subsystem. The Sorcerer PE Application Layer subsystem partitions each search job into possibly thousands of self-contained micro-jobs, each containing 300 spectra with associated seqblobs. With PTM searches, the same spectra unit may be search against different seqblobs with different mass ranges. (For example, a spectrum with 1000 amu precursor mass may have its unmodified peptide sequence be 1000 amu with no mods, or 920 amu with a single phospho-site.)

All the micro-jobs are recorded in a MySQL relational database. Available CPU cores from either the master or slave nodes will query the database for the next micro-job, and submit the results when completed. 

Since each seqblob contains pre-searched peptide information, each micro-job performs only the scoring function, which is the only part customized to SEQUEST or other search engines. (Before the advent of multi-core CPUs, FPGA subsystems were also used to execute search micro-jobs. Other exotic architectures, such as Nvidia GPUs and the upcoming Intel Larrabee, are also compatible and may be implemented depending on market needs.)

When all the micro-jobs associated with one queue search job is done, the results are aggregated and written out to the file subsystem. As well, an optional MUSE script is run at this time on the output directory. For example, Ascore phospho-site localization can be done with the search results, or additional re-scoring using different user-defined search engines. 

This powerful mechanism also allows algorithm developers to use the SORCERER search as a pre-search function to enrich the peptide candidates to perhaps the top 50 or 500, and then use MUSE scripting to rapidly develop scoring functions to increase accuracy. In particular, algorithm developers can optimize the important scoring functions without needing to first develop the base software to read FASTA files, compute PTM combinations, or perform other necessary but low-value operations.

Applications include the analysis of CID+ETD spectra, whereby the top CID search results are used to drive the ETD search, and MS2/MS3 phosphorylation analysis, whereby associated MS3 spectra may be separately searched in MUSE and re-combined with the MS2 results.

The SORCERER architecture includes a ‘custom’ directory, which has a higher priority than the application directory, to allow knowledgeable developers to substitute and overwrite almost any part of the SORCERER platform. (By confining all customization to this directory, it is simple to revert back to the original factory state.) Therefore, researchers can start with a powerful, functional workflow using a standard SORCERER product, then customize it as needed from simple MUSE scripts to a full re-architecting of major subsystems.

Tags: , , , , ,

Discovery proteomics research, such as for biomarker discovery, requires advanced “Proteomics 2.0″ analyses for PTMs like phosphorylation, ETD, and quantitation in addition to high-throughput.

With the transfer of the high-throughput SEQUEST Cluster business, the choice for high-throughput data analysis is simplified to one of two SORCERER products, both of which bring powerful “Proteomics 2.0″ capabilities with the integrated MUSE scripting environment.

Many advanced proteomics analyses require some level of customization, so the MUSE scripting can be invaluable. For example, some PTMs of interest occur only on certain residues at a peptide terminus, which can be implemented as a post-search filtering step. Workflow automation, such as the compression and copying of results after search completion, can be easily scripted in MUSE. Indeed, the Ascore phospho-site localization algorithm is scripted entirely within MUSE.

Algorithm developers can quickly experiment with new scoring functions, such as for ETD, PTMs, quantitation, or even replicating other common peptide search engines, by simply re-scoring, say, the top 50 candidate peptides from a Sorcerer search. 

SEQUEST Cluster users who have developed custom interface modules to their workflow can most likely adapt their infrastructure to SORCERER with little or no change.

The SORCERER 2 system will be the product of choice for most high-throughput users. It is a plug-and-play, pre-configured Enterprise Linuxserver. Users can install it in minutes, and immediately use a web browser interface (with a password) from any network PC for uploading and downloading data and submitting search jobs. They will also appreciate the reliability, as many Sorcerer systems in the field have been continuously running for more than a year without downtime.

The SORCERER Enterprise software will be a better fit for high-throughput users who must run software on approved servers within a data center, such as in biopharmaceutical companies or large centralized labs. It can be viewed as an “a la carte” version of the software architecture within the SORCERER 2 IDA, and allows other software to co-exist on the same server. 

The SORCERER Enterprise software can be purchased pre-installed and tested on customer-specified servers. Otherwise, it and its dependent components must be installed and configured by qualified IT staff on qualified powerful servers. As well, the semi-custom nature of the installation and maintenance will result in higher support costs.

Like the SEQUEST Cluster, the SORCERER Enterprise product allows throughput to be increased with additional slave nodes running the SORCERER Enterprise Plus software. Note, however, that each high-performance slave node may be worth 16 nodes of SEQUEST Cluster under common search conditions, so you won’t need as many.

Furthermore, the combination of Thermo Discoverer and Sage-N Research SORCERER provides a powerful, customizable, client-server data analysis platform. Discoverer provides a Windows user interface customizable using the Windows .NET environment, while the SORCERER provides the back-end Enterprise Linux server with MUSE customizability.

See the joint press release at:http://www.sagenresearch.com/news_10.html

If you plan to buy a new Orbitrap or other fast mass spectrometer for discovery proteomics, we would strongly recommend that you include a SORCERER 2 system (or SORCERER Enterprise software if you must run in a data center) in your budget. PC software will not be able to keep up with a frequently used Orbitrap. 

If you have a SEQUEST Cluster that is over 2 years old, we recommend that you upgrade to SORCERER within one year to replace the outdated hardware. And please inquire about the special time-limited upgrade offer to make this transition easier.

Tags: , ,

We were privileged to have talks by Drs. John Yates (Scripps), Roman Zubarev (Uppsala),Alexander Ivanov (Harvard), Sean Beausoleil (Harvard Med), and Aaron Klammer (U. Washington) on large-scale quantitative analysis, ETD, and phosphorylation (and other PTMs). 

These talks offer a glimpse into the upcoming capabilities of the Sorcerer 2 platform. 


“Proteomics 2.0″ Users Meeting Group Picture


Dr. John Yates, at right, is presented by David Chiang with a Nintendo Wii, which was given to all 5 speakers this year.


Dr. Nick Morrice (U. Dundee) won one of the three Wii door prizes. Other lucky winners were Drs. Lynn Spruce (Childrens Hospital of Philadelphia) and Patrick Everley (Harvard Med, now in US Army). Eight Wii systems were given away in all.

Find out why all the participants were so excited about this special meeting by listening to the talks! Stay tuned to this space for how to download the talks, as well as for a chance to attend our meeting next year. (The 2009 “Proteomics 2.0″ Users Meeting will again be by invitation only.)

The following are a few of the comments from the meeting participants.


“Sage-N is in tune with the constantly evolving needs of proteomics labs throughout the world.”
Sean Beausoleil (Harvard Medical School)

“I find the Sorcerer convenient to use and when I dig carefully into selected data, I feel confident in the large-scale results from Sorcerer.”
Larry Brill (Burnham Institute)

“It was one of the best, if not the best event of this year’s ASMS! Fantastic speakers with talks about highly relevant subjects. I enjoyed it enormously. Thank you again.”
Markus Brosch (Wellcome Trust Sanger Institute, UK)

“Great opportunity to learn from field pioneers in an intimate setting.”
Josh Elias (Harvard Medical School)

“The scope of the meeting is excellent!”
Alexander Ivanov (Harvard School of Public Health)

“Great talks overall. Great prizes! Couldn’t imagine them any better.”
Aaron Klammer (University of Washington)

“Sage-N is enabling the forefront of proteomics research by providing data appliances that are robust, cutting edge, and easy to use.”
Mark Pitman (Geneva Bioinformatics SA)

“I thought you lined up a great list of speakers on useful current topics of interest. We found that Sage-N’s Sorcerer 2 product is a key component at our facility’s informatics system. The meeting was really worth my time! I was going to hop around, but I stayed on.”
Alexander B. Schilling (University of Illinois Chicago)

“A-list speakers – well worth the time.”
Lynn Spruce (Children’s Hospital of Philadelphia)

“It is a very nice small environment for people learn about the company and products.”
Ru Wei (Pfizer)

“Sorcerer provides rapid protein identification with minimal IT support requirement. It is a highly efficient tool for proteomics studies.”
Wenhong Zhu (Burnham Institute)

Tags: , , , ,

« Older entries