SorcererScore: Science of Fragment Mass for Low-Abundance Peptides

by David.Chiang (at) SageNResearch.com

The deep proteomics revolution is here!

Deep proteomics is a hi-res “biochemical x-ray” for low-abundance proteins in cells, it will revolutionize early detection of cancer and infections, and medical research in general.

fig1

Figure 1: Scatterplot of “PeakCount vs. Average Fragment Delta-Mass” of top-200 peptide ID hypotheses with non-decoys (green) and decoys (black).

SorcererScore(tm) brings proteomics to its tipping point by making deep proteomics possible for the first time. Researchers skilled at deep data analysis will most benefit. No matter how accurate the data, deep insights come from interpreting ambiguous data beyond the reach of fully-automated workflows.

Here we illustrate the theory and practice of deep data analysis of fragment mass data. Continue reading

Posted in Interesting | Leave a comment

SorcererScore, Low-Abundance Peptides, & the Molecular Medicine Revolution

by David.Chiang (at) SageNResearch.com

20160607_LowAbund_fig1Figure 1: Logarithmic Scatterplot of Rank vs. Average RMS Fragment Delta-Mass. Low-ranked correct IDs (i.e. LAMPs) highlighted. [decoys in black; non-decoys in green]

We perceive light frequency as color. This means Newton’s prism, which split light into component colors, proves that any signal can be decomposed into component frequencies!

Isaac Newton was not a dumb guy, but it took a century for Joseph Fourier to write down his famous mathematical transform in 1822, and another century for Cooley and Tukey to patent the Fast Fourier Transform (FFT) algorithm in 1965. FFT lets system engineers like me boost signal-to-noise to drive the mobile communications revolution.

Fortunately it only took us 2 years not centuries to work out the math and IP for SorcererScore(tm) to identify low-abundance modified peptides (LAMPs), the foundation of deep proteomics. SorcererScore saves proteomics from irreproducible irrelevance and drives clinical discovery in the trillion-dollar Molecular Medicine Revolution.

People say opportunity knocks but once. They joke how some people make things happen while others wonder, “what happened??” They are talking to you, the proteomic scientist!

Here we take an analytical approach to dissect technology and market discontinuities that make deep proteomics, and medical technology at large, likely the greatest opportunity in our lifetime. They will make medical research unrecognizably efficient in 5 to 10 years.

SorcererScore will help the savvy stake their claim in the second ever Medical Gold Rush. For many reasons, a window of opportunity rarely remains open for long. Get it while it’s hot — don’t say I didn’t warn you!  Continue reading

Posted in Uncategorized | Leave a comment

How to Identify Labile Phosphorylation and PTMs with SorcererScore

by David.Chiang (at) SageNResearch.com

 

fig1

         Figure 1: 3-D data-cube of S-score’s three components, with S-score=0 plane.

A scuba diver has a sophisticated dive computer on his wrist. But if disoriented, he would blow bubbles which he follows in slow ascent. The bubbles directly tell him: (1) which way is up, and (2) the safe ascent speed to avoid the bends.

That’s the success strategy amidst disorienting complexity: back to direct fundamentals.

Proteomics is robust for easy problems like identifying semi-pure proteins under ideal conditions, but it mostly struggles with clinically valuable low-abundance modified peptides (LAMPs) and proteins due to limitations in analytics. Here we illustrate how to identify labile phosphorylation with our novel hypothesis-driven methodology called SorcererScore(tm).

To be sure, the analytics is perhaps 90% there with the cross-correlation search engine (Eng et al, 1994), target-decoy search statistics (Elias and Gygi, 2010), and Bayesian protein inference (Keller et al, 2002) already integrated within the SORCERER GEMYNI platform.

The last 10% is its Achilles heel comprising peptide search and the post-search filter. Everything hinges on correct peptide ID of individual spectra, including protein ID, quantitation, and characterization of post-translational modifications (PTMs). Otherwise, everything downstream becomes garbage-in-garbage-out. We now know many irreproducible results can be explained by “p-hacking” (more later).

In any case, the challenge is to incorporate accurate fragment mass with statistical rigor, a deceptively tricky problem. After two years of stealth development with twists and turns, we believe we’ve solved this ‘massive’ puzzle by using high school math on direct fundamentals, i.e. mass accuracy.

The patent-pending SorcererScore analytics (Chiang, 2016) is technically compatible with all high-accuracy tandem mass spectrometers regardless of technology or brand, needing only accurate mass data. This automated software capability has been delivered to customer sites.

The impact of the 3 R’s of deep proteomics — robust, rigorous, and reproducible — to revolutionize all molecular biology research cannot be overstated. SorcererScore makes it possible for the very first time. Continue reading

Posted in Application notes, Muse | Leave a comment

Breakthrough SorcererScore Identifies Low-Abundance Peptides

by David.Chiang (at) SageNResearch.com

Low-abundance analysis in proteomics is billion-dollar valuable but unsolved until now. After 18 months in stealth mode, we announce SorcererScore(tm), a breakthrough in analytics that successfully finds low-abundance, modified peptides (LAMPs).

20160130eNews_LAMPFound_fig1

Figure 1: Peptide Identification Part of Proteomics Workflow

Deep proteomics needs both high-accuracy mass data and quality interpretation. Many proteomics labs produce the former but struggle with the latter, akin to an x-ray lab that produces super-sharp images but grossly mis-identifies early tumors. Funding and attention will explode once quality results can be dependably delivered.

The issue is analytics, defined as the discovery of meaningful patterns in data using math and computing. Every big-data field invests heavily in beefy server-based analytics to data-mine deep insights, but not proteomics. This fact alone should raise red flags over popular speedy PC programs. The problem is depth perception: many are unable to see beyond the shallow in judging tools and expertise. This stalls the field but presents opportunity.

Continue reading

Posted in Interesting, Muse | Leave a comment

The Key Secret to Low-Abundance Peptides

by David.Chiang (at) SageNResearch.com

High-accuracy mass spectrometry is a revolutionary game-changer that transforms proteomics into a powerful clinical R&D tool — a biochemical “x-ray” for cells. However, common workflows seem to produce semi-random results for important low-abundance and modified peptides/proteins (LAMPs). Here we present a theory using first principles that pinpoints the cause of LAMP irreproducibility. We also outline our patent-pending SorcererScore(tm) methodology specifically for robust LAMP identification and quantitation.

With abundant high-accuracy data, researchers no longer need tricky-to-analyze statistical approaches. The best high-sensitivity workflows use simple math (to avoid modeling artifacts) done on a large scale, a precedent set by particle physics which also analyzes noisy mass/charge data but for elementary particles.

Why accurate mass triggers a genuine paradigm shift

The Scientific Method requires applying data to test hypotheses. Since mass spectrometry measures only masses, hypotheses must be mass-testable. Therefore, when a search engine generates candidate peptide identities ‘X’ for each spectrum ‘Y’, that’s best viewed as a semi-automated hypothesis-generator generating hypotheses in the form “Peptide X produced Spectrum Y”. Critically, such peptide-hits must be explicitly tested against the observed intact (precursor) mass, and rejected if the delta-mass is excessive. Or else it isn’t science.

In other words, high-accuracy mass spectrometry makes proteomics a rigorous science for the very first time. One key implication: workflows must structurally change to accommodate this.

Unlike DNA sequencing which is typically a direct biochemical “measurement”, peptide sequencing with mass spectrometry is not a direct measurement at all but an implicit “experiment” involving hypotheses auto-generated and auto-tested by the software, which is the field’s Achilles heel. Continue reading

Posted in Interesting | Leave a comment

Seeing is Believing: Use Accurate Masses Correctly for Reproducible Research

by David.Chiang (at) SageNResearch.com

Being able to “see” deeply is invaluable. Proteomics uniquely lets you see the cellular response of a drug, genetic modification (CRISPR) or pathogen by taking a molecular snapshot of proteins, the circuitry of life. However until recently, mass spectrometers lacked the mass accuracy and throughput for clinical research. Now that mass spectrometers are capable, the limitation is in bioinformatics in terms of both tools and expertise. SORCERER GEMYNI lets you see the analytical response of a filter or algorithm on your data to evaluate new protocols while avoiding costly mistakes.

 

eNews_fig1Figure 1: Precursor delta-mass distributions of superimposed target (gray) and decoy (yellow) peptide-sequence matches from a typical 5000-spectra proteomics dataset searched with the Yates-Eng XCorr score, for top match (left) and top-10 matches (right). A population of likely correct matches emerges near dMass ~ 0 amu. Search conditions: 50 ppm precursor mass tolerance and partial trypsin-digested peptides.

eNews_fig2Figure 2: Same as figure 1 but with narrow search conditions (10 ppm, complete trypsin-digested). The “reverse-filtering” effect concentrates incorrect matches on top of correct matches, complicating correct/incorrect discrimination. It is more pronounced for top 10 (right) vs. top match (left).

Err apparent

Traditional statistics dealt with scarcity of data, but the new field of “data science” deals with over-abundance of indirect data (i.e. infer biochemistry from mass/charge data) posing new challenges in noise, complexity, and self-inconsistency. Accurate estimation of error is tricky and a big part of data science, with hidden traps for the untrained.

The false-discovery error rate (FDR) is probabilistically derived, based on the premise — not necessarily valid if the algorithm, like a witness in a police lineup, is “tipped off” about decoys — that incorrect IDs have the same target/decoy ratio as the search space (typically 1:1). This makes FDR a random variable with non-zero variance, meaning its calculation is merely a statistical estimate subject to variation, and not a definitive value. Therefore, an algorithm that loops to optimize peptide IDs by minimizing the estimated FDR is susceptible to under-reporting FDR (and over-reporting IDs) unless precautions are taken.

Data science teaches to partition the dataset into separate training and test subsets (to decouple model optimization and error estimation, respectively) and to limit the dimensionality of the model if possible (often <= 4), both to minimize over-fitting. Algorithms that don’t heed these (many published ones don’t) would systemically underestimate FDR, possibly by a significant factor that undermines the integrity of the analysis. (See figure 3.)

(We discuss a methodology to sanity-check FDR at http://www.proteomics2.com/?p=1292 .)

eNews_fig3
Figure 3: Typical curves for training error vs. true error. Software that does not separate training and test data subsets reports FDR as training error which is systemically less than the true error due to over-fitting.
Figure courtesy of http://www.experian.com/blogs/marketing-forward/2013/05/29/ .

Massive potential Continue reading

Posted in Interesting | Comments Off on Seeing is Believing: Use Accurate Masses Correctly for Reproducible Research

Why False-Discovery-Rate is Meaningless for Low-Abundance Peptides

by David.Chiang (at) SageNResearch.com

About three decades ago, a paradigm shift in my field of semiconductors foretells what is likely to happen to medical research. Traditionally, research was centralized and separate from product development, but in the 1980’s research became decentralized and merged into product R&D. Though it was decried as the end of research, in reality it made innovation more efficient and better funded and became the standard model in high-tech.

In my view, the Digital Medical Revolution is starting to remake medical research the same way, shifting traditional research into commercial R&D (whether within academia or industry) as its public support trends downward. An abrupt trillion-dollar totally-predictable paradigm shift, triggered by US healthcare reform, is a once-in-a-lifetime opportunity for those savvy and skilled. New $B titans, little more than fledglings now, will emerge as the medical Apples and Googles. On the flip side, many sacred cows will become hamburgers. Like a juicy lobster shedding its too-restrictive shell to get juicier, medical research is beginning to break through its rigid outdated institutions to unleash explosive growth. Long-established research-only institutions must re-invent themselves or risk perishing (e.g. Bell Labs).

Deep characterization of low-abundance pathway proteins is the ticket to success, but data analysis remains the Achilles heel because of signal-to-noise. It’s useful to note that “data analysis” and “software” are very general terms that range from trivial (low-value) to arbitrarily complex (high-value). In every field, the analysis and hence software tools range from simple to sophisticated to fit the user’s capability. Continue reading

Posted in tips and hints, Views | Leave a comment

Deep Characterization of Low-Abundance Peptides: A Theoretical Framework

by David.Chiang (at) SageNResearch.com

Deep proteomics technology recently became robust for high-accuracy clinical R&D but data analysis remains its Achilles heel. The limitation is no longer mass accuracy but signal-to-noise, especially for valuable low-abundance peptides. The solution requires a non-trivial understanding of the technical problem.

Modern complex datasets really comprise two distinct subsets — low-noise and high-noise — requiring opposing analysis tradeoffs. Most proteomics software tools are tuned for clean spectra from abundant peptides, because they’re easier to demo and sell to mass spec novices who make up most of the market, but they actually worsen results from valuable noisy spectra. Meanwhile, workflows developed for noisy data from older instrumentation, which need updating because many necessarily over-exploited secondary parameters (especially ‘dCn’) that helped then but hurt now, lost favor because their results ‘looked’ worse for clean spectra, especially when used as-is on data from modern instruments. The net effect is, demo analyses can look near-perfect but results for critical low-abundance peptides can seem almost random, with most researchers having little idea why. The good news is, for labs with a modern mass spectrometer less than 3-5 years old, the missing piece exists as a robust deep data analysis system, as is consistent with the biology-as-information-science paradigm shift.

Here, we outline the theory and solution behind a new statistically robust, adaptive, multi-score methodology that combines the best of both approaches in a single unified workflow. Unlike tools that optimize only the overall false-discovery rate (FDR), the SorcererScore methodology optimizes individual peptide-sequence matches (PSMs) important for clinical applications. It is available as a transparent sample script in the turnkey SORCERER GEMYNI software platform, strongly recommended for all datasets above a minimum size, allowing for optimization and customization by licensed users. Continue reading

Posted in Muse | Leave a comment

Analyzing Accurate Fragment Mass Data in SORCERER and TPP

Some clients have reported poorer results quality with accurate fragment mass data (e.g. Q-Exactive and Fusion) on SORCERER using Trans-Proteomics Pipeline.  The problem appears to result from PeptideProphet’s poor compatibility with the ‘dCn’ distribution from such data.
Our recommended solution is incorporated in the SX script which:
  1. Combines classical XCorr with a peak-match score that incorporates fragment mass accuracy, and
  2. Replaces the PeptideProphet discriminant function with the SorcererScore (see blog entry: http://www.proteomics2.com/?page_id=943. This is recommended for all datasets with more than 8000 spectra analyzed by TPP.

With accurate fragment mass widely available, it seems tempting to change the Yates-Eng XCorr score (aka, Sequest) to allow it to be specified. Indeed, some versions do. We investigated this and believe it would neutralize its noise-reduction. (Note the ~1 amu bin is not a fragment mass tolerance per se, but the average m/z periodicity for peptides — i.e., roughly one proton and one electron, a physical constant used for estimating base-noise. Redefining it as a instrumentation parameter with a smaller value alters the background noise estimate.) In other words, this makes XCorr behave like a peak-match score, which defeats XCorr’s purpose. 

XCorr does well for noisy spectra, but reduces specificity for low-noise spectra. In contrast, peak-match scores (i.e. Mascot, Andromeda) do well for low-noise spectra, but poorly for noisy spectra, especially those with “additive” noise (extra noise peaks sized similarly to signal peaks).

Experienced researchers know some datasets do well with XCorr, others with peak-match scores. The SX’s approach is to combine them with weighting coefficients adaptively calculated for each dataset using a Support Vector Machine algorithm with semi-supervised extraction of training sets.

The re-calculated True and False Gaussian-like population curves replace the classic PeptideProphet discriminant by replacing XCorr and dCn with their equivalents.

Posted in tips and hints | Leave a comment

Secrets to Next-Gen Research Success

by David.Chiang@SageNResearch.com

It’s finally here!! The Digital Medical Revolution was likely triggered in 2014, the result of the perfect storm from technology readiness (e.g. deep proteomics), market discontinuity (US healthcare reform), and a surging economy. Here I discuss why you must to start to master “data science” concepts now (via free online classes) and why you need to transition from opaque instrument-specific ‘programs’ to transparent vendor-neutral ‘platforms’ as your research foundation before you miss your window of opportunity. A data-driven tsunami is coming to sweep away inefficient research built on weak foundation, clearing the way for specially trained innovators to remake medicine to be unrecognizably efficient. Technology already gave us Captain Kirk’s communicator; now we create Dr. McCoy’s medical scanner, which I believe to be a mass spectrometer with a data mining platform.

We all know about climate change, but are we all prepared for financial climate change? As the competitive atmosphere becomes heated by uncontrolled emission of data, a financial drought will continue in most areas (80/20 Rule) while a few focused areas will be flooded with funding and opportunity, including privately funded clinical proteomics. If this digital revolution unfolds like others, it will bring dream opportunities to talented individuals and hot young companies while devastating once-successful companies and institutions that don’t re-invent themselves quickly enough. (Remember Bell Labs?) Many long-established institutions are at risk if they seem stuck in the past, focusing resources on ‘things’ (buildings, big machines) rather than ‘ideas’ (people, data infrastructure) as if it’s still the industrial revolution. Continue reading

Posted in Interesting | Leave a comment

New SORCERER-V: Future of Mass Spec on Your Laptop

by David.Chiang (at) SageNResearch.com

(Special intro pricing before ASMS: See end of blog entry for details.)

People say opportunity knocks but once, and everybody is one key skill away from the next level of success. Being a mass spectrometry (MS) expert at the cusp of possibly the greatest digital revolution of all in medical research could be that knock of a lifetime. Yet the 80/20 Rule suggests only the minority who have the critical skills will advance once the field expands, very specifically of script-based deep analysis of large datasets using Linux, the modus operandi across many digital disciplines.

The new SORCERER-V (V for virtual) iDA is specifically designed to help MS labs and researchers bridge the paradigm shift gracefully, by using your own desktop or laptop[1] to “simulate” a scaled down, but otherwise fully functional SORCERER iDA on a network, with a full proteomics search engine and workflow. This “personal server” is targeted for script training, development and testing. As well, SORCERER-V is positioned to become the industry standard entry point for smaller labs with initially low-throughput needs.

SORCERER-V allows scientists to begin the transition from the current ad hoc paradigm of a haphazard collection of incompatible, difficult-to-maintain, vendor-specific PC programs, to a flexible and scalable script-based paradigm that accommodates best-of-class instrumentation from multiple vendors, and that allows custom algorithm development for general-purpose MS data analysis.



Transparent, Accelerated, and Scalable

The SORCERER-V is the solution to the single biggest problem facing the MS field — opaque bioinformatics software — which makes results from complex datasets difficult if not impossible to validate[2]. Continue reading

Posted in Views | Leave a comment

*Updated: A Technical Look at Deep Data Analysis with GEMYNI and R

by David.Chiang (at) SageNResearch.com

Biomolecule mass spectrometry (MS) will revolutionize medicine just as fingerprinting revolutionized forensics. The past in forensics evolution foretells the future of MS, which addresses the “why” of deep data analysis. More later. Here, we start with the “how”.

Mass spectrometers are such nifty instruments that it’s easy to forget they do mainly one thing — detect the mass/charge ratio (m/z) of chemical entities. MS data consist entirely of numbers — m/z, search engine scores, mass errors, etc. — huge piles of them that beg for your innovative numerical analysis and algorithms. Success in the new information-science paradigm of biology requires a real data “platform” professionally developed and supported by data professionals. In contrast, oversimplified summaries from canned “programs” are sufficient only for less challenging experiments, and offer little or no clues when problems arise.

To illustrate, let’s dive deep into a sample proteomics output[1] generated by the new SX1102 GEMYNI™ script on SORCERER™. It improves your results in the same SORCERER workflow of the last decade (it can even be applied retroactively), by replacing the XCorr score with a fully compatible composite score that incorporates secondary parameters, including another complementary search engine score, delta-mass of fragment ions, and other features.

It turns the top-hit XCorr distribution (above) into the more discriminating SorcererScore distribution (below), showing clear bimodal true and false populations, resulting in as many as 10% to 20%+ peptides across the board, depending on data quality.

Importantly, SX1102’s SorcererScore™ methodology can be easily user-extended by adding new parameter columns into the training set fed to its machine learning algorithm. Characteristic fragment ion m/z from certain post-translational modifications, for example, may be added to aid their identification. Additional search engine scores can also be included to increase specificity.

Note that I try to convey the compact power of R within the GEMYNI platform with real commands, but feel free to skim over their cryptic syntax in this lengthy post. In practice, researchers need only basic working knowledge to edit sample scripts from our tech support staff, and we are available to help.

Please contact us (info@SageNResearch.com) if you wish to obtain a SX1102 csv output to try on your dataset, which you can explore with these R commands on your Mac or PC.

Precursor Delta-Mass

We start with how easily we can plot the density of a parameter, determine its mean and standard-deviation, and transform it into a “score” that helps separate true-positives from false-positives.

Continue reading

Posted in Uncategorized | Tagged | Leave a comment

Success Strategy for the Digital Mass Spec Revolution

by David.Chiang @ SageNResearch.com

These are turbulent times for medical research, as labs and companies big and small face uncertainty and layoffs. This too shall pass, but there will be big changes ahead. More than ever, you want to think strategically with a game plan, like a chess player looking ahead seven moves, in order to survive today’s turmoil to thrive in the incredible recovery.

This three-point plan will increase your added Value[1] now and position you for the future:

  1. Propose high-Value projects for your institution
  2. Transform yourself into a “digital” scientist
  3. Build relationships

The hottest opportunity for mass spec labs today? Digitalizing decades of frozen bio-samples into molecular profiles for diagnostic data mining. (One proteomics startup raised $50M+ total last month to do just that.) Your institution needs your help to propose a plan to monetize its frozen assets. And we can help. You can also develop important “digital” research skills at the same time — a win-win all around!

The far greater opportunity is when the Digital Revolution in molecular biology takes hold, driven by exponential advances in mass spec, microscopy etc. It will dwarf the already incredible revolutions in microelectronics, software, and the Internet because its Value-added impacts more people.

At that time, in addition to having the right skills, you will need relationships to land the best opportunities. So always be generous to colleagues, collaborators, and friends, because the good ones will inevitably return the favor. That’s not only good karma but good strategy!

Scalability & Interconnectivity: Why Digital is Unstoppable

Continue reading

Posted in Views | Leave a comment

How to use SORCERER to search ETD ms/ms spectra

Electron transfer dissociation (ETD) is a promising dissociation technology for analyzing labile post-translational modifications (PTMs) such as phosphorylation. Unlike CID, ETD generates positively charged c and z* (z-radical) ions instead of b and y ions. There are two caveats in using standard SEQUEST for ETD tandem mass spectra:

  1. Standard c/z option doesn’t compute z* ions correctly.
  2. Standard SEQUEST allows only low charge states, and would not work for highly charged, long peptides.

It is important to note that z* ions are not the same as z ions, and have an extra hydrogen (1.08 Da monoisotopic mass). This means that the standard SEQUEST option of searching c/z ions will not search ETD spectra correctly, since the computed z ions will have the wrong mass. On SORCERER, correct c/z* ions can be obtained using user-defined static peptide terminus modifications on standard b/y searches, as described below. As well, SORCERER* allows very high precursor charge states (up to +255) in order to accommodate highly charged species. Here is how to search ETD spectra using SORCERER …

1. Define peptide terminus mods that shift b/y ions to c/z* ions, and use these for ETD searches.

Define the following static peptide terminus modifications using the web interface (click “Add/edit modifications…” on the Search page, then click “New/edit modifications” on top):

  • Name: “BtoC” with Mono Mass: “17.02655” and Type=”N-Terminus”
  • Name: “YtoZrad” with Mono Mass “-16.01872407″ and Type=”C-Terminus”

In both cases, Residue is left blank.

2. Define a new search profile that incorporates the above peptide terminus mods.

In the Search page under “(2) Choose a Search profile”, select the most similar existing search profile, then click “Edit this profile…”. Be sure to name it something different and memorable, then select the above 2 mods under “Terminus modifications” and “Static”. Select other applicable options.

Note that many common post-SEQUEST probability re-scoring algorithms, such as PeptideProphet or Scaffold, are not tuned for ETD scores. From first principles, we believe that the resulting probabilities may not be wrong per se, but rather be lacking in specificity.  *The Yates Lab’s version of SEQUEST has 2 code modifications for ETD. The first is the increased charge state (same as in SORCERER). The second is exclusion of the Proline cleavage, which is not implemented in the standard SORCERER search engine. However, this can be done with a MUSE post-processing step in the future if it is found to have a large effect. As always, in-warranty clients can contact our TechTeam for help on this and other advanced capabilities.

Posted in Application notes | Tagged , , , | Leave a comment

The Value of a Solid Foundation: The Key to Success at the Top

All true competitors want to sustain their best performance and they know that having the right infrastructure is instrumental in order to compete at the top.  In response to your needs and the expanding Omics fields, we’ve expanded the SORCERER™ capabilities to new levels.

We’re reaching farther and deeper to give you that solid foundation necessary to capitalize on opportunities.  There’s no doubt that competing at the top is challenging.  But, we’ve got the foundation tools to compliment your talent — just add mass specs!  See a Bulleted Summary of Our New Expanded Capabilities here

Posted in Uncategorized | Leave a comment

Sorcerer PE 4.3 released: streamlined data handling and enhanced results analysis

The latest version of the Sorcerer Proteomics Edition (Sorcerer PE) software is version 4.3, and it is now available to Sorcerer customers under the TSP support plan. The focus of this release is on more efficiency and enhanced performance in the handling of spectral data, and on more powerful results analysis both in the Sorcerer PE flow and in external postprocessing software — TPP, Scaffold and custom muse scripts. The highlights of the new capabilities include:

  • Streamlined MS2 and SQT file-handling
  • New optimizations for huge (~1M) spectra sets
  • ReAdW 4.6: support for Q Exactive
  • Interoperates with ScaffoldBatch version 4.0
  • Integrated TPP 4.6.2
  • New Petunia integration offers advanced TPP analysis
  • New and different handling of decoys
    • Sorcerer automated decoy generation on by default

    The net benefit of these enhancements is overall improved performance and productivity in Sorcerer, and a simpler route to high resolution results interpretation.

    Continue reading

    Posted in Application notes, Interesting, News | Leave a comment

    “Cell Surface Chemoproteomics for Capturing States of Human Pluripotent Stem Cells and their Cardiac Derivatives”: Dr. Rebekah Gundry speaks at 2013 User Group

    Guest Speaker, 2013 Sage-N Research Annual User Group Meeting

    Dr. Rebekah Gundry, Medical College of Wisconsin, gave the talk “Cell Surface Chemoproteomics for Capturing States of Human Pluripotent Stem Cells and their Cardiac Derivatives” at the 2013 Annual User Group meeting in Minneapolis, just before the ASMS Conference.

    The complete slide set is available by clicking here: 2013 UG-Gundry slides

     

    The audio/video portion of the talk is available here (part 1 of 3): SageNUG_Gundry_13min_part1

    The audio/video portion of the talk is available here (part 2 of 3): SageNUG_Gundry_13min_part2

    The audio/video portion of the talk is available here (part 3 of 3): SageNUG_Gundry_13min_part3  

    Posted in Interesting | Leave a comment

    “New Capabilities for SORCERER and the GEMYNI platform”: James Candlin from Sage-N Research speaks at 2013 User Group

     

    James Candlin gave the talk “New Capabilities for SORCERER and the GEMYNI platform” at the 2013 Annual User Group meeting in Minneapolis, just before the ASMS Conference.

    The complete slide set is available by clicking here: 2013 UG Mtg JC SPE4.3 Slides

    *Note:  SORCERER PE v4.3 is now officially released 

    Posted in News | Leave a comment

    “Percolator’s Confidence in Your Identifications”: Dr. Lukas Käll speaks at 2013 User Group

     

    Guest Speaker, 2013 Sage-N Research Annual User Group Meeting

     

    Dr. Lukas Käll, Royal Institute of Technology, Sweden gave the talk “Percolator’s Confidence for Your Identifications” at the 2013 Annual User Group meeting in Minneapolis, just before the ASMS Conference.

    The complete slide set is available by clicking here: 2013 UG-Kall slides 

    A portion of the video/audio is available here:  2013 UG-Kall video

     

    Posted in Interesting | Leave a comment

    “High Performance, Label-Free Quantification of Deep (Phospho) proteomes of Human Pluripotent and Multipotent Stem Cells”: By Dr. Laurence Brill, Sanford-Burnham Medical Research Institute

    Dr. Laurence Brill gave the talk “High Performance, Label-Free Quantification of Deep (Phospho) proteomes of Human Pluripotent and Multipotent Stem Cells”at the Annual Sage-N Research User Group meeting.

    The complete slide set is available by clicking here: Larry Brill_2012UG_Slides

    The audio/video portion of the talk is available here (part 1 of 3): SageNUG_Brill_L_13min_part1

    The audio/video portion of the talk is available here (part 2 of 3): SageNUG_Brill_L_13min_part2

    The audio/video portion of the talk is available here (part 3 of 3): SageNUG_Brill_L_13min_part3 

    Posted in Interesting | Leave a comment

    The Science of Breakout Success: An Analytical Approach

    by David.Chiang @ SageNResearch.com

    Recovery from a deep recession always fuels incredible prosperity and opportunity for decades. One strategy for success, proven in every major industry, is to invest in server automation to maximize Value (ie, make as valuable as possible) and reduce opportunity cost (aka time). Everyone benefits by achieving more with less, thereby increasing their own Value (and salary) and maximizing the Value of their employer. Everybody wins!

    Concerningly, the IT revolution transforming everyone else’s productivity is bypassing the mass spectrometry field, which if anything is regressing back to the computing paradigm of the 1980’s: manually running software on desktop PCs. Instead of investing money to save time, mass spec labs often waste time to save money, by not investing in automation and by focusing on price instead of quality for information tools. This has profound implications for biologists and chemists who will want to tell future employers about their unique expertise setting up automated workflows and defining algorithms for methods development, not doing PC maintenance better done by bright high school students.

    The root problem seems to be misunderstanding in three key areas:

    1. Success is about maximizing added Value, by choosing professional tools that add maximum Value to you, and that as a result allows you to add maximum Value to your lab. However, too many seem to misunderstand this, and instead focus on having the most raw features at the lowest price. True professionals, whether master chefs or trained scientists, need robust tools to perfect their craft without compromise.
    2. Software is the technology for processing information in its purest form. As such, its powerful Value multiplication potential allowed exceptional young entrepreneurs to famously create $B’s in market value seemingly out of thin air. Yet, many mistakenly equate ‘software’ with ‘programming’ (akin to trivializing ‘mathematics’ as just ‘algebra’), and see information tools not as an investment for rapidly maximizing one’s Value, but as an expense to minimize, hence missing out on its Value multiplication power.
    3. Servers, unlike desktop PCs, are defined to be industrial-strength computers designed for robust automation and storage, built with enterprise-class components designed for years of 24/7 operation, and run a robust server operating system like Linux. Many mass spec labs shy away from servers because: (a) they may not fully understand the value proposition of automation and near zero-downtime, (b) they may confuse servers with reliability-challenged Beowulf clusters (ie, “compute servers”) that provide neither automation nor storage , and (c) they may be overly concerned about system maintenance, which are largely eliminated with products like the SORCERER iDAs.

    Scientists and their labs that can leverage server automation to maximize Value will have a key competitive advantage in the post-recession recovery.

    Theory of Value: Relative and Quantifiable

    A solid understanding of Value is the basis for unbounded success, whether as a scientist, investor, family and friend, or human being. It is important to note that: (1) Value is always relative to its recipient, not an absolute attribute of the object or person, (2) Value can be quantified and hence maximized, and (3) quality server automation tools can rapidly multiply, not just add to, your and your lab’s Value.

    Continue reading

    Posted in Interesting, Views | Leave a comment

    What’s Mined is Yours: Turbocharge Your Mass Spec Research with Rapid Algorithm Development

    by David.Chiang@SageNResearch.com

    The brand-new GEMYNI(tm) platform for SORCERER iDAs, designed for mass spec data mining, enables rapid algorithm development, deployment, and automation using R, the popular open source programming language for statistics and data science.

    Bioinformaticians researching new algorithms will publish more quickly by focusing only on the R module rather than the whole application. Biopharma labs with specialized needs, commonly automation tools to characterize drug compound variants, can develop such tools in-house or with our help. Biotech startups relying on proprietary algorithms can keep the secret sauce in their own R code, yet benefit from the rest of the robust SORCERER Enterprise system. Specialty quantitation, often poorly addressed by off-the-shelf or academic software, can be built and optimized using abundant R math libraries for area-under-curve and 2D peak extraction calculations.

    As an internal test, I was able to prototype a next-generation algorithm within 3 weeks, one that utilized the Support Vector Machine statistical model, used in UW’s Percolator algorithm, to combine multiple search engines, SEQUEST 3G and the binomial score (based on Max Planck’s Andromeda), including fragment mass errors, to derive a modified XCorr score that is plug-in compatible with the current SORCERER workflow. This will significantly improve results for data from newer instruments like the Q Exactive using the current workflow, and without ad hoc search engine tweaks. (Watch this space for technical details later.)

    An important point is how quickly this algorithm could be prototyped and deployed in the production workflow, by partitioning the project between the mathematical manipulation in R and the input/output handled by standard MUSE library functions. To be sure, some tuning will be needed as the algorithm is stress-tested with different data-sets. Nevertheless, such an effort would take far longer to develop as a standalone program, and far more difficult to maintain, than as a script within the GEMYNI platform.

    The new GEMYNI platform for rapid algorithm development is the latest industry-leading innovation introduced by SORCERER iDA products. SORCERER iDAs (Integrated Data Appliances) increase overall research productivity by both reducing or eliminating low-value distractions (software maintenance, server administration) and making high-value activities simpler (deep data analysis, algorithm development, data mining). They are designed for mass spec labs with more than $1M in research capability and that can improve productivity by at least 10% with an automated, maintenance-free, server-based data analysis platform.

    The first version of GEMYNI (v4.3), a “Linux-only” version, and the Sorcerer-Score sample script will be formally introduced at ASMS in June. They will be available for all late-model SORCERER systems as part of the TSP maintenance plan.

    To find out more, please contact us at: sales@SageNResearch.com. Hope to see everyone at ASMS!

    References:
    Cox, J. et al. (2010) Andromeda: A Peptide Search Engine Integrated into the MaxQuant
    Environment. J. Proteome Res., 10, 1794-1805.
    Käll,L. et al. (2007) A semi-supervised machine learning technique for peptide
    identification from shotgun proteomics datasets. Nat. Methods, 4, 923–925. [Percolator]
    Käll,L. et al. (2008) Assigning significance to peptides identified by tandem mass
    spectrometry using decoy databases. J. Proteome Res., 7, 29–34. [q-value]

    Posted in Application notes, Muse | Leave a comment

    Upgrading the ReAdW module for extracting spectra from Thermo RAW files

    CARENotice – Comprehensive Assessment and Response Event

    As part of our ongoing Total Support Program, this CARE notice is being sent to all friends of Sage-N Research:

    Based on our extensive research, we recommend that all in-warranty (TSP) customers contact us in order to upgrade to the latest v4.6 of the ReAdW module. Enhanced features of this most current version include:

    • Correctly identifies QExactive instruments
    • Skips zero intensity peaks (e.g., smaller files)
    • Retains the proper precursor charge state information for spectra in the MzXML file

    To schedule this software upgrade, please contact us at:  Support@SageNResearch.com

    **This CARE notice is your TSP at work. TSP encompasses not only hardware and software support, it also contributes to our ongoing research and development to provide the most cutting edge products and bring you the tools you need to refine your research.

    To renew or re-instate your TSP, please contact us at: Sales@SageNResearch.com

    Posted in Application notes, News | Leave a comment

    Advisory Bulletin: Diagnostics for Preventing Data Loss

    Members of Sage-N Research’s Total Support Program (TSP) will want to read the following courtesy advisory bulletin carefully and contact our support team if deemed necessary.
    Note:  If you are currently out of TSP, or not using newer hardware servers (e.g. Fujitsu), the following message will be of the utmost importance to you as well:

    By far the most common problem we see in the field is hard drive failures. Given there are moving parts inside hard drives which spin at a very high rate of speed for years on end, it is common for these drives to fail at some point in time. This can be a catastrophic failure resulting in a total loss of your data in the event more than one drive fails.  By monitoring and taking the correct course of action when a drive fails, data loss can probably be avoided.

    SORCERER ™ systems are configured with a disk technology called RAID that allows the system to continue to function normally in the event of a single hard drive failing. It is possible that your SORCERER system may have a failed hard drive right now and you may not even be aware. It is crucial that in the event a single drive has failed, that drive must be replaced as soon as possible. If it is not replaced, another drive failing will lead to a total loss of data.

    How do you know if a drive has failed or not?  The newer Fujitsu systems come equipped with a few options that make checking for hard drive failures easy and they allow automated email notification of hardware failures:

    • A quick way to tell if a hard drive has failed is by checking the lights on the front of the system. On Primergy-based SORCERER 2 and Lab systems, the hard drive bays are located at the bottom front of the tower. For Enterprise systems, they are in the separate disk subsystem. You should see a green light on every active drive in the system. If any drive has a red light on, that indicates the drive has failed. Please contact our support team immediately if this is the case at  support@sagenresearch.com.
    • SORCERERs built on newer server platforms also feature hardware monitoring software which can send out automated email alerts when a hardware problem arises. Given that most people do not physically check the lights on their system daily, we highly recommend all customers set up the alerting software if you have not already done so.  Please contact our support team at  support@sagenresearch.com for assistance on setting up the alerts. As well, we would like to offer our TSP members complimentary monitoring of email alerts by routing them to our support address.
    •  For those experienced in using the Linux command line: If you run the command “PrimeCollect” as the root user, the Fujitsu system can generate a diagnostics report for your system. You may upload your file at:  http://dropbox.yousendit.com/SageN and we will get back to you once we have analyzed the results.

    We hope that this advisory bulletin will give our TSP members the opportunity for peace of mind that their SORCERER system is running smoothly, their data is protected and that the Sage-N Research support team is at your finger tips.

    We will continue to strive to offer our customers the very best in available advanced hardware features for performance, reliability and expansion by using enterprise-grade (vs. consumer-grade) components that are designed for years of continuous 24/7 peak operation.

    **If you are not currently covered under our TSP maintenance plan, or if your SORCERER hardware is something other than the newer server (e.g. Fujitsu), please contact info@sagenresearch.com to discuss options for rejoining TSP and/or upgrading your hardware.

    Posted in Application notes, Interesting, Q&A, tips and hints | Tagged , , , , | Leave a comment

    Sorcerer Proteomics Edition 4.2 software released

    The newest release of Sorcerer Proteomics Edition (Sorcerer PE) software is now available in beta to supported Sorcerer customers. It introduces several new enhancements:

    • New native file formats based on MS2 and SQT for greater data handling efficiency
    • The obsolete DTA and OUT formats have been removed from the internal flows of Sorcerer but are still available for import and export to legacy applications
    • Improved system performance and efficiency throughout.
    • Support for the multiple biosample feature of Scaffold — spectra files can be pre-grouped in the search to become separate biosamples in the Scaffold file
    • Built-in processing for Raw files from Thermo LTQ Orbi Velos and Q Exactive mass spectrometers
    • Now bundles the most recent TPP 4.5.2 software
    • Support for Scaffold V3.4

    Release 4.2 is the latest in the V4 series of Sorcerer Proteomics Edition (Sorcerer PE) software, and is immediately available for beta testing, which means that all the new features have now been implemented and tested internally, but that the software has not yet received full testing in real-world conditions. If you would like to try out the new features, then please contact support@sagenresearch.com to request the new beta software. If you are currently using version 3.5 or earlier releases, you will also need to enter new license keys.

    Sorcerer PE V4’s NEW FILE formats OFFER greater PERFORMANCE

    This release completes the transition to new file formats that was begun with v4.1 (which still used the old formats behind the scenes), and now all of Sorcerer PE’s internal use of the legacy Sequest DTA and OUT file formats has been replaced by the more modern MS2 and SQT formats for representing MS2 spectra and peptide matches respectively.  In these days in which tandem mass spectrometers can generate tens of thousands of spectra every hour, it is very inefficient to represent each data item in a separate file — there is a substantial overhead in opening and closing each file, and transfers in a network environment such as Sorcerer uses are typically slow. It also wastes a lot of disk space. So using MS2 and SQT natively throughout the Sorcerer search engine greatly improves the overall performance of the system.

    However, although they work well internally to the system, we don’t recommend these formats for an end-user to work with directly —  the formats are neither standardized, amenable to upstream and downstream processing tools, nor easily generalized to other search algorithms. Rather, for input to and output from Sorcerer, we’ve standardized on mzXML for spectra and pepXML for peptide matches as interchange formats that are more general and with extensive community support. PepXML is now generated by default, even if you do not select TPP postprocessing overall. Of course, Sorcerer supports  other formats, too, such as Thermo’s Raw files, but these will be converted to pass through the standard formats — mzXML in the case of Raw files.

    One more word about DTA and OUT file legacy support: these files are no longer directly supported by the Sorcerer PE search engine, but you can still import DTAs, and we will have a script to generate OUT files from pepXML, if your downstream processing requires them. Please note that there is one spot in the TPP suite that expects .out files, and that is the “spectrum” hyperlink in the Peptide Viewer, which actually brings up a view of the out file, if any. Most of the scores, masses etc. for the spectrum match that are presented in that view can be added as columns directly to the peptide report. But if you do want to view these OUT files and you don’t mind the extra overhead, then consider running the OUT file compatibility script as a post-processing step. Please consult support@sagenresearch.com for further assistance with the compatibility script.

    Multiple Biosample support for Scaffold

    One common request from our clients who are keen Scaffold users is for enhanced support in the Sorcerer-Scaffold integration that can take advantage of Scaffold’s ability to group data into different biosamples, corresponding to different columns in the Scaffold view. We’re happy to announce a new feature in the Sorcerer PE software that speaks to this. The way it works is very simple, and requires only a minor difference to the way you have always searched data on Sorcerer.

    Previously, if you selected multiple items for searching in the Web GUI, they would all be searched together and would wind up being a single biosample in the Scaffold file. Now, any separately selected item — either a single spectra file, or a folder of several files — will become its own biosample. Typically, the way this is used is to pre-group raw files in subfolders of the search data folder, and each of those subfolders will become a separate biosample, so long as they are each individually selected from within the search data folder. If, however, you select the search data folder itself at the top level, then all its contents will become a single biosample.

    Of course, the existing method of working with Scaffold Desktop to add new biosamples based on merging with another Scaffold file is still available, so you can choose whichever method is more suitable for your needs.

    Do be aware though, that searching more data in one run will add to the load of the Scaffold analysis. The system resources that Scaffold needs, particularly in terms of memory, is a function primarily of the number of files, and the number of spectra represented by those files. We recommend that any Sorcerer that is used for intensive Scaffold analysis should be upgraded to a minimum of 24GB of system RAM, and that users should discuss their Scaffold analysis usage and possible upgrades to their system with Sage-N support in order to ensure the best performance.

    New method for extracting Thermo RAW files in Sorcerer

    When Thermo introduced XCalibur 2.1 and 2.2 supporting the Orbi Velos and Q-Exactive instruments, incompatibilities in their libraries meant that the method of extracting spectra from Raw files that Sorcerer then used suddenly stopped working. In response to this, Sage-N Research developed a solution based on a new software method, but that was Windows-specific, and not well suited to other platforms such as Linux. Nevertheless, at the cost of some complexity, particularly in terms of installation, we made it work on Sorcerer, and once again had an integrated flow with Sorcerer PE for XCalibur 2.1 and above.

    Now we have implemented an alternative approach, based on a method developed by Dr. Patrick Pedrioli at the University of Dundee, that allows that Sorcerer’s built-in extraction software to be used successfull with the latest XCalibur libraries. It is a lot easier to deploy on Sorcerer than the Windows-based solution , and just requires a few tweaks that Sage-N customer support can easily guide you through or do remotely. This method is now the default flow for the Sorcerer PE 4.2 release.

    The Windows/msconvert method remains available for qualified customers who have the requirement to use its different feature set.

    New versions of TPP and ScaffolD Software

    The version of the bundled Trans-Proteomics Pipeline (TPP) software has been updated to the most recent 4.5.2 software, which provides several new enhancements and bug fixes. Also, the most recent version of Scaffold, V3.4, is now supported. Licensed users may obtain this software at the Proteome Software download site.

    Other Sorcerer PE V4 enhancements

    The new release rolls up other enhancements from earlier V4 releases including:

    • The SORCERER scoring module with new features to improve the sensitivity and thoroughness of peptide searches.
    • A new Web API for submitting and getting results from Sorcerer searches over the network has been implemented to help developers use Sorcerer as a search engine within their programs and scripts.
    • A component design for the Sorcerer-as-a-platform architecture, co-existing with other life science analysis software
    • Enhancements to the MUSE scripting framework to allow more powerful scripts to customize Sorcerer searching.

    Please review an earlier posting for further details of these and other enhancements in Sorcerer PE V4.

     

    Posted in Interesting, News, Q&A | Tagged , , , , | Leave a comment

    What’s New for Sorcerer Proteomics Edition V4.1

    Release 4.1 is an update to V4.0, which was only released as beta software to a limited number of users, so this release will be the first general release in the Sorcerer PE version 4 series.   The release is currently entering a beta-testing period, following which (probably in late summer), it will be made available to Sorcerer customers with active support arrangements, as well as installed on newly purchased Sorcerer systems.

    This release contains enhancements in many different areas of the Sorcerer software:

    • The SORCERER scoring module has new features to improve the sensitivity and thoroughness of peptide searches.
    • The data flows for Sorcerer processing have been rearchitected to use MS2 and SQT data formats instead of the legacy SEQUEST DTA and OUT file formats.
    • As a solution for the issue of extracting from recent RAW files, an interface has been developed within the Sorcerer software to connect to a separate Windows system and to remotely run ProteoWizard’s new MSConvert extractor with instrument -specific libraries
    • The bundled version of Trans-Proteomic Pipeline software is updated to V4.4.1, which offers multiple enhancements.
    • The new Sorcerer software now supports Scaffold V3.1.2, with new features in TIC quantitation and batch file merging
    • The Scaffold flow has also been reworked on the Sorcerer side, enabling users to identify multiple biosamples for Scaffold in a single search.
    • A new Web API for submitting and getting results from Sorcerer searches over the network has been implemented to help developers use Sorcerer as a search engine within their programs and scripts.
    • This software release has been designed as a component for the Sorcerer-as-a-platform architecture, co-existing with other life science analysis software
    • Enhancements to the MUSE scripting framework to allow more powerful scripts to customize Sorcerer searching.

    Continue reading

    Posted in Interesting, Muse, News | Tagged , , , | Leave a comment

    Marketing Partnership with Nonlinear Dynamics

    Many of you are asking the question, what do we do about quantitation, and how can Sage-N Research help in this area?

    Our goal at Sage-N Research is to provide you with a complete proteomics platform.

    Quantitation is an important part of the overall workflow, and can be used for both differential protein expression and protein characterisation applications.

    After looking at different software in the market, we found that the Nonlinear Dynamics’ Progenesis LC-MS is the best solution available.

    Progenesis LC-MS is a data analysis program that helps you to find and quantify the proteins showing interesting behaviour in your label-free samples. It can be used for both differential protein expression and protein characterisation applications.

    Progenesis LC-MS quantifies peptides and proteins independently of identification, thus ensuring users capture all of the interesting protein data in their experiments. The software is platform independent and will integrate with a wide range of instrumentation.

    For more information on Progenesis LC‑MS and its approach to data analysis, you may find the following links useful

    If you already own Progenesis LC‑MS, but want to know about the features in the latest release, please see the FAQ, “What’s new in the latest version?

    Overall we think the approach used by Nonlinear Dynamics can greatly assist you in your workflow.

    Posted in Uncategorized | Leave a comment

    Marketing Partnership With GenoLogics

    We have recently created a Marketing Partnership with  GenoLogics.  We have found that many of our clients were asking for ” front-end” solutions. And have found that Genologics is one of the best people in this area.   GenoLogics solutions offer the flexibility of a custom in-house built LIMS, with all the benefits of a commercial LIMS solution. Their Solutions are built on a scalable informatics platform that adapts to constant change, and can expand to multiple labs. GenoLogics is the proven industry leader in LIMS and data management for next generation sequencing.  For more information on our solutions for proteomics, contact us directly or visit www.genologics.com.

    Sage-N Research Marketing

    Posted in News | Tagged , , | Leave a comment

    Changing nature of software in Proteomics (or why you can’t buy great software for SILAC)

    by David.Chiang@SageNResearch.com

    Proteomics technology is now a robust discovery tool, at least in capable hands with the right tools, for characterizing post-translational modifications such as phosphorylation, right alongside gene expression and cellular imaging for tumor and stem cell research.

    However, the complexity, scale, and criticality of the data from a modern mass spectrometer such as an Orbitrap Velos are well beyond the capability of desktop PCs and require specialized infrastructure IT solutions.

    When losing data becomes catastrophic rather than merely annoying, it is time to move beyond PCs into robust infrastructure solutions, such Sage-N Research’s SORCERER Enterprise system. Unlike traditional business-oriented IT systems, the SORCERER Enterprise system is optimized for the large multi-gigabyte data files of proteomics research.

    Robust servers and storage systems provide the needed capacity, reliability, and throughput for storing and analyzing proteomics mass spec data that inexpensive PCs cannot provide. For example, a typical throughput of 300GB of raw data per week for a single mass spec will fill up a PC in less than a month. As well, the lower grade disk drives used in cost-sensitive, consumer-oriented PCs and external USB drives can lead to costly data loss and system downtime.

    In addition, the nature of the data analysis needed for proteomics is changing, as it becomes more akin to hedge fund data mining than an administrative assistant running an Excel spreadsheet. This is especially true for quantitation and ETD data analyses where the field has not settled onto a de facto one-size-fits-all methodology, and where some semi-customization of the analysis to query and adapt to a particular data-set will be necessary. This is why the large-scale SILAC papers are always done by research groups with their own bioinformatics resource, and why just about any off-the-shelf software you can download or buy will probably not work well for your needs without some customization.

    Why does quantitation or ETD software need to be semi-customized? Continue reading

    Posted in Views | Tagged , , , | Leave a comment

    The Truth About Probability Scores

    “All models are wrong, but some are useful.” George E. P. Box

    The Truth About Probability Scores

    What is Probability Scoring?
    Probability Scoring is a popular method of ranking possible peptide sequences that best fits an observed tandem mass spectrum. This can be computed as the primary score in a search engine (e.g. Mascot), or as a second stage re-scoring of, say, the top 10 results from another search engine.

    Why is it important?

    Of all the different types and styles of similarity scores used in proteomics search engines, Probability Scoring is considered a conceptually easy and simple score to understand. Other scores, notably SEQUEST’s cross-correlation score (XCorr) based on vectors and linear algebra, can be more mathematically rigorous, but require more technical background to understand its calculation.

    What does it derive from?
    The Probability Scoring functions used by both the Matthias Mann Lab from Max Planck and Steven Gygi Lab from Harvard use the coin-flip model with a biased coin (i.e. the binomial distribution).

    For example, if a peptide sequence is predicted to yield N=18 fragment ions, and of those exactly K=6 observed peaks match these, and assume the success probability is modeled as p=0.05 (we will get to that later), then the “random-chance probability” of that happening (i.e. the p-value) is computed as the probability of getting exactly 6 Heads out of 18 Tosses using a Biased Coin, where each coin is modeled to have a 5% chance of yielding Heads.
    Continue reading

    Posted in tips and hints | Tagged , , | Leave a comment

    What Probability Scoring algorithm does the SORCERER offer?

    Starting with v4.0 software, the Sage-N Research SORCERER platform will provide the 2-stage scoring (i.e. different from 2-stage searching) architecture that generally mimics the current Gygi Lab workflow, which does a first-stage SEQUEST (i.e. SEQUEST 3G starting with v4.0) followed by our open-source MUSE scripting version of the Gygi Lab’s “Bino 5-score”. (This is analogous to the Mann Lab workflow, which generally uses a Mascot first-stage followed by a “6-score” re-score stage, according to private communication with Matthias.)

    Users can also modify these re-score modules to incorporate their own scoring functions, such as to accommodate water and/or ammonia losses, incorporate special cleavage rules, or otherwise tune coefficients and parameters.

    Anything to keep in mind about Probability Scoring?
    Some researchers mistakenly believe that a “probability” is some kind of absolute “word of God” but they are very much a creation of man. Indeed, in science, a probability of an event has more to do with you — or rather your lack of all the relevant information — than the event itself, and is best considered a “degree of confidence” measure based on incomplete information. After all, mass spectrometers do not measure peptide sequence per se, but only a collection of mass/charge ratios from which you infer sequence information.

    Probability Scores are simply tools that based on underlying (sometimes hidden) models, which as George Box observed are always “wrong” because they necessarily involve simplifications and assumptions. Probability Scoring, by its nature, tends to have increased specificity but reduced sensitivity. Their Achilles heel is the filtering step – how does one decide which peaks are “real” and which are “noise”, particularly for noisy spectra common for phospho and low abundance peptides? With only a handful of matching peaks to determine the Score, their accurate selection becomes critical.

    Therefore, they are best used as a second re-scoring stage of results from a search engine like SEQUEST 3G designed to find specific patterns with significant noise. In addition, it is important to note that p-values are NOT true probabilities, since there is no requirement for such values of competing hypotheses to sum to 1. (See this Proteomics 2.0 blog entry for further discussion: http://proteomics2.com/?p=65 )

    Continue reading

    Posted in Views | Leave a comment

    New workflow for XCalibur 2.1 RAW files (Velos) is released to beta

    We have developed an new flow for processing Thermo RAW files that works both with the most recent XCalibur V2.1, as well as with earlier versions. This flow has been giving good results in internal testing, and we are now releasing it for beta testing to any interested, actively supported Sorcerer customer.

    Thermo LTQ Velos users will have noticed the major changes to the XCalibur software that were introduced at version 2.1. The installation process is different, and requires a new component called Thermo Foundation, and some of the file names and locations have changed. All of these changes are no longer compatible with the ReAdW program that is used within the CrossOver environment by Sorcerer. One workaround which has been commonly suggested in the Thermo field is to down-rev the XCalibur used on the instrument to V2.0 and to continue using the old software for analysis. This remains a viable option, but with our newly developed solution, it is now also possible to use 2.1 RAW files on Sorcerer.

    We are moving to a new spectrum extractor called msconvert (part of the ProteoWizard suite)  which works with a different version of the Thermo libraries, and for which we have developed a new integration in the CrossOver environment. We are offering this as a beta release to our in-warranty customers. This solution  entails a few Linux operations to reinstall CrossOver with the latest release, to configure the required libraries and to install a new Sorcerer workflow script; it is fairly straightforward for people comfortable with the Linux environment, or alternatively, we can do it for you if you give us remote access to your system. Please contact us at support@sagenresearch.com for more information.

    Posted in Application notes, Interesting, News, tips and hints | Tagged | Leave a comment

    Video: “Peptide ID with Target-Decoy Searching” by Prof. Josh Elias (Stanford)


    Prof. Josh Elias (left) of Stanford University receives a thank-you gift from David Chiang after his talk.

    Ever wondered about target-decoy searching? Want to gain a better understanding and realistic expectation of this effective tool? SageNResearch’s video “Addressing Peptide Identification Signal-to-noise With Target-Decoy Searching”, given by Professor Josh Elias of Stanford University at our “Translational Proteomics 2.0″ meeting, can help. Dr. Elias is an Assistant Professor in Chemical and Systems Biology at Stanford University, and was part of the Steven Gygi Lab at Harvard Medical School before that. His lab is keenly interested in developing and applying methods to meet the current challenges facing scientists engaged in large scale proteome characterization.

    Josh kicked off his talk with a stunning and very powerful visual to hit home the concept of what target-decoy database searching can do — you’ll never look at coffee beans in quite the same way. With this talk, you’ll know how to better find a happy medium for thresholds, smarter ways of designing your filtering criteria, when not to even consider using the method, how to get the most out of (really easy) decoy searching in SORCERER, and what’s so good about partial tryptic searches.

    The 30-minute presentation is available at: http://www.scivee.tv/node/15544
    To view slides, I recommend using the “full screen” mode. The slide set can also be downloaded as a Powerpoint file.

    Posted in Download, Interesting, tips and hints, Webcasts | Tagged , , , , , | Comments Off on Video: “Peptide ID with Target-Decoy Searching” by Prof. Josh Elias (Stanford)

    Video: “Peptide ID and Protein Inference..” by Prof. Alexey Nesvizhskii (U. Michigan)


    Prof. Alexey Nesvizhskii (left) of University of Michigan receives a thank-you gift from David Chiang after his talk.

    If you really want to understand how peptide and protein identification is done, this video talk is a must-see!

    Professor Alexey Nesvizhskii of the University of Michigan is one of the co-inventors (with Dr. Andy Keller) of the popular PeptideProphet/ProteinProphet algorithm for turning search engine results into statistically consistent peptide and protein identifications. (This algorithm is also the basis for the popular Scaffold software.)

    At the “Translational Proteomics 2.0” meeting, we were privileged to have Alexey give his insightful talk that reviews the various steps involved in inferring peptide and protein identifications from large spectra datasets.

    In this talk, you will learn why False Discovery Rates are preferred over P-values, why you probably should not run more than 4 replicates of a MudPIT experiment, how FDR estimations from decoy differ from Peptide/ProteinProphet, how “The Two Prophets” compute probabilities by curve-fitting the score distributions, how sensitivity and FDR are computed, and the what and why of some advanced TPP options.

    The talk is available at: http://www.scivee.tv/node/12671 (45 minutes).

    I recommend using the “full screen” mode so you can view the slides, which are also available as a download from the site. (Please be aware that the slideset order is different from that in the presentation.)

    (Note: Both Trans-Proteomic Pipeline and Scaffold Batch software are integrated into the SORCERER platforms.)

    Posted in Download, Interesting, News, tips and hints, Webcasts | Tagged , , , , , | Leave a comment

    Secret Insights to Translational Proteomics Success

    by David.Chiang@SageNResearch.com

    Proteomics mass spectrometry is finally sensitive and specific enough for robust translational medicine (at least in capable hands), and holds tremendous promise to revolutionize biology and medicine. For some, it holds the key to incredible research power for decades to come.

    However, there is a chasm that continues to grow between the productive and unproductive labs, because too many proteomics practitioners focus too early on low-level issues (i.e. cost, automation, ease-of-use) without first resolving high-level ones (i.e. sensitivity in presence of noise, quality of results, algorithmic suitability).

    For many researchers experimenting with a new high-resolution instrument, the most common scenario is to select a workflow based on running a simple protein solution, usually a purified BSA solution or a commercial protein mixture.

    Since different workflows will give basically identical protein IDs results for these simple test cases, they may conclude that all search engines are equivalent. While true when there is almost no signal noise, it is largely irrelevant in translational research. In fact, the exact same test will likely show that low-resolution and high-resolution mass specs are equivalent, the lowest quality reagents will suffice, or maybe you don’t have to clean your glassware as often. These are also true when there is little or no signal noise, but again, that is irrelevant for real-world research.

    Seeing that there is little difference in protein IDs, some focus on using protein coverage as the sole metric for evaluating search engines. However, this is actually the opposite of what is needed for sensitive discovery proteomics. For example, if you are hunting for new protein biomarkers (especially a “one-hit wonder”), you do not want the protein inference engine tuned to assigning any ambiguous peptides to already found proteins, thereby hiding them from further study.

    Not surprisingly, a workflow selected based on low-noise experiments and focused on protein coverage will excel for simple mixtures, but is not sensitive enough to analyze complex mixtures with wide dynamic range, such as in translational research. Scientists will be able to see the abundant peptides and proteins, but probably little else. That is roughly what most proteomics researchers find today, nothing meaningful, but enough of the obvious to not change their methodologies.

    The result is that most labs are not getting the value commensurate with their investments in proteomics mass spectrometry. Under the current economic environment, this is both wasteful and dangerous.

    Within the academic world, while many proteomics researchers have trouble getting any interest, a select few are swamped and have to turn away collaborators. Within drug discovery firms, while many are staring at their mostly idle mass spectrometers, a select few are running multiple mass spectrometers 24/7 sieving productively through millions of peptides.

    So why are the majority of the proteomics research not producing high-value results?

    With our access into the world’s top academic and drug discovery proteomics labs, we have a unique bird’s eye view into the answer. (However, like attorneys, we never give out client-specific information.)

    Please allow me to share some secrets to your future success.

    Continue reading

    Posted in Interesting | Tagged , , , , , , | Leave a comment

    Translational Proteomics Meeting: Secrets from the Masters


    “Translational Proteomics 2.0” 2009 Users Meeting in Philadelphia.
    Guest speakers Jimmy Eng (UWashington), Alexey Nesvizhskii (UMichigan), Josh Elias (Stanford), along with SAB member John Yates (Scripps) are in the middle row.


    Stanford’s Dr. Chris Adams (left) must be feeling pretty lucky!
    He gets to use a SORCERER 2 for his research (as part of Allis Chien’s mass spec core facility), AND wins an Acer One netbook door prize from David Chiang!

    Translational proteomics — aka Proteomics 2.0 — is high-sensitivity proteomics for translational research, whose mastery is your key to unimaginable fame and fortune in biology and medicine!

    Whether you need to catch up or to keep up, you need to hear the leading proteomics technologists reveal their secrets!

    We were fortunate to have three of most accomplished technologists (Mr. Jimmy Eng, Prof Josh Elias, and Prof Alexey Nesvizhskii) at our “Translational Proteomics 2.0 Meeting” give their insider insights on high-sensitivity data analysis.

    In addition, we were privileged to have Sage-N Research SAB advisor Prof John Yates, one of the fathers of proteomics, attend our meeting and join in our lively panel discussions regarding the present and future of translational proteomics.

    From the talks, these are tips for best sensitivity and specificity:

    * There are several equivalent ways to calculate precursor mass, all of which can result in several AMUs of mass error due to incorrect isotope assignment.
    * Semi-tryptic settings for database searching gives the best performance
    * Use a wider mass tolerance than your experiments will yield
    * However, you don’t need a wide mass tolerance for searching if (a) you use isotope shift check and (b) you have a decent source of noisy peptide, e.g. with semi-enzyme search
    * Post-process peptide IDs with proper statistical tools (e.g. PeptideProphet, DTASelect or target-decoy analysis)
    * Key is to monitor the false discovery rates (FDR) with different filtering criteria
    * Use monoisotopic mass for fragment ions, and for precursor ions if using high-resolution instrument
    * P-values or E-values are not good for large-scale proteomics, because they don’t give you estimated data rates for a given score cut-off, and they ignore other relevant factors (e.g. retention time, mass accuracy, etc.)
    * The target-decoy method is a simple and effective means of FDR estimation. It gives scores more discriminatory power by improving signal-to-noise ratio.
    * Can use search scores in combination with other characteristics to get more good IDs at a particular FDR than by using score alone

    We will be publishing the meeting talks online. Watch this space for details!

    Posted in Interesting, News | Tagged , , , , , , , | Leave a comment

    Success Profile: Dr. Khatereh Motamedchaboki from Burnham

    Hear Khatereh discuss her work and her success with the SORCERER 2 system!

    Dr. Khatereh Motamedchaboki is currently the Manager of the Proteomics Facility at the Burnham Institute for Medical Research.

    She is one of our increasing number of two-time SORCERER success stories, as a previous user at the Ebrahim Zandi Lab at the University of Southern California.

    Reference: Laurence M. Brill, Khatereh Motamedchabokia, Shuangding Wu, and Dieter A. Wolf, “Comprehensive proteomic analysis of Schizosaccharomyces pombe by two-dimensional HPLC-tandem mass spectrometry”, Methods (2009), doi:10.1016/j.ymeth.2009.02.023.

    Click Here to See Video

    Posted in Interesting, News | Tagged , , , | Leave a comment

    Announcing Sorcerer PE v4.0 with Enhanced ETD and Quantitation

    Our R&D team is busy working on the next major version of the Sorcerer-PE software, and expects to release it to then-in-warranty customers in the next few weeks.  Early previews and beta tests of some of the components will be made available by arrangement to qualified customer sites.

    Highlights of the upcoming release include:

    • ETD fragmentation support and analysis
    • MUSE scripting modules for rescoring peptide matches with Olsen-Mann and Sadygov-Coon scores
    • Interoperation with major components of the Yates lab Sequest suite, including the DTASelect filtering and statistical analysis tool, and the Census quantitation application
    • Enhancements to the SEQUEST engine which provide first-pass cross-correlation scoring and E-values for greater accuracy and sensitivity

    Continue reading

    Posted in Interesting, Muse, News | Tagged , , , , | Leave a comment

    New Target-Decoy capabilities with DTASelect and Muse

    We’ve developed a new Muse workflow for target-decoy analysis and false discovery rate estimation, based on our integration of DTASelect from the Yates lab. DTASelect can now use target-decoy FASTA files that are installed on Sorcerer to support its statistical analysis. It provides an easy-to-interpret results report complete with match statistics and estimated false discovery rates.

    Our DTASelect on Sorcerer page on this blog has been updated to describe the target-decoy workflow, in addition to the existing material on installing, configuring and running DTASelect and the Muse script. Please visit it to get links to the latest scripts and for a detailed How-To.

    Posted in Application notes, Muse, tips and hints | Tagged , , , , | Leave a comment

    Experts agree: use semi-enzymatic search for best sensitivity and specificity

    Three of the world’s leading experts on MS-MS protein identification came together recently at Sage-N Research’s annual user group meeting, and presented methods and results for the techniques and tools with which they are associated:

    • Jimmy Eng, co-inventor of Sequest and developer of many proteomics tools, presented tips for Sequest analysis
    • Josh Elias, who pioneered the systematic use of decoy databases for FDR estimation, gave a talk on how to use that technique to address Peptide ID signal-to-noise.
    • Alexey Nesvizhskii spoke about the tools he co-authored, in “Peptide identification and protein inference using PeptideProphet and ProteinProphet”

    Their talks were very wide-ranging and full of practical insights for the proteomics user community, and they explored the different research interests, data sets, analysis methods and workflows in the individual labs.  However, they all had this in common: they had kept a careful eye on their search settings, monitored sensitivity and error rates, and come to a common, if perhaps not entirely intuitive, conclusion: the most sensitive search and the lowest error rates for shotgun proteomics are achieved when using semi-enzymatic searches — that is, when one end, but not both, of the peptide is allowed to diverge from the expected cleavage site.

    Continue reading

    Posted in Application notes, Interesting, News, tips and hints | Tagged , , , , , , , , , , | Leave a comment

    Video: “SEQUEST and TPP Tips” by Jimmy Eng (U. Washington)


    Jimmy Eng (left) of University of Washington receives a thank-you gift from David Chiang after his talk.

    During our Translational Proteomics 2.0 Meeting, we were privileged to have Jimmy Eng (University of Washington) give us his uncommon insights into using SEQUEST with the Trans-Proteomic Pipeline (TPP).

    This talk will be invaluable for advanced users of the SEQUEST search engine for sensitive translational proteomics analysis. All active SEQUEST users should listen to this talk!

    Researchers will benefit by increasing their sensitivity and decreasing their false discovery rates when identifying proteins and post-translational modifications using proteomics mass spectrometers like the Orbitrap.

    Jimmy is one of the most prolific proteomics developers over almost two decades, as the co-inventor (with John Yates) of proteomic search engines and SEQUEST, as well as the developer of a number of TPP tools.

    Conclusions from slides:
    – Semi-tryptic searches are better
    – Use monoisotopic masses for fragment ions
    (Use monoisotopic masses for precursor ions if data from a high-res instrument)
    – Narrow mass tolerance searches better if search considers precursor mass isotope assignment error

    The talk is available at:  http://www.scivee.tv/node/11920 (31 minutes).

    I recommend using the “full screen” mode so you can view the slides, which are also available as a download from the site.

    Posted in Application notes, tips and hints | Tagged , , , | Leave a comment

    DTASelect is now supported on Sorcerer

    Many of our customers have found DTASelect to be a very useful postprocessing tool for Sequest results, and have reported success using it with Sorcerer output. Up until now, however, these customers have generally run the tool manually on a separate desktop computer. Now we have developed a Muse script to make it easy to do this automatically, on Sorcerer itself.

    See our DTASelect on Sorcerer page on this blog for a detailed How-to on installing, configuring and running DTASelect and the Muse script.

    Posted in Application notes, Muse, tips and hints | Leave a comment

    New Ascore scripts available for Sorcerer PE V3.5

    If you are interested in using Ascore as described in the application note on the blog, please contact us for new Muse scripts for your Sorcerer. We’ve just updated them, and they are needed to work with the recent v4.0 release of TPP, which is what’s in the current Sorcerer release.

    Posted in Muse, tips and hints | Tagged , , | Leave a comment

    How to update Java on Sorcerer

    Here’s a how-to for technically advanced users who need to update the Java platform on Sorcerer. It’s not required for the base Sorcerer software, including ScaffoldBatch, but it may be necessary for Phenyx installation. Please consult our technical support staff before deciding to do the update.

    These instructions assume that you have a recent 64-bit Sorcerer operating platform (either RHEL 5.2 or Centos 5-based), and that your Sorcerer software is at V3.5.

    Here are the steps:

    1. Get the latest Java Development Kit (JDK)  (currently v6 update 18) from http://java.sun.com/javase/downloads/index.jsp. Click on the ‘Download JDK’ button. Get the Linux x64 platform, and download the non-rpm file which has a name like jdk-6u18-linux-x64.bin
    2. Log in as root in a terminal window and type: cd /opt
    3. Copy the file you downloaded to /opt, and unpack it:  /bin/sh jdk-6u18-linux-x64.bin
    4. Note the name of the pathname to java in the unpacked directory for use in the next step, e.g. /opt/jdk1.6.0_18/bin/java
    5. Type:  /usr/sbin/alternatives install /usr/bin/java java /opt/jdk1.6.0_18/bin/java 2
      • This sets up a system of links from /usr/bin/java to the new installation
    6. Type: /usr/sbin/alternatives –config java
      • Enter ‘2’ at the prompt to select the newly installed alternative
    7. Check you have the latest java by typing:  java -version

    (Optional) Update Firefox Java plugin:

    1. Create a plugins directory in the Firefox installation directory if the plugins directory does not exist. Please check your version of Firefox to determine the correct path to use: mkdir /usr/lib64/firefox-3.x.x/plugins
    2. Create a symbolic link to the new Java plugin. Again please check your Firefox and JRE version for the correct paths: ln -s /opt/jdk1.6.0_18/jre/lib/amd64/libnpjp2.so /usr/lib64/firefox-3.0.5/plugins/
    Posted in Application notes, Muse, tips and hints | Leave a comment

    Opportunity of a lifetime coming: lessons from the 1980’s

    by David.Chiang@SageNResearch.com

    First off, I may need to apologize to those who take offense at the equivalent of someone trying to lift spirits at a funeral, as I am not trying to make light of the seriousness of today’s challenging economic circumstances.

    However, I subscribe to the philosophy of author Anthony Robbins and others that there is always a positive to any negative, and that a proper mindset is key to move yourself forward, no matter what life throws at you. If life gives you lemons, it’s an opportunity to build a lemonade business empire.

    Today, it is more important than ever to focus one’s mind on a positive path forward, because quite honestly, there are signs that the post-recession recovery could well be the opportunity of a lifetime for many of you!

    It may seem perverse to have such a view given the prevalence of all the bad news, but history is on my side.

    In fact, for those of you relatively early in your career, with at least 10 to 20 good working years ahead of you, I believe the career gods may well be smiling on you, as you have the best chances of catching the wave of the upcoming Biotech Revolution 2.0 — the one centered around proteins rather than DNA or cDNA.

    Let me explain why this is so, and what you must know to win big in the next decades.

    Continue reading

    Posted in Views | Tagged , , | Leave a comment

    Success Profile: Laurence Brill (Burnham) on YouTube describes advanced proteomics setup

    Hear Dr. Laurence Brill, senior research scientist at the Burnham Institute (La Jolla, CA) describe his advanced proteomics setup with the SORCERER 2 system:

    Click here to here Dr. Laurence Brill

    Reference: Laurence M. Brill, Khatereh Motamedchabokia, Shuangding Wu, and Dieter A. Wolf, “Comprehensive proteomic analysis of Schizosaccharomyces pombe by two-dimensional HPLC-tandem mass spectrometry”, Methods (2009), doi:10.1016/j.ymeth.2009.02.023.

    Click here for another Success Profile

    Posted in tips and hints | Tagged , , , | Leave a comment

    Why Digital Biology is more than high-throughput biology

    Many people I’ve talked to in the science and investment community equate Digital Biology with High-Throughput Biology. While related, they are not the same thing.

    High-throughput is about speed, but digital (in the sense of Moore’s Law type geometric scaling) is about acceleration, or geometrically increasing throughput. The distinction is important for predicting the eventual successes of different technologies, and maybe your career if it is closely tied to a particular technology.

    Moore’s Law exemplifies geometric scaling in semiconductors, and generally predicts that the number of transistors on a single chip doubles approximately every 18 months. If you work out the math, it’s about 1000x after 15 years. When that happens, a field is revolutionized once it reaches tipping point.

    Indeed, when I started working in Silicon Valley as a freshly minted MIT engineer in the mid 1980’s, I was astonished that my friend’s company had a whopping 3 gigabytes of disks on their central computers (that’s for the ENTIRE mid-size company). Today, those 3GB fit on a USB pen drive, and you can get portable drives with 1 terabytes. In another 15 years, we may have portable 1 petabyte drives, and computers 1000x more powerful. It’s mind-boggling.

    In contrast, high-throughput technologies include flow cytometry and robotics. While they are “fast” today, it is doubtful that they will become 1000x faster in 15 years. There are 96-well, 384-well, and 1536-well plates, but it is doubtful these will continue to geometrically scale to having million-well plates anytime soon.

    Proteomics mass spectrometry, which often relies on ever faster and more sensitive electronics and sensors, is geometrically scalable for some time, and as such holds the possibility for a revolution. With continued scaling, it is possible to imagine when geometrically more proteins can be characterized from a single organelle.

    Such will be the power of the Digital Biology Revolution.

    Posted in Uncategorized | Tagged , | Leave a comment

    Using Sorcerer with APEX for label-free quantitation

    APEX (‘Absolute Proteomics Expression’) is a technique developed by Lu et al. for label-free quantitation of proteins based on MS-MS spectral counting of peptides. Unlike basic methods of this sort which suffer from variable detection probabilities that depend on the physiochemical properties of the peptides, APEX includes correction factors that predict the detection rates of the peptides for a better protein quantitation result.

    There is an open source APEX Quantitative Proteomics Tool that implements this technique and that can use Sequest-based protein IDs as analyzed by the Trans-Proteomic Pipeline. Sorcerer users had the idea of using the tool in conjunction with Sorcerer, and now we have developed a workflow and MUSE script to help other users use this combination.

    For more information, please read the application note ‘Sorcerer Workflow for the APEX Quantitative Proteomics Tool’.

    Posted in Application notes, Muse, News | Leave a comment

    How to make the most of economic stimulus funding

    President Obama unveiled a stimulus package in February that includes about $10B funding for NIH over two years. He specifically called for cancer research, which will get about $1.26B. In March, he will lift federal funding restrictions for stem cells. Other countries may follow suit. Since cancer and stem cell research make up more than 2/3 of advanced proteomics research today, this is very good news for proteomics!

    Stimulus grants are likely one-time grants, so it should be viewed as a start-up grant for building up or completing your advanced research capability. In particular, focus on tools that increase automation and reduce manual intervention, including any tech support and maintenance that can improve your research productivity over the next 3 years.

    For advanced “Proteomics 2.0” analyses capable of large-scale analysis of important PTMs (phosphorylation and ubiquination), you would need (1) one or two high-throughput mass-accurate mass spectrometers, (2) a high-throughput software workflow capable of sensitive PTM analysis, (3) a robust compute server and storage system, and (4) several years of software and hardware warranty and maintenance.

    For example, this is a proven basic setup suitable for advanced phospho-proteomics for cancer and stem cell research:

    1) LTQ-Orbitrap mass spec
    2) SORCERER 2 integrated data appliance
    3) SORCERER ISIS-10 storage system with 10 terabytes
    4) Recommended optional software on PC: Proteome Discoverer, Scaffold
    5) Recommended optional software on SORCERER: Mascot, Phenyx

    The SORCERER 2 IDA system includes a high-throughput SEQUEST and two integrated high-throughput workflows (Scaffold Batch and Trans-Proteomic Pipeline), as well as specific tools for phosphorylation and an integrated scripting environment for workflow customization. The SORCERER ISIS storage subsystem works directly with the SORCERER IDA to provide sufficient secure storage for several years of typical throughput. They both share the same warranty for 3 years, simplifying the IT maintenance.

    A more advanced system, particularly for statistically robust biomarker discovery, including isotope labelled quantitation (iTRAQ), is the following:

    1) LTQ-Orbitrap with HPLC
    2) MALDI-TOF/TOF mass spec (e.g. ABI 4800)
    3) SORCERER 2 integrated data appliance
    4) SORCERER ISIS-20 storage system with 20 terabytes
    5) Recommended optional software on PC: Proteome Discoverer, Scaffold
    6) Recommended optional software on SORCERER: Mascot, Phenyx

    In most cases, Sage-N Research can assemble the entire data analysis system with the noted tools pre-installed and pre-configured as a plug-and-play system with unified warranty.

    Not surprisingly, what we recommend coincides with our own product portfolio. Undoubtedly, some would think that we are simply recommending what we sell.

    Actually, quite the opposite is true — we sell what we would recommend. As the only prominent search engine provider that doesn’t promote our own proprietary search algorithm (we resell SEQUEST, Phenyx, and X!Tandem inside SORCERER, and would pre-install Mascot on request), we are free to pick and choose the best-of-class software and hardware components to deliver the most robust, sensitive workflow systems.

    Advanced proteomics holds great promise, especially for cancer and stem cell research. Unfortunately, six years after Scott Patterson proclaimed in Nature Biotechnology that data analysis is proteomics’ Achilles heel, it is still true. The fact is, there is no way you can realize the full potential of an Orbitrap for advanced proteomics using just a PC.

    Please contact me personally to see if we can help. The best way is to send me an email at: david@SageNResearch.com. Together, we can realize the full potential of proteomics to benefit everyone.

    Posted in Views | Tagged , , , , | Leave a comment

    Introducing ISIS storage system with 4.1 to 100+ Terabytes

    We are pleased to announce the availability of the ISIS (Integrated Storage and Information System), which is configured and integrated to work directly with the SORCERER Enterprise bladecenter system to provide 4 to 100+ terabytes of integrated, protected storage for proteomics, genomics, imaging, and other repository needs. A second backup ISIS system can be configured offsite to provide additional backup and disaster recovery needs. To simplify maintenance and warranty for our clients, it will be covered under the same warranty plan as the SORCERER system for 3 years or 5 years.

    The base ISIS system will provide approximately 4.1 terabytes of secure storage in a “2U” height, rack-mount system, consisting of twelve 450 GB SAS disks with 2 disk redundancy in RAID6.

    In most countries, the ISIS system consists of the following:

    – ISIS storage integration software interface running on SORCERER platform
    – Fujitsu ETERNUS DX80 with single controller
    – Approximately 4.1TB usable (12 x 1TB SATA disks in Raid6) per 2U rack, with up to 20 racks
    – Min 3 year warranty is included (subject to the TSP coverage of the SORCERER)

    Note that future expansion to 100+ TB will require additional ISIS expansion units or higher density SAS drives.

    New clients can order the SORCERER Enterprise blade system with the ISIS system together as two rack-mount units. Clients with newer SORCERER 2 integrated data appliances with at least 8 CPU cores can simply add the ISIS to their existing system. (Older SORCERER systems will require a hardware upgrade.)

    Please contact sales@SageNResearch.com for more information.

    Posted in News | Tagged , , , | Leave a comment