Tag Archives: arXiv

An Amplitudes Flurry

Now that we’re finally done with flurries of snow here in Canada, in the last week arXiv has been hit with a flurry of amplitudes papers.


We’re also seeing a flurry of construction, but that’s less welcome.

Andrea Guerrieri, Yu-tin Huang, Zhizhong Li, and Congkao Wen have a paper on what are known as soft theorems. Most famously studied by Weinberg, soft theorems are proofs about what happens when a particle in an amplitude becomes “soft”, or when its momentum becomes very small. Recently, these theorems have gained renewed interest, as new amplitudes techniques have allowed researchers to go beyond Weinberg’s initial results (to “sub-leading” order) in a variety of theories.

Guerrieri, Huang, Li, and Wen’s contribution to the topic looks like it clarifies things quite a bit. Previously, most of the papers I’d seen about this had been isolated examples. This paper ties the various cases together in a very clean way, and does important work in making some older observations more rigorous.


Vittorio Del Duca, Claude Duhr, Robin Marzucca, and Bram Verbeek wrote about transcendental weight in something known as the multi-Regge limit. I’ve talked about transcendental weight before: loosely, it’s counting the power of pi that shows up in formulas. The multi-Regge limit concerns amplitudes with very high energies, in which we have a much better understanding of how the amplitudes should behave. I’ve used this limit before, to calculate amplitudes in N=4 super Yang-Mills.

One slogan I love to repeat is that N=4 super Yang-Mills isn’t just a toy model, it’s the most transcendental part of QCD. I’m usually fairly vague about this, because it’s not always true: while often a calculation in N=4 super Yang-Mills will give the part of the same calculation in QCD with the highest power of pi, this isn’t always the case, and it’s hard to propose a systematic principle for when it should happen. Del Duca, Duhr, Marzucca, and Verbeek’s work is a big step in that direction. While some descriptions of the multi-Regge limit obey this property, others don’t, and in looking at the ones that don’t the authors gain a better understanding of what sorts of theories only have a “maximally transcendental part”. What they find is that even when such theories aren’t restricted to N=4 super Yang-Mills, they have shared properties, like supersymmetry and conformal symmetry. Somehow these properties are tied to the transcendentality of functions in the amplitude, in a way that’s still not fully understood.


My colleagues at Perimeter released two papers over the last week: one, by Freddy Cachazo and Alfredo Guevara, uses amplitudes techniques to look at classical gravity, while the other, by Sebastian Mizera and Guojun Zhang, looks at one of the “pieces” inside string theory amplitudes.

I worked with Freddy and Alfredo on an early version of their result, back at the PSI Winter School. While I was off lazing about in Santa Barbara, they were hard at work trying to understand how the quantum-looking “loops” one can use to make predictions for potential energy in classical gravity are secretly classical. What they ended up finding was a trick to figure out whether a given amplitude was going to have a classical part or be purely quantum. So far, the trick works for amplitudes with one loop, and a few special cases at higher loops. It’s still not clear if it works for the general case, and there’s a lot of work still to do to understand what it means, but it definitely seems like an idea with potential. (Pun mostly not intended.)

I’ve talked before about “Z theory”, the weird thing you get when you isolate the “stringy” part of string theory amplitudes. What Sebastian and Guojun have carved out isn’t quite the same piece, but it’s related. I’m still not sure of the significance of cutting string amplitudes up in this way, I’ll have to read the paper more thoroughly (or chat with the authors) to find out.

A Tale of Two Archives

When it comes to articles about theoretical physics, I have a pet peeve, one made all the more annoying by the fact that it appears even in pieces that are otherwise well written. It involves the following disclaimer:

“This article has not been peer-reviewed.”

Here’s the thing: if you’re dealing with experiments, peer review is very important. Plenty of experiments have subtle problems with their methods, enough that it’s important to have a group of experts who can check them. In experimental fields, you really shouldn’t trust things that haven’t been through a journal yet: there’s just a lot that can go wrong.

In theoretical physics, though, peer review is important for different reasons. Most papers are mathematically rigorous enough that they’re not going to be wrong per se, and most of the ways they could be wrong won’t be caught by peer review. While peer review sometimes does catch mistakes, much more often it’s about assessing the significance of a result. Peer review determines whether a result gets into a prestigious journal or a less prestigious one, which in turn matters for job and grant applications.

As such, it doesn’t really make sense for a journalist to point out that a theoretical physics paper hasn’t been peer reviewed yet. If you think it’s important enough to write an article about, then you’ve already decided it’s significant: peer review wasn’t going to tell you anything else.

We physicists post our papers to arXiv, a free-to-access paper repository, before submitting them to journals. While arXiv does have some moderation, it’s not much: pretty much anyone in the field can post whatever they want.

This leaves a lot of people confused. In that sort of system, how do we know which papers to trust?

Let’s compare to another archive: Archive of Our Own, or AO3 for short.

Unlike arXiv, AO3 hosts not physics, but fanfiction. However, like arXiv it’s quite lightly moderated and free to access. On arXiv you want papers you can trust, on AO3 you want stories you enjoy. In each case, if anyone can post, how do you find them?

The first step is filtering. AO3 and arXiv both have systems of tags and subject headings. The headings on arXiv are simpler and more heavily moderated than those on AO3, but they both serve the purpose of letting people filter out the subjects, whether scientific or fictional, that they find interesting. If you’re interested in astrophysics, try astro-ph on arXiv. If you want Harry Potter fanfiction, try the “Harry Potter – J.K. Rowling” tag on AO3.

Beyond that, it helps to pay attention to authors. When an author has written something you like, it’s worth it not only to keep up with other things they write, but to see which other authors they like and pay attention to them as well. That’s true whether the author is Juan Maldacena or your favorite source of Twilight fanfic.

Even if you follow all of this, you can’t trust every paper you find on arXiv. You also won’t enjoy everything you dig up on AO3. Either way, publication (in journals or books) won’t solve your problem: both are an additional filter, but not an infallible one. Judgement is still necessary.

This is all to say that “this article has not been peer-reviewed” can be a useful warning, but often isn’t. In theoretical physics, knowing who wrote an article and what it’s about will often tell you much more than whether or not it’s been peer-reviewed yet.

arXiv vs. snarXiv: Can You Tell the Difference?

Have you ever played arXiv vs snarXiv?

arXiv is a preprint repository: it’s where we physicists put our papers before they’re published to journals.

snarXiv is…well..sound it out.

A creation of David Simmons-Duffin, snarXiv randomly generates titles and abstracts out of trendy arXiv buzzwords. It’s designed so that the papers on it look almost plausible…until you take a closer look, anyway.

Hence the game, arXiv vs snarXiv. Given just the titles of two papers, can you figure out which one is real, and which is fake?

I played arXiv vs snarXiv for a bit today, waiting for some code to run. Out of twenty questions, I only got two wrong.

Sometimes, it was fairly clear which paper was fake because snarXiv overreached. By trying to pile on too many buzzwords, it ended up with a title that repeated itself, or didn’t quite work grammatically.

Other times, I had to use some actual physics knowledge. Usually, this meant noticing when a title tied together unrelated areas in an implausible way. When a title claims to tie obscure mathematical concepts from string theory to a concrete problem in astronomy, it’s pretty clearly snarXiv talking.

The toughest questions, including the ones I got wrong, were when snarXiv went for something subtle. For short enough titles, the telltale signs of snarXiv were suppressed. There just weren’t enough buzzwords for a mistake to show up. I’m not sure there’s a way to distinguish titles like that, even for people in the relevant sub-field.

How well do you do at arXiv vs snarXiv? Any tips?

arXiv, Our Printing Press


Johannes Gutenberg, inventor of the printing press, and possibly the only photogenic thing on the Mainz campus

I’ve had a few occasions to dig into older papers recently, and I’ve noticed a trend: old papers are hard to read!

Ok, that might not be surprising. The older a paper is, the greater the chance it will use obsolete notation, or assume a context that has long passed by. Older papers have different assumptions about what matters, or what rigor requires, and their readers cared about different things. All this is to be expected: a slow, gradual approach to a modern style and understanding.

I’ve been noticing, though, that this slow, gradual approach doesn’t always hold. Specifically, it seems to speed up quite dramatically at one point: the introduction of arXiv, the website where we store all our papers.

Part of this could just be a coincidence. As it happens, the founding papers in my subfield, those that started Amplitudes with a capital “A”, were right around the time that arXiv first got going. It could be that all I’m noticing is the difference between Amplitudes and “pre-Amplitudes”, with the Amplitudes subfield sharing notation more than they did before they had a shared identity.

But I suspect that something else is going on. With arXiv, we don’t just share papers (that was done, piecemeal, before arXiv). We also share LaTeX.

LaTeX is a document formatting language, like a programming language for papers. It’s used pretty much universally in physics and math, and increasingly in other fields. As it turns out, when we post a paper to arXiv, we don’t just send a pdf: we include the raw LaTeX code as well.

Before arXiv, if you wanted to include an equation from another paper, you’d format it yourself. You’d probably do it a little differently from the other paper, in accord with your own conventions, and just to make it easier on yourself. Over time, more and more differences would crop up, making older papers harder and harder to read.

With arXiv, you can still do all that. But you can also just copy.

Since arXiv makes the LaTeX code behind a paper public, it’s easy to lift the occasional equation. Even if you’re not lifting it directly, you can see how they coded it. Even if you don’t plan on copying, the default gets flipped around: instead of having to try to make your equation like the one in the previous paper and accidentally getting it wrong, every difference is intentional.

This reminds me, in a small-scale way, of the effect of the printing press on anatomy books.

Before the printing press, books on anatomy tended to be full of descriptions, but not illustrations. Illustrations weren’t reliable: there was no guarantee the monk who copied them would do so correctly, so nobody bothered. This made it hard to tell when an anatomist (fine it was always Galen) was wrong: he could just be using an odd description. It was only after the printing press that books could actually have illustrations that were reliable across copies of a book. Suddenly, it was possible to point out that a fellow anatomist had left something out: it would be missing from the illustration!

In a similar way, arXiv seems to have led to increasingly standard notation. We still aren’t totally consistent…but we do seem a lot more consistent than older papers, and I think arXiv is the reason why.

Who Plagiarizes an Acknowledgements Section?

I’ve got plagiarists on the brain.

Maybe it was running into this interesting discussion about a plagiarized application for the National Science Foundation’s prestigious Graduate Research Fellowship Program. Maybe it’s due to the talk Paul Ginsparg, founder of arXiv, gave this week about, among other things, detecting plagiarism.

Using arXiv’s repository of every paper someone in physics thought was worth posting, Ginsparg has been using statistical techniques to sift out cases of plagiarism. Probably the funniest cases involved people copying a chunk of their thesis acknowledgements section, as excerpted here. Compare:

“I cannot describe how indebted I am to my wonderful girlfriend, Amanda, whose love and encouragement will always motivate me to achieve all that I can. I could not have written this thesis without her support; in particular, my peculiar working hours and erratic behaviour towards the end could not have been easy to deal with!”

“I cannot describe how indebted I am to my wonderful wife, Renata, whose love and encouragement will always motivate me to achieve all that I can. I could not have written this thesis without her support; in particular, my peculiar working hours and erratic behaviour towards the end could not have been easy to deal with!”

Why would someone do this? Copying the scientific part of a thesis makes sense, in a twisted way: science is hard! But why would someone copy the fluff at the end, the easy part that’s supposed to be a genuine take on your emotions?

The thing is, the acknowledgements section of a thesis isn’t exactly genuine. It’s very formal: a required section of the thesis, with tacit expectations about what’s appropriate to include and what isn’t. It’s also the sort of thing you only write once in your life: while published papers also have acknowledgements sections, they’re typically much shorter, and have different conventions.

If you ever were forced to write thank-you notes as a kid, you know where I’m going with this.

It’s not that you don’t feel grateful, you do! But when you feel grateful, you express it by saying “thank you” and moving on. Writing a note about it isn’t very intuitive, it’s not a way you’re used to expressing gratitude, so the whole experience feels like you’re just following a template.

Literally in some cases.

That sort of situation: where it doesn’t matter how strongly you feel something, only whether you express it in the right way, is a breeding ground for plagiarism. Aunt Mildred isn’t going to care what you write in your thank-you note, and Amanda/Renata isn’t going to be moved by your acknowledgements section. It’s so easy to decide, in that kind of situation, that it’s better to just grab whatever appropriate text you can than to teach yourself a new style of writing.

In general, plagiarism happens because there’s a disconnect between incentives and what they’re meant to be for. In a world where very few beginning graduate students actually have a solid research plan, the NSF’s fellowship application feels like a demand for creative lying, not an honest way to judge scientific potential. In countries eager for highly-cited faculty but low on preexisting experts able to judge scientific merit, tenure becomes easier to get by faking a series of papers than by doing the actual work.

If we want to get rid of plagiarism, we need to make sure our incentives match our intent. We need a system in which people succeed when they do real work, get fellowships when they honestly have talent, and where we care about whether someone was grateful, not how they express it. If we can’t do that, then there will always be people trying to sneak through the cracks.

Amplitudes on Paperscape

Paperscape is a very cool tool developed by Damien George and Rob Knegjens. It analyzes papers from arXiv, the paper repository where almost all physics and math papers live these days. By putting papers that cite each other closer together and pushing papers that don’t cite each other further apart, Paperscape creates a map of all the papers on arXiv, arranged into “continents” based on the links between them. Papers with more citations are shown larger, newer papers are shown brighter, and subject categories are indicated by color-coding.

Here’s a zoomed-out view:


Already you can see several distinct continents, corresponding to different arXiv categories like high energy theory and astrophysics.

If you want to find amplitudes on this map, just zoom in between the purple continent (high energy theory, much of which is string theory) and the green one (high energy lattice, nuclear experiment, high energy experiment, and high energy phenomenology, broadly speaking these are all particle physics).


When you zoom in, Paperscape shows words that commonly appear in a given region of papers. Zoomed in this far, you can see amplitudes!

Amplitudeologists like me live on an island between particle physics and string theory. We’re connected on both sides by bridges of citations and shared terms, linking us to people who study quarks and gluons on one side to people who study strings and geometry on the other. Think of us like Manhattan, an island between two shores, densely networked in to the surroundings.


Zoom in further, and you can see common keywords for individual papers. Exploring around here shows not only what is getting talked about, but what sort of subjects as well. You can see by the color-coding that many papers in amplitudes are published as hep-th, or high energy theory, but there’s a fair number of papers from hep-ph (phenomenology) and from nuclear physics as well.

There’s a lot of interesting things you can do with Paperscape. You can search for individuals, or look at individual papers, seeing who they cite and who cite them. Try it out!

What’s up with arXiv?

First of all, I wanted to take a moment to say that this is the one-year anniversary of this blog. I’ve been posting every week, (almost always) on Friday, since I first was motivated to start blogging back in November 2012. It’s been a fun ride, through ups and downs, Ars Technica and Amplituhedra, and I hope it’s been fun for you, the reader, as well!

I’ve been giving links to arXiv since my very first post, but I haven’t gone into detail about what arXiv is. Since arXiv is a rather unique phenomenon, it could use a more full description.


The word arXiv is pronounced much like the normal word archive, just think of the capital X like a Greek letter Chi.

Much as the name would suggest, arXiv is an archive, specifically a preprint archive. A pre-print is in a sense a paper before it becomes a paper; more accurately, it is a scientific paper that has not yet been published in a journal. In the past, such preprints would be kept by individual universities, or passed between interested individuals. Now arXiv, for an increasing range of fields (first physics and mathematics, now also computer science, quantitative biology, quantitative finance, and statistics) puts all of the preprints in one easily accessible, free to access place.

Different fields have different conventions when it comes to using arXiv. As a theoretical physicist, I can only really speak to how we use the system.

When theoretical physicists write a paper, it is often not immediately clear which journal we should submit it to. Different journals have different standards, and a paper that will gather more interest can be published in a more prestigious journal. In order to gauge how much interest a paper will raise, most theoretical physicists will put their papers up on arXiv as preprints first, letting them sit there for a few months to drum up attention and get feedback before formally submitting the paper to a journal.

The arXiv isn’t just for preprints, though. Once a paper is published in a journal, a copy of the paper remains on arXiv. Often, the copy on arXiv will be updated when the paper is updated, changed to the journal’s preferred format and labeled with the correct journal reference. So arXiv, ultimately, contains almost all of the papers published in theoretical physics in the last decade or two, all free to read.

But it’s not just papers! The digital format of arXiv makes it much easier to post other files alongside a paper, so that many people upload not just their results, but the computer code they used to generate them, or their raw data in long files. You can also post papers too long or unwieldy to publish in a journal, making arXiv an excellent dropping-off point for information in whatever format you think is best.

We stand at the edge of a new age of freely accessible science. As more and more disciplines start to use arXiv and similar services, we’ll have more flexibility to get more information to more people, while still keeping the advantage of peer review for publication in actual journals. It’s going to be very interesting to see where things go from here.