논문의 홍수에서 살아남기

HOW TO TAME THE FLOOD OF LITERATURE

BY ELIZABETH GIBNEY







Bergman에게 쏟아지는 컨텐츠는 '그럭저럭 감당할 만' 했지만, "점차 더 부담스러워진다."라고 말했다.

Casey Bergman’s daily research routine used to include checking all his e-mails and web alerts to pick out fresh papers in his field. But he grew dissatisfied with table-of-contents alerts from journals, RSS (Rich Site Summary) feeds and automated e-mails from the PubMed database. The flow of content was manageable, but if he left it for more than a day, “it became a burden”, he says.


그래서 작년에 그는 초파리라는 단어가 포함된 논문을 찾아서 그 팔로워에게 보내주는 FlyPapers라는 이름의 트위터 계정(봇)을 만들었다. 이후 다른 분야에서도 55개정도의 유사한 봇들이 생겼다. 

So last year Bergman, a computational geneticist studying fruit flies (Drosophila) at the University of Manchester, UK, turned to a fresh approach: an automated Twitter account (or ‘twitterbot’) that he named FlyPapers. The bot trawls PubMed and the arXiv preprint server to find papers containing the word Drosophila, and spits them out into its followers’ feeds. Bergman finds it much easier to catch up with FlyPapers popping up in his Twitter feed — and his idea has spawned around 55 twitterbots in other disciplines.


매일 6000개의 논문이 쏟아지는 상황에서조차 중요한 논문을 놓치는 것은 "사망선고(mortifying)"와 다를 바 없다. Sally Burn은 Scizzle이라는 서비스를 활용해서 PubMed연구 중 저장한 내용을 그녀에게 보낸다. 

It is no surprise that academics are coming up with their own ways to keep on top of the flood of literature. “It’s a common struggle,” says Bergman. A staggering 6,000 papers are published every day — and although no one wants to be overloaded with recommendations, missing key papers is “mortifying”, says Sally Burn, a developmental geneticist at Columbia University in New York City. She uses a service called Scizzle, which regularly sends her the results of saved PubMed searches. “Unless you have all day, and ten people working for you trawling the literature, I think it’s the best situation you’re going to get,” she says.


그러나 키워드를 중심으로 논문을 찾아 보는 것은 기술적으로 가능한 것의 극히 일부분일 뿐이다. 최근 등장한 논문추천엔진은 쏟아지는 논문들 중 필요한 것만 걸러줄 뿐만 아니라 사용자의 관심사를 바탕으로 다른 내용을 추천해주기도 한다. "기본적으로 넷플릭스나 아마존이 하는 것과 같다고 할 수 있죠" Matthew Davis는 말한다. 

But a stream of papers based on keywords only scratches the surface of what is technologically possible. Emerging literature-recommendation engines promise not only to filter the flood of papers to a trickle, but also to learn from their users’ interests to add personalized suggestions (see ‘A guide to reading’). “In spirit, it’s similar to what Netflix or Amazon do,” says Matthew Davis, a computational biologist at the University of Texas at Austin who wrote the algorithm for one such service, PubChase — now owned by ZappyLab, a firm in Berkeley, California, that makes web- and phone-based tools for scientists.



이걸 좋아하면, 요것도 좋아할걸If you like that, you’ll like this

초창기 서비스면서 아직까지도 최고의 자리를 지치고 있는 것으로 구글스칼라가 있다. 연구자의 논문이나 인용을 바탕으로 통계적인 모델을 적용해서 논문을 추천해준다. 그러나 대학원생은 이렇게 추천을 받을만큼 논문을 쓰지 않았기에 어려움이 있다고 말한다.

One of the first, and still best-known, services comes from Google Scholar. Its Updates tool suggests articles by applying a statistical model to a record of a researcher’s authored papers and citations. “The recommendations are almost scarily good,” says Roger Schonfeld, programme director at Ithaka S+R, a non-profit consultancy based in New York City that advises academia on digital technology. But graduate students may not have a sufficient body of work for the site to help, notes Patrick Mineault, a computational neuroscientist at the University of California, Los Angeles.


PubChase는 사용자의 출판기록을 토대로 PubMed에서 논문을 추천해준다. 그러나 이 역시 사용자가 읽거나 저장한 내용을 기반으로 한다. 여기에 기계-학습 기술을 더했는데 다른 사람의 저장소와 비교해주는 것이다. 

PubChase suggests articles from PubMed on the basis of a user’s publishing record, but it also learns from the articles that the user has read and stored in his or her online library. And it adds another machine-learning technique: comparing this library with other people’s collections, with the logic that people with common research interests might benefit from each others’ preferences. “I’ve been really impressed: nearly every article it has recommended has been relevant to my research,” says Kelsey Wood, a geneticist at the University of California, Davis, who uses the service along with reference-manager tool Mendeley, owned by Amsterdam-based publisher Elsevier.


Ross Mounce는 Pubchase는 PubMed의 바깥에 있는 연구자들에게는 그다지 유용하지 않다고 말한다. 그는 Sparrho를 선호하는데, 여기서는 키워드피드를 바탕으로 추천을 해주며 사용자에게 관련도를 평가하게 하여 추천목록을 검토한다. 또한 논문, 연구비, 특허, 포스터 등등도 포함한다. PubChase와 마찬가지로 비슷한 사용자들사이의 관계를 바탕으로 추천을 해준다.

Ross Mounce, an evolutionary biologist at the University of Bath, UK, says that PubChase is not useful for those whose interests fall outside the boundaries of PubMed. He prefers Sparrho, a fledgling London-based venture that generates recommendations with a keyword-based feed, and asks users to train the tool by flagging suggestions as relevant or irrelevant. It includes articles, grants, patents, posters and conference proceedings from all the sciences. “The breadth is a real strength,” says Mounce. As with PubChase, recommendations are based on connections between similar users. “We’re allowing intelligent curators, humans, to join the scattered dots,” says chief executive Vivian Chan, who co-founded Sparrho after she struggled to keep up with the literature while studying for a biochemistry PhD at the University of Cambridge, UK.


PubChase와 Sparrho는 투자를 유치하고자 하는 스타트업들이기에 많은 사용자가 쓰는 것이 중요하다. 이 서비스의 사용자수는 적긴 하나, 둘 다 빠르게 성장하는 중이다.

As start-ups seeking investment, PubChase and Sparrho are guarded about how many users they have. It is clear that numbers are small. (A Nature survey of more than 3,000 scientists found that only 8% had heard of PubChase, and fewer than 1% visited it regularly; see Nature 512, 126–129 (2014).) But both say that their user base is growing quickly.



기본으로 돌아가기 Back to basics

Bergman은 알고리즘 기반 검색에 대해 우려하는 입장이다. 추천 목록을 스스로 학습하고 관리해주는 기계야말로 "당신의 지적 시력을 떨어뜨릴 것입니다"라고 한다. 또한 그는 그가 수행한 '학제간연구'적 특성이 구글스칼라를 오히려 혼동스럽게 만든다고 지적한다. 그러나 Davis는 이렇게 분야를 좁혀주는 것과 달리 비슷한 관심사를 가진 다른 사람들의 특징을 기반으로 새로운 문을 열어주는 경우도 있다고 말한다.

Bergman is wary of algorithm-based searches. A machine that learns and tailors recommendations can become like “blinders on your intellectual scope”, he says. And he has found that the interdisciplinary nature of his work, which melds genomics and text-mining, confused Google Scholar — the tool threw up irrelevant papers and missed important ones. But Davis says that this narrowing is counteracted by the new doors opened by recommendations based on the profiles of people with similar interests.


많은 연구자들은 알고리즘ㅇ르 회피하고, 자신의 사회적 네트워크에 속한 다른 동료들을 따라 읽기도 한다. "트위터는 조용한 영웅이죠" 라고 Cassie Ettinger는 말한다. 비슷한 것으로 1000 Prime과 Mendeley가 있다.

Many researchers eschew algorithms altogether, and simply follow colleagues on social networks to find out what is worth reading. “Twitter is the unsung hero of the paper-recommendation world,” says Cassie Ettinger, a geneticist in the same research group as Wood. Other scientists check which papers rise to the top in online communities or among users of reference-management services such as Faculty of 1000 Prime and Mendeley.


그러나 추천목록을 공유하고 새로운 논문을 찾기 위해서 라이브러리를 업로드하는 것이 매우 일반적인 것은 아니다. RSS를 선호하는 Derek Lowe는 그녀는 추천엔진을 훈련시킬 시간이 없다고 말한다. Mineault는 자동화 검색엔진은 과학자가 찾고자 하는 논문은 절대 찾아주지 못할 것이라고 하면서, 대신 그는 이들 검색엔진이 향후에 발전할 가능성은 높다고 생각한다고 말했다. 

But the desire to share recommendations or upload libraries to find new papers is hardly universal. Derek Lowe, a chemist at Vertex Pharmaceuticals in Boston, Massachusetts, who writes the blog In the Pipeline, remains a fan of RSS feeds from journal websites. And Burn says that she does not have the time to train a recommendation engine. Mineault acknowledges that automated learning devices will never find all the papers a scientist wants, but he thinks that they will improve. Techniques for gleaning meaning from content will become more sophisticated, he says, and will eventually have a significant role in guiding scientists’ reading choices.


Bergman에게 이런 것들은 대체로 취향의 문제다. 그의 트위터봇은 온라인 초파리 커뮤니티를 만들기에 이르렀고, 이 봇이 추천한 것은 심지어 과학자가 아닌 다른 분야 연구자들에 의해서도 리트윗된다. Bergman은 다른 기술들을 안 쓰려는 생각은 없지만 아직까지는 FlyPapers만을 사용한다. "아직은 다른 것이 필요한지를 잘 모르겠어요. 저한테는 이정도면 충분하고 그게 중요한거죠."

For Bergman, a lot of this is a matter of taste. His twitterbot has convened an online fruit-fly community; its suggestions have been retweeted by researchers in other disciplines, and even by non-scientists. Bergman has not ruled out trying further technologies, but he is sticking to FlyPapers for now. “I haven’t felt the need to try any others. It’s working for me, and that’s all that matters,” he says.




 2014 Sep 4;513(7516):129-30. doi: 10.1038/513129a.

How to tame the flood of literature.

PMID:

 

25186906

 

[PubMed - indexed for MEDLINE]





+ Recent posts