Scientists must navigate a sea of data whenever they are investigating a disease to research new materials for future batteries or develop new drugs. Today’s entire ecosystem of scientific tools generates a huge variety of data that needs to be explored. The expedition will now be made much easier thanks to scientists at the US Department of Energy’s (DOE) Brookhaven National Laboratory, the National Synchrotron Light Source II (NSLS-II). A newly released software tool tile— Researchers can view, segment, and study data more conveniently than ever. Compared to previous methods, this new data access tool helps parks find and analyze the right pieces of data, laying the groundwork for the next scientific breakthrough.
As one of 28 DOE Office of Science user facilities across the United States, NSLS-II welcomes nearly 2,000 scientists each year to use ultra-bright lighting that addresses some of the biggest challenges in materials and life sciences. These visiting researchers come from all over the world to collaborate with experts and use unique research tools at NSLS-II. They quickly jab samples with powerful X-rays, from ancient rocks to new quantum materials, and use advanced detectors to capture outgoing signals. In turn, these detectors emit a stream of data waiting to be analyzed by scientists.
“Working with data is a central part of any research, but it is a challenge in itself. It comes in a variety of formats, in different sizes and shapes, and not all pieces of data are useful to researchers. This is why we develop data. “It’s a software tool that makes accessing, viewing, and sorting data very important,” said computational scientist Dan Allan.
Tiled is a data access service for data-aware portals and data science tools. This means that Tiled sits at the top of databases and file systems, allowing scientists to access data through, for example, a web browser or data analysis software. While the Data Science and Systems Integration (DSSI) program will be distributed as Tiled to all lab stations in NSLS-II, the service will be available at all lab stations, like its cousin project Bluesky (data acquisition software also developed in NSLS-II). . labs around the world. This is possible because Tiled is published under a popular open source software license.
“We developed Tiled in the programming language Python, so it integrates naturally with Python-based data science libraries, but the service isn’t limited to Python,” said Stuart Campbell, NSLS-II’s lead data scientist. said. “The client uses an API or application programming interface to connect a user application with the server. An API is basically a set of rules or contracts that define how different pieces of software communicate with each other. The good thing about this approach is that it is that once these rules and interfaces are defined, it gives users and developers a structure to build great tools and extend their capabilities beyond what was originally envisioned.”
Tiled’s flexibility allows the service to integrate seamlessly with any database or collection of files, allowing it to be used for a wide range of experiments with a wide variety of technologies and data.
Squaring data requirements
“In the past, PhD advisors have helped download data from facilities like NSLS-II. This was tedious because we had to download all the data at once before we could classify the useful parts. Also, “data can be It was a detector type. This means that after a long download, you have to transform the data before you even see it,” Allan said.
“If Dan had Tiled then, he would have been able to easily browse the data in a web browser or data analysis application, categorize the good parts, and share only the parts of interest to his advisors via a single link,” Campbell adds.
With Tiled, scientists can preview data and access only what they want without large downloads. You can also choose the format of the downloaded data or feed it directly into your analysis software. At the same time, Tiled provides access control based on web security standards to keep all your data safe. Setting up a new account can be a barrier, so Tiled can be configured to allow logins to third-party services such as Google and ORCID.
“Remote capabilities are more important than ever,” said Dylan McReynolds, Computing Systems Engineer at Advanced Light Source, the DOE Office of Science User Facility located at Lawrence Berkeley National Laboratory, where Tiled collaborated. to enhance our scientific capabilities by making it easy to move data where it is needed.”
The new software also enables a kind of “airplane mode” where data is stored on the user’s laptop, allowing researchers to continue working offline or over a slow internet connection.
“Tiled’s goal is to simplify data access for everyone. As long as you don’t have to worry about converting data types to other formats or picking information from file names,” said Thomas Caswell, computational scientist at NSLS-II. .
Simplifying and standardizing data access is critical to optimizing existing workflows and enabling future workflows centered around machine learning, AI, and other advanced analytics. These new technologies reach their full potential by relying heavily on frictionless access to data, regardless of how it is collected or stored.
Tiled: Fits all research puzzles.
The first users of Tiled have already built interesting and sophisticated tools to aid their research.
“Tiled provides a whole new way to access data that will simplify and streamline the processing and analysis pipeline for experiments. No more wasting time on cumbersome downloads or importing data in 12 different formats for experiment analysis. You don’t have to!” said Denis Leschev, an assistant physicist at NSLS-II who tested Tiled. “Tiled also enables a more direct way to share data, paving the way for a more open and transparent science in the future.”
The new software is not exclusive to NSLS-II users. The team designed the software to be applicable to any data source. It can be deployed at scale in facilities such as the NSLS-II, but can also be run on student laptops or on workstations in study groups. Other laboratories and institutions already have the opportunity to tailor this software to their own needs.
An early user of Tiled, Peter Beaucage, an employee scientist at the National Institute of Standards and Technology (NIST), integrated Tiled with his scientific data analysis program, PyHyperScattering. He lets Tiled handle the data transfer and security details, and based on that, gives users the specific interface they need to work with.
“The amount of synchrotron data required for common analytics has expanded dramatically over the past decade and has rapidly expanded beyond the capabilities of traditional data transfer platforms. Tiles and similar solutions help users seamlessly access and speed up the right data at the right time. I promise, it’s a discovery based on X-ray science,” said Beaucage.
In addition to Beaucage, other users of Tiled have built data analysis pipelines to move data from real-time experiments in NSLS-II to remote clusters, and to custom software for visualizing and investigating data. Each step was supported by Tiled.
“Overall, we’re very proud to launch Tiled,” Campbell said. “This is the culmination of our work over the past six years. It combines all the features we want in a modern data access tool and uses it with Bluesky.” . .
the way forward
Tiled makes your entire garden of useful tools available for a variety of techniques. The team turned to building a variety of web applications focused on specific research skills. The team also wants to design an open data interface so that anyone can use Tiled to explore publicly available real-world data.
“Grants often require public data access, but it is difficult for researchers to achieve this in a practical and immediately useful way. Tiled opens the door to researchers with tools they already use to find data. Scientific Data Management It is accessible, interoperable and reusable following the principles of the FAIR Guidance for Management and Management,” Allan added.
By decoupling how data is stored from how it is accessed, Tiled shows how to use cutting-edge storage and retrieval technology inside, while setting a time-tested and established standard for researchers. It meets them where they are and takes responsibility for how they format and work with data.
“Tiled aims to follow other NSLS-II software efforts to grow a friendly community of contributors and users. We look forward to collaborating with facilities and researchers around the world who are facing similar challenges, regardless of industry, academia or government. We’re actively looking, and we’re excited to see what we can build together on this platform,” Allan said.
After the AI mastered Go and Super Mario, the scientists taught them how to ‘play’ the experiments on NSLS-II.
Daniel Allan et al., Ahead of Bluesky: Multi-Facilities Collaboration for a Masterpiece Software Project for Data Collection and Management; Synchrotron Radiation News (2019). DOI: 10.1080/08940886.2019.1608121
Tile documentation: blueskyproject.io/tiled
Tile demo (for programmers): tiled-demo.blueskyproject.io/
Bluesky open source project home page: blueskyproject.io/
Summons: Transforming Data Access with New Software Tools: Tiled (November 24, 2021), November 2021 at https://techxplore.com/news/2021-11-revolutionizing-access-software-tool-tiled.html 24 days search
This document is protected by copyright. No part may be reproduced without written permission, except for private research or fair trade for research purposes. The content is provided for informational purposes only.