Many scholars are increasingly incorporating cloud computing, machine learning and other modern tools into their research.
For people who want to take the plunge into these new technologies, Princeton University’s Schmidt DataX Initiative serves as a diving board through its array of programming which includes informative workshops. Sessions on topics such as cloud computing and Python packaging have been held – all organized with the goal to spread and deepen the use of data science and machine learning across campus.
“These tools have the potential to advance science, speed up the development of important technologies, and increase our understanding of the world,” said Peter Ramadge, director of the Center for Statistics and Machine Learning (CSML), which oversees a range of DataX efforts.
On March 4th, DataX sponsored part one of a workshop on cloud computing. Twenty people attended, both in person and via Zoom. Part two of the workshop will be on April 1st. The first workshop focused on setting up an integrated development environment for local and cloud computing. That workshop video will be available soon.
Additional upcoming DataX events include a “Tutorial Workshop on Machine Learning for Experimental Science” in April and “Synthetic Control Methods” in early June.
Jose Garrido Torres, a DataX data scientist in the computer science department, led the March 4th workshop with the assistance of Vineet Bansal, senior research software engineer jointly appointed to CSML and the Princeton Institute for Computational Science and Engineering.
The workshop introduced attendees to PyCharm, an integrated development environment (IDE) that allows programmers to program in a single graphical interface. IDEs are helpful because they streamline various aspects of coding, Torres said.
In the hands-on workshop, Torres showed attendees how to build, run and debug simple examples using PyCharm. Though the examples used Python, Torres said the skills learned during the session apply to other languages.
Torres said his lesson plan is to prepare attendees to leap from coding on their laptops to coding on virtual machines in the cloud. Part two, on April 1st, will show how to build virtual machines in Microsoft Azure and access these using PyCharm.
Building and harnessing virtual machines in the cloud is very helpful in data science and machine learning because it allows people to execute computationally expensive processes, said Torres. Modern machine learning algorithms are a challenge for a laptop. But cloud computing enables these methods to take advantage of powerful servers off-site connected to a cloud environment. The cloud environment allows users to create powerful virtual machines to execute their code. Hence their laptops do not limit their computational resources. In addition, users can scale cloud resources on-demand depending on their computational needs.
“That’s what’s great about cloud computing. From a laptop, you can employ the best resources on the cloud,” said Torres.
Jackson Deobald, a graduate student in chemistry, attended the workshop and found it helpful.
“There are a lot of cool predictive capabilities in machine learning,” said Deobald, who focuses on organic chemistry and wants to incorporate more machine learning and cloud computing into his work.
Tyler Cochran, a graduate student in physics, also found the workshop instructive.
“I thought it might be interesting to attend. It’s not at all related to what I do,” said Cochran, whose research is on experimental condensed matter physics. “But it’s good to learn something new.”
You can find the DataX workshop schedule and previously recorded DataX workshops at this link.
To register for part two of this workshop, scheduled for April 1st, apply here.