The database giant is leveraging assets from its acquisition of DataScience.com to deliver a new enterprise-grade, cloud-based data science platform focused on collaboration. Credit: Stephen Lawson Oracle Wednesday staked its claim in the data science platform space with the availability of the Oracle Cloud Data Science Platform. The platform, built on the foundation of DataScience.com acquired by Oracle in 2018, is geared for teams of data scientists working collaboratively. Its capabilities include shared projects, model catalogs, team security policies, reproducibility, and auditability. The platform has the Oracle Cloud Infrastructure Data Science service at its core. It provides users the ability to build, train, and manage machine learning algorithms on the Oracle Cloud using Python, TensorFlow, Keras, Jupyter and other popular data science tools. Six additional services round out the platform, including new machine learning capabilities integrated in Oracle Autonomous Database, the Oracle Cloud Infrastructure Data Catalog, Oracle Big Data Service, Oracle Cloud SQL, Oracle Cloud Infrastructure Data Flow, and Oracle Cloud Infrastructure Virtual Machines for Data Science. “The service is really the first of its kind in terms of a native cloud service in that it’s really targeted for the enterprise,” says Greg Pavlik, senior vice president product development of Oracle Data and AI Services. “It is focused on providing an environment for collaboration and governance for data scientists.” According to Pavlik, the offering targets the full lifecycle of machine learning within the enterprise, meaning that it’s not just about developing or training models, but also taking those models into production and maintaining them. “As data changes, models become potentially less valid and users need to be able to continue to leverage them inside of applications or inside the analytic reports on the one hand. On the other hand, they have to have a high confidence that what they’re using is actually giving them good answers or correct responses,” Pavlik says. Simplifying data science With Oracle Cloud Infrastructure Data Science, Oracle is taking on platforms from competitors such as Alteryx, KNIME Analytics Platform, and RapidMiner with a focus on automating the data science workflow. The platform leverages AutoML algorithm selection and tuning, using machine learning models to select the best-fit algorithm for a specific use case, and to help users choose algorithm inputs and tune the model, Pavlik says. The platform also simplifies feature engineering by automatically identifying key predictive features from larger data sets. Oracle Cloud Infrastructure Data Science also aids in model evaluation by generating a suite of metrics and visualizations to help users measure model performance against new data and rank models over time. To support regulatory compliance efforts and help data teams establish trust in the output of their algorithms, Oracle’s offering provides automated explanation of the weighting and importance of factors used to generate a prediction. “We have advanced capabilities that we’ve developed in our Oracle Labs organization for model explainability,” Pavlik says. “That’s really understanding what is driving the model to its prediction, which is particularly important for regulatory situations where you have to be able to give an accounting of why: Why is the business making this decision? Why is the model telling us to do this?” Shared projects To support collaboration, Oracle has drawn inspiration from modern software development processes, adding capabilities that support shared projects, model catalogs, team-based security policies, and reproducibility and accountability. “The big problem that we often see with teams is the data scientists are off downloading a bunch of stuff on their laptop and then they’re working in relative isolation,” Pavlik says. “You lose some of the sense of accountability, safety, some of the best practices you’d have from software development. So, we’re looking to help organizations solve that problem without taking anything away from the data scientist.” The platform enables teams to leverage version control and share data and notebook sessions. Using model catalogs, teams can also share models and the artifacts necessary to modify and deploy them. Team-based security policies provide access controls to models, codes, and data, all integrated with Oracle Cloud Infrastructure Identity and Access Management. Enterprises can also track assets via the platform, thereby ensuring models can be reproduced and audited, even if team members leave. Additional data and machine learning services Oracle Cloud Infrastructure Data Science sits at the core of the new Oracle Cloud Data Science Platform, but Oracle also unveiled six other data and machine learning services to support the platform and integrate it with the company’s overall cloud offering. “If you’re working in your notebook, you’re doing Python training, it allows you to transparently go out, use compute resources, do scale-out training jobs, without having to drop into an IT administrative type mode. You can, within the tool itself, leverage the elastic capabilities of the cloud as part of your model training and model experimentation process,” Pavlik says. The additional six services include: New machine learning capabilities in Oracle Autonomous Database. Oracle has added support for Python and automated machine learning to Oracle Autonomous Database. Forthcoming integration with Oracle Cloud Infrastructure Data Science will give data scientists the ability to develop models using open source and scalable in-database algorithms. Oracle Cloud Infrastructure Data Catalog. The data catalog provides the ability to discover, find, organize, enrich and trace data assets. It features a built-in business glossary. Oracle Big Data Service. This service offers a full Cloudera Hadoop implementation, as well as machine learning for Spark. Oracle Cloud SQL. This service gives users the ability to run SQL queries on data in HDFS, Hive, Kafka, NoSQL, and Object Storage. Oracle Cloud Infrastructure Data Flow. This fully managed service lets users run Apache Spark applications without deploying or managing infrastructure. Oracle Cloud Infrastructure Virtual Machines for Data Science. This service offers preconfigured GPU-based environments for $30 a day. Related content feature 10 most popular IT certifications for 2023 Certifications are a great way to show employers you have the right IT skills and specializations for the job. These 10 certs are the ones IT pros are most likely to pursue, according to data from Dice. By Sarah K. White May 26, 2023 8 mins Certifications Careers interview Stepping up to the challenge of a global conglomerate CIO role Dr. Amrut Urkude became CIO of Reliance Polyester after his company was acquired by Reliance Industries. He discusses challenges IT leaders face while transitioning from a small company to a large multinational enterprise, and how to overcome them. By Yashvendra Singh May 26, 2023 7 mins Digital Transformation Careers brandpost With the new financial year looming, now is a good time to review your Microsoft 365 licenses By Veronica Lew May 25, 2023 5 mins Lenovo news Alteryx works in generative AI for speedy analytics results OpenAI integration and AI wizardry for report generation are aimed at making Alteryx’s analytics products more accessible. By Jon Gold May 25, 2023 3 mins Analytics Podcasts Videos Resources Events SUBSCRIBE TO OUR NEWSLETTER From our editors straight to your inbox Get started by entering your email address below. Please enter a valid email address Subscribe