by Thor Olavsrud

Pentaho adds native Python integration

Jan 27, 2016
AnalyticsData ArchitectureSoftware Development

The integration brings the most popular coding language to Pentaho's data integration environment, allowing it to better support machine learning and analytical environments.

Aiming to better support machine learning and analytical environments, Pentaho Labs yesterday announced that it has developed a native integration for the Python language through Pentaho Data Integration (PDI).

PDI is essentially a portable “data machine” for ETL, which you can deploy as a stand-alone Pentaho cluster or inside a Hadoop cluster through MapReduce or YARN. Will Gorman, vice president of Pentaho Labs at Hitachi subsidiary Pentaho, says the integration means data scientists can now use of the most popular and flexible open-source languages to increase productivity and data governance while supporting predictive analytics and machine learning. He says the integration will also make data science and predictive modeling more accessible to the developer community.

[ Related: 5 legacy technologies still in high demand today]

“Python is the environment that is growing the fastest from a community perspective,” Gorman says. “And a large portion of teams are working with Python to build out machine learning and analytical environments.”

Python wins popularity contest

Last year, CodeEval said its data showed that Python was the most popular coding language for the fourth year running, followed by Java, C++ and Javascript. And a study commissioned by Ocado Technology that year found that Python had become the most popular language taught in primary schools, beating out French.

[ Related: HDS adds to advanced analytics portfolio with Pentaho buy ]

“As the field of data science continues to grow outside the world of research and statisticians, it is important for our team to arm developers with a wide range of programming languages,” Gorman says. “Python provides developers another option for data science with a general purposes language. With these languages, data scientists have the ability to use the most appropriate language with increased use of data preprocessing through PDI.”

Gorman also says that Python is the preferred language for deep learning researchers, providing engineers in data science the ability to more easily develop predictive models.

“Python is widely deployed by developers and engineers to create statistical analytic workflows, particularly in areas such as finance, oil and gas and physics,” Matt Aslett, research director, 451 Research, said in a statement Tuesday. “We see Python as a primary language for artificial intelligence engines and Pentaho’s native integration of Python will allow organizations to apply their deep domain expertise and improve predictive analytics and machine learning algorithms.”

PDI for Python is available for download in the Pentaho Marketplace.