What do the universe, open source software, a 12,000-core supercomputer, a cool $2.5 million of high-grade silicon and one of the country\u2019s largest data sets have in common? They all underpin a five-year Australian initiative to map and study the observable universe from the southern hemisphere.\nThe story begins back in 2003 when the Great Melbourne Telescope was destroyed during the bush fires of January that year.\nFrom the ashes rose the SkyMapper observatory, based in the Siding Spring Observatory at the safer central-west NSW location of Coonabarabran and tasked with scanning the night skies to create the Southern Sky Survey.\nA deep digital map of the southern sky, the Southern Sky Survey -- with a little help from the National Computational Infrastructure (NCI) National Facility -- will allow astronomers to study interstellar objects ranging from nearby asteroids to super-distant objects like quasars.\nThe data from SkyMapper will also be shared globally via the Virtual Observatory initiative, to allow astronomers all over the world to explore its every possibility. Experts say this advance heralds the arrival of a new era in astronomy -- one where researchers can draw on freely available online data about the universe instead of having to wait months, or even years, for a chance to observe the night sky through a billion-dollar physical telescope.\n(Check out CIO's SkyMapper slideshow here.)\nSouthern Sky Survey\nAccording to Stefan Keller, SkyMapper scientist and research fellow at the Australian National University\u2019s Research School of Astronomy and Astrophysics, the underlying idea -- and significance of the survey -- is that it will form the first digital, optical map of the southern skies.\nMapping about a billion objects, the survey will provide a fundamental resource for future astronomical studies in the near and distant universe.\n\u201cThe southern sky has traditionally not been as observed as the northern sky, as there are fewer people, so there is the potential to find objects the size of Pluto drifting around out there [in our solar system] and as yet unseen,\u201d Keller says.\nThrough particular attention to the use of different coloured glass filters in SkyMapper\u2019s 268 megapixel camera, astronomers will be able to focus on particular parts of the stellar spectrum to help decipher the heat, density and chemical abundances of stars.\nAt the far edge of the optically observable universe, SkyMapper will also be able to pick up things such as \u2018high red-shift\u2019 quasi-stellar objects (QSOs), Keller says.\n\u201cHere we have galaxies powered by central black holes,\u201d he says. \u201cAs they consume material they spit out jets of material and create a lot of light. Those objects form very valuable probes through the murk between us and them, and in that way we can determine what the material is along that line of site.\u201d\n\u201cSkyMapper is really about finding the needles in the haystacks -- the incredibly rare objects. That\u2019s really the power of SkyMapper; by drawing in that many objects you can spot all the oddball ones.\u201d\nSkyMapper is also notable for the speed and breadth at which it can take images -- about 1000 degrees of space a night, according to Keller; about 20 times the amount of data available through any other observatory in the southern hemisphere.\nUnsurprisingly, the Southern Sky Survey will result in a large volume of raw data -- about 470 terabytes, or about 100,000 DVDs worth -- when complete, according to Keller.\nPage Break\nUsing a data trickler and a secure gigabit link to the Australian Academic Research Network (AARNet) each night\u2019s scan produces about 0.7 terabytes of data, which is transferred from Coonabarabran to the NCI National Facility in Canberra.\nThe data is then stored using a hierarchical storage management system, which mixes disk and a large robotic tape library system to help preserve the data in a regular, categorised form and duplicate it in two separate locations for backup purposes.\n(Check out CIO's SkyMapper slideshow here.)\nTo increase the usability and accessibility of the data, the project will also shrink the raw data down to the actual numbers that are most important to researchers -- the shapes, sizes and brightness of the billion objects in the southern sky.\n"We are going to image those one billion-odd objects in 36 images spaced in time over five years, Keller says. \u201cIn that way we can look at objects that vary across the sky, objects such as pulsating stars and moving asteroids.\n\u201cThen we reduce the data and end up with a database of about 30 terabytes, which we then make available via the Web. As far as we know it will be Australia\u2019s largest database.\u201d\nAutomation\nNeeding to identify and catalogue around a billion objects, and scan for five years, SkyMapper relies on a high level of intelligent automation, Keller says.\nUsing an automation and scheduling application, SkyMapper is capable of independently assessing night sky conditions -- whether the moon is out, how bright the stars are, whether there are clouds -- and then progress through the most suitable scientific program.\nThe cataloguing of one billion objects across more than 4000 survey fields is also automated, meaning that SkyMapper is able to discern objects based on factors such as brightness and shape, Keller says.\n\u201cWe can cleanly extract all the stars and measure their brightness,\u201d he says. \u201cGalaxies are a bit harder as they can have spiral arms on them but we can still easily find the interesting objects that lie in that data set.\u201d\nOpen Source\nAccording to Keller, the SkyMapper project is heavily based on open source software, largely because of its low cost.\n\u201cWe are on a very tight budget, so any expertise we can draw on in a shared way is extremely valuable to us,\u201d Keller says.\n\u201cMost of our pipeline is comprised of components written by astronomers elsewhere in the world over many decades. We draw together those little units and basically script them all up together with Perl and Python, and that makes for an efficient coding process.\u201d\nPage Break\nFor its databases SkyMapper uses Postgresql, which is front-ended by a standard Web form through which relational searches can be done.\n\u201cYou may be interested in a galaxy at a certain position, so you can get on to the Web page, download the data and image of that galaxy. In this way we can save astronomers a lot of time,\u201d Keller says. \u201cThey don\u2019t have to go and survey it themselves to decide if they\u2019re interested in it for further research.\u201d\n(Check out CIO's SkyMapper slideshow here.)\nThis is particularly important for the current generation of massive 20-30 metre telescopes, Keller says. These behemoths cost about a billion dollars each, so time on them is extremely valuable.\nKeller says that with the increasing importance of online data as a reference for the sky, astronomy is on the verge of a paradigm shift. This new online data is served by the International Virtual Observatory Alliance, a consortium of international astronomical facilities that make their data freely available to researchers and scientists.\n\u201cSkyMapper will be a key component in the Virtual Observatory by providing coverage for the southern sky, allowing astronomers to cross match objects seen in gamma-rays through optical to radio wavelengths and open new windows of exploration,\u201d Keller says.\nHigh Performance Data Transfer\nBen Evans, head ANU Supercomputer Facility and manager at the NCI National Facility, says that data transfer between SkyMapper and the NCI National Facility site is performed using GridFTP, which is designed to provide a more reliable and high performance file transfer for grid computing applications.\nTo handle data replication, the NCI National Facility uses a modified version of the data replication techniques in the Globus Alliance\u2019s Globus Toolkit to verify that a full data copy has been received in Canberra before the images created at its Siding Spring Observatory are deleted, Evans says.\n\u201cWhat\u2019s unique in Australia is us expanding the way in which we manage data,\u201d Evans says. \u201cNormally, we would just manage data in the local domain, but the grid software allows us to push our management technique right out to the instrument. That\u2019s not typically how grid software is being used in other parts of the world.\u201d\nAccording to Evans, the decision to use on open source software as a control mechanism to manage the SkyMapper project\u2019s large volumes of data came down to simplicity and flexibility -- and a lack of commercial software choices.\n\u201cWe just wanted to adapt something that was already out there rather than develop our own,\u201d he says. \u201cThere aren\u2019t too many commercial software apps that do this and there is too much already available in the open source domain.\u201d\nSupercomputing\nEvans says the bulk of the analysis of the SkyMapper data will be done on a brand new, next generation Sun supercomputer kitted out with 12,000 cores. Due to be fully online by December, the supercomputer will offer a tenfold increase in performance over the facility\u2019s current set up of two SGI machines, each with just under 3500 cores in total.\nAlong with processing data from SkyMapper, the new Sun machine will also be used for atmospheric and weather research as well as serving other high-performance computing needs around the country, Evans says.\nData hosting will be done on a data storage cloud hosted next to the supercomputer to allow for easy access for data processing. This cloud is based on a hybrid of software and hardware including: SAN QFS software from Sun to help manage the storage domain; virtualization software from VMware; Linux as a core operating system; Solaris ; and databases from MySQL and PostgreSQL.