In 2008, to make data more accessible for scientific sharing and integration, the Department of Energy came up with the vision for a “Systems Biology Knowledgebase” (also referred to as Kbase) for plant and microbial life. In 2010 the research and development needed to design and implement Kbase was completed by the Genomic Science program, and work on it is currently underway.
Kbase should be a boon both for those who want to gain better understanding of such life forms for the sake of pure science and to those who would apply the Kbase data, metadata, and tools for modeling and predictive technologies to help the production of renewable biofuels and a reduction of carbon in the environment.
According to the Kbase implementation plan, “In general, a knowledgebase is an organized collection of data, organizational methods, standards, analysis tools, and interfaces representing a body of knowledge.”
Michael Schatz, a quantitative biologist involved in the project, observes of the present state of microbial life data: "There are many different 'silos' of information that have been painstakingly collected; and there are a number of existing tools that bring some strands of data into relation. But there is no overarching tool that can be used across silos." The integration of data generated by scientists around the world, combined with computational models, is expected to greatly accelerate scientific advancements.
Source: Genomic Science Program, US Dept. of Energy
Kbase would foster “open community science.” According to Kbase literature, instead of scientists working on their particular areas of investigation with their own limited data, now “any laboratory or project, regardless of size,” will have “free and open access to data, analysis tools, resources for modeling and simulation, and information.” In turn, any one of these labs could also contribute to “a transformative community-wide effort” to advance the body of knowledge and spur innovations in predictive biology.
The plan is for Kbase to start off with seven data centers on ESnet (the Department of Energy Energy Sciences Network). That is one for each of the six defined scientific objectives of Kbase; the seventh is devoted to coordinating the infrastructure development of the project. According to the current timetable, it should take 12 months to get the Kbase hardware platform operational. Version 1.0 is anticipated to be accessible after 18 months and version 2.0 after 36 months; five years is the estimated time to achieve operation and support at target levels.
The idea is to implement a system that can grow as needed and be easily used by scientists without extensive training in applications. It should produce understandable results based on clear scientific assumptions, engage all members of the scientific community, and encourage further discovery, with findings that inspire “new rounds of experiments or lines of research.”
There are three “guiding principles” for Kbase as stated in the implementation plan, and they all begin with the word “open”:
Open Access. Data and methods are available for anyone to use.
Open Source. Source code is freely available to access, modify, and redistribute.
Open Development. Anyone can contribute to the development of Kbase resources by following guidelines defined by the community.
Source: Genomic Science Program, US Dept. of Energy
The hope, of course, is that this will open the way to new discoveries and, possibly, new sources of renewable energy.
— Ariella Brown is a freelance writer, editor, and social media consultant.
Absolutely, this type of technology can be a great asset to healthcare, and it can certainly help the advance of other sciences, as well. Perhaps once the DOE paves the way, others will be able to come up with the knowledgebase systems that will reach their goals in less than 5 years.
Ariella, I wonder what other disciplines will begin to apply the concept of a knowledgebase?
Actually, IBM is trying to apply the knowledgebase of Watson to the healthcare industry in providing doctors with latest database and evidence-based literature for diagnosis, etc.
If we begin to think about the sharing of knowledge and then the patterning, and then use the technology tools developed I believe we will unleash new opportunities of discovery.
You're right, DHager. The genome project also brought that a great deal of data is generated byresearch and that a solution to organizing it to make it accessible could greatly accelerate scientific advances.
Ariella, great blog. That is fascinating; obviously they were stimulated by the recent genome project where they pooled knowledge and now see the benefits of applying technology.
You and jabailo are correct in looking at the engineering structure and comparing with Watson. What I think will be interesting is not only the various data engineering tools, but the models that evolve in the further use and development of artificial intelligence.
Examining Technologies for Database Management Systems that Support Computational Biology and Bioinformatics Applications• Principal Investigator: Victor Markowitz (Lawrence Berkeley National Laboratory and the DOE Joint Genome Institute)This project focused on evaluating new database management system technologies that allow efficient analysis of very large datasets. Prototypes of a large database based on the DOE JGI's Integrated Microbial Genomes (IMG) data management system were implemented using several of these technologies. Performance tests of IMG "all versus all" data were conducted in Hbase on the DOE National Energy Research Scientific Computing Center's Magellan Hadoop cluster and on a smaller departmental Hadoop cluster. Results show that distributed tabular storage has significant long-term potential for Kbase but that it is not yet ready for large-scale production use. Investigators note that Hadoop and Hbase currently are undergoing rapid development, and they anticipate that stability issues will be addressed within the next 2 years. DOE JGI is now implementing the Magellan cloud infrastructure for microbial genome assembly and annotation based on the results of this pilot test.
The same PDF also contains a couple of models of the architecture, which includes several layers: the Kbase core services, Kbase infrastructure services, the unified application interface, the data consistency and experimental design engine, and the Kbase end user web interface.
From what I read that sounds like a lot of the infrastructure that Watson was based on. Hadoop a key part of the multithreading. The OS was SuseLinux for Watson.
However, I guess I'm speaking more to the technologies of knowledge retrieval, data miniing, knowledge representation and natural language understanding that ride on top of that infrastructure.
Kbase does not appear to have a Watson. It has a Kandinsky:
the hardware configuration includes over 0.5 petabytes of storage on local nodes under the direction of the Hadoop Distributed File System.
In addition to supporting Hadoop based applications, support for private cloud virtualization will be added via the Eucalyptus infrastructure software that enables establishment of private cloud computing environments. Eucalyptus is interface-compatible with the Amazon Web Services (AWS) cloud infrastructure, which means users can reuse existing AWS-compatible tools and scripts to manage their own private cloud, run Amazon Machine Images on their private cloud and cloud-burst to other public-clouds (also known as hybrid clouds -- a private on-premise cloud, in this case, a Eucalyptus cloud, working seamlessly with a public cloud).
It was a country fried pork chop with mashed potatoes and hot apple sauce!
Should they be using IBM systems?
Well, more to the point (and maybe I should stick to being a writer instead of a videographer) I am specifically talking about the AI program Watson which dazzled us on Jeopardy. When I heard about multiple lines of logic and inquiry, it sounded exactly what they had to do to get Watson to be able to evaluate across many types of expertise -- they built multiple, multithreaded inference engines to be able to handle all the types.
Just seeing Watson in action, makes we think we need a whole new mindset when we plan any large scale, broad based knowledge systems...starting with looking at Watson itself as an option!
Jabailo, how can you point to your TV dinner tray without telling us exactly what you ate? I mean that's what the internet is all about -- sharing what you have for dinner, at least as far as a lot of Twitter and FB posts go.
I contacted 4 people whose names were connected to this project about their choice of cloud service provider but got no answers. At best, they replied with a friendly email that said they passedo on my queries to the PR person. I even found her contact info myself and emailed her directly. No response. It looks like they are planning to build something very elaborate on their own. Should they be using IBM systems? Possibly, yes. But I didn't get to discuss that with anyone involved in the project.
The ThinkerNet does not reflect the views of TechWeb. The ThinkerNet is an informal means of communication to members and visitors of the Internet Evolution site. Individual authors are chosen by Internet Evolution to blog. Neither Internet Evolution nor TechWeb assume responsibility for comments, claims, or opinions made by authors and ThinkerNet bloggers. They are no substitute for your own research and should not be relied upon for trading or any other purpose.
The day before Valentine’s Day, hundreds of thousands of people watched a video that featured the sentence “I love you” in 100 different languages. That video, widely shared on social networking sites, was made by Memrise, a learning site based in London. Languages are among the things you can learn through its memory techniques. And, unlike Rosetta Stone, the site is free.
“This is the largest classroom in the world, Professor -- television.” That’s what Charles Van Doren is told in the movie Quiz Show. And now, the potential for education assigned to television in the 1950s and described in that film is now found on the Internet.
LED lightbulbs will be used not only for home and business lighting automation, but possibly also for locating shoppers inside stores and transmitting data at hundreds of megabits per second.
David Bartlett, a.k.a. the "Building Whisperer," explains how complex organizations are microcosms of cities, and why it's everyone's job to become more efficient and, therefore, "smarter."
Edmunds separates customers into segments based on the info it collects on its site and from partners, and uses that to push out custom content, said Brian Baron, director of business analytics for Edmunds.com, at Predictive Analytics Innovation Summit.
The automotive website uses propensity modeling to target ads and customer registration forms, said Brian Baron, director of business analytics for Edmunds.com, at Predictive Analytics Innovation Summit.
Big-data has become a big point of emphasis for many businesses. While the technology is available to deploy these applications, the needed personnel often is not. As a result, analytic engineers' salaries have blown past the six-figure mark, and hiring these experts has become a challenge for IT managers.
New York's Metropolitan Transit Authority is conducting a pilot test of digital kiosks to guide subway users to where they want to go more efficiently and at lower cost.
The whole Amazon.reader debate is a double-stupid. It's stupid to think that there's any e-book buyer who doesn't know Amazon's URL, and it was stupider to let ICANN launch the whole free-form TLD initiative to start with.
While NFC's original goal was to enhance mobile commerce applications, it is finding its way into a number of other uses, which is creating both opportunity as well as challenges for IT departments.
Enterprises would like to move to cloud computing but are hesitant because they are concerned about providers’ ability to secure company data. Here are some tips that help to ensure that if breaches occur, the business is not left holding the bag.
Edmunds separates customers into segments based on the info it collects on its site and from partners, and uses that to push out custom content, said Brian Baron, director of business analytics for Edmunds.com, at Predictive Analytics Innovation Summit.
The automotive website uses propensity modeling to target ads and customer registration forms, said Brian Baron, director of business analytics for Edmunds.com, at Predictive Analytics Innovation Summit.
Expert Integrated Systems: Changing the Experience & Economics of IT In this e-book, we take an in-depth look at these expert integrated systems -- what they are, how they work, and how they have the potential to help CIOs achieve dramatic savings while restoring IT's role as business innovator. READ THIS eBOOK
your weekly update of news, analysis, and
opinion from Internet Evolution - FREE! REGISTER HERE
Wanted! Site Moderators Internet Evolution is looking for a handful of readers to help moderate the message boards on our site as well as engaging in high-IQ conversation with the industry mavens on our thinkerNet blogosphere. The job comes with various perks, bags of kudos, and GIANT bragging rights. Interested?
To save this item to your list of favorite Internet Evolution content so you can find it later in your Profile page, click the "Save It" button next to the item.
M2M: Rise of the Machines? Not Yet David Weldon In the 1970 science fiction thriller Colossus: The Forbin Project, two giant supercomputers from the United States and Soviet Union secretly join forces to take control of the collective nuclear might of the two countries. In the film, the two machines discover each other's existence, communicate back-and-forth, share their collective data, and cut their human creators out of the process. It is the ultimate example of machine-to-machine communications, or M2M. CLICK FOR MORE
M2M: Rise of the Machines? Not Yet David Weldon In the 1970 science fiction thriller Colossus: The Forbin Project, two giant supercomputers from the United States and Soviet Union secretly join forces to take control of the collective nuclear might of the two countries. In the film, the two machines discover each other's existence, communicate back-and-forth, share their collective data, and cut their human creators out of the process. It is the ultimate example of machine-to-machine communications, or M2M. CLICK FOR MORE
M2M: Rise of the Machines? Not Yet David Weldon In the 1970 science fiction thriller Colossus: The Forbin Project, two giant supercomputers from the United States and Soviet Union secretly join forces to take control of the collective nuclear might of the two countries. In the film, the two machines discover each other's existence, communicate back-and-forth, share their collective data, and cut their human creators out of the process. It is the ultimate example of machine-to-machine communications, or M2M. CLICK FOR MORE
M2M: Rise of the Machines? Not Yet David Weldon In the 1970 science fiction thriller Colossus: The Forbin Project, two giant supercomputers from the United States and Soviet Union secretly join forces to take control of the collective nuclear might of the two countries. In the film, the two machines discover each other's existence, communicate back-and-forth, share their collective data, and cut their human creators out of the process. It is the ultimate example of machine-to-machine communications, or M2M. CLICK FOR MORE