Water, water everywhere, but not a drop to drink. When those words from the poem “The Rime of the Ancient Mariner” were first published in 1798, the English poet Samuel Taylor Coleridge described a story of a sailor who had returned from a long sea voyage, what was gained and what was lost on the […]
But the research community has been able to pull in information from Fitbits and other connected, wearable devices for four years with the help of a research platform called Fitabase.
This week, Fitbit announced that Fitabase, made by San Diego-based startup Small Steps Labs, has now collected more than 2 billion minutes of Fitbit data for research purposes. Fitabase also has supported more than 200 research projects since its 2012 founding, the company also disclosed.
“What we’ve built is kind of the missing piece for research,” said Fitabase CEO Aaron Coleman. The platform collects and de-identifies data from Fitbit users and offers data pools to academic researchers, including many in healthcare. “This removes a lot of privacy concerns,” including those around HIPAA, Coleman said.
“This is a technology that bridges a consumer device like Fitbit with the needs of research,” Coleman said. “Researchers are loving this new paradigm of research.”
That’s important because millions have purchased and regularly use activity trackers. The data these wearables collect provide insights about movement, heart rate and sleep patterns that previously had not been available, plus people actually enjoy wearing their Fitbits.
“It was really difficult to get people to use pedometers,” Coleman noted. That made it tough for researchers and clinicians alike to collect good data and, more importantly, improve health.
“Devices help people better tailor their activities and their health,” Coleman said. “Interventions shouldn’t be the same for everyone.”
For example, Fitabit is helping researchers determine how quickly people regain their previous level of activity following surgery. “They can tailor interventions to people who need it most,” Coleman said.
So what about the “2 billion minutes” of Fitbit data? “We provide the researcher with de-identified data at the minute level,” Coleman explained. Each person’s activity levels can vary at different times in the day. Having this insight allows researchers — and, ultimately, healthcare professionals and caregivers — to schedule interventions when they are most likely to be effective, according to Coleman.
Coleman pointed to a research project at Arizona State University, where Eric Hekler, director of the school’s Designing Health Lab, is applying engineering strategies to study what Hekler calls “precision behavior change,” a complement to precision medicine. Hekler and research partner Daniel Rivera, director of the ASU Control Systems Engineering Laboratory, are testing “health interventions that are adaptive and individualized, versus static and generalized,” according to a Fitbit statement.
Coleman himself also has applied individual Fitbit data to control the level of difficulty in an app called Tappy Fit, a Flappy Birds-like mobile fitness game.
Photos: Fitabase, Fitbit
The entities are building a statewide health information network that will enable doctors to share real-time patient data and analytics.
As the amount of data in healthcare continues to grow, data scientists are working to create solutions that address complex problems in healthcare. We are joined by Kevin Petrie, a technology evangelist at Attunity who has worked on solving problems inside and outside of healthcare with data scientists . Kevin joins us to talk about […]
The next big step in population health is greatly expanding what defines a population. And that process will require a new generation of technologies, according to Adrian Zai of Massachusetts General.
In a deal with Bausch + Lomb, IBM announced a cloud-based app that helps cataract specialists plan and conduct surgeries.
Data science has lost none of its cachet in recent years; companies all over the world very much need data scientists to crunch enormous datasets and provide insights. Job opportunities abound. But does that demand actually make it harder for a tech pro to land a data-science job?
Hilary Mason, a data scientist and founding CEO of New York City-based Fast Forward Labs, sees “a ton of people asking for data scientists.” Her company’s newsletter has likewise experienced an increase in job postings.
Large corporations such as Ford Motor Co. have also “increased the number of data scientists we hire,” according to Laura Kurtz, the auto-giant’s manager of recruiting. “We recently created a new data analytics group to understand and make better use of them.” Ford relies on data scientists for everything from human resources (to develop better strategic workforce plans) to manufacturing (to study process efficiencies and throughput). The company is hiring data workers from across the whole experience spectrum, including recent college graduates.
But not every company is hungry for more data scientists. Kaggle.com, which organizes data-science competitions and jobs, recently cut seven of its approximately 20 jobs. (Despite its shrinking staff, the firm still runs dozens of contests, some with pretty significant payouts; that’s in addition to posting newsworthy datasets such as Hillary Clinton’s email collection, stored in a SQL database.)
Kaggle isn’t alone in the data-competition department: DrivenData currently runs seven different data science contests, most of which focus on improving conditions in far-flung parts of the globe. Texata.com offers an annual Big Data business-world championship, specifically designed for college students. Numerous hackathons make use of data-science techniques, as well.
If you want to enter this still-vibrant field and land a job, here are a few suggestions from the pros:
Understand What You’re Getting Into
Not all data science jobs are alike, and not all positions carry equal prominence at all companies. Dave Holtz, writing a post for online-learning site Udacity, has put together a great list of suggestions on how you can evaluate different job openings and company types.
His post also suggests eight different skills that you should have in your tool-kit, such as statistics, data visualization, and basic software engineering. Also on the list: advanced calculus and linear algebra.
Mason feels the hiring market has matured to the point where “companies are a bit more aware of what skills they actually need, rather than asking for the kitchen sink. Over the last few years, companies have gotten better at hiring data scientists, both in defining the skills they actually need and in interviewing and supporting data scientists once they join a team.”
If you’re interested in brushing up on your day-to-day data skills, look at some of the online tutorials at Datacamp.com, where you can find more practical exercises such as how to use R and Python scripting for large datasets.
Participate in a Contest
Another way to hone your skills is by participating in a data-science contest. Kaggle’s CTO has put together a list of suggestions on how to win such competitions. These include entering alone (rather than as part of a team), using some kind of data visualization tool, and doing frequent iterations on whatever solution you come up with. If you’re interested, take a look at the next GlobalHack contest, held in the fall in St. Louis, with a total purse of a million dollars in various prizes.
Look inside your own company to see if you can spearhead a data-science approach to some of your thorniest issues. “A number of companies get to the point where they have a lot of traffic (and an increasingly large amount of data),” said Udacity’s Holtz, “and they’re looking for someone to set up a lot of the data infrastructure that the company will need moving forward.” This could be the best opportunity; after all, you should already know your own business.
Interested in working for a healthcare IT startup? While the potential rewards are vast, so are the challenges.
“In healthcare, many great ideas falter because of technology—or more specifically, the difficulty in integrating to legacy systems,” John Sung Kim, founder of Five9 and DoctorBase, wrote in a new TechCrunch column. “Whether you’re selling to a small doctor’s office or a large hospital, healthcare organizations of any size are juggling multiple software systems, many of which do not speak to each other.”
Although many experts blame the woes of the healthcare IT industry on a lack of integration between healthcare databases and software platforms, there’s also the issue of regulations. Every app that interacts with patient data needs to follow the Health Insurance Portability and Accountability Act (HIPAA), which protects health data both in movement between databases and at rest. Hospitals and other entities that handle such data must ensure that they can maintain necessary privacy and security standards.
According to Kim, startups in healthcare IT face entrenched competition from Electronic Health Record (EHR) vendors, whose executives have no desire to find their business “disrupted” by some tiny company with an innovative new platform.
Whether working for a tiny startup or a massive vendor, tech pros interested in the healthcare IT field need to familiarize themselves with not only the basic building blocks of any software platform—programming languages such as C# and Python, and management methods including Agile—but also the sort of creative thinking that allows people to solve thorny problems.
That being said, much of the software employed in healthcare is complex and unique to the industry, making it hard for tech pros to get a handle on much of it until they have a number of years of experience under their belts. Health Level 7 (a framework and standards for retrieving electronic health data) and DICON (an imaging program) are just two of the platforms that workers will need to get familiar with.
But given the importance of data protection, perhaps the most important skill to learn is everything HIPAA-related. Whatever the nature of your startup, there’s nothing more important than ensuring patient data is shielded.
If you’re an enterprise architect or systems administrator, you know how a data center works—and if your career spans several years, chances are good that you’ve dealt with some pretty massive systems. But it’s hard to think of infrastructure more massive than what Google just revealed.
“From relatively humble beginnings, and after a misstep or two, we’ve built and deployed five generations of datacenter network infrastructure,” Amin Vahdat, a Google Fellow, wrote in an Aug. 18 posting on the Google Research Blog. “Our latest-generation Jupiter network has improved capacity by more than 100x relative to our first generation network, delivering more than 1 petabit/sec of total bisection bandwidth.” That means 100,000 servers communicating “with one another in an arbitrary pattern at 10Gb/s.”
Google built the system out of necessity; traditional networking hardware simply couldn’t scale to meet its enormous needs. The company has released four papers on how it managed to customize bandwidth to serve thousands of applications, design data center network topologies, deal with epic congestion and latency, and build a muscular cluster manager. For any tech pro interested in data centers, they’re well worth reading for an idea of how far networking and server technology has progressed—even if you have no intention of working with a system as large as Google’s.
Harper Reed is most famous for his role as CTO of Obama’s 2012 re-election campaign, but he’s served other prominent roles over the past several years, including CTO of Threadless (the t-shirt company) and CEO of e-commerce startup Modest.
In order to operate effectively in such high-profile roles, you can’t just be a great developer—you need to have people skills, including the ability to wrangle some strong personalities. How did Reed develop those skills?
Through a bit of trial and error, according to a new Medium posting where he describes his formative years. Reed got into computers early, becoming obsessed with not only hardware and software but also Bulletin Board Systems (BBS), the ancestors of today’s social networks. At first, Reed didn’t exactly use his newfound know-how for good; in one early hack, he made his school’s computers display profanities, a stunt that cost him school computer privileges for the rest of the year.
After a local kid used instructions Reed found on a website to build a bomb, agents from the Bureau of Alcohol, Tobacco and Firearms came calling. “Thankfully, I didn’t get kicked off the computers again — because I had already parlayed my experience into running IT for the high school,” Reed wrote, “and thus knew more about the school’s computers than any of my teachers. They needed me.”
Reed believes those early experiences gave him the attitude necessary to run the tech side of Obama’s re-election campaign. “Somehow knew I could do the job,” he wrote. “I attribute that confidence to my experience as a hacker and the subsequent willingness to take risks. If you never break through that wall of doubt, you will never see what might’ve been possible.”
Obama’s campaign deployed dozens of data scientists, developers, and engineers to analyze and work with huge mountains of data gathered from Facebook and other online sources. The data-analytics initiatives included Project Narwhal, which made voter information accessible to campaign workers across the country. It was the sort of job capable of intimidating even the most experienced tech executive, but Reed was evidently well-equipped to handle it, thanks to a hefty dose of hacker attitude.
When MongoDB came out six years ago, the NoSQL fad was in full swing. Rather than rely on a table-based relational database structure, MongoDB utilizes BSON, or “Binary JSON,” which many users found attractive; the platform quickly grew into a leader among NoSQL databases.
As more people took the MongoDB plunge, however, many began to complain about various issues, including its lack of speed. For the past few years, people have debated whether that initial hype was worth it. Now that the technology’s reached maturity, and many of its most annoying bugs have been squashed, that question can be confronted head on: Is it worth using MongoDB when you can stick with an old, rugged SQL standby such as MySQL?
Before we dig into the issue, let’s look at the purpose behind each technology:
- MySQL is for storing relational data
- MongoDB is for storing documents in JSON and BSON
I’ve seen a lot of developers favor a NoSQL solution such as MongoDB for the bulk of their work, because they assume that MySQL remains unable to accomplish certain tasks. “My data has a unique layout, that can’t be put into relational, tabular form,” these people claim. But if you do some digging, you’ll often find that people are able to accomplish whatever they need with a relational database, provided they code carefully and don’t try to take shortcuts. The format is more versatile than many people will admit; for decades, people have created complex structures and stored them in relational databases just fine.
Developers also dislike certain aspects of MongoDB. It doesn’t allow joins, for example, which leads people to stuff in gigantic documents with lots of repeated data, creating a royal mess. Making things worse is MongoDB’s insistence on restricting access; if you’re adding customer orders to a giant document, other people can’t add rows until you’re finished. With MySQL, on the other hand, people in that situation could add orders simultaneously. (Those who insist on bringing joins-like functionality to MongoDB can put records in separate documents, although this can require some creative coding on the developer’s part.)
Even those companies and prominent developers who like MongoDB don’t rely on it for every database-related issue. For example, some firms have a real problem with what they perceive as MongoDB’s lack of ACID compliance. If your project requires large amounts of normalized data, with loads of reads and writes, it’s probably best to go with MySQL, rather than “trying to make it work” in MongoDB.
As with any popular software platform, both MongoDB and MySQL suffer from perception issues, some of which are, frankly, unfair. People claim that MySQL doesn’t easily scale horizontally, and that the only option for doing so is to increase memory on a single MySQL server. That’s an outmoded claim; there are plenty of ways to scale. In a similar fashion, a lot of people insist that MongoDB will crash and cause massive data loss; while that was an issue in the past, developers have squashed many of the related bugs.
So is it worth using MongoDB over MySQL? Not in every situation. When deciding which is better suited to your project, spend time studying the documentation for both, and try working with a dataset in MongoDB to see whether it fits your needs. Understand the role that benchmarks play: For instance, if you’re building a website that does a handful of database queries per second, does it matter which database can perform a couple thousand per second? Bottlenecks can have a huge impact on application performance; it pays to read up on how to optimize your implementation, regardless of which one you choose.
I’ve been engaged in SQL programming since around 1990. When I found MongoDB, it was like a godsend, as it allowed me to store complex structures easily. Over time, however, I realized I was relying on MongoDB as a sort of a crutch, and moved back to MySQL. Today, I use MongoDB when I need document storage that doesn’t require continual tweaks of small fields, but I generally use MySQL for most of my work.
But my situation doesn’t apply to everyone: Explore both, learn them well, and you’ll find what works best for your needs.