Is it really possible to store 256GB of data on a plain A4 sheet of paper?
Yes, according to Sainul Abideen, an engineering student in India who claims his Rainbow format encoding can do it by turning the basic data into coloured geometric shapes. Abideen stated that he could store 2.7GB of data in a square inch, and up to 450GB on a large sheet of paper - which he was then photographed with.
But following huge interest in the story, Mr Abideen's claims have come under determined scrutiny across the Internet, with a broad consensus that it is in fact impossible to store that amount of information with the apparatus he has outlined. One expert has called the claims "the storage equivalent of perpetual motion".
The feasibility of the entire method boils down to the encoding system used and the technology behind a special scanner that Abideen has demonstrated. However, Abideen has so far refused to divulge any details of either, adding to the general mood of scepticism.
Techworld has spoken twice with Mr Abideen and sent a lengthy email asking questions about his claimed technology but we are, as yet, unsatisfied with the response. We're not the only ones.
Regarding the printing and scanning technology, Alex Young, director of technical marketing at RAID specialist Infortrend Europe said: "With today's laser printer technology (600 dpi or 1200 dpi), it will be hard to achieve this. Also from today's scanner technology, the scanner has to be very precise and free of dust. From the colour-matching/correction software technology, the colours have to be precise or the data might be misinterpreted."
A storage expert, Robin Harris, agreed: "Mr. Abideen has misplaced a few decimal places. Modern offset presses used to print magazines, which he suggests as a medium operate at about 300 dpi, or about 8.7 million dots on an A4 sheet, or 35 million dots with four color printing. There is no lossless way to compress 256 GB (or 2 trillion bits) of data into 35 million dots, or even 35 billion dots. It is the storage equivalent of perpetual motion."
Large numbers of Net users have also raised doubts.
An article focuses on scanning/printing technology and compression difficulties of the method outlined. Daily Tech forum commentators list a large number of problems and issues they see with the reported claims. So too do Digg, Slashdot and ITsoup.
The postings point to three types of problem overall:
- If the stored data is simply represented as bits on the page then there aren't enough reliably detectable bits, using current scanning and printing technology, to do the job.
- If there is some new method of encoding information then the computational tasks associated with encoding it and then reading it are potentially immense.
- Paper as a medium distorts, is not level, changes its shape in response to heat and humidity, and can be folded which decreases information clarity in the fold area. As such it is a poor medium for recording such dense amounts of information.
Is the original report suspect?
There is some confusion about the people mentioned in the original Arab News story. The MES College of Engineering (MESCE), Kuttippuram, exists as and is listed on a Kerala government website. It also has its own website. However a Professor Hyderali is not listed as a member of the MCA (Master of Computer Application) faculty. A Professor Sainul Abdeen though is listed as chief warden at MESCE.
A Mr. K Hyderali is listed as a lecturer in the MCA faculty at MESCE. The same listing includes Professor Sainul Abideen in an MCA adjunct faculty status. However a 2003 student admission listing includes a Sainul Abideen.
The Arab News report, and a near-identical Deccan Herald report by the same writer, state Sainul Abideen has just gained his MCA qualification and is aged 24. This seems young for a professor. So, if taken literally, there are two Sainul Abideens but no Professor Hyderali. The most likely explanation is journalistic inaccuracy, which also raises doubts about the story overall.
FAQ: The issues and problems
We have indentified no less than 14 queries over the feasibility of the Rainbow technology. Here they are with brief explanations:
1. Scanning To scan the Rainbow-encoded image would require a scanner to be able to scan 256GB-worth of data. A 1200dpi scanner might pick out 1,440,00 dots per square inch. That computes to 18,000 bytes, and 18KB per square inch is a long way short of 2.7GB per square inch. Assuming the scanner is perfectly calibrated and that the paper is positioned correctly in the scanner then we might say it could pick out one of 256 colours per dot. That leads to 4.608MB per square inch - still a long way shy of the claimed figure.
2. Compression In the Rainbow scheme, geometric shapes as well as colours are used to represent the information. It's asserted that this is a form of compression scheme and that it can't be better than existing compression methods otherwise it would be in use already. Compressing data into a Zip archive could increase the storage capacity but only by a factor of two or three and that depends upon the data type. You can only compress so far before losing information.
3. Data capacity and number of bits It's asserted that there is no way that storing data as coloured shapes can be any better (at increasing storage capacity) than storing coloured bits. A shape on paper is made of coloured bits. A bit is a bit and has a storage capacity of 2 to the N where N is the number of bits. Eight bits, a byte, have 256 possible permutations.
We can think of a byte as a code for a geometric shape, like a square or a triangle, but the byte still only contains 8 bits, meaning 256 possible values, and that is not enough for the Rainbow format claims. So the claims would appear to be impossible.
4. A dot equaling a byte still doesn't provide enough capacity. Let's suppose a dot could have 256 values through its presence/absence and colour. Then are there enough dots per square inch to provide the claimed capacity? No. Assuming a 10x10 inch piece of paper and a 2400dpi scanner then the scanned information amount is:-
It's not enough.
5. Real life Scanning density isn't good enough. Hard disks can have unimaginably high bit per square inch numbers because disk and read/write head are fantastically accurately aligned. Having paper, a medium that changes shape according to temperature and humidity fed into a scanner, meaning its position is variable, will make it impossible for a scanner to retrieve information at the dpi measure required.
Also you have to allow for error-correction which decreases the theoretical scanning density to a lower, real life, level. Detecting differences between a 12-sided and a thirteen-sided polygon could be very difficult. Also the difficulty increases with the number of sides. Ditto the increasing wavelength of colours.
6. Using symbols does not increase the storage capacity of the medium. The amount of information that can be read off the paper is measured in bits. Just because a post-scanning algorithm finds out that the bits represent colours, hex numbers, geometric shapes or Chinese characters does not increase or decrease the basic amount of information, which is limited by what can be printed and what can be read by a scanner. Using symbols divides the bit-level information into chunks. It doesn't increase the amount of bit-level information.
A shape that is made of ten pixels (bits) contains no more information than the ten pixels.
7. Depth You could greatly increase the storage capacity if you stored data in 3D, i.e. by using the depth of the media to store more information, as in holographic storage. But the Rainbow format makes no claims to this effect. We are left to assume that only the storage medium's surface is used and that is 2D.
8. Printer and scanner calibration issue To reliably and accurately scan one of 256 colours would need the originating printer and the reading scanner set to use exactly the same colour wavelengths. This is a practical impossibility as the number of used wavelengths (colours) increase. When combined with the response of paper and ink to temperature and humidity changes this becomes even more of a problem.
9. More on scanner alignment To accurately position the scanner's read head you would need tracking information, meaning bits, added to the paper. (This is similar to track positioning data on magnetic tape.) The space taken up by these would reduce the storage capacity.
10. Colour printing doesn't add much more capacity A colour printer prints, at most, in for primary colours: cyan, magenta, yellow and black in the CMYK scheme. It can print many more recognisable colours but only by partially overlapping dots to additively form a colour from these four components. The primary colour dots overlap so that the area of the page needed for one apparent dot of the desired colour is many times greater than a dot of a primary colour. This decreases the bit dpi rate on the page. In other words the new colour is like a chunk of the original bits.
11. If there is a new way of encoding information so that you could store 2.7GB/sq in then you could use to store much more information on CDs and DVDs too. With the storage market worth billions of dollars a year, it is highly unlikely that one Indian student has uncovered something not considered by thousands of storage specialists.
12. The encoding method We know numbering schemes can increase the information content of a string of bits. For example, hexadecimal numbering has more information in a single number than a binary number scheme. Until we know what the Rainbow format encoding scheme involves and what the 'scanner' actually consists of and does we can't ascribe believability to the Rain technology claims.
If it's a form of hash encoding then there is a huge computational problem in reversing a hash to get the original information back, raising questions about any practical application.
13. The alphabet problem Hexadecimal numbering works because the reading device "understands" hexadecimal. Suppose you could use coloured and shape-grouped bits to store more information, you would then need to "understand" it. If every pixel represented a 32-bit colour then its value is 2 to the power 32. A contributor to Daily Tech calculated that you could have a 4096x4096 grid using pixels of 1-32 colours and so arrive at 6MB of data. Two such "super bits" could represent 16GB (16 trillion) pieces of information but ... you have invent an alphabet with 16 trillion letters and map that to a binary alphabet. This is not a trivial computational problem.
14. Paper problems Paper distorts and inks fade so the long term storage potential is strictly limited. Paper also burns and can get torn which also restricts the method's usability. Paper can be folded which would distort the represented information in the area of the fold.
So, in summary, the claims made for this Rainbow technology are almost certainly incorrect. There is no pot of gold at the end of it, and we'll just have to stick with hard disk drives for the moment. Although new holographic technologies are beginning to offer new, huge capacities, and advances in Flash memory technology mean that increasingly large amounts of data can be stored in smaller and smaller areas.
But 256GB on an A4 sheet? No way!
Find your next job with techworld jobs