Sander Peerna CS100W: 2013

Sunday, December 15, 2013

Scientific Computing: Open Source Software

There is a great need for software in the scientific community that can simplify and reduce the work required to solve complex mathematical equations. Otherwise manually solving science related problems would take forever and be error-prone. Scientific computing aims to resolve complicated problems in a range of fields including the physical and engineering sciences, finance and economics, medical, social and biological sciences. It can enhance communication of information by creating visual representations of scientific data. The major numerical computing environment and programming-language that most have heard of is MATLAB. Unfortunately MATLAB is proprietary software and thus has a high monetary cost. Fortunately there are open source alternatives that have much, if not all, of the capabilities required for scientific computations.

SciPy is an open source computing environment built for the Python programming language. The core elements of SciPy are the NumPy and SciPy libraries that include all the algorithms and mathematical tools required for core scientific computing. There are also additional libraries to expand the features of SciPy such as the Matplotlib library which is used to show plots.

Here’s a list of some of SciPy’s features and their packages:

• Special Functions (scipy.special)‏

• Signal Processing (scipy.signal)‏

• Fourier Transforms (scipy.fftpack)‏

• Optimization (scipy.optimize)‏

• General plotting (scipy.[plt, xplt, gplt])‏

• Numerical Integration (scipy.integrate)‏

• Linear Algebra (scipy.linalg)‏

• Input/Output (scipy.io)‏

• Genetic Algorithms (scipy.ga)‏

• Statistics (scipy.stats)‏

• Distributed Computing (scipy.cow)‏

• Fast Execution (weave)‏

• Clustering Algorithms (scipy.cluster)‏

• Sparse Matrices* (scipy.sparse)‏

These allow the creation of vast variety of functions required for use by the scientific community. If you are looking for a powerful open source computing environment for scientific computing visit their site at http://www.scipy.org/ and download the software.

Get started with Python and SciPy: Introduction to Scientific Computing

Sunday, December 8, 2013

Computer Graphics: CAPTCHA Image Processing

In 1999, slashdot.com create an online poll to ask the people which graduate school had the best computer science program. This was a big mistake. Both MIT and Carnegie Mellon wrote programs or “bots” that voted for them. As a result the poll became a contest between the voting “bots” where each school ended up with over 20,000 votes while the rest had less than a 1,000 votes. This led to research in preventing such programs and the CAPTCHA was created. CAPTCHA stands for “Completely Automated Public Turing Test to Tell Computers and Humans Apart.” The idea is that the CAPTCHA is a test that humans can pass but computers can't pass with a probability greater than guessing. What does the CAPTCHA have to do with computer image processing?

CAPTCHAs are distorted images which computers can't solve due to the segmentation problem. Computers are actually better than humans at solving fundamental CAPTCHA problems. Yet the computers fail at separating letters from each other, recognizing distorted letters, and understanding the context of each letter. Humans on the other hand excel at recognizing the letters and the resulting words. Computers are not able to recognize distorted letters because there are infinitely many distortions. They are not able to separate letters from each other as well because CAPTCHA images have lines going across the words and confusing background patterns. Thus the computer's CAPTCHA image processing problem is a difficult problem in the field of artificial intelligence. One last interesting thought: CAPTCHA is a program that can generate and grade tests that it itself cannot pass.

Sources:

http://en.wikipedia.org/wiki/CAPTCHA

http://www.cs.cmu.edu/~biglou/captcha_cacm.pdf\

Information Security: Principles and Practice by Mark Stamp

Sunday, December 1, 2013

Cryptography: TLS/SSL Protocol

Network interactions require specific protocols for them to take place. These protocols are based around user authentication and confidentiality. Protocols can be used for authentication, confidentiality, or both. The protocols allow you to make secure transactions, application connections, and user connections over non-secure networks. Examples of these protocols are TLS/SSL, IPSec, and Kerberos. I’ll focus this blog on TLS/SSL as that is the protocol most visible to everyone today.

TLS/SSL

We all use this protocol when we browse the internet because TLS/SSL is the underlying security protocol for HTTPS. The protocol is implemented at the socket layer (to use it applications have to implement it) and is relatively simple. TLS/SSL main purpose is for secure transaction. To purchase an item you want to be sure you are dealing with the real business (authentication), you want your credit card information to be protected (confidentiality and integrity), and the business does not need to authenticate you since all they want is the money (no mutual authentication). Now to the actual steps of the protocol. If you are ready to purchase an item on Amazon then the first step is for you to request a connection with Amazon. Along with it you will send a list of ciphers that you support and a random nonce (number used once). Amazon will then reply with their certificate, a chosen cipher, and their own random nonce. You then reply with a secret that is encrypted with Amazon’s public key and another encrypted message that is used for integrity check and establishes a session key. Amazon replies with one last message to prove that they were able to decrypt your previous messages. A couple of important parts are the certificate sent by Amazon and the established session key. The certificate prevents a man-in-the-middle attack because it is signed by a certificate authority and your browser will check the certificate signature. If an attacker sends a false certificate the browser will see that it is not signed and gives a warning to you. Unfortunately users can ignore this warning and allow the connection to proceed which allows the man-in-the-middle attack to succeed. This is a flaw in human nature not the protocol. The other important part of the protocol messages is the session key. The session key is a hash of the secret you sent and both of the nonces. Often your browser opens multiple parallel connections to improve performance. The TLS/SSL session are costly but given an existing session new connections are cheap. Thus any number of new connections can be created from the existing key to allow multiple parallel connections.

Book I've been reading: Information Security: Principles and Practice by Mark Stamp.

Sunday, November 24, 2013

Artificial Intelligence: Technological Singularity

The concept of artificial intelligence (AI) has been around as long as the idea of machines and computers. People are fascinated by the idea that it is possible to code software that can “think” to a certain extent. Technologies with AI are all around us but we don't always think of them as AI. This thought can be attributed to all the movies with AI that are far more advanced than today's AI and/or that we have become used to AI being part of the world. Examples of AI currently are robots in car factories, automated customer services, Roomba vacuum cleaner, IBM's Watson, and self-parking cars. Currently the two major AI areas are voice-recognition software and self-driving cars. The major use of AI is to improve efficiency and to help humans with dangerous or difficult tasks. There are smart robots disabling land mines and handling radioactive materials.

As mentioned earlier, the AI technology available today is rather one-dimensional compared to what one can see in movies. AI is only as smart as the code that is uses. I don't think we are nowhere near in creating a truly “intelligent” AI, one that has the capabilities of human thought. Whether self-awareness can ever be achieved in a machine is debatable. One view is that if the Moore's law continues to hold then it's only a matter of time before humans create a machine with superhuman intelligence. This view was brought up by Vernon Vinge and he even went as far as saying that this will occur by the year 2030. If mankind ever develops software that will allow a machine to analyze data, make decisions and act autonomously then we can expect to see machines begin to design and build even better machines. In return, the new machines can build more powerful machines. Once these machines are able to improve themselves humans will become obsolete since the machines will have more intelligence then us, this point is called technological singularity. What will happen then?

Sources:

- Artificial Intelligence Overview

- Technological Singularity

Sunday, November 17, 2013

History of Computer Science: Cryptography with Digital Computers

With the computer revolution came forth more advanced cryptographic techniques that were previously impossible or at the very least not very efficient. In 1948, Claude Shannon started the cryptographic revolution with his paper, A Communications Theory of Secrecy Systems. The published paper crowned Shannon as “The Father of Information Theory” because he applied advanced mathematical techniques to show and prove the security of cryptographic algorithms. The Lucifer cipher developed by Horst Feistel in the 1970’s while working for IBM paved the way for the symmetric key ciphers. By the mid 1970’s the computer revolution was at full strength and it became clear that digital data needed to be secured. At the time cryptography was a field only for the military and the government until the National Bureau of Standards called for cipher proposals. The only serious contender was the Lucifer cipher which the NBS handed to the government experts, the NSA, who modified the Lucifer cipher and created the Data Encryption Standard (DES). With the ever increasing computational powers of the computers over time DES has been replaced by Triple-DES and AES.

During the same time the symmetric key cryptography was being developed another cryptographic technique was being born, public key cryptography. In 1976, Whitfield Diffie and Martin Hellman published a paper titled New Directions in Cryptography which introduced public key cryptography and one-way functions. Unlike symmetric keys which required the key to be shared before the communication was made, the Diffie-Hellman key exchange allowed making connections without prior key sharing. The one-way functions allowed the public key cryptosystem to flourish because one-way functions are easy to compute in one direction computationally infeasible in the other. The Diffie-Hellman inspired RSA which is stilled used today for public cryptography. RSA was published in 1977 by Ronald L. Rivest, Adi Shamir and Leonard M. Adleman. For internet security PGP was released in 1991 and it is still considered secure today. PGP uses public keys and doesn’t allow the sender to determine the decryption key even if the encryption key is known. Cryptography has become extremely important and will become more important as the power of computers increases along with the growth of digital data and the internet.

More detailed history here and here.

More info on different cryptography systems here.
Book I've been reading: Information Security: Principles and Practice by Mark Stamp.

Saturday, November 16, 2013

History of Computer Science: Cryptography Before Digital Computers

The beginning of cryptography was when humans spoke their first words. Even to this day a language can be considered a form of cryptography because if you don’t know the language another person is speaking you will have no idea what secrets they are talking about. This accounts for the use of written language as well since a majority of people, up until recently, were not able to read. Speaking and writing are easily breakable nowadays though. The Egyptian hieroglyphs could be considered a form of cryptography too as it used pictures to hide their stories. The first use of algorithms to secure a message was created by the Greeks who came up with the Spartan Scytale around 7^th century B.C. Rods of different diameter were used to wrap a strip of parchment around it on which a message was written. The Caesar Cipher appeared during, you guessed it, Julius Caesar’s rule and was used for war (as was the Scytale). The Caesar Cipher, a monoalphabetic cipher, used simple substitution as a form of confusion. There was little advancement in cryptography until the Middle Ages but the Arabs did make headway in cryptoanalysis by using frequency analysis.

In the 1500’s, Leon Battista Alberi, “The Father of Western Cryptology,” developed polyalphabetic substitution. Polyalphabetic uses multiple alphabets to hide the plaintext by allowing different ciphertext symbols to represent the same plaintext symbol. During the 16^th century, Blaise de Vigènere made improvements to polyalphabetic substitution which was used until the Civil War. Around WWI codebook ciphers and the one-time pad showed up. The one-time pad was started by Gilbert Vernam and improved by Joseph Mauborgne. In the case of the one-time pad, if the key is truly random and used only once then it provides perfect secrecy. Arthur Scheribus invented the Enigma machine at the end of WWI, it was used commercially at first and then improved by the German government for use in WWII. The machine was broken by a Polish cryptologist, Marian Rejewksi, and his work was transferred over to Alan Turing and the code breakers at Bletchley Park to build Bombes which were electromechanical machines that were designed specifically to break Enigma.

More detailed history here and here.

More info on different cryptography systems here.

Sunday, November 10, 2013

File Sharing: Sharing is Caring

File sharing is what makes up the internet. Internet would not exist without it being possible to share files between applications and people. Whether you are browsing the internet, sending emails, or checking Facebook you are sharing files. The issue that comes into play when sharing files is security. For most of the files integrity is enough for sharing the files across the internet but for sensitive information the files have to have confidentiality and integrity. And if you are downloading files from third-party sources, torrents, or possibly even from Dropbox they could include viruses or malware. There are many layers of security required from both the host and the user end to make sure the files are safe and secure.

One of the aspects of file sharing is checking the integrity of the file. When you upload or send a file someone could capture the packets and modify the file any way they desire. This is where integrity comes in and tells the parties involved whether the original file has been tampered with. The two most common methods for proving file integrity are the MD5 and SHA-1 hash functions. They compute a hash from all the packets sent but unfortunately they are not as secure as believed. The next level of security for file sharing is confidentiality. This requires files to be encrypted with a key and then sent out. The key is either a symmetric key established between the parties, a public key, or a session key if a connection was established (hopefully using a secure protocol). Then the files are encrypted with algorithms such as AES and DES. The files can have both integrity and confidentiality if executed properly.

The last part I want to touch upon is downloading files from file sharing applications. Third-party sites and torrents are often tricky for the user because anyone could have uploaded a file with any name. Most common example I have seen is if you are looking to download a specific pdf file you may find a file with a similar name but instead of the file having a pdf extension it is an executable. One has to be very careful when the source is unknown or open to anyone.

Sunday, November 3, 2013

Data Structures: Efficiency is Key!

Data structures are one of the most integral parts of computer software. These days our computer processors can go through billions of calculations per second but searching, accessing, inserting, and deleting data can take a large chunk of the processing power. This is where data structures save the day, as long as they are built and implemented properly. Data structures gather the data storing and organizing it for efficiency. There are many different types of data structures and many different application uses for them. Arrays, lists, binary trees, heaps, b-trees, and hashes are all ways to manage data for an application. Data structures are also used to create efficient algorithms. In combination, data structures and the resulting algorithms save a great deal of processing power that can be used on more important tasks.

The big O notation is used to analyze the efficiency of all the tasks that go along with data structures such as searching, inserting, deleting, and the amount of space they use. Most commonly the notation is used to describe the average and worst case scenarios. This is analogous to looking for noodles in a grocery store. The aisles are nicely divided by food categories so you go look in the pasta aisles and voila there it is the first item in the row. But it may be the case that you are at a brand new grocery store and the noodles you are looking for are at the end of the aisle so now you have to walk an extra 30 feet. Now to the actual notation. Common big O notations include O(1), O(n), O(n²), O(n log n) where O() is the big O and the time function is inside the parentheses. Each data structure has their own efficiency for each action.

Take a look at some of the performance rates for data structures!

Saturday, October 26, 2013

Hacking: How Encryption Keeps Your Data Secure

With billions of people and devices connected to the internet there is an enormous amount of data being sent over the network, saved on the computers, and stored in the databases. The network and the devices allow us to conveniently take such actions with our data. Majority of this data that we have is background noise that doesn’t have much use to others but we also have sensitive data. Even if the data isn't sensitive most of us feel much more comfortable knowing the data we share is not being looked at by a third-party. Life would be easy for all of us if our data was only visible to us and any parties we have given permission to. Unfortunately all technological devices and the network are susceptible to hacking. There is security implemented at every layer of technology because every layer has ways of leaking data which allows hackers to get unauthorized access to your data. There are evildoers out in the world who want to hack you, even the U.S. government is trying to hack you (and succeeding)!

One of the major layers protection, specifically for sensitive information, is data encryption. People have tried and found ways to encrypt data for thousands of years, possibly even longer if only we could decipher the hieroglyphs… Data encryption begins with a cipher used to hide the message, or plaintext. Most early examples of ciphers come from times of war used to hide the information about an army’s strategy. Such examples are the Spartan Scytale, Caesar Cipher, and the Enigma machine. Unfortunately these ciphers were easily breakable. With the power of today’s computers cipher algorithms need to be strong so that the best possible attack is by brute force.

A common encryption method today is using AES in CBC mode giving the data confidentiality and integrity. Confidentiality prevents unauthorized reading of your data whereas integrity prevents unauthorized writing of your data. In the case of the aforementioned encryption method, AES works by dividing your plaintext data into 128 bit blocks and encrypting each block with a key after CBC obscures the plaintext block by XORing it with the previously encrypted block. The plaintext gets XOR’d otherwise two blocks with the exact same plaintext will have equal encryption output and that gives valuable information to the attacker. Since the first block does not have a previously encrypted block to XOR with a special initialization value is used. This encryption cleverly applies XOR and special functions to completely obscure the plaintext. The attacker can only use brute force and the shortest possible key for AES is 128 bits which would take a long time to break.

*Currently taking a course on information security so most of my knowledge on cryptology is from Information Security: Principles and Practice by Mark Stamp.

Sunday, October 13, 2013

Open Source: Open World

State of Linux Distros

What is open source? Open source is software made freely available to everyone in the world! The source code can be modified and distributed by anyone under the same terms as the license of the original software. Usually open source software is developed not by a company but as a collaborative project amongst the public. It’s amazing that people across the world put in their time and effort for free software to be enjoyed by everyone. There is open source software equivalent to nearly all the proprietary software, i.e. LibreOffice and OpenOffice = Microsoft Office, Thunderbird = Microsoft Outlook. Other open source software I would recommend: Wireshark, HandBrake, GIMP, phpMyAdmin, Dev-C++, Notepad++, and more with descriptions. Using open source software has saved people about $60 billion. The problem with open source is that most of them are not for a regular user, you need to have decent understanding of computers. Proprietary software has a business behind them and has money to advertise but open source software tends to be hidden and you have to know which ones are safe.

I enjoy downloading all types of software, except viruses, and play around with them. Open source makes that much easier since I don’t have to pay. My favorite open source software are the Linux distributions. The Linux distros are nearly all open source and all the software that comes with the Linux operating systems are open source. You can hit two birds with one stone. The Linux distros are set up for different types of uses. General distros include Ubuntu and Linux Mint, multimedia centric distros include Arch Linux and ArtistX, and there are advanced distros like BackTrack. Linux is even being pushed out on mobile phones. Take a look at more options: Linux Distros. They all have interesting open source software built into them for all types of uses. I just recently installed BackTrack on my computer to mess around with it since it’s made specifically for information security. A bit complicated currently but soon I’ll get it figured out.

Sunday, October 6, 2013

Agile: Software Development Done Smart

When it comes to working on projects there are a few different ways to approach them. One way to approach a project is to create a plan at the beginning and use that for the entire project. In this option there is really only one route to follow as the whole product is put together at once. An increasing option for a project is to use agile development. Agile development is more flexible as the progress is in intervals and the customer can see parts of the project come to life. This way the customer receives working software continuously and frequently. The Agile Manifesto.

How does one get started with agile development? The key to agile is the customer and developer communication. Each agile development interval is called a sprint which usually lasts two to four weeks. The sprint start with a scrum where the customer gives a relatively simple description, a “user story,” of what they want and then the developers figure out how to solve the customer’s problem. A list of what the customer wants is created and then the order in which they are implemented and what sprint they are going to be worked on are decided. During the scrum a leader is chosen as well to communicate with the customer and divide the work amongst the team members. In the following scrum meetings team members share what they worked on, will work on, and identify any blocks to progress. At the end of the spring a working, not necessarily final, product is shown to the customer and the customer can decide on any changes that need to be implemented. An interesting way to estimate the effort of a “user story” is to use a method called planning poker where the developers pick a numbered card depending on the amount of work they think is required. Then everyone shows their cards at once and discusses why they picked their number. You can test out and get a better idea of planning poker at www.planningpoker.com.

Friday, September 20, 2013

LinkedIn and Branding: Brand Your Career

LinkedIn has redefined social networking by creating a business related networking site to map your professional life. With around 225 million user accounts LinkedIn connects you to a large professional network to boost your career. In this day and age paper resumes are becoming obsolete if not already as LinkedIn has replaced them with their electronic resumes. The entire site is your resume and because it is all online in an easily accessible format recruiters can find you with a click. This simplified interaction between the job seeker and the recruiter makes LinkedIn the future. Recruiters have all the tools they need on LinkedIn, they can easily search for the right candidates and get access to their resumes. They can search by profession, industry, and geography and used the associated graphs to pick out the candidates. No longer to businesses have to store peoples’ resumes and they are constantly updated.

To attract connections and recruiters you need to keep your LinkedIn actively updated. First of all, you need to set up your profile with all the professional information. Add your picture, summary, experience, skills, education, and other information. Also edit your public profile link so that it easily remembered not a bunch on numbers (preferably your name or a permutation of your name). Another important way to keep your profile active is to follow companies and groups that you are interested in. This further helps you get connected with related professionals and improves your chances of getting recruited. Posting on these groups and related news topics is a good way to spread your name as well.

More info: http://imonlinkedinnowwhat.com/

Translate